Pretrained model is here. #3

MaxMax2016 · 2022-02-07T04:23:15Z

链接：https://pan.baidu.com/s/1CEgyC1R3FxXEI-5AL_Sj8g
提取码：cym1

OnceJune · 2022-02-11T01:31:42Z

How many steps do you trained on Chinese dataset? Thanks in advance.

MaxMax2016 · 2022-02-11T03:18:28Z

3 GPU 16 batch_size about 200K step, you should focus loss:

baker_base INFO [200000, 0.00017334926433399393]
baker_base INFO loss_disc=2.410, loss_gen=2.418, loss_fm=6.987
baker_base INFO loss_mel=17.984, loss_dur=1.342, loss_kl=2.022
baker_base INFO Saving model and optimizer state at iteration 1143 to ./logs/baker_base/G_200000.pth
baker_base INFO Saving model and optimizer state at iteration 1143 to ./logs/baker_base/D_200000.pth

josh-zhu · 2022-02-20T08:26:05Z

谢谢@dtx525942103 提供的pretrained 模型，跑了下停顿感还是很强，是不是因为你提的 “音素后面强插边界了，VITS又强插边界 ” 的原因，要解决停顿感的问题是不是需要将add_blank 设置为 false 重新训练？期待回复，谢谢哈

MaxMax2016 · 2022-02-21T02:21:45Z

@josh-zhu 已经是False，你的音频和提供的样本差别大不大呢？

josh-zhu · 2022-02-21T07:06:15Z

好像没变化呢，时长都一致

MaxMax2016 · 2022-02-21T07:07:50Z

你把你的文字和音频放粘贴上来，我看看和我的状态一致不

josh-zhu · 2022-02-21T07:14:08Z

新进来的宝贝，赶紧去抢福利了哈
vits_out.tar.gz

josh-zhu · 2022-02-21T07:15:03Z

1_baker_sample.wav 是和原有的vits_string.txt 保持一致的;感谢大佬及时回复哈

MaxMax2016 · 2022-02-21T07:15:37Z

你的ZIP是空的

josh-zhu · 2022-02-21T07:18:30Z

你的ZIP是空的

抱歉更新了哈

MaxMax2016 · 2022-02-21T07:20:47Z

这个效果很好了啊，正常效果啊

josh-zhu · 2022-02-21T07:23:36Z

oo,那就是没问题的;我和fastspeech2 的结果进行比较，感觉fastspeech2在语调上更自然;因为fastspeech2 + hifi 不是 e2e的，我还可以对每个phonme的mel 时长进行微调,整体更舒服一下;大佬方便留个线下联系的方式吗，做很多和语音相关的东西，想方便时请教交流下

taotaoyuhust · 2022-04-08T15:29:16Z

报错：
size mismatch for enc_p.emb.weight: copying a param with shape torch.Size([219, 192]) from checkpoint, the shape in current model is torch.Size([178, 192]).
请问是需要修改哪里啊

MaxMax2016 · 2022-04-11T01:43:19Z

@taotaoyuhust 您是修改了text中的建模单元symbols = _pause + _initials + [i + j for i in _finals for j in _tones]了吗？

This was referenced Feb 7, 2022

Export the model to onnx format. jaywalnut310/vits#41

Open

is that able to train on Chinese dataset? jaywalnut310/vits#2

Open

codybai mentioned this issue Apr 13, 2022

进行multispeaker训练的一些问题 #10

Closed

FanhuaandLuomu mentioned this issue May 26, 2022

question about the phone seqs #19

Closed

MaxMax2016 closed this as completed Feb 22, 2023

asr-pub mentioned this issue Feb 27, 2023

Is needed to insert prosody label under Chinese data? #31

Closed

zdj97 mentioned this issue May 6, 2023

建模单元 #70

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pretrained model is here. #3

Pretrained model is here. #3

MaxMax2016 commented Feb 7, 2022

OnceJune commented Feb 11, 2022

MaxMax2016 commented Feb 11, 2022

josh-zhu commented Feb 20, 2022

MaxMax2016 commented Feb 21, 2022

josh-zhu commented Feb 21, 2022

MaxMax2016 commented Feb 21, 2022

josh-zhu commented Feb 21, 2022 •

edited

josh-zhu commented Feb 21, 2022

MaxMax2016 commented Feb 21, 2022

josh-zhu commented Feb 21, 2022

MaxMax2016 commented Feb 21, 2022

josh-zhu commented Feb 21, 2022

taotaoyuhust commented Apr 8, 2022

MaxMax2016 commented Apr 11, 2022

Pretrained model is here. #3

Pretrained model is here. #3

Comments

MaxMax2016 commented Feb 7, 2022

OnceJune commented Feb 11, 2022

MaxMax2016 commented Feb 11, 2022

josh-zhu commented Feb 20, 2022

MaxMax2016 commented Feb 21, 2022

josh-zhu commented Feb 21, 2022

MaxMax2016 commented Feb 21, 2022

josh-zhu commented Feb 21, 2022 • edited

josh-zhu commented Feb 21, 2022

MaxMax2016 commented Feb 21, 2022

josh-zhu commented Feb 21, 2022

MaxMax2016 commented Feb 21, 2022

josh-zhu commented Feb 21, 2022

taotaoyuhust commented Apr 8, 2022

MaxMax2016 commented Apr 11, 2022

josh-zhu commented Feb 21, 2022 •

edited