Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretrained model is here. #3

Closed
MaxMax2016 opened this issue Feb 7, 2022 · 14 comments
Closed

Pretrained model is here. #3

MaxMax2016 opened this issue Feb 7, 2022 · 14 comments

Comments

@MaxMax2016
Copy link
Collaborator

链接:https://pan.baidu.com/s/1CEgyC1R3FxXEI-5AL_Sj8g
提取码:cym1

@OnceJune
Copy link

How many steps do you trained on Chinese dataset? Thanks in advance.

@MaxMax2016
Copy link
Collaborator Author

3 GPU 16 batch_size about 200K step, you should focus loss:

baker_base INFO [200000, 0.00017334926433399393]
baker_base INFO loss_disc=2.410, loss_gen=2.418, loss_fm=6.987
baker_base INFO loss_mel=17.984, loss_dur=1.342, loss_kl=2.022
baker_base INFO Saving model and optimizer state at iteration 1143 to ./logs/baker_base/G_200000.pth
baker_base INFO Saving model and optimizer state at iteration 1143 to ./logs/baker_base/D_200000.pth

@josh-zhu
Copy link

谢谢@dtx525942103 提供的pretrained 模型,跑了下停顿感还是很强,是不是因为你提的 “音素后面强插边界了,VITS又强插边界 ” 的原因 ,要解决停顿感的问题是不是需要将add_blank 设置为 false 重新训练?期待回复,谢谢哈

@MaxMax2016
Copy link
Collaborator Author

@josh-zhu 已经是False,你的音频和提供的样本差别大不大呢?

@josh-zhu
Copy link

好像没变化呢,时长都一致

@MaxMax2016
Copy link
Collaborator Author

你把你的文字和音频放粘贴上来,我看看和我的状态一致不

@josh-zhu
Copy link

josh-zhu commented Feb 21, 2022

新进来的宝贝,赶紧去抢福利了哈
vits_out.tar.gz

@josh-zhu
Copy link

1_baker_sample.wav 是和原有的vits_string.txt 保持一致的;感谢大佬及时回复哈

@MaxMax2016
Copy link
Collaborator Author

你的ZIP是空的

@josh-zhu
Copy link

你的ZIP是空的

抱歉更新了哈

@MaxMax2016
Copy link
Collaborator Author

这个效果很好了啊,正常效果啊

@josh-zhu
Copy link

oo,那就是没问题的;我和fastspeech2 的结果进行比较,感觉fastspeech2在语调上更自然;因为fastspeech2 + hifi 不是 e2e的,我还可以对每个phonme的mel 时长进行微调,整体更舒服一下;大佬方便留个线下联系的方式吗,做很多和语音相关的东西,想方便时请教交流下

@taotaoyuhust
Copy link

报错:
size mismatch for enc_p.emb.weight: copying a param with shape torch.Size([219, 192]) from checkpoint, the shape in current model is torch.Size([178, 192]).
请问是需要修改哪里啊

@MaxMax2016
Copy link
Collaborator Author

@taotaoyuhust 您是修改了text中的建模单元symbols = _pause + _initials + [i + j for i in _finals for j in _tones]了吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants