-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
进行multispeaker训练的一些问题 #10
Comments
一些demo vits_out.zip |
说话人泄漏的问题没遇到过,语速是不是推理脚本里面的length_scale设置的不对? |
filelist的文件我在我的数据集上生成的时候没有进行打乱,可能是这个原因造成的泄露,目前修改了一下,看看效果。 |
还是不行。 |
试下把speaker相关的信息concat上去,不要add呢? |
vits官方项目本身就有基于VCTK的多说话人TTS,效果非常完美 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
我在config file 文件中增加gin_channels=256字段,现在speaker nums = 15,目前进行到330个epoch,测试发现合成的语音速度非常快,并且一个句子中,会出现多个speaker的声音。现在我的疑问是:
1、目前大概需要多少epoch才能稳定?
2、gin_channels=256,相对于只有15个speakers是否太大?比如改成32,64等小参数比较好收敛?
3、一个句子中,会出现多个speaker的声音,从现阶段来看,是否正常?
4、每句话的语速非常快,对于已经训练了330个epoch来说,是否有点不正常?
其中,每话的音素结构如下(在tacotron2中,能够训练出韵律正常的model,所以比较明确音素设计应该没问题。):
p iy1 #1 eh1 l #1 ah0 #1 eh1 s #1 t iy1 #1 iy1 #1 aa1 r #1 iy1 #1 aa1 r #1 eh1 s #1 t iao2 #0 l i4 #1 d uei4 #1 ^ u1 #0 r an3 #0 ^ van2 #1 p u3 #0 ch a2 #1 d e5 #1 z u3 #0 zh iii1 #1 sh iii2 #0 sh iii1 #1 z uo4 #1 l e5 #1 n ei3 #0 x ie1 #1 g uei1 #0 d ing4 #3 fin
大佬们有空聊聊吧!
The text was updated successfully, but these errors were encountered: