Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

进行multispeaker训练的一些问题 #10

Closed
codybai opened this issue Apr 13, 2022 · 6 comments
Closed

进行multispeaker训练的一些问题 #10

codybai opened this issue Apr 13, 2022 · 6 comments

Comments

@codybai
Copy link

codybai commented Apr 13, 2022

我在config file 文件中增加gin_channels=256字段,现在speaker nums = 15,目前进行到330个epoch,测试发现合成的语音速度非常快,并且一个句子中,会出现多个speaker的声音。现在我的疑问是:

1、目前大概需要多少epoch才能稳定?
2、gin_channels=256,相对于只有15个speakers是否太大?比如改成32,64等小参数比较好收敛?
3、一个句子中,会出现多个speaker的声音,从现阶段来看,是否正常?
4、每句话的语速非常快,对于已经训练了330个epoch来说,是否有点不正常?

其中,每话的音素结构如下(在tacotron2中,能够训练出韵律正常的model,所以比较明确音素设计应该没问题。):
p iy1 #1 eh1 l #1 ah0 #1 eh1 s #1 t iy1 #1 iy1 #1 aa1 r #1 iy1 #1 aa1 r #1 eh1 s #1 t iao2 #0 l i4 #1 d uei4 #1 ^ u1 #0 r an3 #0 ^ van2 #1 p u3 #0 ch a2 #1 d e5 #1 z u3 #0 zh iii1 #1 sh iii2 #0 sh iii1 #1 z uo4 #1 l e5 #1 n ei3 #0 x ie1 #1 g uei1 #0 d ing4 #3 fin

大佬们有空聊聊吧!

@codybai
Copy link
Author

codybai commented Apr 13, 2022

一些demo vits_out.zip

@OnceJune
Copy link

说话人泄漏的问题没遇到过,语速是不是推理脚本里面的length_scale设置的不对?
训练轮数可以看看这个issue:#9

@codybai
Copy link
Author

codybai commented Apr 14, 2022

filelist的文件我在我的数据集上生成的时候没有进行打乱,可能是这个原因造成的泄露,目前修改了一下,看看效果。

@codybai
Copy link
Author

codybai commented Apr 15, 2022

还是不行。

@OnceJune
Copy link

试下把speaker相关的信息concat上去,不要add呢?

@MaxMax2016
Copy link
Collaborator

vits官方项目本身就有基于VCTK的多说话人TTS,效果非常完美

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants