进行multispeaker训练的一些问题 #10

codybai · 2022-04-13T09:58:17Z

我在config file 文件中增加gin_channels=256字段，现在speaker nums = 15，目前进行到330个epoch，测试发现合成的语音速度非常快，并且一个句子中，会出现多个speaker的声音。现在我的疑问是：

1、目前大概需要多少epoch才能稳定？
2、gin_channels=256，相对于只有15个speakers是否太大？比如改成32,64等小参数比较好收敛？
3、一个句子中，会出现多个speaker的声音，从现阶段来看，是否正常？
4、每句话的语速非常快，对于已经训练了330个epoch来说，是否有点不正常？

其中，每话的音素结构如下(在tacotron2中，能够训练出韵律正常的model，所以比较明确音素设计应该没问题。)：
p iy1 #1 eh1 l #1 ah0 #1 eh1 s #1 t iy1 #1 iy1 #1 aa1 r #1 iy1 #1 aa1 r #1 eh1 s #1 t iao2 #0 l i4 #1 d uei4 #1 ^ u1 #0 r an3 #0 ^ van2 #1 p u3 #0 ch a2 #1 d e5 #1 z u3 #0 zh iii1 #1 sh iii2 #0 sh iii1 #1 z uo4 #1 l e5 #1 n ei3 #0 x ie1 #1 g uei1 #0 d ing4 #3 fin

大佬们有空聊聊吧！

codybai · 2022-04-13T10:11:54Z

一些demo vits_out.zip

OnceJune · 2022-04-14T01:16:31Z

说话人泄漏的问题没遇到过，语速是不是推理脚本里面的length_scale设置的不对？
训练轮数可以看看这个issue：#9

codybai · 2022-04-14T10:07:22Z

filelist的文件我在我的数据集上生成的时候没有进行打乱，可能是这个原因造成的泄露，目前修改了一下，看看效果。

codybai · 2022-04-15T00:44:39Z

还是不行。

OnceJune · 2022-04-18T05:45:48Z

试下把speaker相关的信息concat上去，不要add呢？

MaxMax2016 · 2022-04-18T10:41:38Z

vits官方项目本身就有基于VCTK的多说话人TTS，效果非常完美

MaxMax2016 closed this as completed Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

进行multispeaker训练的一些问题 #10

进行multispeaker训练的一些问题 #10

codybai commented Apr 13, 2022

codybai commented Apr 13, 2022

OnceJune commented Apr 14, 2022

codybai commented Apr 14, 2022

codybai commented Apr 15, 2022

OnceJune commented Apr 18, 2022

MaxMax2016 commented Apr 18, 2022

进行multispeaker训练的一些问题 #10

进行multispeaker训练的一些问题 #10

Comments

codybai commented Apr 13, 2022

codybai commented Apr 13, 2022

OnceJune commented Apr 14, 2022

codybai commented Apr 14, 2022

codybai commented Apr 15, 2022

OnceJune commented Apr 18, 2022

MaxMax2016 commented Apr 18, 2022