Issue Training VITS on a custom Dataset #1704

rob1392 · 2022-06-29T04:01:14Z

rob1392
Jun 29, 2022

I have created a custom dataset from a variety of publicly accessible audio recordings and am having trouble training VITS to generate intelligible speech. The dataset has multiple speaker totaling ~200 hours with the distribution of speaker times' looking like this:

I also tried just the top 6 speakers representing ~85 hours.

I have used all of the same parameters from here: https://github.com/coqui-ai/TTS/blob/dev/recipes/vctk/vits/train_vits.py with the main difference being I am training on 8 V100 GPUs.

In order to create the dataset I followed the advice of the HiFiTTS paper: https://arxiv.org/abs/2104.01497 using CTC to align speech and text and filtering out everything that had a SNR of at least 20 dB.

The audios when I spot checked them all sound good and the distribution of audio lengths also seems good looking like this:

I have trained these models for a relatively short time but it seems the asymptotes are not reaching the levels of other successfully trained models like VITS on VCTK (which I provide as reference) so my suspicion is that more training time is not the answer and the problem lies elsewhere.

I have attached my tensorboard charts here in the hopes that someone can help me out.

The blue lines are the baseline VITS trained on VCTK and the red lines are the same VITS params trained on my custom dataset.
To my eyes it seems like the discriminator is "winning" a lot more in the custom instance and so maybe there is some tweaking that could be done there?

Please let me know if I am missing any information and thanks for the help!

neurlang · 2022-07-05T13:07:24Z

neurlang
Jul 5, 2022

what is your audio sample rate?

0 replies

xanguera · 2023-04-21T13:58:29Z

xanguera
Apr 21, 2023

Dear @rob1392 did you manage to find what the problem was?
Thank you

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue Training VITS on a custom Dataset #1704

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Issue Training VITS on a custom Dataset #1704

Uh oh!

Uh oh!

rob1392 Jun 29, 2022

Replies: 2 comments

Uh oh!

neurlang Jul 5, 2022

Uh oh!

xanguera Apr 21, 2023

rob1392
Jun 29, 2022

neurlang
Jul 5, 2022

xanguera
Apr 21, 2023