Skip to content

Training a TTS model from scratch and Finetuning on private data #1067

Discussion options

You must be logged in to vote

It takes around 15 minutes to train one single epoch for Glow TTS on a single K80 GPU. Which means it could take around 10 days to train a TTS model from scratch on LJ speech data with a single GPU for 1000 epochs. Is this normal?

Yes

If I understood it correctly from the Humble FAQ: Would Tacotron 2 or Tacotron train much faster in comparison to GlowTTS?

Mostly

Is finetuning the only fastest way the to train on a small private dataset of another speaker (~1 hour data)?

Yes

Given that my small private dataset is a male speaker, does it make sense to finetune on LJspeech dataset which is female?

Male model would work better

Which one would produce better results: Finetuning on a p…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@VigneshBaskar
Comment options

Answer selected by VigneshBaskar
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants