-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Steps to replicate pretrained models on LibriTTS #57
Comments
ciao dario, Our paper has detailed about how we trained the LibriTTS model. You will not be able to exactly match our training because the LSH model was trained on LJSpeech and two proprietary datasets. Nonetheless, you should be able to reproduce our results by following the steps on the paper substituting the LSH dataset with the LJS dataset. Post issues on this repo if you have them.
|
Ciao Rafael, I decided to train on LibriTTS with a warm start from your pretrained LibriTTS model. 1 FlowAs suggested I started with 1 flow.
ResultsAfter running the inference at different steps, I found that the ones that "sounded" the best were the ones at approximately step 580,000 (that's also where the validation loss is at its minimum) 2 FlowsI am training now with 2 flows, I started from the checkpoint at step 580,000 set the appropriate include layers to
ResultsWhen I run the inference on the early steps of this 2 flow training (step 10,000) the output is still "ok"
At step 240,000 even though the losses are lower, the inference results are bad
My questions:
Thanks a lot again @rafaelvalle |
Try inference again once your Attention Weights 1 look better. |
That makes sense, thanks! |
Ok, I have been running the training with 2 flows now for a while. This is what I see on TensorBoard
I would say that everything looks great. When I run the inference everything looks (and sounds) bad
@rafaelvalle What would you recommend? Thanks |
Confirm that during inference the hyperparams in config.json match what is used during training. |
If you're not, make sure to add punctuation to phrases. |
I did add punctuation. |
Did you try a lower value of sigma? |
I was already running it with sigma=0.5 |
Try something even more conservative, 0.25. |
What happens if you set |
Yes, the model is trained with speaker embeddings. Here are some examples: I set sigma as low as 0.25 as you suggested.
And in the text = trainset.get_text(text).cuda()
n_frames = len(text)*6 Still bad results |
Try these modifications to the phrases:
|
That's very surprising. Give us some time to look into it. |
Thanks a lot! I really appreciate your help. |
One thing: there are differences in the output when running the inference on different checkpoints. |
Are the speaker ids you're sharing the LibriTTS ids? The model should have about 123 speakers. |
Yes, from the LibriTTS ids: list |
I synthesized the 3 phrases with our LibriTTS-100 model trained with speaker embedding using Your attention weights during training look really good and your validation loss is similar to what we reached. |
Those phrases sound like what I'd like to hear. I uploaded the checkpoint I used here There is one small difference in the dataset: This is the config file This is the training files list |
@rafaelvalle did you manage to run the inference using the weights I shared? |
Yes, I get similar results to your results by using your model. |
First of all, thank you for the amazing paper and for releasing the code.
I have read the instructions and all the issues, but I can't find a single place with the steps that would allow me to faithfully replicate the training of the models you shared. - The Flowtron LibriTTS -
Would it be possible to provide a detailed step by step guide to do that?
Something that would include exactly:
I am big fan of easy reproducibility :)
Thanks again.
The text was updated successfully, but these errors were encountered: