New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How is the new PortaSpeech Implementation performing? #67
Comments
The model works, it can be trained and it can be used for inference. So far however the results have not been good. I wonder if it's just because PortaSpeech requires more data and more training steps, or if the hyperparameters need to be improved. I'm using a different encoder than them, but I still think that it should work fine. I had to make some changes to the gradient flow to make it work at all, so I'm not sure where the problem currently lies. |
I just now started training my own Meta model based on PortaSpeech branch but if you think it needs some more work I better work on my 22k fs2 model and wait for PortaSpeech to be stable enough |
There is still a bug where training sometimes stops because values in the distribution turn to NaN, but I trained single speaker models on 5 hours of high quality data and the quality is actualy very high with PortaSpeech and the new 24kHz vocoder. Multispeaker does not work yet however. The model diverges and does not really learn anything in the multilingual multispeaker case. I will have to find a way to make it work. Maybe with some pretraining phases. Also, the Glow based Postnet is also not working perfectly yet. But the basic PortaSpeech works very well I can now confirm. |
Thanks for the details but I am interested in the Multi-speaker Multi-lingual part more since this project makes very easy to work on low resource languages, do you think it will work on Multi speaker and Multi Lingual mode anytime soon? and is the 24K vocoder training completed, if yes when can I expect a release since I am also working on higher sample rate model I will shift to 24k instead of 22.05k. |
is there any improvement in the Multi-speaker Multi-lingual performance of portaspeech? I recently started working on Diffusion based model but it takes too much of training time to get good results. Have you considered Diffusion based models what is your opinion on them? |
Diffusion models are pretty good, but I think they are not well suited for the low-resource stuff that I'm interested in. The VAE in portaspeech was too unstable, I couldn't get it to function properly, so I kind of gave up on portaspeech. I am now trying to apply what I learned in my own attempt for an architecture that combines a lot of ideas in a different way. There is however very slow progress because I have been very sick recently and it doesn't seem to go away. Will take a long time before it is ready at the current pace. |
I am sorry to hear that and hope you get well soon, Looking forward to your new architecture. |
Hi
I noticed that you are working on new PortaSpeech Implementation, Can I know how the model is performing, is the implementation completed? Can I try training with my own data?
Thanks
The text was updated successfully, but these errors were encountered: