Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is the new PortaSpeech Implementation performing? #67

Closed
bharaniyv opened this issue Dec 3, 2022 · 7 comments
Closed

How is the new PortaSpeech Implementation performing? #67

bharaniyv opened this issue Dec 3, 2022 · 7 comments

Comments

@bharaniyv
Copy link

Hi
I noticed that you are working on new PortaSpeech Implementation, Can I know how the model is performing, is the implementation completed? Can I try training with my own data?

Thanks

@Flux9665
Copy link
Collaborator

Flux9665 commented Dec 3, 2022

The model works, it can be trained and it can be used for inference. So far however the results have not been good. I wonder if it's just because PortaSpeech requires more data and more training steps, or if the hyperparameters need to be improved. I'm using a different encoder than them, but I still think that it should work fine. I had to make some changes to the gradient flow to make it work at all, so I'm not sure where the problem currently lies.

@bharaniyv
Copy link
Author

I just now started training my own Meta model based on PortaSpeech branch but if you think it needs some more work I better work on my 22k fs2 model and wait for PortaSpeech to be stable enough

@Flux9665
Copy link
Collaborator

Flux9665 commented Dec 6, 2022

There is still a bug where training sometimes stops because values in the distribution turn to NaN, but I trained single speaker models on 5 hours of high quality data and the quality is actualy very high with PortaSpeech and the new 24kHz vocoder.

Multispeaker does not work yet however. The model diverges and does not really learn anything in the multilingual multispeaker case. I will have to find a way to make it work. Maybe with some pretraining phases. Also, the Glow based Postnet is also not working perfectly yet. But the basic PortaSpeech works very well I can now confirm.

@bharaniyv
Copy link
Author

Thanks for the details but I am interested in the Multi-speaker Multi-lingual part more since this project makes very easy to work on low resource languages, do you think it will work on Multi speaker and Multi Lingual mode anytime soon? and is the 24K vocoder training completed, if yes when can I expect a release since I am also working on higher sample rate model I will shift to 24k instead of 22.05k.

@bharaniyv
Copy link
Author

is there any improvement in the Multi-speaker Multi-lingual performance of portaspeech? I recently started working on Diffusion based model but it takes too much of training time to get good results. Have you considered Diffusion based models what is your opinion on them?

@Flux9665
Copy link
Collaborator

Flux9665 commented Jan 2, 2023

Diffusion models are pretty good, but I think they are not well suited for the low-resource stuff that I'm interested in.

The VAE in portaspeech was too unstable, I couldn't get it to function properly, so I kind of gave up on portaspeech. I am now trying to apply what I learned in my own attempt for an architecture that combines a lot of ideas in a different way. There is however very slow progress because I have been very sick recently and it doesn't seem to go away. Will take a long time before it is ready at the current pace.

@bharaniyv
Copy link
Author

I am sorry to hear that and hope you get well soon, Looking forward to your new architecture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants