How is the new PortaSpeech Implementation performing? #67

bharaniyv · 2022-12-03T06:00:17Z

Hi
I noticed that you are working on new PortaSpeech Implementation, Can I know how the model is performing, is the implementation completed? Can I try training with my own data?

Thanks

Flux9665 · 2022-12-03T13:09:16Z

The model works, it can be trained and it can be used for inference. So far however the results have not been good. I wonder if it's just because PortaSpeech requires more data and more training steps, or if the hyperparameters need to be improved. I'm using a different encoder than them, but I still think that it should work fine. I had to make some changes to the gradient flow to make it work at all, so I'm not sure where the problem currently lies.

bharaniyv · 2022-12-03T18:52:29Z

I just now started training my own Meta model based on PortaSpeech branch but if you think it needs some more work I better work on my 22k fs2 model and wait for PortaSpeech to be stable enough

Flux9665 · 2022-12-06T00:58:46Z

There is still a bug where training sometimes stops because values in the distribution turn to NaN, but I trained single speaker models on 5 hours of high quality data and the quality is actualy very high with PortaSpeech and the new 24kHz vocoder.

Multispeaker does not work yet however. The model diverges and does not really learn anything in the multilingual multispeaker case. I will have to find a way to make it work. Maybe with some pretraining phases. Also, the Glow based Postnet is also not working perfectly yet. But the basic PortaSpeech works very well I can now confirm.

bharaniyv · 2022-12-06T05:34:52Z

Thanks for the details but I am interested in the Multi-speaker Multi-lingual part more since this project makes very easy to work on low resource languages, do you think it will work on Multi speaker and Multi Lingual mode anytime soon? and is the 24K vocoder training completed, if yes when can I expect a release since I am also working on higher sample rate model I will shift to 24k instead of 22.05k.

bharaniyv · 2022-12-22T13:27:20Z

is there any improvement in the Multi-speaker Multi-lingual performance of portaspeech? I recently started working on Diffusion based model but it takes too much of training time to get good results. Have you considered Diffusion based models what is your opinion on them?

Flux9665 · 2023-01-02T17:44:04Z

Diffusion models are pretty good, but I think they are not well suited for the low-resource stuff that I'm interested in.

The VAE in portaspeech was too unstable, I couldn't get it to function properly, so I kind of gave up on portaspeech. I am now trying to apply what I learned in my own attempt for an architecture that combines a lot of ideas in a different way. There is however very slow progress because I have been very sick recently and it doesn't seem to go away. Will take a long time before it is ready at the current pace.

bharaniyv · 2023-01-05T06:52:06Z

I am sorry to hear that and hope you get well soon, Looking forward to your new architecture.

bharaniyv closed this as completed Jan 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is the new PortaSpeech Implementation performing? #67

How is the new PortaSpeech Implementation performing? #67

bharaniyv commented Dec 3, 2022

Flux9665 commented Dec 3, 2022

bharaniyv commented Dec 3, 2022

Flux9665 commented Dec 6, 2022

bharaniyv commented Dec 6, 2022

bharaniyv commented Dec 22, 2022

Flux9665 commented Jan 2, 2023

bharaniyv commented Jan 5, 2023

How is the new PortaSpeech Implementation performing? #67

How is the new PortaSpeech Implementation performing? #67

Comments

bharaniyv commented Dec 3, 2022

Flux9665 commented Dec 3, 2022

bharaniyv commented Dec 3, 2022

Flux9665 commented Dec 6, 2022

bharaniyv commented Dec 6, 2022

bharaniyv commented Dec 22, 2022

Flux9665 commented Jan 2, 2023

bharaniyv commented Jan 5, 2023