-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Capacitron #510
Capacitron #510
Conversation
Merging upstream dev bug fixes
…ith varying input lengths
…s the post encoder MLP plus bugfixes
… into dev-capacitron-dev
…ial implementation
Thanks, @a-froghyar an excellent PR congratulations again :). I will help and fix the compatibility issues. I also intend to leave the VAE generic and implement support in Tacotron 2. I intend to add it as was done for GST. |
Also be aware of the upcoming Trainer API. Maybe it is better for you to check it as you design your PR ✨ . |
Update: @Edresson was kind enough to open a PR on the branch - a-froghyar#6 -, this is being reviewed this week and will be pushed hopefully soon. The list on the top will be updated by the end of this week and the new Trainer API will also be taken into account. |
a18d49c
to
8cb16da
Compare
Little update: I'm finishing up my thesis in the next 2 months so activity here won't be too much in the meantime, however there is active development on the model/PR :) |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels. |
I'm submitting and presenting my thesis in the next weeks, the PR is still active. |
@a-froghyar good luck! |
renamde text_encoder_output_dim in capacitron_layers for clarity
I'm done with the masters and I'll be working on this PR in the next week. @erogol @Edresson @WeberJulian could I post Qs here later? There seemed to have been loads of changes in the past 3 months so it'll take some time to reorganise things. :) |
OK, there's way too many conflicts - I'm actually going to restart the PR from a fresh dev branch. I'm gonna post the new PR here for follow-up! |
Disclaimer
@Edresson and @WeberJulian have asked me to open a PR with my newly implemented model, Capacitron. There are still some things to clean up and some fixes to do, however I'm opening this PR already for the discussion around it. Contributions for the list below are welcome, however I'll be slow in the next 10 days with checking them. I have just merged the new
dev
branch into this branch with the new Coqpit refactor among others and I haven't had the chance to try to run the training yet. I've implemented this model as my master thesis and I'm currently still training different models for the thesis, so I won't have too much capacity (hehe) to experiment in the next month.This PR implements a new model into 🐸 TTS based on the Capacitron model from Google. It's a partial implementation of the models detailed in the paper, hierarchical latent embeddings are still to be done - this is a TODO for later. If you'd like to get an idea what the model does and how it works, here's a post I did a few weeks ago. Audio samples to follow shortly.
Phonemes
I haven't followed the entire discussion about the phoneme issue, however I'm aware that because of license issues, espeak needed to be dropped. @Edresson mentioned that I could try to train the model with graphemes as well, however I haven't had time to do this yet. According to the paper, however, Capacitron does need phoneme inputs to work.
TODOs (List is still to be updated):
capacitron_blizzad.json
tacotron_abstract.py
)train_tacotron.py
andtraining.py
capacitron_layers.py
capacitron_eval.py
into a notebookcapacitron_beta
a-froghyar/Capacitron#8To Discuss