Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capacitron #510

Closed
wants to merge 41 commits into from
Closed

Capacitron #510

wants to merge 41 commits into from

Conversation

a-froghyar
Copy link
Contributor

@a-froghyar a-froghyar commented May 27, 2021


Disclaimer

@Edresson and @WeberJulian have asked me to open a PR with my newly implemented model, Capacitron. There are still some things to clean up and some fixes to do, however I'm opening this PR already for the discussion around it. Contributions for the list below are welcome, however I'll be slow in the next 10 days with checking them. I have just merged the new dev branch into this branch with the new Coqpit refactor among others and I haven't had the chance to try to run the training yet. I've implemented this model as my master thesis and I'm currently still training different models for the thesis, so I won't have too much capacity (hehe) to experiment in the next month.


This PR implements a new model into 🐸 TTS based on the Capacitron model from Google. It's a partial implementation of the models detailed in the paper, hierarchical latent embeddings are still to be done - this is a TODO for later. If you'd like to get an idea what the model does and how it works, here's a post I did a few weeks ago. Audio samples to follow shortly.

Phonemes

I haven't followed the entire discussion about the phoneme issue, however I'm aware that because of license issues, espeak needed to be dropped. @Edresson mentioned that I could try to train the model with graphemes as well, however I haven't had time to do this yet. According to the paper, however, Capacitron does need phoneme inputs to work.

TODOs (List is still to be updated):

To Discuss

  • Some of the losses that are normally activated in Tacotron are deactivated in Capacitron - during my experiments I stripped down the loss calculation to the bare minimum (which is already complicated enough) and I haven't experimented with turning them back on again since then
  • DDC + Capacitron

a-froghyar and others added 29 commits March 13, 2021 11:24
Merging upstream dev bug fixes
This reverts commit 7030dff, reversing
changes made to 64cff14.
@Edresson
Copy link
Contributor

Disclaimer

@Edresson and @WeberJulian have asked me to open a PR with my newly implemented model, Capacitron. There are still some things to clean up and some fixes to do, however I'm opening this PR already for the discussion around it. Contributions for the list below are welcome, however I'll be slow in the next 10 days with checking them. I have just merged the new dev branch into this branch with the new Coqpit refactor among others and I haven't had the chance to try to run the training yet. I've implemented this model as my master thesis and I'm currently still training different models for the thesis, so I won't have too much capacity (hehe) to experiment in the next month.

This PR implements a new model into TTS based on the Capacitron model from Google. It's a partial implementation of the models detailed in the paper, hierarchical latent embeddings are still to be done - this is a TODO for later. If you'd like to get an idea what the model does and how it works, here's a post I did a few weeks ago. Audio samples to follow shortly.

Phonemes

I haven't followed the entire discussion about the phoneme issue, however I'm aware that because of license issues, espeak needed to be dropped. @Edresson mentioned that I could try to train the model with graphemes as well, however I haven't had time to do this yet. According to the paper, however, Capacitron does need phoneme inputs to work.

TODOs (List is still to be updated):

  • Implement Coqpit config instead of capacitron_blizzad.json
  • Fix changes coming from formatting
  • Fix capacitron multispeaker training -- (HInt: tacotron_abstract.py)
  • Refactor the parameter splitting in train_tacotron.py and training.py
  • Create test for capacitron_layers.py
  • Fix tests
  • Refactor capacitron_eval.py into a notebook
  • Fix a-froghyar#4

To Discuss

  • Some of the losses that are normally activated in Tacotron are deactivated in Capacitron - during my experiments I stripped down the loss calculation to the bare minimum (which is already complicated enough) and I haven't experimented with turning them back on again since then
  • DDC + Capacitron

Thanks, @a-froghyar an excellent PR congratulations again :). I will help and fix the compatibility issues. I also intend to leave the VAE generic and implement support in Tacotron 2. I intend to add it as was done for GST.

@Edresson Edresson self-requested a review May 27, 2021 11:09
@erogol
Copy link
Member

erogol commented May 28, 2021

Also be aware of the upcoming Trainer API. Maybe it is better for you to check it as you design your PR ✨ .

@a-froghyar
Copy link
Contributor Author

Update:

@Edresson was kind enough to open a PR on the branch - a-froghyar#6 -, this is being reviewed this week and will be pushed hopefully soon. The list on the top will be updated by the end of this week and the new Trainer API will also be taken into account.

@erogol erogol force-pushed the dev branch 4 times, most recently from a18d49c to 8cb16da Compare June 30, 2021 14:23
@CLAassistant
Copy link

CLAassistant commented Aug 3, 2021

CLA assistant check
All committers have signed the CLA.

@a-froghyar
Copy link
Contributor Author

Little update: I'm finishing up my thesis in the next 2 months so activity here won't be too much in the meantime, however there is active development on the model/PR :)

@a-froghyar a-froghyar mentioned this pull request Aug 30, 2021
58 tasks
@stale
Copy link

stale bot commented Sep 15, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

@stale stale bot added the wontfix This will not be worked on but feel free to help. label Sep 15, 2021
@a-froghyar
Copy link
Contributor Author

I'm submitting and presenting my thesis in the next weeks, the PR is still active.

@stale stale bot removed the wontfix This will not be worked on but feel free to help. label Sep 15, 2021
@erogol
Copy link
Member

erogol commented Sep 16, 2021

@a-froghyar good luck!

@a-froghyar
Copy link
Contributor Author

I'm done with the masters and I'll be working on this PR in the next week. @erogol @Edresson @WeberJulian could I post Qs here later? There seemed to have been loads of changes in the past 3 months so it'll take some time to reorganise things. :)

@a-froghyar
Copy link
Contributor Author

OK, there's way too many conflicts - I'm actually going to restart the PR from a fresh dev branch. I'm gonna post the new PR here for follow-up!

@a-froghyar a-froghyar closed this Sep 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants