Capacitron #510

a-froghyar · 2021-05-27T08:57:21Z

Disclaimer

@Edresson and @WeberJulian have asked me to open a PR with my newly implemented model, Capacitron. There are still some things to clean up and some fixes to do, however I'm opening this PR already for the discussion around it. Contributions for the list below are welcome, however I'll be slow in the next 10 days with checking them. I have just merged the new dev branch into this branch with the new Coqpit refactor among others and I haven't had the chance to try to run the training yet. I've implemented this model as my master thesis and I'm currently still training different models for the thesis, so I won't have too much capacity (hehe) to experiment in the next month.

This PR implements a new model into 🐸 TTS based on the Capacitron model from Google. It's a partial implementation of the models detailed in the paper, hierarchical latent embeddings are still to be done - this is a TODO for later. If you'd like to get an idea what the model does and how it works, here's a post I did a few weeks ago. Audio samples to follow shortly.

Phonemes

I haven't followed the entire discussion about the phoneme issue, however I'm aware that because of license issues, espeak needed to be dropped. @Edresson mentioned that I could try to train the model with graphemes as well, however I haven't had time to do this yet. According to the paper, however, Capacitron does need phoneme inputs to work.

TODOs (List is still to be updated):

To Discuss

Some of the losses that are normally activated in Tacotron are deactivated in Capacitron - during my experiments I stripped down the loss calculation to the bare minimum (which is already complicated enough) and I haven't experimented with turning them back on again since then
DDC + Capacitron

Merging upstream dev bug fixes

…ith varying input lengths

…s the post encoder MLP plus bugfixes

… into dev-capacitron-dev

This reverts commit 7030dff, reversing changes made to 64cff14.

…ial implementation

Edresson · 2021-05-27T10:56:26Z

Disclaimer

@Edresson and @WeberJulian have asked me to open a PR with my newly implemented model, Capacitron. There are still some things to clean up and some fixes to do, however I'm opening this PR already for the discussion around it. Contributions for the list below are welcome, however I'll be slow in the next 10 days with checking them. I have just merged the new dev branch into this branch with the new Coqpit refactor among others and I haven't had the chance to try to run the training yet. I've implemented this model as my master thesis and I'm currently still training different models for the thesis, so I won't have too much capacity (hehe) to experiment in the next month.

This PR implements a new model into TTS based on the Capacitron model from Google. It's a partial implementation of the models detailed in the paper, hierarchical latent embeddings are still to be done - this is a TODO for later. If you'd like to get an idea what the model does and how it works, here's a post I did a few weeks ago. Audio samples to follow shortly.

Phonemes

I haven't followed the entire discussion about the phoneme issue, however I'm aware that because of license issues, espeak needed to be dropped. @Edresson mentioned that I could try to train the model with graphemes as well, however I haven't had time to do this yet. According to the paper, however, Capacitron does need phoneme inputs to work.

TODOs (List is still to be updated):

Implement Coqpit config instead of capacitron_blizzad.json

Fix changes coming from formatting

Fix capacitron multispeaker training -- (HInt: tacotron_abstract.py)

Refactor the parameter splitting in train_tacotron.py and training.py

Create test for capacitron_layers.py

Fix tests

Refactor capacitron_eval.py into a notebook

Fix a-froghyar#4

To Discuss

Some of the losses that are normally activated in Tacotron are deactivated in Capacitron - during my experiments I stripped down the loss calculation to the bare minimum (which is already complicated enough) and I haven't experimented with turning them back on again since then

DDC + Capacitron

Thanks, @a-froghyar an excellent PR congratulations again :). I will help and fix the compatibility issues. I also intend to leave the VAE generic and implement support in Tacotron 2. I intend to add it as was done for GST.

erogol · 2021-05-28T11:04:16Z

Also be aware of the upcoming Trainer API. Maybe it is better for you to check it as you design your PR ✨ .

a-froghyar · 2021-06-10T09:38:49Z

Update:

@Edresson was kind enough to open a PR on the branch - a-froghyar#6 -, this is being reviewed this week and will be pushed hopefully soon. The list on the top will be updated by the end of this week and the new Trainer API will also be taken into account.

CLAassistant · 2021-08-03T08:42:28Z

All committers have signed the CLA.

a-froghyar · 2021-08-04T09:17:36Z

Little update: I'm finishing up my thesis in the next 2 months so activity here won't be too much in the meantime, however there is active development on the model/PR :)

stale · 2021-09-15T09:30:35Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

a-froghyar · 2021-09-15T14:33:40Z

I'm submitting and presenting my thesis in the next weeks, the PR is still active.

erogol · 2021-09-16T07:48:45Z

@a-froghyar good luck!

renamde text_encoder_output_dim in capacitron_layers for clarity

a-froghyar · 2021-09-22T13:01:57Z

I'm done with the masters and I'll be working on this PR in the next week. @erogol @Edresson @WeberJulian could I post Qs here later? There seemed to have been loads of changes in the past 3 months so it'll take some time to reorganise things. :)

a-froghyar · 2021-09-22T13:06:02Z

OK, there's way too many conflicts - I'm actually going to restart the PR from a fresh dev branch. I'm gonna post the new PR here for follow-up!

a-froghyar and others added 29 commits March 13, 2021 11:24

started development on capacitron

b4248b0

Added Capacitron into the infrastructure

9e6f2ba

Merge branch 'dev' into dev-capacitron

ce8a3b7

Merging upstream dev bug fixes

Implemented step-wise lr decaying method

e863141

cleanup after previous commit

a613f38

added the conditional text input feature

65a3981

added ref encoder LSTM and implemented the routine for dynamic lstm w…

8789c65

…ith varying input lengths

implemented the logic for sampling from prior and posterior as well a…

26e9ee0

…s the post encoder MLP plus bugfixes

full but buggy implementation, dev branch for backup

33be83d

finished the implementation, still buggy at some points

e5f4ca9

Merge branch 'dev-capacitron-dev' of github.com:a-froghyar/Capacitron…

bbf0158

… into dev-capacitron-dev

small changes for stability

417d2f1

resolve conflicts from merging dev into dev-capacitron-dev

2458f15

second cleanup

2840cb5

new CI config

1de1284

working base modell

0ceb489

delete unused stuff

bc8e12a

add mixed precision to capacitron

fca955c

add blizzard wavegrad config

6bced9e

small bug fix for inference

4aaf406

mac eval and melgan config

64cff14

Merge branch 'dev' of github.com:a-froghyar/Capacitron into dev

19f1652

merge dec into dev-capacitron

7030dff

Revert "merge dec into dev-capacitron"

9f58071

This reverts commit 7030dff, reversing changes made to 64cff14.

change optimizer initialization for compatibility with Hifi-GAN offic…

dacd6c5

…ial implementation

fix devices and add hifigan

53c9310

merge up to date dev into capacitron-toMerge

9876910

deleted unnecessary files

dbe1453

gitignore delete line

45b675b

Edresson self-requested a review May 27, 2021 11:09

update to new style and gst bugfix

f36357d

Edresson added 4 commits May 28, 2021 13:28

capacitron vae Tacotron2 support

d77f217

bug fix and remove old code

a065cf9

Normalize the Capacitron VAE Loss

f23ee0b

add alpha weight to CapacitronVAE Loss

fd7d192

Edresson and others added 4 commits June 10, 2021 09:25

bug fix

63af948

bug fix

7f2aebb

small text fix

fa9f15a

renamde text_encoder_output_dim in capacitron_layers for clarity

cdf2bca

erogol force-pushed the dev branch 4 times, most recently from a18d49c to 8cb16da Compare June 30, 2021 14:23

a-froghyar mentioned this pull request Aug 30, 2021

🐸 TTS roadmap #378

Closed

58 tasks

stale bot added the wontfix This will not be worked on but feel free to help. label Sep 15, 2021

stale bot removed the wontfix This will not be worked on but feel free to help. label Sep 15, 2021

a-froghyar added 3 commits September 22, 2021 11:40

resolve conflicts

e921ecb

Merge pull request #7 from a-froghyar/dev-capacitron-fixes

8350332

renamde text_encoder_output_dim in capacitron_layers for clarity

change some comments

c7b56be

a-froghyar closed this Sep 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capacitron #510

Capacitron #510

a-froghyar commented May 27, 2021 •

edited

Loading

Edresson commented May 27, 2021

erogol commented May 28, 2021

a-froghyar commented Jun 10, 2021

CLAassistant commented Aug 3, 2021 •

edited

Loading

a-froghyar commented Aug 4, 2021

stale bot commented Sep 15, 2021

a-froghyar commented Sep 15, 2021

erogol commented Sep 16, 2021

a-froghyar commented Sep 22, 2021

a-froghyar commented Sep 22, 2021

Capacitron #510

Capacitron #510

Conversation

a-froghyar commented May 27, 2021 • edited Loading

Edresson commented May 27, 2021

erogol commented May 28, 2021

a-froghyar commented Jun 10, 2021

CLAassistant commented Aug 3, 2021 • edited Loading

a-froghyar commented Aug 4, 2021

stale bot commented Sep 15, 2021

a-froghyar commented Sep 15, 2021

erogol commented Sep 16, 2021

a-froghyar commented Sep 22, 2021

a-froghyar commented Sep 22, 2021

a-froghyar commented May 27, 2021 •

edited

Loading

CLAassistant commented Aug 3, 2021 •

edited

Loading