Capacitron #977

a-froghyar · 2021-11-29T14:01:50Z

This PR implements a new model into 🐸 TTS based on the Capacitron model from Google. It's a partial implementation of the models detailed in the paper, hierarchical latent embeddings are still to be done - this is a TODO for later. If you'd like to get an idea what the model does and how it works, here's a post I did a few months ago.

I have implemented this model as part of my Master's Thesis at TU Berlin. The thesis itself is a detailed report on the implementation and subjective evalutation of this model. You can read my thesis and listen to some samples here. I'm in the process of creating a website with audio samples from my pretrained models and the uploaded thesis as well - this is WIP.

I have implemented this model into an earlier version (March 2020) of 🐸 TTS, so this new "re-implementation" still needs to be tested. I'm in that process right now, however I've wanted to open this PR already to discuss some of the ways the Trainer API needs to be adjusted to accomodate the model.

TODOs:

@erogol @Edresson @WeberJulian I'd appreciate it if you could review the changes and discuss some of the specifics in the code.

TTS/trainer.py

TTS/tts/layers/losses.py

TTS/tts/layers/tacotron/capacitron_layers.py

TTS/tts/models/tacotron2.py

TTS/trainer.py

a-froghyar · 2021-11-30T09:29:41Z

Update: I've just ran the first training and it is slightly off. I suspect there's an error in the loss calculation because of the reorganisations from @Edresson a few months back. I'm investigating that today and will push the new commits.

… capacitron-pr

a-froghyar · 2021-12-02T14:13:57Z

Update: managed to do the first training that gave some promising results. I'm including now a step wise gradual lr scheduler (unlike the Noam Scheduler, this takes in hardcoded step # thresholds and learning rates), which proved necessary in my previous implementation. More updates to follow. :)

a-froghyar · 2021-12-03T14:44:45Z

capacitron_inference.py

@@ -0,0 +1,64 @@
+'''
+This will be deleted later, only for dev, to see how to infer the capacitron model


This file will be deleted before the merge, it's only a script to show how to infer the model for others who are experimenting with this work

- added reference wav and text args for posterior inference - some formatting

a-froghyar · 2022-04-13T13:00:03Z

@erogol from my side this is ready to go. 😊 Big thanks to @WeberJulian for all the help!

WeberJulian · 2022-04-13T13:58:05Z

Just waiting for my ljpeech T2 capacitron run to converge. If it does, I'll merge both this PR and coqui-ai/Trainer#26

a-froghyar · 2022-04-14T20:46:22Z

Trainings are not converging since the reorganisation of the previous 2 weeks. Reporting back here soon

Edit: commits below fixed all issues

Update CI badges

erogol · 2022-05-30T10:16:32Z

FINALLY !!!! 🚀

erogol and others added 7 commits April 29, 2021 09:34

new CI config

1de1284

Merge branch 'dev' of github.com:a-froghyar/Capacitron into dev

19f1652

Merge remote-tracking branch 'upstream/dev' into dev

f2c2bab

Merge remote-tracking branch 'upstream/dev' into dev

d8a102a

initial Capacitron implementation

158c7f3

delete old unused file

dc00fd8

fix empty formatting changes

90e0a8f