Skip to content

A tensorflow implementation of GAN-TTS architecture

Notifications You must be signed in to change notification settings

alexandruRopotica/GAN-TTS

Repository files navigation

End-to-end GAN-TTS architecture

A tensorflow implementation of GAN-TTS paper.

Proposed architecture

Notes

  • Text embeddings are generated by a tensorflow pre-trained BERT model.
  • Linguistic features are not predicted by external models, but they are predicted by a feature net that works together with the generator and the discriminator. The feature net is a simple CBHG module, which takes a text embedding in input and outputs a tensor of linguistic features.
  • You can explore the data flow and data dimensionality using the notebook Open In Colab. The discriminator used in the notebook is different because colab GPU couldn't handle the original discriminator
  • I trained the model on a really small dataset, 17 audio-texts from LJSpeech, because i didn't have a proper machine to use.
  • To evaluate this GAN i used the Frechét Distance, where all embeddings were calcuated with VGGish TensorFlow pre-trained model.

Papers

  • GAN-TTS - Used to implement generator and discriminator
  • Tacotron - Used to implement CBHG module

External code

About

A tensorflow implementation of GAN-TTS architecture

Resources

Stars

Watchers

Forks