Skip to content
/ TTS Public
forked from coqui-ai/TTS

๐Ÿธ๐Ÿ’ฌ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

License

Notifications You must be signed in to change notification settings

AI-Unicamp/TTS

ย 
ย 

Repository files navigation

๐Ÿง‘โ€๐ŸŽค Expressive Text-to-Speech

This is a repository forked from Coqui-AI (๐ŸธTTS ) used to research about expressive TTS in our AI-Unicamp-CPQD group. The original codes are kept in "main" branch which is not our default visualization.

Here we keep the "unicamp' branch as our main branch, while "main" branch remains as the original and updated. You can see here the original README.md.

๐Ÿ” About the group

We are an expressive TTS research group located at Unicamp and CPQD (Brazil).

๐Ÿ”จ Implementations

Expressive Models

  • Tacotron 2
  • Fastpitch

Expressive Datasets

  • EMOVDB
  • IEMOCAP
  • ESD

Style Encoders

  • Look-Up
  • Reference Encoder (Coarse/Fine-Grained)
  • GST
  • VAE
  • VQ-VAE
  • VAE+Flow
  • Diffusion

Disentanglement Blocks

  • Style Classifier
  • Speaker Classifier + GRL (Gradient Reversal Layer)

Style Reference Features

  • Pitch
  • Energy
  • Mel-Spectrogram

Agregation Types

  • Sum, Concat or AdaIN

Enhancing Losses

  • Orthogonal Loss
  • CLIP Loss
  • Cycle consistency Loss(*)

About

๐Ÿธ๐Ÿ’ฌ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 67.6%
  • Python 32.1%
  • Shell 0.2%
  • HTML 0.1%
  • Makefile 0.0%
  • Cython 0.0%