Skip to content

Support all Types of Languages

Compare
Choose a tag to compare
@Flux9665 Flux9665 released this 20 May 10:04
· 27 commits to Multi_Language_Multi_Speaker since this release
1ae0202

This release extends the toolkits functionality and provides new checkpoints.

New Features:

  • support for all phonemes in the IPA standard through an extended lookup of articulatory features
  • support for some suprasegmental markers in the IPA standard through parsing (tone, lengthening, primary stress)
  • praat-parselmouth for greatly improved pitch extraction
  • faster phonemizaton
  • word boundaries are added, which are invisible to the aligner and the decoder, but can help the encoder in multilingual scenarios
  • tonal languages added, tested and included into the pretraining (Chinese, Vietnamese)
  • Scorer class to inspect data given a trained model and dataset cache (provided pretrained models can be used for this)
  • intuitive controls for scaling durations and variance in pitch and energy
  • divese bugfixes and speed increases

Note:

  • This release breaks backwards compatibility. Make sure you are using the associated pretrained models. Old checkpoints and dataset caches become incompatible. Only HiFiGAN remains compatible.
  • Work on upcoming releases is already in progress. Improved voice adaptation will be our next goal.
  • To use the pretrained checkpoints, download them, create their corresponding directories and place them into your clone as follows (you have to rename the HiFiGAN and FastSpeech2 checkpoints once in place):
...
Models
└─ Aligner
      └─ aligner.pt
└─ FastSpeech2_Meta
      └─ best.pt
└─ HiFiGAN_combined
      └─ best.pt
...