Skip to content
/ ctts Public

Controllable Text-to-speech system, based on FastSpeech2

License

Notifications You must be signed in to change notification settings

aucki6144/ctts

Repository files navigation

CTTS: Controllable Text-To-Speech

Links

Controllable TTS - Grader (Github)

Controllable TTS - Grader (Gitee)

Controllable TTS (Gitee)

Quickstart

GUI

Start gradio gui with

python .\gui.py

Dependencies

You can install the Python dependencies with

pip3 install -r requirements.txt

Attention: the matplotlib version requirements of gradio conflicts with FastSpeech2. Install gradio first then rollback matplotlib to 3.2.2.

Inference

Arrange config files as the following structure:

  .
  ├── config
  │   ├── DATASET_NAME
  └── └── └── model.yaml
          └── preprocess.yaml
          └── train.yaml

The model use ESD_en dataset by default. For English single-speaker TTS, run

python .\synthesize.py -t "YOUR_CONTENT"

There are optional parameters, you can check out the details by using "help" or read the code in "synthesis.py"

python .\synthesize.py --help

Here lists some common used parameters:

-m or --model: name of model used.

-s or --speaker_id: specify the emotion id in multi emotion datasets.

-e or --emotion_id: specify the speaker id in multi speaker datasets.

-r or --restore_step: load the model of a particular checkpoint.

The generated utterances will be put in output/result/.

Training

Preprocess

Preprocess dataset by the following command:

python .\preprocess.py -m ESD_en

The TextGrid file generated by MFA should be put in ./preprocessed_data/DATASET_NAME/

Training

Train model by the following command:

python .\train.py -m ESD_en

configure pretrain path by parameter -pp or -pretrain_path

About

Controllable Text-to-speech system, based on FastSpeech2

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages