Skip to content

Liangzheng-ZL/BEdit-TTS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BEDIT-TTS: TEXT-BASED SPEECH EDITING SYSTEM WITH BIDIRECTIONAL TRANSFORMERS

In our paper, we proposed BEdit-TTS: Text-Based Speech Editing System with Bidirectional Transformers. We provide our code as open source in this repository. Samples are also available at https://liangzheng-zl.github.io/bedit-web

The model code is at espnet/nets/pytorch_backend/e2e_tts_bedit.py

Set up

The system is built on ESPnet. Before running the model, please install ESPnet. This model requires Python 3.7+ and Pytorch 1.10+. Other packages are listed in requirements.yaml.

Data

To obtain duration information, you can use the kaldi tool to train the GMM-HMM model to achieve forced alignment.

To prepare the data of BEdit-TTS:

bash run.sh --stage 0 --stop_stage 0
bash pre_bedit_data.sh --stage 1 --stop_stage 2 # for training data
# bash pre_bedit_data.sh --stage 1 --stop_stage 4 # for decoding data

To apply CMVN:

bash run.sh --stage 1 --stop_stage 1

To prepar dictionary and json data:

bash run.sh --stage 2 --stop_stage 2

To update json data:

bash run.sh --stage 3 --stop_stage 3

Run

To train the model:

bash run.sh --stage 4 --stop_stage 4

To generate spectrum:

bash run.sh --stage 5 --stop_stage 5

The waveform can be synthesized by a pre-trained HiFiGAN.

About

Speech samples and code of BEdit-TTS

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published