Implementation of FastSpeech Text-to-Speech model trained on LJSpeech.
chmod +x setup.sh && ./setup.shpython3 train.py -c <config_file> [-r <resume_checkpoint>] [--lr <learning_rate>] [--bs <batch_size>]If something with downloading model goes wrong, you can manually download model weights from here
and put them under resources/fastspeech.pth
Input for inference is a text file with source sentences located in separate lines. If not provided default samples will be used.
python3 inference.py -c <config_file> -r <checkpoint> [-s <source_file>] [-t <target_directory>]For example (will generate default samples):
python3 inference.py -c configs/main.json -r resources/fastspeech.pthDefault samples:
A defibrillator is a device that gives a high energy electric shock to the heart of someone who is in cardiac arrestMassachusetts Institute of Technology may be best known for its math, science and engineering educationWasserstein distance or Kantorovich Rubinstein metric is a distance function defined between probability distributions on a given metric space
configs/contains configs which were used to train modeldata/contains data (LJSpeech downloads there by default) and trainval split (indices in dataset)src/contains source codestrain.pyis a training script (it downloads all needed data if it is not present)inference.pyis an inference script which takes text file and outputs audio files in a directory