Skip to content

ShamerD/fast-speech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fast-speech

Implementation of FastSpeech Text-to-Speech model trained on LJSpeech.

Installation

chmod +x setup.sh && ./setup.sh

Usage

Train:

python3 train.py -c <config_file> [-r <resume_checkpoint>] [--lr <learning_rate>] [--bs <batch_size>]

If something with downloading model goes wrong, you can manually download model weights from here and put them under resources/fastspeech.pth

Inference:

Input for inference is a text file with source sentences located in separate lines. If not provided default samples will be used.

python3 inference.py -c <config_file> -r <checkpoint> [-s <source_file>] [-t <target_directory>]

For example (will generate default samples):

python3 inference.py -c configs/main.json -r resources/fastspeech.pth

Default samples:

  • A defibrillator is a device that gives a high energy electric shock to the heart of someone who is in cardiac arrest
  • Massachusetts Institute of Technology may be best known for its math, science and engineering education
  • Wasserstein distance or Kantorovich Rubinstein metric is a distance function defined between probability distributions on a given metric space

Project structure

  • configs/ contains configs which were used to train model
  • data/ contains data (LJSpeech downloads there by default) and trainval split (indices in dataset)
  • src/ contains source codes
  • train.py is a training script (it downloads all needed data if it is not present)
  • inference.py is an inference script which takes text file and outputs audio files in a directory

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors