PyTorch implementation of waveRNN based neural vocoder, which predicts a raw waveform from a mel-spectrogram.
- LJSpeech (en): https://keithito.com/LJ-Speech-Dataset/
python preprocess.py \
--dataset_dir <Path to the dataset dir (Location where the dataset is downloaded)>\
--out_dir <Path to the output dir (Location where processed dataset will be written)>
The preprocessing code currently supports the following datasets:
- LJSpeech (en)
python train.py \
--train_data_dir <Path to the dir containing the data to train the model> \
--checkpoint_dir <Path to the dir where the training checkpoints will be saved> \
--resume_checkpoint_path <If specified load checkpoint and resume training from that point>
python generate.py \
--checkpoint_path <Path to the checkpoint to use to instantiate the model> \
--eval_data_dir <Path to the dir containing the mel spectrograms to be synthesized> \
--out_dir <Path to the dir where the generated waveforms will be saved>
The code in this repository is based on the code in the following repositories
- arXiv:1802.08435: Efficient Neural Audio Synthesis
- arXiv:1811.06292v2: Towards Achieving Robust Universal Neural Vocoding