Pytorch implementation of MultiBand-WaveRNN model from Efficient Neural Audio Synthesis DURATION INFORMED ATTENTION NETWORK FOR MULTIMODAL SYNTHESIS
RAW mode, Unbatched generation supported. Welcome for your contribution to implement MOL mode.
Ensure you have:
- Python >= 3.6
- Pytorch 1 with CUDA
Then install the rest with pip:
pip install -r requirements.txt
Download the LJSpeech Dataset.
Edit hparams.py, point wav_path to your dataset and run:
python preprocess.py
or use preprocess.py --path to point directly to the dataset
Here's my recommendation on what order to run things:
1 - Train WaveRNN with:
python train_wavernn.py
2 - Generate Sentences with both models using:
python gen_wavernn.py
Speaker | Recording | WaveRNN | Parallel WaveGAN | FB MelGAN | SingVocoder |
---|---|---|---|---|---|
#1 | |||||
#2 | |||||
#3 | |||||
#4 | |||||
#5 |
Speaker | Recording | WaveRNN | Parallel WaveGAN | FB MelGAN | SingVocoder |
---|---|---|---|---|---|
#1 | |||||
#2 | |||||
#3 | |||||
#4 | |||||
#5 |