Skip to content
A PyTorch implementation of "Robust Universal Neural Vocoding"
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
LICENSE
README.md
config.json
dataset.py
generate.py
model.py
network.png
preprocess.py
requirements.txt
train.py
utils.py

README.md

Robust Universal Neural Vocoding

A PyTorch implementation of Robust Universal Neural Vocoding. Audio samples can be found here.

network

Quick Start

  1. Ensure you have Python 3 and PyTorch 1.

  2. Clone the repo:

git clone https://github.com/bshall/UniversalVocoding
cd ./UniversalVocoding
  1. Install requirements:
pip install -r requirements.txt
  1. Download and extract ZeroSpeech2019 TTS without the T English dataset:
wget https://download.zerospeech.com/2019/english.tgz
tar -xvzf english.tgz
  1. Extract Mel spectrograms and preprocess audio:
python preprocess.py
  1. Train the model:
python train.py
  1. Generate:
python generate.py --checkpoint=/path/to/checkpoint.pt --wav-path=/path/to/wav.wav

Pretrained Models

Pretrained weights for the 9-bit model are available here.

Notable Differences from the Paper

  1. Trained on 16kHz audio from 102 different speakers (ZeroSpeech 2019: TTS without T English dataset)
  2. The model generates 9-bit mu-law audio (planning on training a 10-bit model soon)
  3. Uses an embedding layer instead of one-hot encoding

Acknowlegements

You can’t perform that action at this time.