Skip to content

DillWave is a fast, high-quality neural vocoder and waveform synthesizer.

License

Notifications You must be signed in to change notification settings

dillfrescott/dillwave

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DillWave

DillWave is a fast, high-quality neural vocoder and waveform synthesizer. It starts with Gaussian noise and converts it into speech via iterative refinement. The speech can be controlled by providing a conditioning signal (e.g. log-scaled Mel spectrogram). The model and architecture details are described in DiffWave: A Versatile Diffusion Model for Audio Synthesis.

Credit to the original repo here.

Recommended Requirements

An Nvidia GPU that is somewhere in the RTX 30XX-40XX range.

For training it's recommended to have 16+ GB of VRAM. For inference its recommended to have at least 4 GB of VRAM.

Install

First install Pytorch, GPU version recommended! Also you need Python of course! Version 3.10.X is recommended for dillwave.

As a package:

pip install dillwave

From GitHub:

git clone https://github.com/dillfrescott/dillwave
pip install -e dillwave

or

pip install git+https://github.com/dillfrescott/dillwave

You need Git installed for either of these "From GitHub" install methods to work.

Training

python -m dillwave.preprocess /path/to/dir/containing/wavs # 48000hz, 1 channel, (8 seconds length recommended for each clip)
python -m dillwave /path/to/model/dir /path/to/dir/containing/wavs

# in another shell to monitor training progress:
tensorboard --logdir /path/to/model/dir --bind_all

Inference CLI

python -m dillwave.inference /path/to/model --spectrogram_path /path/to/spectrogram -o output.wav [--fast]

Pretrained models are going to be released here.

About

DillWave is a fast, high-quality neural vocoder and waveform synthesizer.

Resources

License

Stars

Watchers

Forks

Languages