Multi-band MelGAN

Unofficial PyTorch implementation of Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech.

Audio samples are available on the project demo page.

Model

I use Identity as a shortcut connection (instead of Linear) in residual blocks and don't use biases, so my implementation has slightly fewer parameters than described in the paper (1.52 vs 1.91).

PQMF

The cutoff-ratio of the pseudo quadratue mirror filter bank can be set to a specific value or to None. In the latter case, the optimal filter will be automatically synthesized before the start of training.

Train

To start training for, say, 500K iterations, run the command:

train.py -l log -c config/mb_train.yaml -i 500000

To continue training from the last saved checkpoint for another 500K iterations, run the command:

train.py -l log -i 500000

The training results will be posted in the log folder and available for viewing via the tensorboard.

Vocoder

Pretrained multi-band vocoder (config and weights) can be downloaded here. This model was trained for 500K iterations on the LJSpeech dataset.

Example

import sounddevice as sd
import librosa
import yaml

config_path = "models/melgan.yaml"
model_path = "models/melgan.pt"

cfg = yaml.load(open(config_path, "r"), Loader=yaml.FullLoader)
sr = cfg["data"]["sample_rate"]
vocoder = from_config(cfg)
vocoder.G.load_state_dict(torch.load(model_path))

# out-of-distribution sample (female)
x = torch.from_numpy(librosa.load(librosa.example("libri3"), sr=sr)[0])

# wav-to-mel
y = vocoder.encode(x)
with torch.no_grad():
    # mel-to-wav
    x_hat = vocoder.decode(y)

# play restored wav
sd.play(x_hat, sr, blocking=True)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
config		config
LICENSE		LICENSE
README.md		README.md
context.py		context.py
data.py		data.py
logger.py		logger.py
melgan.py		melgan.py
modules.py		modules.py
pqmf.py		pqmf.py
requirements.txt		requirements.txt
train.py		train.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config

config

LICENSE

LICENSE

README.md

README.md

context.py

context.py

data.py

data.py

logger.py

logger.py

melgan.py

melgan.py

modules.py

modules.py

pqmf.py

pqmf.py

requirements.txt

requirements.txt

train.py

train.py

trainer.py

trainer.py

Repository files navigation

Multi-band MelGAN

Model

PQMF

Train

Vocoder

Example

About

Releases

Packages

Languages

License

che-roman/mb-melgan

Folders and files

Latest commit

History

Repository files navigation

Multi-band MelGAN

Model

PQMF

Train

Vocoder

Example

About

Topics

Resources

License

Stars

Watchers

Forks

Languages