Skip to content

Pytorch implementation of paper "High Fidelity Speech Regeneration With Application to Speech Enhancement"

License

Notifications You must be signed in to change notification settings

SolomidHero/speech-regeneration-enhancer

Repository files navigation

Regeneration Enhancer

Python application

This repository provides speech enhancement via regeneration implementation with Pytorch. Algorithm is based on paper, but several changes were made in feature extraction and therefore model parameters.

TODO list:

  • add inference scripts
  • implement streaming model and its inference
  • provide multilingual enhancement models (and adapt feature extraction too)
  • make pypi package
  • release pretrained models

Requirements

This repository is tested on Ubuntu 16.04 with a GPU 1080 Ti.

  • libsndfile (you can install via sudo apt install libsndfile-dev in ubuntu)
  • pip requirements (defined in requirements.txt, install via pip install -r requirements.txt):
    • hydra-core 1.0.6+
    • pytorch 1.7+
    • torchaudio 0.7.2+
    • librosa 0.8.0+
    • pytest 6.2.0+
    • transformers 4.3.0+, pyworld 0.2.12+, pyannote.audio 2.0+ (for feature extraction)
  • (optional) ffmpeg (for .mp3 support, you can install via sudo apt install ffmpeg in ubuntu)

Installation

git clone https://github.com/SolomidHero/speech-regeneration-enhancer
pip install -e ./speech-regeneration-enhancer

Training

For training you should use DAPS dataset, or dataset with similar file namings (folder structure doesn't matter):

data_folder/
  wav_1_clean.wav
  dirty/
    wav_1_recoder_bathroom.wav
    wav_2_microphone_street.wav
  some_sub_tree/
    wav_2_clean.wav

In this repository we use hydra configuration (read more), thus for training and inference you can only change config.yaml file. Also defining through parameters in bash is available.

When changes to config are made, you can check yourself if your parameters are acceptable by any of these commands:

pytest                       # to check if everything is working
pytest tests/test_scripts.py # to check if training process can be done
  1. After data downloading and config changes, run preprocessing script (feature extraction made here):
preprocess.py dataset.wav_dir=/path/to/wavs # parameters can be added into config directly
  1. Finally we are able to train model:
train.py train.epochs=50 train.ckpt_dir=/path/to/ckpts # parameters can be added into config directly

In /path/to/ckpt checkpoints for generator and other stuff (discriminator, optimizers) will appear from now.

Reference

About

Pytorch implementation of paper "High Fidelity Speech Regeneration With Application to Speech Enhancement"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages