Skip to content
Speech Enhancement using Bayesian WaveNet
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
assets sample training data Apr 1, 2018
examples add audio samples Aug 23, 2017 Update Mar 24, 2018 test Aug 21, 2017 test Aug 21, 2017 update1 Mar 24, 2018 update1 Mar 24, 2018 test Aug 21, 2017 test Aug 21, 2017

WaveNet-Enhancement contains contains most of the functions including definition of the WaveNet structure. trains the prior model. trains the likelihood model with a fixed prior model. builds a fast sample by sample generator for the prior or the likelihood model. It generates pseudo clean prediction using the current model in the iterative training process as described in the our paper.

Train Prior Model

The prior model is a 40 layer WaveNet consists of 4 blocks with 10 layers each.

Therefore, the input shape is 20477 by data size, and the output shape is 16384 by data size.

Input data is bin indices of clean speech using 256 mu-law quantization. Output data is the corresponding prediction shifted by one sample to the right, making the model always predicts the next sample based on the past samples.

Input data: train_pr.mat
DataType: int32

Output data: target_pr.mat DataType: uint8


python bawn_pr_multi_gpu_train /logdir NUM_GPUS

Train Likelihood Model

Besides the pre-trained prior model, the likelihood model consists of 2 more copies of WaveNet as in the prior model.

Therefore, the clean input shape is 20477 by data size, the noisy input shape is 24570 by data size, and the output shape is 16384 by data size.

Noisy input data is raw noisy audio samples. Clean input data is the bin indices of its clean counterpart. Output data is the expected prediction the same as in training prior model.

Clean input data: clean_train.mat DataType: int32

Noisy input data: noisy_train.mat DataType: float32

Output data: target_train.mat DataType: uint8


python /logdir /path_to_prior_model NUM_GPUS


Training likelihood model is very time-consuming due to the iterative training process. Recently, it has been greatly accelerated using approximations, which has not been included in this branch yet.

You can’t perform that action at this time.