A Chainer implementation of U-Net singing voice separation model
Switch branches/tags
Nothing to show
Clone or download
Failed to load latest commit information.
.gitignore first commit Nov 22, 2017
DoExperiment.py Add import in the commented example Nov 28, 2017
ProcessDSD.py fix pep8 Nov 28, 2017
ProcessIKALA.py fix pep8 Nov 28, 2017
ProcessIMAS.py fix pep8 Nov 28, 2017
ProcessMedleyDB.py fix pep8 Nov 28, 2017
README.md Update README.md Nov 23, 2017
const.py first commit Nov 22, 2017
network.py fix pep8 Nov 28, 2017
unet.model first commit Nov 22, 2017
util.py fix pep8 Nov 28, 2017


This is an implementation of U-Net for vocal separation proposed at ISMIR 2017, with Chainer framework.


Python 3.5

Chainer 3.0

librosa 0.5.0

cupy 2.0 (required if you want to train U-Net yourself. CUDA environment required.)


Please refer to DoExperiment.py for code examples (or simply modify it!).

How to prepare dataset for U-Net training

*If you want to train U-Net with your own dataset, prepare the mixed, instrumental-only, and vocal-only versions of each track, and pickle their spectrograms using util.SaveSpectrogram() function. You should set PATH_FFT (in const.py) to the directory you want to save the pickled data.

*If you have either iKala, MedleyDB, DSD100 dataset, you could make use of ProcessXX.py scripts. Remember to set the PATH_XX in each script to the right path.

*If you want to generate dataset with "original" and "instrumental version" audio pairs (as the original work did), refer to ProcessIMAS.py.


The neural network is implemented according to the following publication:

Andreas Jansson, Eric J. Humphrey, Nicola Montecchio, Rachel Bittner, Aparna Kumar, Tillman Weyde, Singing Voice Separation with Deep U-Net Convolutional Networks, Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR), 2017.