Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Latest commit 003cbf7 May 17, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
SVSGan @ bdd1358 Update SVSGan submodule May 18, 2019
Wave-U-Net @ 4d94049 update submodels Apr 29, 2019
.gitignore metrics code Mar 15, 2019
.gitmodules SVSGAN + Wave-U-Net submodules Mar 9, 2019 Update May 14, 2019 fix Apr 30, 2019


Given a mixture of background music and a speaker's voice, we want to separate the music from the pure human voice. The dataset is assumed to be composed of text-audio pairs, with most audio examples being voice-music mixtures and a small portion being pure voice recordings.

We apply two approaches to achieve this goal:

  • Wave-U-Net is a fully supervised neural network architecture that works with pure waveforms as input and output (as opposed to spectrograms) in order to separate one audio input into multiple audio outputs (link to paper).
  • SVSGAN is a semi-supervised model that uses GAN architecture to generate magnitude spectrogram of vocal and music from that of their mixture. The generated spectrogram is then combined with the phase spectrogram of mixture to generate the waveform through Inverse Short-Time Fourier Transform (ISTFT) (link to paper).

Team Members: Magd Bayoumi, Qian Huang, Xiuyu Li, Zhao Shen, Guandao Yang

You can’t perform that action at this time.