# Generative Source Separation with Glow


In this notebook, we demonstrate our method using two song snippets (inspried from TagBox's [demo page](https://ethman.github.io/tagbox/)) under different configurations.

totalComponents = ['vocals', 'accompaniment', 'drums', 'bass', 'guitars','other'] candidates modelName inputs


In [1]:
import numpy as np
import os, glob
import inverse_utils
from scipy.io.wavfile import read
import IPython.display as ipd
from source_separation import music_sep_batch
os.environ["CUDA_VISIBLE_DEVICES"] = "2"

def audio_preview(x_est, x_phase, stft, components):
    # waveform reconstruction from estimation
    x_hats = [[] for i in range(len(components))]
    for i, src in enumerate(x_est):
        x_hats[i] = stft.stft_fn.inverse(src[:, :513, :].cpu(), x_phase).cpu().numpy()[0]
        print('Estimated', components[i])
        ipd.display(ipd.Audio(x_hats[i], rate=22050))


## 1. Singing Voice Separation

We first load audio and vocal source model

In [2]:
# load audio
f1 = './demo/ww.wav' # alternatively ./demo/ww.wav
sr, x1 = read(f1) 
if sr != 22050:
    x1 = librosa.resample(x1, orig_sr=sr, target_sr=22050)
ipd.display(ipd.Audio(x1, rate=sr))

mix = np.asarray(x1)
mix = mix[np.newaxis, :]

# prepare vocal model
modelFolder = '/home/ge/github-repo/GenerativeSourceSeparation/generator/glow/logs/'
vocalGen, STFTfunc = inverse_utils.load_glow(glowFolder=modelFolder,
                                             modelName='vocals', 
                                             epoch=1000)

INFO:root:Loaded checkpoint '/home/ge/github-repo/GenerativeSourceSeparation/generator/glow/logs/vocals/G_1000.pth' (iteration 1000)


In the first configuration, we simply use pretrained accompaniment as a secondary source track

In [3]:
components1 = ['vocals', 'accompaniment']
accGen, _ = inverse_utils.load_glow(glowFolder=modelFolder, 
                                    modelName=components1[1], 
                                    epoch=1000)
genList1 = [vocalGen, accGen]


INFO:root:Loaded checkpoint '/home/ge/github-repo/GenerativeSourceSeparation/generator/glow/logs/accompaniment/G_1000.pth' (iteration 1000)


In [4]:
# run separation
xEst, mixPhase = music_sep_batch(mix, genList1, STFTfunc,
                                 optSpace='z', lr=0.01, 
                                 sigma=0.01, alpha1=1.0, 
                                 alpha2=1.0, iteration=150,
                                 mask=False, wiener=False)

100%|██████████| 150/150 [00:29<00:00,  5.07it/s]


In [5]:
audio_preview(xEst, mixPhase, STFTfunc, components1)


Estimated vocals


Estimated accompaniment


## 2. Music Source Separation

In the second configuration, we additionally use drums, bass and guitars as source model, in this experiment, we can see that drums and bass tracks are correct sources, however, vocals and guitar tracks are not as good, probably due to poorly trained guitars.

In [None]:
genList2 = [vocalGen]
components2 = ['vocals', 'drums', 'bass', 'guitars']  
for inst in components2[1:]:
    instGen, _ = inverse_utils.load_glow(modelName=inst, condition=False)
    genList2.append(instGen)

In [None]:
# run separation
xEst, mixPhase = music_sep_batch(mix, genList2, STFTfunc,
                                 optSpace='z', lr=0.01, 
                                 sigma=0.0, alpha1=1.0, 
                                 alpha2=0.0, iteration=120)

In [None]:
audio_preview(xEst, mixPhase, STFTfunc, components2)
