# Separation of speakers using PIT-S-CNN

This notebook contains an example of loading an already trained version of the PIT-S-CNN source separation model.  It also shows how to use the loaded model to separate individual speakers from an example waveform.

In [None]:
# Generic imports
import numpy as np
import tensorflow as tf

# Imports to play audio
from IPython.display import Audio

# Import Lab41's separation model
from magnolia.dnnseparate.pit import PITModel

# Import utilities for using the model
from magnolia.utils.clustering_utils import clustering_separate, preprocess_signal
from magnolia.features.mixer import FeatureMixer
from magnolia.features.spectral_features import istft, scale_spectrogram
from magnolia.utils.postprocessing import reconstruct
from magnolia.features.data_preprocessing import undo_preemphasis

## Paths

In [None]:
libridev = "** Path to librispeech dev hdf5 **"
model_path = "** Path to model checkpoint **"

## Hyperparameters

    fft_size    : Number of samples in the fft window
    overlap     : Amount of overlap in the fft windows
    sample_rate : Number of samples per second in the input signals

In [None]:
fft_size = 512
overlap = 0.0256
sample_rate = 10000
numsources = 2
datashape = (51, fft_size//2 + 1)
preemp_coef = 0.95

### Create and load a pretrained instance of PIT-S-CNN

Here an untrained model instance is created, and the pretrained weights are loaded

In [None]:
tf.reset_default_graph()

model = PITModel(method='pit-s-cnn', num_steps=datashape[0], num_freq_bins=datashape[1], num_srcs=numsources)

config = tf.ConfigProto()
config.allow_soft_placement = True
config.gpu_options.allow_growth = True

sess = tf.Session(config=config)
model.load(model_path, sess)

### Example separation process

Samples can be generated from the dev set for qualitatively evaluating the perfomance of the model and to test the separation process.  For this example, a sample will be generated, converted to a raw waveform, and then separated into two sources.

In [None]:
# Create a mixer for recordings from the dev set
long_mixer = FeatureMixer([libridev,libridev], shape=(200,None)) 

Get an example from the mixer and convert it back into a waveform via the istdt function and undo the preemphasis.

In [None]:
data = next(long_mixer)
spec = data[0]
spec_mag, spec_phase = scale_spectrogram(spec)
signal = istft(spec,sample_rate,None,overlap,two_sided=False,fft_size=512)
signal = undo_preemphasis(signal)

Audio(signal,rate=sample_rate)

Use the model's separate function to separate the signal waveform into sources.

In [None]:
sources_spec = model.separate(spec_mag, sess)
sources = [reconstruct(x, spec, sample_rate, None, overlap, square=True, preemphasis=preemp_coef) for x in sources_spec]

Listen to the results

In [None]:
Audio(sources[0], rate=sample_rate)

In [None]:
Audio(sources[1], rate=sample_rate)