# Separation of speakers using Lab41's model

This notebook contains an example of loading an already trained version of Lab41's source separation model.  It also shows how to use the loaded model to separate individual speakers from an example waveform.

In [1]:
# Generic imports
import json
import numpy as np
import pandas as pd

# Imports to play audio
from IPython.display import Audio, display

# Import Lab41's separation model
from magnolia.dnnseparate.L41model import L41Model

# Import utilities for using the model
from magnolia.utils.postprocessing import convert_preprocessing_parameters
from magnolia.features.preprocessing import undo_preprocessing
from magnolia.iterate.mix_iterator import MixIterator
from magnolia.utils.clustering_utils import l41_clustering_separate

### Hyperparameters

* **model_location** : Specify the location of where to store the model (CPU or GPU)
* **model_settings** : Path to model configuration settings
* **mixes**          : List of mix configuration settings to source-separate
* **from_disk**      : Whether or not to read mixes from disk
* **mix_number**     : Mix to write to file

In [2]:
# from model settings
model_params = {
    'nonlinearity': 'tanh',
    'layer_size': 600,
    'embedding_size': 40,
    'normalize': 'False'
}
uid_settings = '/local_data/magnolia/pipeline_data/date_2017_09_27_time_13_25/settings/assign_uids_LibriSpeech_UrbanSound8K.json'
model_save_base = '/local_data/magnolia/experiment_data/date_2017_09_28_time_13_14/aux/model_saves/l41'

model_location = '/cpu:0'
model_settings = ''
mixes = ['/local_data/magnolia/pipeline_data/date_2017_09_27_time_13_25/settings/mixing_LibriSpeech_UrbanSound8K_test_in_sample.json']
from_disk = True
mix_number = 1

### Data iterator

Create a mix iterator that loops through the mixes one at a time.

In [3]:
mixer = MixIterator(mixes_settings_filenames=mixes,
                    batch_size=1,
                    from_disk=from_disk)

### Create and load a pretrained instance of Lab41's model

Here an untrained model instance is created, and the pretrained weights are loaded

In [4]:
# get frequency dimension
frequency_dim = mixer.sample_dimensions()[0]

# get number of sources
settings = json.load(open(uid_settings))
uid_file = settings['output_file']
uid_csv = pd.read_csv(uid_file)
number_of_sources = uid_csv['uid'].max() + 1

In [5]:
model = L41Model(**model_params,
                 num_speakers=number_of_sources,
                 F=frequency_dim,
                 device=model_location)

model.load(model_save_base)

INFO:tensorflow:Restoring parameters from /local_data/magnolia/experiment_data/date_2017_09_28_time_13_14/aux/model_saves/l41


### Example separation process

Samples can be generated from the dev set for qualitatively evaluating the perfomance of the model and to test the separation process.  For this example, a sample will be generated, converted to a raw waveform, and then separated into two sources.

Get an example from the mixer and convert it back into a waveform via the istdt function and undo the preemphasis.

In [6]:
assert(mix_number <= mixer.epoch_size())

settings = json.load(open(mixes[0]))

signal = settings['signals'][0]
preprocessing_settings = json.load(open(signal['preprocessing_settings']))
istft_args = convert_preprocessing_parameters(preprocessing_settings['processing_parameters']['stft_args'])
preemphasis_coeff = preprocessing_settings['processing_parameters']['preemphasis_coeff']


for i in range(mix_number):
    spec, bin_masks, source_specs, uids, snrs = next(mixer)

model_spec = spec
spec = spec[0]
bin_masks = bin_masks[0]
source_specs = source_specs[0]
uids = uids[0]
snrs = snrs[0]

Display out the original sources and the mix

In [7]:
print('SNR of this mix: {}'.format(snrs))

y_mix = undo_preprocessing(spec, mixer.sample_length_in_bits(),
                           preemphasis_coeff=preemphasis_coeff,
                           istft_args=istft_args)

print('Mixed sample')
display(Audio(y_mix, rate=mixer.sample_rate()))

for i, source_spec in enumerate(source_specs):
    y = undo_preprocessing(source_spec, mixer.sample_length_in_bits(),
                           preemphasis_coeff=preemphasis_coeff,
                           istft_args=istft_args)
    
    print('Sample for source {}'.format(i + 1))
    display(Audio(y, rate=mixer.sample_rate()))

SNR of this mix: 0.3283302478975898
Mixed sample


Sample for source 1


Sample for source 2


Use the model and the clustering_separate function to separate the signal waveform into sources.

In [9]:
source_specs = l41_clustering_separate(model_spec, model, mixer.number_of_samples_in_mixes())

for i, source_spec in enumerate(source_specs):
    y = undo_preprocessing(source_spec, mixer.sample_length_in_bits(),
                           preemphasis_coeff=preemphasis_coeff,
                           istft_args=istft_args)

    print('Separated sample for source {}'.format(i + 1))
    display(Audio(y, rate=mixer.sample_rate()))

Separated sample for source 1


Separated sample for source 2
