(carnatic-separation-leakage)=
## Leakage-aware source separation model

In this section we walkthrough a tool that has been trained using the {cite}`saraga`. Given the live performance nature of Carnatic Music, it is difficult, in fact current impossible, to find fully-isolated multi-stem recordings to train or fine-tune existing separation approaches. Saraga includes multi-stem recordings, but these have source bleeding in the background, since these have been recorded in live performances. In this section we present an approach that has been designed having the bleeding problem in mind.

This model is able to separate clean singing voices even though it has been solely trained with data that have bleeding in the multi-track stems. Let's test how it works in a real example. Since the model is DL-based, we first need to install tensorflow.

In [None]:
## Installing (if not) and importing compiam to the project
import importlib.util
if importlib.util.find_spec('compiam') is None:
    ## Bear in mind this will only run in a jupyter notebook / Collab session
    %pip install compiam
import compiam
from compiam import load_model

# Import extras and supress warnings to keep the tutorial clean
import os
import numpy as np
import soundfile as sf
from pprint import pprint

import warnings
warnings.filterwarnings('ignore')

# Installing and importing tensorflow in case is not installed
%pip install tensorflow
%pip install tensorflow_addons
import tensorflow as tf
import tensorflow_addons as tfa

In [None]:
# Initializing a melodia instance
separation_model = load_model('separation:cold-diff-sep')

# We load the same example
audio_path = os.path.join(
    "..", "audio", "59c88c32-0bde-433b-b194-0f65281e5714.mp3")
input_mixture, sr = sf.read(audio_path)

input_mixture = input_mixture.T
mean = np.mean(input_mixture, keepdims=True)
std = np.std(input_mixture, keepdims=True)
input_mixture = (input_mixture - mean) / (1e-6 + std)

In [None]:
### Getting 20 seconds and separating
input_mixture = input_mixture[:, -44100*30:]
separation = separation_model.separate(
    input_data=input_mixture,
    input_sr=sr,
    clusters=6,
    scheduler=5,
)

In [None]:
import IPython.display as ipd

# And we play it!
ipd.Audio(
    data=separation,
    rate=separation_model.sample_rate,
)

Although perceptible artifacts in the vocals can be heard, the separation is surprisingly clean, hopefully helping musicians and musicologists to extract relevant information for it. Also, less pitched noise is present in the signal so melodic feature extraction systems may work better on these data rather than in a complete mixture or in a singing voice with source bleeding in the background.

Let's now listen to the separation on some example recordings from the Dunya corpora.

In [None]:
import glob
import soundfile as sf
separation_examples = glob.glob("../audio/separation/*.mp3")

for ex in separation_examples:
    input_mixture, sr = sf.read(ex)
    input_mixture = input_mixture.T
    input_mixture = input_mixture[:, :44100*30]
    mean = np.mean(input_mixture, keepdims=True)
    std = np.std(input_mixture, keepdims=True)
    input_mixture = (input_mixture - mean) / (1e-6 + std)
    # Running leakage-aware model
    leakage_aware_separation = separation_model.separate(
        input_data=input_mixture,
        input_sr=sr,
        clusters=6,
        scheduler=5,
    )
    # Saving the results
    song_name = ex.split("/")[-1].replace(".mp3", "")
    sf.write(
        os.path.join("..", "audio", "separation", song_name, "vocals.wav"),
        leakage_aware_separation.T,
        sr
    )

separated_files = glob.glob("../audio/separation/leakage/*/vocals.wav")

In [None]:
print("Separation:", separated_files)[0]
ipd.Audio(separated_files[0], rate=sr)

In [None]:
print("Separation:", separated_files)[1]
ipd.Audio(separated_files[1], rate=sr)

In [None]:
print("Separation:", separated_files)[2]
ipd.Audio(separated_files[2], rate=sr)

In [None]:
print("Separation:", separated_files)[3]
ipd.Audio(separated_files[3], rate=sr)

In [None]:
print("Separation:", separated_files)[4]
ipd.Audio(separated_files[4], rate=sr)

In [None]:
print("Separation:", separated_files)[5]
ipd.Audio(separated_files[5], rate=sr)

In [None]:
print("Separation:", separated_files)[6]
ipd.Audio(separated_files[6], rate=sr)