# BLIND SOURCE SEPARATION WITH DUET

## Practical Exercise

In this IPython notebook, we will implement DUET algorithm to perform speech source separation.
In particolar, we will learn how to:
    1. import audio and listen to it in python
    2. perform a Short Time Fourier Transform and visualize its results as a Spectrogram
    3. implement a naive version of the DUET algorthim (using only the phase information)
    
### References
- _Rickard, Scott. “The DUET blind source separation algorithm.” Blind Speech Separation. Springer Netherlands, 2007._   
- _Yilmaz, Ozgur, and Scott Rickard. “Blind separation of speech mixtures via time-frequency masking.” Signal Processing, IEEE transactions, 2004._

Let's begin by importing the necessary libraries all of which can be installed with pip

In [17]:
import numpy as np
import matplotlib.pyplot as plt

from scipy.signal import find_peaks              # for peak-picking
from mir_eval.separation import bss_eval_images  # for evaluation

import soundfile as sf # to load audio files
import stft # to perform stft

import IPython

## Some hyperparametes and fs-path
Here some paths to the data folder and some hypersparameter.

In this tutorial, we will do some _inform_ our source separation, that is, we use some a-priori knowledge.  
In particular we know how many sound-sources/speaker we want to separate.

In [4]:
# some environmental parameter.
dir_data = "./data/" 
dir_mix = dir_data + "mixtures/"    # path to the mixtures
dir_gt = dir_data + "ground_truth/" # path to the clean sound sources
fig_size = (10, 5)

# analysis parameters
wavefile = "3speech_2chan.wav"  # file to be analysed
n_src = 2 # number of sound sources

tw = 64 # [ms] STFT analysis window - it a typical parameter used in litterature
ov = 50 # [%]  STFT window overlap  - it a typical parameter used in litterature

## Load sound in Python
Let's load the mixture signal, that is the signal recorded at the microphone.  
... and listen to it with '''IPython.display.Audio'''

In [5]:
#### Load the mixture
# TODO.
#  - use the function in read of soundfile to import the file wav
#  - check the dimension of the matrix (Samples x Channels)
#  - use IPython.display.Audio to listen to one channel

Now that our hears are satisfied, let s plot the signals so that our eyes can enjoy as well

In [8]:
#### Plot the mixture

# TODO 1) plot the waveforms (all the channels)

# plt.plot(......)
# plt.legend(['channel 1', 'channel 2'])
# plt.xlabel('Time [s]')
# plt.ylabel('Amplitude')
# plt.axes().set_aspect('auto')
# plt.show()

# TODO 2) change the axis to diplay the waveforms as fuction of time [seconds]

# time_support = ???????
# plt.plot(?????????)
# plt.legend(['channel 1', 'channel 2'])
# plt.xlabel('Time [s]')
# plt.ylabel('Amplitude')
# plt.axes().set_aspect('auto')
# plt.show()

## STFT reprensentation
Most of the approaches in the litterature work in the Short Time Fourier Transform (STFT) domain.  
DUET does the same. So we need to compute the stft of observed signal.

### Reference
https://en.wikipedia.org/wiki/Short-time_Fourier_transform

In [9]:
#### Go to STFT domain
# TODO. Use:

# TODO 1) stft(signal, frameSize, hopSize), frameSize in bin, hopSize in bins

# nfft = int(???????????)
# hop  = int(???????????)
# print(nfft, hop)

# MIX = stft.stft(???????????)
# MIC1 = MIX[:,:,0]
# MIC2 = MIX[:,:,1]
# print(MIC1.shape)

#  TODO 2) display the spectrogram

# def plot_spectrogram(x):  
#     #spectrogram = ???
#     plt.imshow(spectrogram)
#     plt.axes().set_aspect('auto')
#     plt.xlabel('time frames')
#     plt.ylabel('frequency bins')
#     plt.colorbar()
#     plt.show()

plot_spectrogram(MIC1)  
plot_spectrogram(MIC2)

In [19]:
#### Compute log-ratio of spectrograms
# TODO.
#    - compute the log ratio:
#      - compute the log magnitude of the ratio (ILD)
#      - compute the angle of the of the ratio (IPD)
#      (hint check how to compute mag and of complex numbers)
#    - display the resuls (hint: similar to plot_spectrogram)


## BLIND SOURCE SEPARATION WITH DUET

In this part of the tutorial you are going to implement a vanilla version of the DUET algorithm.  
The original algorithm estract both phase and magnitude information and cluster the peaks in a 3D space:  
   (Delays/Phase x Amplitude/Magnitude x Histogram peaks)
   
Here we are going to use only the _magnitude_ information.

### Bonus exercise
Implement the full DUET algorithm with both phase and magnitude information.

### References

- _Rickard, Scott. “The DUET blind source separation algorithm.” Blind Speech Separation. Springer Netherlands, 2007._
- _Yilmaz, Ozgur, and Scott Rickard. “Blind separation of speech mixtures via time-frequency masking.” Signal Processing, IEEE transactions, 2004._


In [11]:
#### 4) Compute histogram of ratios
# TODO. Use:
#   - np.histogram to compute the densities and the weights, you can use 200 bins
#   - normalize the bins
#   - plot the results


In [12]:
#### 5) Extract peaks of the histogram
# TODO.
#   - do peak picking. How many? Do we know? (hint: check find_peaks in scypy)

In [13]:
#### 6) Plot histogram and peaks
# TODO.

In [14]:
#### 7) Compute Masks
# TODO.
#    - for every time-frequency point, cluster it: decide to which mask it belongs 
#      using the information of the peaks and the wegths.
#       - we have K masks as the number of sources/peaks.
#    - plot the binary mask

In [15]:
#### 8) Apply masks 
# TODO. 

In [16]:
#### 9) Save sounds
# TODO. Use:
#  - istft
#  - wavwrite

## EVALUATION
Now it is time to evaluate our separation. It is common to use the MBSS_EVAL metrics (see References below).  
Sound Mixture have:
- target _sources_ of interests
- _interferences_ (cross-talk)
- _noise_ (reverberation or measuremets)  
moreover, our algo can add spurious sounds, we call these sounds _artifacs_.

This (originally matlab) toolboIx aim to quantify this _error_ with the following metrics:
- SIR: Signal to Intereference ratio (estimated source vs. non-origin sources)
- SAR: Signal to Artifact ratio (estimated source vs. artifacts)
- SNR: Signal to Noise ratio    (estimated source vs. origin source)
- SDR: Signal to Distortion ratio: an accumulating general metrics
- (ISR: Image to Spatial Distortion ratios)

In python we can use the module mir_eval (Music Information Retrieval evaluation). We just need to input our prediction and the ground-truth.

### References
- _http://craffel.github.io/mir_eval/ _
- _http://bass-db.gforge.inria.fr/bss_eval/: A toolbox for performance measurement in (blind) source separation_
- _E. Vincent, R. Gribonval and C. Févotte, Performance measurement in blind audio source separation, IEEE Trans. Audio, Speech and Language Processing, 14(4), pp 1462-1469, 2006._


In [None]:
#### 10) Evaluate method - load/create the ground-truth:
# TODO. Use:
#  - ground_truth sounds
#  - plot the ground truth mask

In [None]:
#### 10) Evaluate method:
# TODO. Use:
#  - create two matrix Channel x Samples x Images, one stacking the gt sources, the other staking
#    estimated sounds
#  - bss_eval_images in mir_eval module