# Speech Enhancement and Evaluations

- implement basic speech enhancement techniques, evaluate and visualize the quality of the enhancement

- implement four different filtering methods: **Spectral-subtraction**, **Wiener-filter**, **linear filter** and a **VAD based filter**.

In all these filters, two things are used: 

1. a constant average magnitude noise model 

2. an ideal noise estimate which is the true noise you generate to create the noisy signal. 

The enhanced signals are evaluated by computing the signal-to-noise ratios-global SNR and segmental SNR. 

To visualize the results, the segmental SNRs of all enhanced signals are plotted.

Besides this, the spectrograms of the clean, noisy and the three enhanced results are plotted and visually inspected.

Following functions are implemented

1. `noiseEst`: Estimate the noise for the noisy signal based on ideal and average noise models.

2. `spectralSub`: Enhance the noisy signal by spectral subtraction.

3. `wiener`:  Enhance the noisy signal by Wiener filter.

4. `linear`: Enhance the noisy signal by linear filter.

5. `vadEnhance`: Enhance the noisy signal by VAD based filter.

6. `snrGlb`: Compute the global SNR of the enhanced signals

7. `snrSeg`: Compute the frame-wise segmental SNR of the enhanced signals.


## Steps

- Generate a noisy signal (additive white Gaussian noise of power -35dB)

- Estimate the noise for the noisy signal, based on 
	1. ideal estimate 
	
	2. avg noise model, by completing "noiseEst". 
	
    (Note that this function should return estimates of the same dimension as the input noise matrix.)

- Enhance the noisy signal by implementing the filtering functions 
    1) Spectral subtraction: "spectralSub", 
	
	2) Wiener filter: "wiener", 
	
	3) Linear filter: "linear", 
    
	4) VAD based filter: "vadEnhance"

- Compute the global SNR and the frame-wise segmental SNR of the enhanced signals by computing
 	1. snrGlb 
 	
	2. snrSeg
 
- Plot and visualize the results.

In [8]:
from utils import *

### prepare the speech signal

In [26]:
# Read the audio file and sampling rate
Fs_target = 16000
Fs,data_clean = wav.read('hello.wav')
data_clean = data_clean[:, 0]

# Transform signal from int16 (-32768 to 32767) to float32 (-1,1)
if type(data_clean[0]) == np.int16:
    data_clean = np.divide(data_clean,32768,dtype=np.float32)

# Make sure the sampling rate is 16kHz
if not (Fs == Fs_target):
    data_clean = sig.resample_poly(data_clean,Fs_target,Fs)
    Fs = Fs_target

### windowing

In [36]:
# Split the data sequence into windows.
frame_length_ms = 25 # in miliseconds
hop_length_ms = 12.5 # in miliseconds

frame_length = int(np.around((frame_length_ms/1000)*Fs))# 25ms in samples
hop_size = int(np.around((hop_length_ms/1000)*Fs))# 12.5 ms (25/2 ms) in samples (50% overlap)

In [37]:
frame_matrix_clean = windowing(data_clean, frame_length, hop_size, 'hamming')

In [38]:
frame_matrix_clean.shape

(400, 311)