<a href="https://colab.research.google.com/github/compi1234/spchlab/blob/main/lab04_feature_extaction/TimeDomainFeatures.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Google Colab" title="Open in Google Colab"></a> 
# Time domain feature extraction

In this notebook we explore feature extraction in the time domain; i.e. features that we can derive
directly from the time-domain signal, without doing a frequency transformation first.  
We focus on two conceptually simple features that are time varying and hence computed with a sliding window approach.

_______________________________

#### 1. Energy
Energy ($E^2$) is defined as the average per sample energy in a short window.
It is computed as the total energy in a frame (no windowing is applied) and which is then normalized for the number of samples in the frame, thus:   

$E^2 =  \frac{1}{N} \sum_{i} x^2[i] $

#### 2. Pitch
For Pitch estimation we use the YIN algorithm as implemented in librosa.  It estimates the pitch period as the minimum of the difference function:   

$d(\tau) = \sum_t (x(t) - x(t-\tau)) $   

$T = argmin_{\tau}d(\tau) $   

The estimated pitch period is expressed in *(m)sec* and the pitch frequency (in *Hz*) is obtained by inversion of the pitch period.   

$ f_0 = \frac{1}{T} $   

The above algorithm is simple and naive at the same time as human speech is never perfectly periodic.  Moreover non-periodic segments (unvoiced speech, silence, .. ) should be recognized as such.  Therefore a few additional heuristics are required to turn this baseline algorithm into an excellent state-of-the-art pitch estimator.

Reference: *A. de Cheveigne et. al*, **YIN, a fundamental frequency estimator for speech and music** (http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf)  

In [1]:
# uncomment the pip install command to install pyspch -- it is required!
#
#!pip install git+https://github.com/compi1234/pyspch.git
#
try:
    import pyspch
except ModuleNotFoundError:
    try:
        print(
        """
        To enable this notebook on platforms as Google Colab, 
        install the pyspch package and dependencies by running following code:

        !pip install git+https://github.com/compi1234/pyspch.git
        """
        )
    except ModuleNotFoundError:
        raise

In [2]:
%matplotlib inline
import os,sys, math
import numpy as np

import pyspch.sp as Sps
import pyspch.core as Spch
import pyspch.display as Spd

import matplotlib.pyplot as plt
import matplotlib as mpl

import ipywidgets as widgets
from ipywidgets import interact, interact_manual, interactive, interactive_output, Layout
from IPython.display import display, clear_output, Audio, HTML

mpl.rcParams['figure.figsize'] = [12.0, 8.0]
mpl.rcParams['font.size'] = 12
mpl.rcParams['legend.fontsize'] = 'large'
mpl.rcParams['figure.titlesize'] = 'large'

dir = "demo/"
demo_files = ["friendly.wav","male1.wav","female1.wav","female2.wav","train.wav","timit_m1_sa1.wav","timit_f1_sa1.wav"]

In [5]:

def time_domain_plot(file_name,length_ms,shift_ms):
    wavdata, sr = Spch.load_data(dir+file_name)
    shift = shift_ms/1000.
    length = length_ms/1000.
    rms,pitch,zcr = Sps.time_dom3(y=wavdata,sr=sr, shift=shift, length=length)
    rms = 10.*np.log10(rms)
    spg = Sps.spectrogram(wavdata,sample_rate=sr,f_shift=shift,f_length=length,n_mels=None,mode='dB')
    fig = Spd.PlotSpgFtrs(wavdata=wavdata,spgdata=spg,shift=shift,line_ftrs=[rms,pitch],dy=None,sample_rate=sr,
                         row_heights=[1,1,1,1])
    fig.axes[2].set_ylabel("ENERGY (dB)")
    fig.axes[3].set_ylabel("Pitch (Hz)")
    pitch_lim = np.array(fig.axes[3].get_ylim())
    pitch_lim[0] = 50*(pitch_lim[0]//50)
    pitch_lim[1] = 50*((pitch_lim[1]-1)//50 + 1)
    fig.axes[3].set_ylim(pitch_lim)
    #fig.add_seg_plot(seg,iax=0,ypos=0.8)
    #if ZCR:
    #    fig.add_line_plot(wavdata,dx=1/sr,iax=4)
    #    fig.axes[4].set_ylabel("ZCR (/sec)")
    #fig.add_line_plot(4*rms,dx=shift,iax=0)
    clear_output(wait=True)
    display(fig)
    display(Audio(wavdata,rate=sr))
    

## Questions and Things to explore

In this notebook you can explore practical issues in a sliding window analysis; this time for the extraction of two of the most essential features: energy and pitch.
You can berify that *good* length and shift values lead to estimates that are in sync with human perception:   
- tracking with TOO LARGE a window (large frame lengths) will smear out the time varying properties, and we may be observing global long term properties instead, instead of shorter term variations that are not audible
- tracking with TOO SMALL a window (short frame lengths/shifts) may yield unstable estimates that suggests short term variabilities that are perceptionally irrelevant


##### 1. Frame Length \>\= Frame Shift ?   
Read the CAVEAT section first. As mentioned there, it is advised NOT to use length < shift as the software may not be reliable in such situation.   
Which of the explanations below is correct ?  Make a sketch that supports your answer:   
- it is technically impossible to implement a sliding window approach where length < shift 
- it makes no sense to have a frame length that is shorter than a frame shift, as this implies gaps in the observation of the signal   
  
##### 2. Effect of Window Length
Setup 1: What happens when using long window lengths ?
- Set the frame shift to 5.0 msec
- **INCREASE** the frame length gradually from 30.0 to 200.0 msec
    - What changes do you observe in the energy and pitch curves
    - Do you observe phenomena / values that seem inappropriate for speech analysis ?
    
Setup 2: What with short window lengths and shifts ?
- Set the frame shift to 2. (or 1.) msec
- **DECREASE** the frame length gradually from 30.0 to 2.0 msec
    - At some point the pitch estimate gets lost .. What is happening ?
    - Explore both a male example and a female example
    - Can you relate the frame length where the pitch estimate fails to the actual pitch of the voice?  Hint: the algorithm used here relies on a time difference function
    - Come up with a rule of thumb for minimum window length to be able to do pitch estimation
    
Setup 3: 
- Use the same setup as in (2) and focus now on the energy
- The very short frame shift lets you observe fine temporal details in the energy curve
- Again let the window length slider move from high to (very) low:
    - During a single sound we expect features such as energy to be rather stable; however, for which values of the window length see you a ripple through of the pitch onto the energy ?  Explain this with a sketch.

Global Questions:
- on the basis of the previous experiments, what can you say about reasonable values for window length and window shift ?

##### CAVEATS
- The GUI is intentionally very flexible in what values of length and shift you enter.
- The output is not guaranteed to make sense (and you may even have occasional crashes) when you enter extreme or unnatural values (e.g. shift > length).  If that is the case, just move your sliders slightly to somewhat more *normal* values and everything will be fine again (hopefully).

## The Time-Domain GUI

#### Interaction
You can choose a **file** from a small selection   
You can interact with 2 sliders: **frame_length** and **frame_shift** (in msec). Defaults are frame_length=20msec and frame_shift=10msec

#### Display
In the figure you get following views:
- time waveform  (with energy overlay)
- Fourier Spectrogram
- RMS energy (square root of mean energy per sample)
- pitch (in Hz)



In [6]:
interact(time_domain_plot,
    file_name = widgets.Dropdown(options=demo_files,value=demo_files[0],description="File Name"),
    length_ms=widgets.FloatSlider(value=30.,min=2.,max=200.,step=2.,
                    continous_update=False,description="Frame Length (msec)",
                    layout=Layout(width='50%') ,style={'description_width':'50%'}),
    shift_ms=widgets.FloatSlider(value=10.,min=2.,max=80.,step=2.,
                    description="Frame Shift (msec)",continuous_update=False,
                    layout=Layout(width='50%') ,style={'description_width':'50%'}));

interactive(children=(Dropdown(description='File Name', options=('friendly.wav', 'male1.wav', 'female1.wav', '…