# Speech Analysis

+ ###### Author: Dirk Van Compernolle   
+ ###### Modification History: 12/12/2023
+ ###### Requires:  pyspch>=0.7

This notebook contains a collection of interactive, versatile speech analysis tools.
It is ment to be a generic workbench giving access to various spectral analysis tools in the pyspch package.
Hence as a user you will not run this notebook cell by cell, but just select the tool(s) that you want to work with.

#### Tools  
1) **Spectrogram** with options for mel spectrogram and cepstral analysis
2) **Spectrogram + Spectral Slice**: similar as above, but with a slider showing spectral slices
3) **Recorder** 

In [1]:
# uncomment the pip install command to install pyspch -- it is required!
#
#!pip install git+https://github.com/compi1234/pyspch.git
#
try:
    import pyspch
except ModuleNotFoundError:
    try:
        print(
        """
        To enable this notebook on platforms as Google Colab, 
        install the pyspch package and dependencies by running following code:

        !pip install git+https://github.com/compi1234/pyspch.git
        """
        )
    except ModuleNotFoundError:
        raise

# Do the imports #
##################
#
%matplotlib qt
import os,sys 
import numpy as np
import pandas as pd
from IPython.display import display, Audio, HTML
import matplotlib.pyplot as plt
#   
import pyspch.sp as Sps
import pyspch.core as Spch
import pyspch.display as Spd
# make notebook cells stretch over the full screen
display(HTML(data="""
<style>
    div#notebook-container    { width: 95%; }
    div#menubar-container     { width: 65%; }
    div#maintoolbar-container { width: 99%; }
</style>
"""))

## Spectrogram GUI

**iSpectrogram** is a GUI that lets you visualize various spectral representations including Fourier Spectrum, Mel spectrum and cepstrum or mel-cepstrum.
The parameters of the spectral analysis can be modified in an interactive way.

### Parameters
- RHS (boolean)  defines the screen layout
    1. RHS=False (default): standard spectrogram view
    2. RHS=True spectrogam view combined with spectral slice view (frame to be selected with a slider); to select this view, add argument "RHS=True" when calling   
- figwidth (in inch): default=12.0, this may need to be adjusted in function of your screen properties
- dpi (default=100): this default is OK for most screens

### GUI controls
- frame shift (in msec): default=10.0
- frame length (in msec): default=25.0
- preemphasis: default=0.97 (range: 0.0 - 1.0)
- mel spectrogram view: checkbox
- cepstral view: checkbox  (if mel spectrogram box is checked, MFCCs are shown otherwise the cepstrum obtained from the Fourier Spectrum)
- number of mel bands: default=80
- number of cepstral coefficients: default=12

*KNOWN LIMITATIONS*:
- The slider may for certain settings not perfectly align with the spectrogram
- Interaction can be sluggish when "sliding" the slider, better is to "click" the intended location
- When changing a text field: put your cursor at the end of the text and hit enter to signal your modification to the GUI !  


In [6]:
Spd.iSpectrogram(figwidth=15,RHS=True,seg_pane=0)                 

iSpectrogram(children=(HBox(children=(VBox(children=(Output(layout=Layout(height='95%')), FloatSlider(value=0.…

In [3]:
times = (0.,1.)
type(times)

tuple

In [4]:
from ipywidgets import widgets
wg = widgets.FloatRangeSlider(
    value=[5, 7.5],
    min=0,
    max=10.0,
    step=0.1,
    description='Test:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='.1f',
)

In [5]:
print(type(wg.value))
wg.value = [0.,1.]
a = list(wg.value) # [wg.value[0], wg.value[1] ]
print(type(wg.value),type(a))

<class 'tuple'>
<class 'tuple'> <class 'list'>


In [6]:
wg

FloatRangeSlider(value=(0.0, 1.0), continuous_update=False, description='Test:', max=10.0, readout_format='.1f…

## Suggested Exercises

### Examining the SLIDING WINDOW approach

Any spectrogram or spectral slice is obtained by analyzing in long signal in successive short frames, a so called sliding window approach.
This sliding window should not be too short, not too long, not too much overlap, ....

The default parameters are shift=10msec, length=25msec, preemphasis=.97
    + describe what you observe when deviating from the defaults
    + for what parameters and in what way does the spectrogram deviate from speech perception ?
    + choose as frame_length 10, 30, 50, 100 msec. Which values would you describe as good, acceptable, not acceptible and why ?
    + choose as frame_shift 5,10, 20 msec. Which values would you describe as good, acceptable, not acceptable and why ?
    + Have you discovered the Heisenberg principle for speech ?

The interactive spectrogram visualizes speech in the time-frequency domain.
Some form of time-frequency analysis is the first processing step in the human auditory system in equally so
in speech recognition systems.
 
A Fourier Spectrogram is obtained by letting a sliding window make short time spectra and by viewing this in a 2D heatmap
we may see which frequencies are present at which moment in time.  

In this notebook we focus on elementary usage

- as a popular Time-Frequency view of speech and audio signals in which we may discover the cognitive relevant elements (sounds, acoustic events, ... )
- where we want to create a visual impression that is consistent with our auditory perception.
 


### Instructions
- You should use
> Spd.iSpectrogram()   
which is equivalent to calling it with parameters:     
> Spd.iSpectrogram(figwidth=12, dpi=120)     

On most graphic card / displays these parameters will work fine.
If on your display you observe a bad mismatch between character sizes in the UI and detail in the figures, then you can try to modify the default settings.   
If sliders don't align well with plots,
you may also need to adjust the size of your window. 

#### Audio and Segmentation File Input
Suggested Files to choose from:     
( in root directory: 'https://homes.esat.kuleuven.be/~spchlab/data/'):   
- misc/friendly.wav  ... a 1 second speech fragment
- misc/train.wav     ... a train whistle
- timit/audio/train/dr1/fcfj0/si1027.wav   ... an example sentence from the TIMIT corpus

#### Segmentations
For the example speech files a number of segmentations are available (not all for each example). You can display them by entering the filename in the appropriate field.
They just have different extensions: ".gra" for grapheme or letter ,
".phn" for phone, ".syl" for syllable and ".wrd" for word   
Be aware that the segmentations may look slightly different in the waveform plot vs. spectrogram plot as segmentations that are given in msec's are rounded to the sample level in the first plot and to the frame level in the second one 


## Exercise 1: Phonetic Segmentations

1. setting up:
    + work with iSpectrogram()
    + load friendly.wav and load also the phonetic segmentation in friendly.phn (or graphemic friendly.gra)
    + set your audio at a comfortable loudness when you play the sentence
2. focus on the first word 'friendly', evaluate the segmentations, listen and comment
    + 'f-r-ie-n-d-l-iy'
    + 'ie-n-d-l-iy' 
    + 'f' and 'f-r'
    + what was your most striking observation
    + to what extent do you agree with the given segmentation, based on perception, based on time waveform and based on spectrogram ?

## Exercise 2: Spectrogram Parameters

1. setting up:
    + work with iSpectrogram()
    + load again misc/friendly.was with its phonetic transcript (or some other speech wavfile)
    + use sliders to select small segments and listen to them to verify the phonetic transcript
2. 
