# Interactive Spectrograms

+ Author: Dirk Van Compernolle   
+ History:   
    - 5/10/2021: Created
    - 25/04/2022: upgrade to pyspch>=0.6
+ Requires: 
    - pyspch>=0.6

In [None]:
# uncomment the pip install command to install pyspch -- it is required!
#
#!pip install git+https://github.com/compi1234/pyspch.git
#
try:
    import pyspch
except ModuleNotFoundError:
    try:
        print(
        """
        To enable this notebook on platforms as Google Colab, 
        install the pyspch package and dependencies by running following code:

        !pip install git+https://github.com/compi1234/pyspch.git
        """
        )
    except ModuleNotFoundError:
        raise

In [2]:
# Do the imports #
##################
#
%matplotlib inline
import os,sys 
import numpy as np
import pandas as pd
from IPython.display import display, Audio, HTML
#   
import pyspch.sp as Sps
import pyspch.core as Spch
import pyspch.display as Spd

import matplotlib.pyplot as plt



# make notebook cells stretch over the full screen
display(HTML(data="""
<style>
    div#notebook-container    { width: 95%; }
    div#menubar-container     { width: 65%; }
    div#maintoolbar-container { width: 99%; }
</style>
"""))

### Purpose and Background
The interactive spectrogram visualizes speech in the time-frequency domain.
Some form of time-frequency analysis is the first processing step in the human auditory system in equally so
in speech recognition systems.

Possible spectral representations are:   
**1. Fourier spectrogram**  
  A Fourier Spectrogram is obtained by letting a sliding window make short time spectra and by viewing this in a 2D heatmap
we may see which frequencies are present at which moment in time.  
**2. mel spectrogram**  
  The mel spectrogram applies warping on the frequency axis in line with the human auditory system.
Roughly speaking the frequency axis is linear below 1kHz and logarithmically compressed above it.
Today this is the most popular feature representation for speech recognition.  
**3. MFCCs (mel frequency cepstral coefficients)**. 
  Mel frequency cepstral coefficients are obtained by applying a DFT to the mel spectrum, optionally followed by truncation
to a handful of coefficients. 
MFCCs are popular because almost all information is concentrated in a a handful of low order coefficients, making them the most
compact possible speech representation.  Moreover MFCCs are highly uncorrelated, making them well suited for mathematical modeling.
While MFCCs have little to offer when abundant data / compute power is available (as is common these days),
they are still interesting in compact systems. 
 


### Instructions
- You have 2 slightly different versions of interactive spectrogram available.  
> pyspch.interactive.iSpectrogram()   
or   
> pyspch.interactive.iSpectrogram22()      
- In most situations it is fine to call these without any parameters.  However, depending on your screen size and resolution some parts in the display may not align well.  If that is the case, first try to adjust you window size in which you are running Jupyter or otherwise try to adjust the figwidth parameter (default = 10)


#### File Input
Suggested Files to choose from ( 'https://homes.esat.kuleuven.be/~spchlab/data/'):
- misc/friendly.wav  ... a 1 second speech fragment
- misc/train.wav     ... a train whistle
- timit/audio/train/dr1/fcfj0/si1027.wav   ... an example sentence from the TIMIT corpus

#### Segmentations
For the example speech files a number of segmentations are available (not all for each example). You can display them by entering the filename in the appropriate field.
They just have different extensions: ".gra" for grapheme or letter ,
".phn" for phone, ".syl" for syllable and ".wrd" for word   
Be aware that the segmentations may look slightly different in the waveform plot vs. spectrogram plot as segmentations that are given in msec's are rounded to the sample level in the first plot and to the frame level in the second one 

#### Visualization details
Normally you shouldn't have to worry about these settings.  On most displays visualization will be fine for screen/window sizes on the order of 10-24 inch.  If on your display you observe a bad mismatch between character sizes in the UI and
in the figures, then you can try to modify the default settings.   
If sliders don't align well with plots,
you may also need to adjust the size of your window.   
In all cases you can change the figure width (default = 12 in inch) in the call to Spg1/Spg2
> Spd.Spg1(figwidth=14, dpi=120)

### Exercise 1: Phonetic Segmentations

1. setting up:
    + work with iSpectrogram()
    + load misc/friendly.wav and load also the phonetic segmentation in misc/friendly.phn (or graphemic misc/friendly.gra)
    + set your audio at a comfortable loudness when you play the sentence
2. focus on the first word 'friendly', evaluate the segmentations, listen and comment
    + 'f-r-ih-n-d-l-iy'
    + 'ih-n-d-l-iy' 
    + 'f' and 'f-r'
    + what was your most striking observation
    + to what extent do you agree with the given segmentation, based on perception, based on time waveform and based on spectrogram ?

### Exercise 2: Spectrogram Parameters

1. setting up:
    + work with iSpectrogram()
    + load again misc/friendly.was with its phonetic transcript (or some other speech wavfile)
2. adjust different spectrogram settings, always start from defaults (shift=10msec, length=30msec, preemphasis=.97)
    + describe what you observed when deviating from the defaults
    + for what parameters and in what way does the spectrogram deviate from speech perception ?
    + choose as frame_length 10, 30, 50, 100 msec. Which values would you describe as good, acceptable, not acceptible and why ?
    + choose as frame_shift 5,10, 20 msec. Which values would you describe as good, acceptable, not acceptable and why ?




In [5]:
Spd.iSpectrogram(figwidth=12)                 

iSpectrogram(children=(VBox(children=(Output(layout=Layout(border_bottom='solid 1px black', border_left='solid…

### Exercise 3: Mel Spectrogram

##### Setup
+ work with Spd.iSpectrogram2()
+ load any speech waveform (suggestions: misc/expansionist.wav, misc/friendly.wav)
+ for a readup on the mel-scale, check out [mel_scale.ipynb notebook](mel_scale.ipynb) in this folder use the plot in [mel_scale](mel_scale.png) as mapping reference
+ add the mel spectrogram to your view (using default parameters, i.e. 80 bands)

##### Questions
+ How does the regular (Fourier) spectrogram compare with the mel spectrogram ?
+ How does the mel spectrogram change visualy when you reduce the number of bands to 24 or lower ?

### Exercise 4: Spectrogram: pitch and formants


1. setting up:
    + work with Spd.iSpectrogram2()
    + load any speech waveform  (suggestions: misc/expansionist.wav, misc/friendly.wav)
    + use default spectrogram parameters   
    
    
2. Find pitch and formants in time and/or frequency domain
    + put the range cursor in the middle of a vowel
    + find pitch in three ways: time waveform, spectral slice, spectrogram: are your values consistent ?
    + could you determine gender from the obtained pitch values
    + find vowel identity by finding first and second formant and then looking up in formant tables   
    
    
3. Pitch and formants in the mel spectrum
    + for a readup on the mel-scale, check out [mel_scale.ipynb notebook](mel_scale.ipynb) in this folder use the plot in [mel_scale](mel_scale.png) as mapping reference
    + add the mel spectrogram (and mel spectrum slice) to your view
    + find the formants in the mel spectrum both for low resolution (nb=20-30) and high resolution (>80) mel filterbanks
    + in which representation is finding formants easiest ?
    + try to map formant frequencies to mel scale and filterbank channel (hint use a number of melbands equal to 30, 60 or 90 and 

In [9]:
Spd.iSpectrogram2(figwidth=12)     

iSpectrogram2(children=(VBox(children=(HBox(children=(Output(layout=Layout(width='66.0%')), Output(layout=Layo…