# Interactive Spectrograms

Date: 24/02/2022 (v0.6)

In [1]:
# Do the imports #
##################
#
%matplotlib inline
import os,sys 
import numpy as np
import pandas as pd
from IPython.display import display, Audio, HTML, clear_output
import ipywidgets as widgets
from ipywidgets import HBox, VBox, Layout
import librosa
#
try:
  import google.colab
  IN_COLAB = True
except:
  IN_COLAB = False

try:
  import pyspch
except:
  ! pip install git+https://github.com/compi1234/pyspch.git

import pyspch.display as Spd 
# make notebook cells stretch over the full screen
display(HTML(data="""
<style>
    div#notebook-container    { width: 95%; }
    div#menubar-container     { width: 65%; }
    div#maintoolbar-container { width: 99%; }
</style>
"""))

### Purpose and Background
The interactive spectrogram visualizes speech in the time-frequency domain.
Some form of time-frequency analysis is the first processing step in the human auditory system in equally so
in speech recognition systems.

Possible spectral representations are:   
**1. Fourier spectrogram**  
  A Fourier Spectrogram is obtained by letting a sliding window make short time spectra and by viewing this in a 2D heatmap
we may see which frequencies are present at which moment in time.  
**2. mel spectrogram**  
  The mel spectrogram applies warping on the frequency axis in line with the human auditory system.
Roughly speaking the frequency axis is linear below 1kHz and logarithmically compressed above it.
Today this is the most popular feature representation for speech recognition.  
**3. Cepstra or MFCCs (mel frequency cepstral coefficients)**    
If the mel spectrogram box is checked then mel cepstra or computed, otherwise regular cepstra are computed.   
Mel frequency cepstral coefficients are obtained by applying a DCT to the mel spectrum, optionally followed by truncation
to a handful of coefficients. 
MFCCs are popular because almost all information is concentrated in a a handful of low order coefficients, making them the most
compact possible speech representation.  Moreover MFCCs are highly uncorrelated, making them well suited for mathematical modeling.
While MFCCs have little to offer when abundant data / compute power is available (as is common these days),
they are still interesting in compact systems. 
 


### Instructions
various interactive spectrogram viewing possibilities are included in the module pyspch.display.interactive
- In default mode, you start the interactive spectrogram, by calling it without any parameters
> Spg1()   
> .. or   
> Spg2()

may need to call the interactive Spectrogram routines with different parameters, that better suit your computer terminal
- size is a percentage of the max display size that is possible with your current notebook setup
- the dpi parameter controls the granularity of the plot and to some extent the size of the plot vs. the controls as well


#### File Input
Suggested Files to choose from (within the default root directory 'https://homes.esat.kuleuven.be/~spchlab/data/'):
- misc/friendly.wav  ... a 1 second speech fragment
- misc/train.wav     ... a train whistle
- timit/si1027.wav   ... an example sentence from the TIMIT corpus

#### Segmentations
For the example speech files a number of segmentations are available (not all for each example). You can display them by entering the filename in the appropriate field.
They just have different extensions: ".gra" for grapheme or letter ,
".phn" for phone, ".syl" for syllable and ".wrd" for word

#### Visualization details
Normally you shouldn't have to worry about these settings.  On most displays visualization will be fine for screen/window sizes on the order of 10-24 inch.  If on your display you observe a bad mismatch between character sizes in the UI and
in the figures, then you can try to modify the default settings.   
If sliders don't align well with plots,
you may also need to adjust the size of your window.   
In all cases you can change the figure width (default = 12 in inch) in the call to Spg1 or Spg2 
> Spg1(figwidth=14, dpi=120)

### Spg1() shows a traditional spectrogram with options for segmental zooming

In [3]:
Spd.Spg1(figwidth=10)

Spg1(children=(VBox(children=(Output(layout=Layout(border='solid 1px black', margin='1px', padding='1px', widt…

### Spg2() shows a spectrogram in combination with spectral slice selection

In [4]:
Spd.Spg2(figwidth=10)                 

Spg2(children=(VBox(children=(HBox(children=(Output(layout=Layout(width='66.0%')), Output(layout=Layout(width=…