# Spectrograms - Basic Usage

+ ###### Author: Dirk Van Compernolle   
+ ###### Modification History: 5/10/2021, 01/02/2023
+ ###### Requires:  pyspch>=0.7   

In [1]:
# uncomment the pip install command to install pyspch -- it is required!
#
#!pip install git+https://github.com/compi1234/pyspch.git
#
try:
    import pyspch
except ModuleNotFoundError:
    try:
        print(
        """
        To enable this notebook on platforms as Google Colab, 
        install the pyspch package and dependencies by running following code:

        !pip install git+https://github.com/compi1234/pyspch.git
        """
        )
    except ModuleNotFoundError:
        raise

In [2]:
# Do the imports #
##################
#
%matplotlib inline
import os,sys 
import numpy as np
import pandas as pd
from IPython.display import display, Audio, HTML
#   
import pyspch.sp as Sps
import pyspch.core as Spch
import pyspch.display as Spd

import matplotlib.pyplot as plt



# make notebook cells stretch over the full screen
display(HTML(data="""
<style>
    div#notebook-container    { width: 95%; }
    div#menubar-container     { width: 65%; }
    div#maintoolbar-container { width: 99%; }
</style>
"""))

### Purpose and Background
The interactive spectrogram visualizes speech in the time-frequency domain.
Some form of time-frequency analysis is the first processing step in the human auditory system in equally so
in speech recognition systems.
 
A Fourier Spectrogram is obtained by letting a sliding window make short time spectra and by viewing this in a 2D heatmap
we may see which frequencies are present at which moment in time.  

In this notebook we focus on elementary usage

- as a popular Time-Frequency view of speech and audio signals in which we may discover the cognitive relevant elements (sounds, acoustic events, ... )
- where we want to create a visual impression that is consistent with our auditory perception.
 


### Instructions
- You should use
> Spd.iSpectrogram()   
which is equivalent to calling it with parameters:     
> Spd.iSpectrogram(figwidth=12, dpi=120)     

On most graphic card / displays these parameters will work fine.
If on your display you observe a bad mismatch between character sizes in the UI and detail in the figures, then you can try to modify the default settings.   
If sliders don't align well with plots,
you may also need to adjust the size of your window. 

#### Audio and Segmentation File Input
Suggested Files to choose from:     
( in root directory: 'https://homes.esat.kuleuven.be/~spchlab/data/'):   
- misc/friendly.wav  ... a 1 second speech fragment
- misc/train.wav     ... a train whistle
- timit/audio/train/dr1/fcfj0/si1027.wav   ... an example sentence from the TIMIT corpus

#### Segmentations
For the example speech files a number of segmentations are available (not all for each example). You can display them by entering the filename in the appropriate field.
They just have different extensions: ".gra" for grapheme or letter ,
".phn" for phone, ".syl" for syllable and ".wrd" for word   
Be aware that the segmentations may look slightly different in the waveform plot vs. spectrogram plot as segmentations that are given in msec's are rounded to the sample level in the first plot and to the frame level in the second one 


## Exercise 1: Phonetic Segmentations

1. setting up:
    + work with iSpectrogram()
    + load misc/friendly.wav and load also the phonetic segmentation in misc/friendly.phn (or graphemic misc/friendly.gra)
    + set your audio at a comfortable loudness when you play the sentence
2. focus on the first word 'friendly', evaluate the segmentations, listen and comment
    + 'f-r-ih-n-d-l-iy'
    + 'ih-n-d-l-iy' 
    + 'f' and 'f-r'
    + what was your most striking observation
    + to what extent do you agree with the given segmentation, based on perception, based on time waveform and based on spectrogram ?

## Exercise 2: Spectrogram Parameters

1. setting up:
    + work with iSpectrogram()
    + load again misc/friendly.was with its phonetic transcript (or some other speech wavfile)
    + use sliders to select small segments and listen to them to verify the phonetic transcript
2. adjust spectrogram parameters, always start from defaults (shift=10msec, length=25msec, preemphasis=.97)
    + describe what you observed when deviating from the defaults
    + for what parameters and in what way does the spectrogram deviate from speech perception ?
    + choose as frame_length 10, 30, 50, 100 msec. Which values would you describe as good, acceptable, not acceptible and why ?
    + choose as frame_shift 5,10, 20 msec. Which values would you describe as good, acceptable, not acceptable and why ?
    + Have you discovered the Heisenberg principle for speech ?

In [3]:
Spd.iSpectrogram(figwidth=12)                 

iSpectrogram(children=(VBox(children=(Output(layout=Layout(border_bottom='solid 1px black', border_left='solid…