<a href="https://colab.research.google.com/github/compi1234/spchlab/blob/main/lab02_spectrogram/SpectrogramGUI.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Google Colab" title="Open in Google Colab"></a> 
# Spectrogram GUI
This notebook is a wrapper around a Spectrogram GUI.
It includes a list of suggested exercises to run with this GUI.

In [1]:
# Uncomment the pip install command to install pyspch -- it is required!
#!pip install git+https://github.com/compi1234/pyspch.git
try:
    import pyspch
    print("pyspch was found - you are all set to continue")
except ModuleNotFoundError:
    try:
        print(
        """
        WARNING: pyspch was not found !!
        To enable this notebook on platforms as Google Colab, 
        install the pyspch package and dependencies by running following code:

        !pip install git+https://github.com/compi1234/pyspch.git
        """
        )
    except ModuleNotFoundError:
        raise

pyspch was found - you are all set to continue


In [2]:
# Do the imports #
##################
#
%matplotlib inline
import os,sys 
import numpy as np
import pandas as pd
from IPython.display import display, Audio, HTML
import matplotlib.pyplot as plt
#   
import pyspch.sp as Sps
import pyspch.core as Spch
import pyspch.display as Spd
# make notebook cells stretch over the full screen
display(HTML(data="""
<style>
    div#notebook-container    { width: 95%; }
    div#menubar-container     { width: 65%; }
    div#maintoolbar-container { width: 99%; }
</style>
"""))

## Spectrogram GUI

**iSpectrogram** is a GUI that lets you visualize various spectral representations including Fourier Spectrum, Mel spectrum and cepstrum or mel-cepstrum.
The parameters of the spectral analysis can be modified in an interactive way.

### Parameters to be passed when started the GUI
The only parameters that you are likely to need are:
- type (int)
    + type=1:  standard spectrogram
    + type=2:  spectrogram with a spectral slice plot in Right Hand figure; A frame slider controls its position
- root (None or str, default(None) is equiv to "pkg_resources_data"):  root directory where the GUI will look for sampled data and segmentation files

### GUI controls
- frame shift (in msec): default=10.0
- frame length (in msec): default=25.0
- preemphasis: default=0.97 (range: 0.0 - 1.0)
- mel spectrogram view: checkbox
- cepstral view: checkbox  (if mel spectrogram box is checked, MFCCs are shown otherwise the cepstrum obtained from the Fourier Spectrum)
- number of mel bands: default=80
- number of cepstral coefficients: default=12
- wavFile: name of waveform file (relative to root)
- Range: start and endtime for waveform selection
- segFile: name of segmentation file (relative to root)

*KNOWN LIMITATIONS*:
- The slider may for certain settings not perfectly align with the spectrogram
- Interaction can be sluggish when "sliding" the slider, better is to "click" the intended location
- When changing a text field: put your cursor at the end of the text and hit enter to signal your modification to the GUI !
- The segmentation filename field is not cleared when updating the WavFile.  However, internally the segmenation information is cleared; so it will show bogus segmentation information in the GUI but not in the plot

## Demo Materials 

#### Audio Files
Suggested Files to choose from:  ( included in the demo directory of pyspch ) :
- demo/friendly.wav       ... a 1 second speech fragment (used in figures in the course notes)
- demo/male1.wav          ... a long sentence spoken by a male person
- demo/female1.wav        ... a long sentence spoken by a female person
- demo/train.wav          ... a train whistle
- demo/timit_f1_sa1.wav   ... an example sentence from the TIMIT corpus

#### Segmentations
For the example speech files a number of segmentations are available (not all for each example). They have the same name as the .wav files, but a different extension:
".gra" for grapheme or letter ,".phn" for phone and ".wrd" for word   

In [3]:
# If you want to access data on the ESAT spchlab server, then add the argument:  root = "https://homes.esat.kuleuven.be/~spchlab/data/"
Spd.iSpectrogram(type=2,root = "https://homes.esat.kuleuven.be/~spchlab/data/")     

iSpectrogram(children=(HBox(children=(VBox(children=(Output(layout=Layout(height='95%', padding='0px 5px 0px 0…

## Exercise 1: Phonetic Segmentations

1. setting up:
    + Load the waveform "demo/friendly.wav" and add a phonetic (or graphemic) segmentation by specifying e.g. "demo/friendly.gra"
    + set your audio at a comfortable loudness when you play the sentence
    + use the default values for spectrogram analysis

2. focus on the first word 'friendly',
    + use the "Range" slider to select smaller segments of interests and hit the Audio button to listen to the selected segment
    + in particular evaluate the segmentations, by comparing and listening to 'f-r-ie-n-d-l-iy' vs. 'ie-n-d-l-iy' 
    + does what you hear correspond with the segmentation ? or does what you hear deviate from the segmentation ?
    + combine all your observations to evaluate the given segmentation from the perspective of perception, time waveform and spectrogram ?
    + do you believe that segmenting into phonemes is possible ?


## Exercise 2:  Spectrogram Parameters for the Sliding Window

Any spectrogram or spectral slice is obtained by analyzing a long signal by cutting it in successive short overlapping frames, the so called sliding window approach.   
This sliding window approach comes with a number of parameters, 
However, this sliding window should not be too short, not too long, not too much overlap, ....
What is good or bad is determined by human hearing properties and key properties of speech.

Select demo/male1.wav or demo/female1.wav as test file and use the range slider to limit your view to roughly 1 sec in order to see enough detail.

The default parameters are shift=10msec, length=25msec, preemphasis=.97
1. Play around with these values and observe the differences in the spectrogram
2. Make shift and length **shorter** than the defaults (best to use a male demo example)
    + fix frame length and set the frame_shift 5, 20 msec (the frame shift should never be bigger than the frame length)
    + set frame shift to 5 msec and choose as frame_length 5, 10 msec
    + For the shortest values the spectrogram looks quite different: describe in which way ?
    + The above effect is not in line with human perception of speech - can you explain this ?
    + Any idea why we advised you to use a male sample for this experiment ?
3. Make frame length (much) **longer** than the defaults
    + Increase the frame length gradually from 30, 50 .. 100 msec.
    + Describe again what you see happening
    + Why will a too long frame length be harmful to speech recognition ?