# Spectrograms for Human Speech Analysis

+ Author: Dirk Van Compernolle   
+ History:   
    - 5/10/2021: Created

In [1]:
# uncomment the pip install command to install pyspch -- it is required!
#
#!pip install git+https://github.com/compi1234/pyspch.git
#
try:
    import pyspch
except ModuleNotFoundError:
    try:
        print(
        """
        To enable this notebook on platforms as Google Colab, 
        install the pyspch package and dependencies by running following code:

        !pip install git+https://github.com/compi1234/pyspch.git
        """
        )
    except ModuleNotFoundError:
        raise

In [2]:
# Do the imports #
##################
#
%matplotlib inline
import os,sys 
import numpy as np
import pandas as pd
from IPython.display import display, Audio, HTML
#   
import pyspch.sp as Sps
import pyspch.core as Spch
import pyspch.display as Spd

import matplotlib.pyplot as plt



# make notebook cells stretch over the full screen
display(HTML(data="""
<style>
    div#notebook-container    { width: 95%; }
    div#menubar-container     { width: 65%; }
    div#maintoolbar-container { width: 99%; }
</style>
"""))

### Purpose and Background

Be sure to have run the exercises in Spectrograms_basics.ipynb first to familiarize yourself with the interactive spectrogram GUI.

In this notebook we explore the spectrogram - and variants of it - very much in function of human speech recognition.   Some adaptations are suggested by our understanding of the humand hearing system.  And some of the features that we look at in the spectrogram are best understood from the source-filter model of speech production.

A spectrogram can represent many views of which the Fourier spectrogram is just one.

Possible spectral representations are:   
**1. Fourier spectrogram**  
  A Fourier Spectrogram is obtained by letting a sliding window make short time spectra and by viewing this in a 2D heatmap
we may see which frequencies are present at which moment in time.  
**2. mel spectrogram**  
  The mel spectrogram applies warping on the frequency axis in line with the human auditory system.
Roughly speaking the frequency axis is linear below 1kHz and logarithmically compressed above it.
Today this is the most popular feature representation for speech recognition.  
**3. MFCCs (mel frequency cepstral coefficients)**. 
  Mel frequency cepstral coefficients are obtained by applying a DFT to the mel spectrum, optionally followed by truncation
to a handful of coefficients. 
MFCCs are popular because almost all information is concentrated in a a handful of low order coefficients, making them the most
compact possible speech representation.  Moreover MFCCs are highly uncorrelated, making them well suited for mathematical modeling.
While MFCCs have little to offer when abundant data / compute power is available (as is common these days),
they are still interesting in compact systems. 
 


### Visualization Options
- You have 2 slightly different versions of interactive spectrogram available.  
> pyspch.interactive.iSpectrogram()  
which just shows a traditional spectrogram with one or several spectral representations   
or   
> pyspch.interactive.iSpectrogram2()  
which shows the spectrograms in the left hand side of your screen and adds a spectral slice view in the right hand side.

- In most situations it is fine to call these without any parameters.  However, depending on your screen size and resolution some parts in the display may not align well.  If that is the case, first try to adjust you window size in which you are running Jupyter or otherwise try to adjust the figwidth parameter (default = 10)

## Exercise 1: Mel Spectrogram

#### Setup
+ work with Spd.iSpectrogram2()
+ load any speech waveform (suggestions: misc/expansionist.wav, misc/friendly.wav)
+ for a readup on the mel-scale, check out [mel_scale.ipynb notebook](mel_scale.ipynb) in this folder use the plot in [mel_scale](mel_scale.png) as mapping reference
+ add the mel spectrogram to your view (using default parameters, i.e. 80 bands)

#### Questions
+ How does the regular (Fourier) spectrogram compare with the mel spectrogram ?
+ How does the mel spectrogram change visualy when you reduce the number of bands from 80 to 24 or lower ?

## Exercise 2: Spectrogram: pitch and formants


1. setting up:
    + work with Spd.iSpectrogram2()
    + load any speech waveform  (suggestions: misc/expansionist.wav, misc/friendly.wav)
    + use default spectrogram parameters   
    
    
2. Find pitch and formants in time and/or frequency domain
    + put the range cursor in the middle of a vowel
    + find pitch in three ways: time waveform, spectral slice, spectrogram: are your values consistent ?
    + could you determine gender from the obtained pitch values
    + find vowel identity by finding first and second formant and then looking up in formant tables   
    
    
3. Pitch and formants in the mel spectrum
    + for a readup on the mel-scale, check out [mel_scale.ipynb notebook](mel_scale.ipynb) in this folder use the plot in [mel_scale](mel_scale.png) as mapping reference
    + add the mel spectrogram (and mel spectrum slice) to your view
    + find the formants in the mel spectrum both for low resolution (nb=20-30) and high resolution (>80) mel filterbanks
    + in which representation is finding formants easiest ?
    + try to map formant frequencies to mel scale and filterbank channel (hint use a number of melbands equal to 30, 60 or 90 and 

In [4]:
Spd.iSpectrogram2(figwidth=12)     

iSpectrogram2(children=(VBox(children=(HBox(children=(Output(layout=Layout(width='66.0%')), Output(layout=Layo…