# Spectrograms for Human Speech Analysis

+ ###### Author: Dirk Van Compernolle   
+ ###### Modification History:   1/1/2020, 11/02/2023
+ ###### Requires:  pyspch>=0.7

In [1]:
# uncomment the pip install command to install pyspch -- it is required!
#
#!pip install git+https://github.com/compi1234/pyspch.git
#
try:
    import pyspch
except ModuleNotFoundError:
    try:
        print(
        """
        To enable this notebook on platforms as Google Colab, 
        install the pyspch package and dependencies by running following code:

        !pip install git+https://github.com/compi1234/pyspch.git
        """
        )
    except ModuleNotFoundError:
        raise

In [2]:
# Do the imports #
##################
#
%matplotlib inline
import os,sys 
import numpy as np
import pandas as pd
from IPython.display import display, Audio, HTML
#   
import pyspch.sp as Sps
import pyspch.core as Spch
import pyspch.display as Spd

import matplotlib.pyplot as plt



# make notebook cells stretch over the full screen
display(HTML(data="""
<style>
    div#notebook-container    { width: 95%; }
    div#menubar-container     { width: 65%; }
    div#maintoolbar-container { width: 99%; }
</style>
"""))

### Purpose and Background

Be sure to have run the exercises in Spectrograms_basics.ipynb first to familiarize yourself with the interactive spectrogram GUI.

In this notebook we explore the spectrogram - and variants of it -  in function of human speech recognition.   


## Exercise 1: Mel Spectrogram

#### Setup
+ work with Spd.iSpectrogram2()
+ load any speech waveform (suggestions: coding/f2.wav, misc/expansionist.wav, misc/friendly.wav)
+ for a readup on the mel-scale, check out [mel_scale.ipynb notebook](mel_scale.ipynb) in this folder use the plot in [mel_scale](mel_scale.png) as mapping reference
+ add the mel spectrogram to your view (using default parameters, i.e. 80 bands)

#### Questions
+ How does the regular (Fourier) spectrogram compare with the mel spectrogram ?
+ How does the mel spectrogram change visualy when you reduce the number of bands from 80 to 24 or lower ?

## Exercise 2: Spectrogram: pitch and formants


1. setting up:
    + work with Spd.iSpectrogram2()
    + load any speech waveform  (suggestions: coding/f2.wav, misc/expansionist.wav, misc/friendly.wav)
    + use default spectrogram parameters   
    
    
2. Find pitch and formants in time and/or frequency domain
    + put the range cursor in the middle of a vowel
    + find pitch in three ways: time waveform, spectral slice, spectrogram: are your values consistent ?
    + could you determine gender from the obtained pitch values
    + find vowel identity by finding first and second formant and then looking up in formant tables   
    
    
3. Pitch and formants in the mel spectrum
    + for a readup on the mel-scale, check out [mel_scale.ipynb notebook](mel_scale.ipynb) in this folder use the plot in [mel_scale](mel_scale.png) as mapping reference
    + add the mel spectrogram (and mel spectrum slice) to your view
    + find the formants in the mel spectrum both for low resolution (nb=20 or 24) and high resolution (80) mel filterbanks
    + in which representation is finding formants easiest ?
    + try to map formant frequencies to mel scale and filterbank channel (e.g. with [mel_filterbank80.png](mel_filterbank80.png))
    + Can you see the pitch in both low and high resolution mel spectrum; which one is the best ?

In [6]:
Spd.iSpectrogram2(figwidth=12,fname='coding/f2.wav')     

iSpectrogram2(children=(VBox(children=(HBox(children=(Output(layout=Layout(width='66.0%')), Output(layout=Layo…