<a href="https://colab.research.google.com/github/compi1234/spchlab/blob/main/lab03_source_filter/SpectrogramAnalysis.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Google Colab" title="Open in Google Colab"></a> 
# Spectrogram Analysis
### Finding Pitch and Formants from waveform, spectrum and spectrogram

In [1]:
# uncomment the pip install command to install pyspch -- it is required!
#
#!pip install git+https://github.com/compi1234/pyspch.git
#
try:
    import pyspch
except ModuleNotFoundError:
    try:
        print(
        """
        To enable this notebook on platforms as Google Colab, 
        install the pyspch package and dependencies by running following code:

        !pip install git+https://github.com/compi1234/pyspch.git
        """
        )
    except ModuleNotFoundError:
        raise

In [2]:
# Do the imports #
##################
#
%matplotlib inline
import os,sys 
import numpy as np
import pandas as pd
from IPython.display import display, Audio, HTML
#   
import pyspch.sp as Sps
import pyspch.core as Spch
import pyspch.display as Spd

import matplotlib.pyplot as plt



# make notebook cells stretch over the full screen
display(HTML(data="""
<style>
    div#notebook-container    { width: 95%; }
    div#menubar-container     { width: 65%; }
    div#maintoolbar-container { width: 99%; }
</style>
"""))

# Spectrogram analysis for pitch and formants


#### Setup
+ use Spd.iSpectrogram(type=2), espcially to see spectral slices where the cursor is
+ use the Range Slider to select parts to pview
+ suggested files to use:
    - demo/female2.wav (female, wideband),   female2_8k  is a narrowband version of the same
    - demo/male1.wav (male, narrowband)
    - demo/timit_f1_sa2.wav                  the same "SA2" sentence by a female
    - demo/timit_m1_sa2.wav                          and male person
    
#### Ex1. Extracting pitch and formants in time and/or frequency domain
+ for 'demo/female2': analyze the speech (vowels) at time=2.33sec 
+ find pitch in three ways (for the first measurement)
    - finding the period in the time waveform
    - find harmonic distance in a spectral slice (e.g. count how many in a 1kHz range)
    - find harmonic distance in a spectrogram
+ do your different pitch estimation methods give consistent estimates ?
+ what happens if you shift your window a bit to the left or the right while remaining in the same vowel. Do your estimates change over time ?
+ now look at some different time instances (e.g. t=1.80 sec or t=3.45 sec): formants are different (that's normal), but how about the pitch ? are these pitch estimates consistent ? are the deviations "within range of expectations" ?  explain the differences if any 
+ what is (roughly) the pitch range that you observe for this speaker in this single sentence ?
+ is pitch a clear indication of gender ?
+ Also extract the formats F1 and F2 at time=2.33 sec.
    - Think "spectral envelope" when trying to measure the formant values
    - You may select the "ENVELOPE SPG"-view to get a good view of spectral envelope
+ some of the above tasks are indeed (very) difficult, given the views that you have. What is lacking in your opinion, what is the problem (the information is there as you can recognize the sounds easily when you listen)

#### Ex2. Pitch and formants in the mel spectrum
+ make sure you are up to date on the mel-scale
+ add the mel spectrogram (and mel spectrum slice) to your view  (start with high resolution: set  to 80 bands if sr=16kHz, 64 bands if sr=8kHz)
+ Look for the 10th harmonic both in the linear spectrum and in the mel spectrum - what's the most striking difference
+ is finding pitch easier / more difficult in the mel spectrum or in the fourier spectrum ?
+ Does you answer to the question above change if you lower the number of mel bands to 24 ?
+ in which representation is finding formants easiest ?

#### Ex3. Simple classification tasks with simple features
Use your measurements from Ex.1 at time=2.33 (alternatively from t=3.45sec) for Ex3.
We're not so keen on doing this with the measurements at t=1.8 (Why ?)

a.  Determine **gender** using **pitch** as (only) feature
    + Use statistics of the Hillenbrand database as reference; distributions/models are shown in PitchDistribution.ipynb
    + What gender can you determine (include both adult and youngster populations) from the pitch ? Are you pretty sure or not so much ?
  
b. Determine **vowel** using **\[F1,F2\]** as features
    + Make use of the Hillenbrand data as reference data.  Scatter plots, distributions, mean values, .. are given in FormantDistribution.ipynb
    + Identify the vowel from inspecting graphs and/or tables in the above tutorial.
    + Which is the second most likely one ? (and is this is a similar vowel ?)
    + Which table/graph is the most helpful ?
    + Did you make use of your knowledge about gender ? if so, in what way ? should you do this or not ?

#### Additional Exercises
+ repeat for a male voice
+ Look for a recording of a question instead of a statement; what difference do you see in the pitch evolution ? 

In [3]:
#root = 'https://homes.esat.kuleuven.be/~spchlab/data/'   # use this for access to more data than the pyspch pkg
root = None
Spd.iSpectrogram(figwidth=14,type=2,root=root)  

iSpectrogram(children=(HBox(children=(VBox(children=(Output(layout=Layout(height='95%', padding='0px 5px 0px 0…