<a href="https://colab.research.google.com/github/compi1234/spchlab/blob/main/lab04_feature_extraction/CepstralLiftering.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Google Colab" title="Open in Google Colab"></a> 
# Cepstral Liftering
###### Last Modification : 24/03/2024
_______________

#### Cepstral Features

A cepstrogram is similar in many ways to a spectrogram. The only difference is that each 'column' in the display is a cepstral slice and not a spectral slice as in a spectrogram. 

The cepstrum is obtained by inverse DFT of the log spectral magnitude.  Thus, the units in which a cepstrum is displayed is the same as time.  In order to avoid confusion and stress the derivation from the spectrum, we call it the *Quefrency* domain, where we will use our usual **msec** as units.

The cepstrum has something magic about it wrt. speech.   It provides an almost natural separation of *source* and *filter* information.
The lowest order cepstral coefficients contain pretty much all information wrt. the spectral envelope (the filter in the source-filter model) and during segments of voiced speech
the cepstrogram we may get a clean observation of the pitch in a different range of cepstral coefficients.

This pitch peak in the cepstrogram corresponds to the pitch period, thus will be in the range \[2.5 , 16\] msec roughly.
Spectral envelope can be limited to the range below 2msec.


#### Cepstral Liftering

The above understanding lets us split the cepstrum in an *envelope* part and in a *pitch* part, by selecting cepstral coefficients below respectively above a certain cutoff.
Retaining the low order quefrency components only and then reconstructing a spectrum on the basis of this is often called **cepstral liftering** (cepstral equivalent of low-pass filtering).
A complementary operation exists in selecting he high order components and reconstruct them to a *pitch spectrogram*; with common liftering it will mainly contain pitch information (alternative this could be called the *residue spectrogram*, i.e. what is remaining after the envelope has been removed). 

___    
**Waveform &rarr; Fourier Spectrum &rarr; Cepstrum** 

**Cepstrum &rarr; selected the coefficients below liftering cutoff &rarr; Spectral Envelope**   

**Cepstrum &rarr; selected the coefficients above liftering cutoff &rarr; Spectral Residue**   
___    


#### The Demo

In the GUI below, you see plots of waveform, spectrogram, cepstrogram, envelope spectrogram and pitch spectrogram.

You can select the liftering point with a slider.  Move a slider to observe more or less smoothing by the liftering operation.
Note that the slider uses **number of coefficients** while in the cepstrogram plot we have labeled the **Quefrency-axis in msec** to express this in physical units:
> liftering_in_msec = liftering_in_samples * (1000.0/sampling_rate)


#### Things to explore

- select a vowel and do pitch extraction (visually) directly from the spectrogram
- can you verify that the above pitch estimate is consistent with counting harmonics in the Spectrogram and the Pitch Spectrogram (Residue)
- now move the liftering point:
     + what happens if the liftering point is moved close to the minimum ?
     + what happens if the liftering point is moved close to the maximum ?

## Mel Cepstral Coefficients

We can still perform this cepstral liftering starting on the mel spectrum.   For the low resolution mel spectrum the effect is somewhat less obvious.  For the high resolution mel spectrum, the pitch smoothing effect by truncation in the cepstrum becomes obvious again.
In the demo below, just 'check' the box 'use mel-scale'.

- can you get a spectral envelope of the high resolution mel spectrum by using cepstral truncation ?
- can you get a clean pitch spectrogram from the high mel cepstral coefficients ?
- check the female voice 'coding/f2.wav'.  (mel) Cepstral smoothing is not evident, in many situations you may observe some pitch leakage into the spectral envelope.  Pls. explain why things are easier with 'misc/friendly' from an average male voice.


In [1]:
#!pip install git+https://github.com/compi1234/pyspch.git
try:
    import pyspch
except ModuleNotFoundError:
    try:
        print(
        """
        To enable this notebook on platforms as Google Colab, 
        install the pyspch package and dependencies by running following code:

        !pip install git+https://github.com/compi1234/pyspch.git
        """
        )
    except ModuleNotFoundError:
        raise

In [2]:
%matplotlib inline
import os,sys, math, copy
import numpy as np
import librosa

import pyspch.sp as Sps
import pyspch.core as Spch
import pyspch.display as Spd

import matplotlib.pyplot as plt
import matplotlib as mpl
import ipywidgets as widgets
from ipywidgets import interact, interactive, Layout, HBox, VBox
from IPython.display import Audio

mpl.rcParams['figure.figsize'] = [12.0, 8.0]
mpl.rcParams['font.size'] = 12
mpl.rcParams['legend.fontsize'] = 'large'
mpl.rcParams['figure.titlesize'] = 'large'

### Load a Waveform
- Choose between male (eg. demo/friendly) or female (eg. demo/female2) or another sample of your liking   
- The next cell already computes a default spectrogram, after that the GUI does the further processing

In [3]:
# female examples -----------

name = 'demo/expansionist'
#name = 'demo/female2'
#name = 'demo/friendly'

wavfname = name+".wav"
wavdata, sr = Spch.load_data(wavfname)
# for simplicity use maximum sampling rate of 16000 Hz
if sr > 16000: wavdata, sr = Spch.load_data(wavfname,sample_rate=16000)
wavlength = wavdata.size/float(sr)
shift = 0.01
spgdata = Sps.spectrogram(wavdata,sample_rate=sr,f_shift=shift)
(nparam,nfr) = spgdata.shape
np = nparam-1

### Define all the GUI Elements

In [4]:
# assumes a wavefile preloaded in *wavdata* and spectrogram precomputed in *spgdata*
#
def cepstral_lifter_plot(lifter=12,mel_flag=False):


    n_mels = 64 if sr == 8000 else 80
    if mel_flag:
        melspec = Sps.spg2mel(S=spgdata,n_mels=n_mels)
        cep = Sps.melcepstrum(S=spgdata,n_cep=n_mels,n_mels=n_mels)
    else:
        cep = Sps.cepstrum(S=spgdata)
    spg_env, spg_res =  Sps.cep_lifter(cep,n_lifter=lifter,n_spec=cep.shape[0])
    # we plot the mean normalized cepstrum for visual clarity
    cep_n = Sps.mean_norm(cep[1:,:])
    #
    ticks4 = [0,np/4,np/2,3*np/4,np]
    flabels4 = ["0"]*5
    tlabels4 = ["0"]*5
    for i in range(len(ticks4)): 
        flabels4[i] = "%d"%(ticks4[i]*float(sr/2.)/np)
        tlabels4[i] = "%d"%(ticks4[i]*1000./sr)


    if mel_flag:
        fig = Spd.PlotSpgFtrs(wavdata=wavdata,spgdata=melspec,img_ftrs=[cep_n,spg_env,spg_res],row_heights=[1,1,1,1,1],sample_rate=sr,dy=1,figsize=(16,10))
        for i in [1,3,4]:
            fig.axes[i].set_ylabel("")
    else:
        fig = Spd.PlotSpgFtrs(wavdata=wavdata,spgdata=spgdata,img_ftrs=[cep_n,spg_env,spg_res],row_heights=[1,1,1,1,1],sample_rate=sr,dy=1,figsize=(16,10))
        fig.axes[2].set_yticks(ticks=ticks4,labels=tlabels4)
        fig.axes[2].set_ylabel("Quefrency\n msec")
        for i in [1,3,4]:
            fig.axes[i].set_yticks(ticks=ticks4,labels=flabels4)
            fig.axes[i].set_ylabel("Hz")
        
    fig.axes[1].set_title("SPECTROGRAM")
    fig.axes[2].set_title("CEPSTROGRAM")
    fig.axes[3].set_title("ENVELOPE SPECTROGRAM")
    fig.axes[4].set_title("PITCH SPECTROGRAM")
    xlim = fig.axes[2].get_xlim()
    fig.axes[2].plot(xlim,[lifter,lifter],lw=3)
     
    display(fig)
    
### GUI #####################
#############################
wg_lifter=widgets.IntSlider(value=12,min=1,max=128,step=1,
                            continous_update=False,description="Lifter",
                                  orientation = 'horizontal') #,width='50%', style={'description_width':'20%'})
wg_mel = widgets.Checkbox(value=False,description='mel') #,width='50%',Indent=False
output = widgets.interactive_output(cepstral_lifter_plot,{'lifter':wg_lifter,'mel_flag':wg_mel})
 
wg_instructions = widgets.HTML(
    "<b>CEPSTRAL LIFTERING</b><br> \
    Move the Liftering Slider to control the number of coefficients that is reserved for the envelope ; the remainder is for the residue<br>\
    Check the 'mel'-box to work with mel-spectrograms")

ui = VBox([ wg_instructions, HBox([wg_lifter,wg_mel],layout=Spd.box_layout(width='100%',border='',align_items='flex-start'))],
          layout=Spd.box_layout(width='100%',padding='10px'))

In [5]:
# Run the GUI
display(VBox([ui, output]),Audio(data=wavdata,rate=sr))

VBox(children=(VBox(children=(HTML(value="<b>CEPSTRAL LIFTERING</b><br>     Move the Liftering Slider to contr…