# 2. Spectral Analysis

## 2.1. Quasi-Stationarity
* **Assumption**: speech is stationary over a **short interval** $(10\;\text{~}\;20\;\text{ms})$

  * Too short $\rightarrow$ insufficient time to determine properties
  * Too long $\rightarrow$ approximation becomes invalid

## 2.2. Digital Signals

>$$\text{Analog} \rightarrow \text{Digital}$$
>
>* **Step 1: Low-pass filter** (remove $f>0.5\;f_{sampling}$: **Nyquist's theorem**)
>* **Step 2: Sampling** (discretisation in time)
>  * High Quality: $16\text{kHz}$ / Normal Quality: $8\text{kHz}$
>* **Step 3: ADC** (discretisation in amplitude)
>  * High Quality: $16\text{bits/sample}$ / Normal Quality: $8\text{bits/sample}$
>  * Output ranges from $-2^{Q-1}, ... , 2^{Q-1}-1$

## 2.3. Windowing
* DFT assumes periodicity $\rightarrow$ **Discontinuity** $\rightarrow$ Undesired high-frequency

><img src="images/image03.png", width=400>

* **Hamming Window**
  * Attenuates the discontinuity **but** also smears the spectral peaks

>$$w(nT)=0.54-0.46\cos{ \left[ \frac{2\pi n}{N-1} \right] }$$

* **Block Processing**
  * Allow overlap between windows
  * Typically) Frame: $10\text{ms}$ & Windows: $25\text{ms}$

## 2.4. Fourier Transform
* **Fourier Analysis**

>* **Periodic Signal of $f_0$**: constructed by sinusoids of $f_0, 2f_0, 3f_0, ...$

>  * $f_0$: fundamental frequency
>  * $nf_0 (n>1)$: harmonics

>* **Aperiodic & Stochastic Signal**: spectrum which is a continuous function of frequency

* **Fourier Transform**

><img src="images/image04.png", width=500>

* **FFT - Fast Fourier Transform**

>$$\mathcal{O}(N^2) \rightarrow \mathcal{O}(N\log N)$$
>
>* Makes use of the symmetry
>* Requires the window to be a power of 2 samples in size
>* This can be achieved by appropriate choice of analysis size and/or zero-padding (after windowing) the frame to a power of 2

## 2.5. DFT - Discrete Fourier Transform
* **Cosine Correlation**

>$$c(\Omega)=\sum^{N-1}_{n=0}{s(nT)\cos{\left(\frac{2\pi np}{N}\right)}}\;\;\;,\;\;\;p=0,1,...,N-1$$

* **Sine Correlation**

>$$s(\Omega)=\sum^{N-1}_{n=0}{s(nT)\sin{\left(\frac{2\pi np}{N}\right)}}\;\;\;,\;\;\;p=0,1,...,N-1$$

* **Amplitude & Phase**

>$$a_p=\sqrt{c^2(\Omega)+s^2(\Omega)} \;\;\; , \;\;\; \phi_p=\tan^{-1}\left[\frac{s(\Omega)}{c(\Omega)}\right]$$

## 2.6. Complex Formulation of DFT


* **DFT**

>\begin{align}
S_p&=\sum^{N-1}_{n=0}{s(nT)\left[\cos{\left(\frac{2\pi np}{N}\right)}-j\sin{\left(\frac{2\pi np}{N}\right)}\right]}\\
&=\sum^{N-1}_{n=0}{s(nT) e^{-j\left(\frac{2\pi np}{N}\right)}}\;\;\;,\;\;\;p=0,1,...,N-1
\end{align}

* **Inverse DFT**

>$$s(nT)=\frac{1}{N}\sum^{N-1}_{p=0}{S_p e^{j\left(\frac{2\pi np}{N}\right)}}\;\;\;,\;\;\;n=0,1,...,N-1$$

## 2.7. Spectral Properties of Speech

* **Vowel** (iy)

><img src="images/image05.png", width=400>
>
>* **Time domain**: approximately **periodic** with $f_0=130\text{Hz}$
>* **Frequency domain**: corresponding periodic excitation $(\text{~}7.5\;\text{cycles/1000Hz})$

* **Fricative** (s)

><img src="images/image06.png", width=400>
>
>* **Time domain**: no periodicity
>* **Frequency domain**: random variations at much higher frequency

## 2.8. Spectral Features of Sounds

* **Vowels**

>* Characterised by the first 3 **formants**
>* Following is a simple relationship

>||Tongue Front|Tongue Back|
|-|-|-|
|**High Jaw**|$F_1$ Low, $F_2$ High|$F_1$ Low, $F_2$ Low|
|**Low Jaw**|$F_1$ High, $F_2$ High|$F_1$ High, $F_2$ Low|

* **Consonantss**

>* **Liquids**
>  * Characterised by formant position & dynamics
>  * Overall energy is lower than for vowels

>* **Nasals**
>  * Strong low $F_1$ around $250\text{Hz}$ and weak higher formants
>  * Often energy around $2.5\text{kHz}$

>* **Fricatives**
>  * Most energy in higher frequencies
>  * Voiced fricatives show weak formant structure

>* **Stops**
>  * Characterised by silence,
>  * Optionally followed by a burst of high energy

## 2.9. Spectrograms
* Dimensions: **time & frequency**
* Intensity of image: **spectral energy**

><img src="images/image07.png", width=400>
>
>||Short window|Wide window|
|-|-|-|
|Time resolution|Good|Poor|
|Frequency resolution|Poor|Good|
||(vertical lines)|(horizontal lines)|
|Band|Wide|Narrow|
||(pitch periods visible)|(harmonics of $f_0$ visible)|