# 4. HMM Speech Recognition - Introduction

## 4.1. Hidden Markov Models

* **HMM: Generative Model of Speech**

>$$p(\mathbf{O},\mathbf{X}|\lambda) = a_{x(0), x(1)} \prod^T_{t=1} b_{x(t)}(\mathbf{o}_t) a_{x(t),x(t+1)}$$

>* $N$ states: **entry** state, $N-2$ emitting states, **exit** state
>* Common choise: $b_j(\mathbf{o}) = \mathcal{N} (\mathbf{o}; \boldsymbol{\mu}_j, \boldsymbol{\Sigma}_j)$ $\rightarrow$ $\boldsymbol{\Sigma}$ is not diagonal (high correlation)
>* Observed data: $\mathbf{O} = \{\mathbf{o}_1,...,\mathbf{o}_T\}$
>* Hidden state sequence: $\mathbf{X} = \{x(1),...,x(T)\}$ / $x(0)$: entry / $x(T+1)$: exit

* **HMM - Likelihood**

>$$p(\mathbf{O}|\lambda) = \sum_{\mathbf{X}} p(\mathbf{O}, \mathbf{X} | \lambda)$$

>* Sum over all possible state sequences $\rightarrow$ impractical
>* Alternatives: **forward-backward algorithm** and **Viterbi algorithm**

* **HMM - Extensions**

>* GMM-HMM & DNN-HMM
>* Continuous Density HMM & Discrete Density HMM

* **Delta & Delta-Delta parameters**

>$$\boldsymbol{\Delta y}_t = \frac{\sum^D_{\tau=1} \tau (\boldsymbol{y}_{t+\tau} - \boldsymbol{y}_{t-\tau})}{2 \sum^D_{\tau=1} \tau^2}$$

>$$\boldsymbol{\Delta}^2\boldsymbol{y}_t = \frac{\sum^D_{\tau=1} \tau (\boldsymbol{\Delta y}_{t+\tau} - \boldsymbol{\Delta y}_{t-\tau})}{2 \sum^D_{\tau=1} \tau^2}$$

>* $e_t$: **Normalised log-energy**
>* **12 MFCCs $\rightarrow$ 39-dim feature vector**

* **Isolated Word Recognition**

>$$\boldsymbol{O} \rightarrow p(\boldsymbol{O}|\mathcal{M}_i) \rightarrow \underset{i}{\text{argmax}} \; p(\boldsymbol{O}|\mathcal{M}_i)$$

>* In practice, $p(\boldsymbol{O}|\mathcal{M}_i)$ is estimated based on the **most likely state sequence**

## 4.2. Composite HMMs

* **Composite HMMs**

>* Smaller HMMs $\rightarrow$ Composition & Concatenation $\rightarrow$ Larger HMM
>* Sub-word units $\rightarrow$ Word models $\rightarrow$ Sentence models

* **Recognition with a Composite Model**

>* **Isolated:** find the optimal state sequence and hence find the model
>* **Continuous:** create **sentence generators** using composite HMMs / use **token passing** 

## 4.3. Viterbi Algorithm

* **Viterbi Algorithm**
  * Efficient (search time linear in $T$)
  * Prevent numerical overflow by using $\log \phi_j (t)$

>* $\phi_j (t)$: probability of 'best' partial path of length $t$ through the trellis ending in state $j$
>* $X^{(t-1)}$: set of all partial paths of length $t-1$

>$$\phi_j (t) = \underset{X^{(t-1)}}{\max} \{ p(\boldsymbol{o}_{1:t}, x(t)=j | \mathcal{M}_i) \} = \underset{i}{\max} \{ \phi_i (t-1) a_{ij} b_j (\boldsymbol{o}_t) \}$$

>* **Step 1: Initialisation**

>$$\phi_1(0) = 1.0 \;\;\; \phi_{1<j<N}(0) = 0.0 \;\;\; \phi_1(1 \leq t \leq T) = 0.0$$

>* **Step 2: Recursion**

>`for t = 1,...,T`

>`for j = 2,...,N-1`

>`compute` $\phi_j (t) = \underset{1 \leq k < N}{\max} [\phi_k (t-1) a_{kj}] b_j (\boldsymbol{o}_t)$

>* **Step 3: Termination**

>$$p(\boldsymbol{O}, \boldsymbol{X}^* | \lambda) = \underset{1<k<N}{\max} \phi_k (T) a_{kN}$$

## 4.4. Viterbi Training

* **MLE** (find the HMM model parameters)

>$$\hat{\lambda} = \underset{\lambda}{\text{argmax}} \Big\{ \prod^R_{r=1} p(\boldsymbol{O}^{(r)}|\lambda) \Big\}$$

* **Viterbi** algorithm (based on **best state sequence**, $\boldsymbol{X}^*$)

>$$p(\boldsymbol{O}|\lambda) \approx p(\boldsymbol{O},\boldsymbol{X}^*|\lambda)$$

* **Transition Parameters**

>$$a_{ij} = \frac{\text{No. of transitions } i \rightarrow j}{\text{No. of transitions from } i}$$

* **Gaussian Parameter Estimation**

>\begin{align}
\hat{\boldsymbol{\mu}}_j &= \frac{ \sum^R_{r=1} \sum^{T^{(r)}}_{t=1} \delta (x^{(r)^*} (t) = j) \boldsymbol{o}_t^{(r)}}{\sum^R_{r=1} \sum^{T^{(r)}}_{t=1} \delta (x^{(r)^*} (t) = j)} \\
\hat{\boldsymbol{\Sigma}}_j &= \frac{ \sum^R_{r=1} \sum^{T^{(r)}}_{t=1} \delta (x^{(r)^*} (t) = j) (\boldsymbol{o}_t^{(r)}-\hat{\boldsymbol{\mu}}_j) (\boldsymbol{o}_t^{(r)}-\hat{\boldsymbol{\mu}}_j)' }{\sum^R_{r=1} \sum^{T^{(r)}}_{t=1} \delta (x^{(r)^*} (t) = j)} \\
\end{align}

* **Iterative Training**

>* **Step 1:** Viterbi algorithm (find best state sequence)
>* **Step 2:** Update HMM parameters