### Initialization:
- **HInit**: creates a single Gaussian HMM, taking as an input a prototype HMM definition and a set of training data files.
- **HRest**: once a HMM has been initialized using HInit, its parameters can be re-estimated using this command, using Baum-Welch algorithm. This improves the training set likelihood and sometimes improves recognition results.

Both HInit and HRest use fixed phone boundaries, as specified by the label file, but whereas HInit uses *Viterbi alignment* during re-estimation, HRest uses *Baum-Welch* training for each phone model.

### Avoid using the hand-produced label boundaries:
- Next stage of HMM training is to avoid using the hand-produced label boundaries by a technique known as **embedded training**. This is done by the command **HERest**, which simultaneously re-estimates the parameters of a full set of HMM definitions. Note that whereas HRest repeats the re-estimation procedure *repeatedly until convergence is reached*, HERest performs *just one iteration of re-estimation*.

### Folders:
- **convert**: feature-type specific environment files and libraries (some of which are shared between feature types).
- **data**: the coded TIMIT files.
- **doc**: the HTK book and practical handout sheets.
- **tools**: both HTK source *htksrc*, binaries *htkbin*, and various scripts (*steps* and *scripts* sub-directories).

## **Feature types** (in convert directory):

### FBRANK (fbk25d)
In signal processing, a **filter bank (or filterbank)** is an array of bandpass filters that separates the input signal into multiple components, each one carrying a single frequency sub-band of the original signal.

### MFCCs (mfc13d)

In sound processing, the **mel-frequency cepstrum (MFC)** is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. In Fourier analysis, the *cepstrum* is the result of computing the inverse Fourier transform (IFT) of the logarithm of the estimated signal spectrum. The method is a tool for investigating periodic structures in frequency spectra.

**Mel-frequency cepstral coefficients (MFCCs)** are coefficients that collectively describe a MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum-of-a-spectrum"). The difference between the cepstrum and the mel-frequency cepstrum is that in the MFC, the frequency bands are equally spaced on the mel scale, which approximates the human auditory system's response more closely than the linearly-spaced frequency bands used in the normal spectrum. This frequency warping can allow for better representation of sound, for example, in audio compression.


# **Front-end parameterisations**

### Filter Bank
- FBK_Z_Init
```bash
    ../tools/steps/step-mono -NUMMIXES 8 ../convert/fbk25d/env/environment_Z FBK_Z_Init/mono
    ../tools/steps/step-decode $PWD/FBK_Z_Init/mono hmm84 FBK_Z_Init/decode-mono-hmm84
```
- FBK_D_Z_Init
```bash
    ../tools/steps/step-mono -NUMMIXES 8 ../convert/fbk25d/env/environment_D_Z FBK_D_Z_Init/mono
    ../tools/steps/step-decode $PWD/FBK_D_Z_Init/mono hmm84 FBK_D_Z_Init/decode-mono-hmm84
```
- FBK_D_A_Z_Init
```bash
    ../tools/steps/step-mono -NUMMIXES 8 ../convert/fbk25d/env/environment_D_A_Z FBK_D_A_Z_Init/mono
    ../tools/steps/step-decode $PWD/FBK_D_A_Z_Init/mono hmm84 FBK_D_A_Z_Init/decode-mono-hmm84
```
- FBK_Z_FlatStart
```bash
    ../tools/steps/step-mono -NUMMIXES 8 ../convert/fbk25d/env/environment_Z FBK_Z_FlatStart/mono
    ../tools/steps/step-decode $PWD/FBK_Z_FlatStart/mono hmm84 FBK_Z_FlatStart/decode-mono-hmm84
```
- FBK_D_Z_FlatStart
```bash
    ../tools/steps/step-mono -NUMMIXES 8 ../convert/fbk25d/env/environment_D_Z FBK_D_Z_FlatStart/mono
    ../tools/steps/step-decode $PWD/FBK_D_Z_FlatStart/mono hmm84 FBK_D_Z_FlatStart/decode-mono-hmm84
```
- FBK_D_A_Z_FlatStart
```bash
    ../tools/steps/step-mono -NUMMIXES 8 ../convert/fbk25d/env/environment_D_A_Z FBK_D_Z_FlatStart/mono
    ../tools/steps/step-decode $PWD/FBK_D_A_Z_FlatStart/mono hmm84 FBK_D_A_Z_FlatStart/decode-mono-hmm84
```

### MFCCs
- MFC_E_Z_Init
```bash
    ../tools/steps/step-mono -NUMMIXES 8 ../convert/mfc13d/env/environment_E_Z MFC_E_Z_Init/mono
    ../tools/steps/step-decode $PWD/MFC_E_Z_Init/mono hmm84 MFC_E_Z_Init/decode-mono-hmm84
```
- MFC_E_D_Z_Init
```bash
    ../tools/steps/step-mono -NUMMIXES 8 ../convert/mfc13d/env/environment_E_Z MFC_E_D_Z_Init/mono
    ../tools/steps/step-decode $PWD/MFC_E_D_Z_Init/mono hmm84 MFC_E_D_Z_Init/decode-mono-hmm84
```
- MFC_E_D_A_Z_Init
```bash
    ../tools/steps/step-mono -NUMMIXES 8 ../convert/mfc13d/env/environment_E_D_A_Z MFC_E_Z_Init/mono
    ../tools/steps/step-decode $PWD/MFC_E_D_A_Z_Init/mono hmm84 MFC_E_D_A_Z_Init/decode-mono-hmm84
```
- MFC_E_Z_FlatStart
```bash
    ../tools/steps/step-mono -NUMMIXES 8 ../convert/mfc13d/env/environment_E_Z MFC_E_Z_FlatStart/mono
    ../tools/steps/step-decode $PWD/MFC_E_Z_FlatStart/mono hmm84 MFC_E_Z_FlatStart/decode-mono-hmm84
```
- MFC_E_D_Z_FlatStart
```bash
    ../tools/steps/step-mono -NUMMIXES 8 ../convert/mfc13d/env/environment_E_Z MFC_E_D_Z_FlatStart/mono
    ../tools/steps/step-decode $PWD/MFC_E_D_Z_FlatStart/mono hmm84 MFC_E_D_Z_FlatStart/decode-mono-hmm84
```
- MFC_E_D_A_Z_FlatStart
```bash
    ../tools/steps/step-mono -NUMMIXES 8 ../convert/mfc13d/env/environment_E_D_A_Z MFC_E_Z_FlatStart/mono
    ../tools/steps/step-decode $PWD/MFC_E_D_A_Z_FlatStart/mono hmm84 MFC_E_D_A_Z_FlatStart/decode-mono-hmm84
```

# **Commands**

The **exp** directory will be used to build and test various HMM systems.

In [None]:
../tools/steps/step-mono -NUMMIXES 8 ../convert/fbk25d/env/environment_Z FBK_Z_Init/mono

This command performs a sequence of steps in the FBK Z Init/mono directory with the appropriate environment file (defines a FBANK parameterisation with no differentials) and first estimates models using HInit in the hmm0 sub-directory, with HRest in the hmm1 sub-directory. This initialised model is used as the basis for HERest training starting in the hmm10 directory.

The naming convention usually used (after initialisation steps) is that the first digits after hmm denote the number of Gaussians and the final digit denotes the number of HERest iterations. After 4 iterations of HERest the script increases the number of Gaussian components to 2 and performs 4 more iterations of HERest to form the MMF in hmm24. This procedure is repeated until the target number of mixture components specified by -NUMMIXES is reached.

In [None]:
../tools/steps/step-decode $PWD/FBK_Z_Init/mono hmm84 FBK_Z_Init/decode-mono-hmm84