<h1 align="center">University of Tehran</h1>
<h2 align="center">CA2</h2>
<h3 align="center">Aryan Bastani</h3>
<h3 align="center">810100088</h3>


</br>
</br>
<h1 align="center">Hidden Markov Models</h1>

## Overview

This project involves the implementation and analysis of a Hidden Markov Model (HMM) for audio signal processing. The HMM is a statistical model in which the system being modeled is assumed to be a Markov process with hidden states. The challenge lies in determining the hidden states based on the observable data.

### Importing needed libraries

In [5]:
import numpy as np
import glob
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from hmmlearn import hmm
from python_speech_features import mfcc as Mfcc
import matplotlib.pyplot as plt
from scipy.io import wavfile
from scipy.stats import multivariate_normal
import sklearn.metrics as sk_metrics

### Load the data

In [2]:
numbers = ["0","1","2","3", "4", "5", "6", "7", "8", "9"]
numbersDataPath = [f".\Data\{numbers[0]}\*.wav", f".\Data\{numbers[1]}\*.wav",
                   f".\Data\{numbers[2]}\*.wav", f".\Data\{numbers[3]}\*.wav",
                   f".\Data\{numbers[4]}\*.wav", f".\Data\{numbers[5]}\*.wav",
                   f".\Data\{numbers[6]}\*.wav", f".\Data\{numbers[7]}\*.wav",
                   f".\Data\{numbers[8]}\*.wav", f".\Data\{numbers[9]}\*.wav"]

# Preprocessing

### 1. do you think data segmentation is useful for this project? Why?


Given the dataset and the use of Hidden Markov Models (HMMs), data segmentation can indeed be an appropriate approach. Here’s why:

Sequential Data Modeling: HMMs are well-suited for modeling sequential data, such as speech signals, time series, or natural language sequences. By segmenting the data, we can capture temporal dependencies and transitions between hidden states effectively.
State Transitions: In an HMM, each hidden state corresponds to a specific segment of the data. For example, in speech recognition, different states might represent phonemes or words. By segmenting the data, we create a natural mapping to these states.
Training Efficiency: Smaller segments facilitate more efficient training. We can focus on local patterns within each segment, leading to faster convergence during model training.
Noise Isolation: Segmentation helps identify noisy or irrelevant portions of the data. By isolating these segments, we improve the quality of the training data.
In summary, data segmentation aligns well with the principles of HMMs and can enhance their performance. However, it’s essential to choose appropriate segment boundaries and consider the trade-offs based on the specific problem and dataset .


### 2. Extracting Features and differences between them

In the realm of artificial intelligence and machine learning, **feature extraction** and **feature selection** play pivotal roles. These steps significantly impact the performance of our models. Let's delve into some essential audio features and explore their relationships:

- #### Mel-Frequency Cepstral Coefficients (MFCC)
    - **Definition**: MFCCs represent the short-term power spectrum of a sound signal.
    - **Purpose**: They capture spectral characteristics, emphasizing perceptually relevant features.
    - **Extraction Process**:
        - Compute the power spectrum.
        - Apply the Mel filterbank to obtain Mel-scale frequencies.
        - Take the logarithm of filterbank energies.
        - Apply the Discrete Cosine Transform (DCT) to yield MFCCs.
    - **Applications**: Speech recognition, speaker identification, and music genre classification.
    - **Relation to Other Features**: Often used as input features for other models due to their effectiveness in capturing spectral information.

- #### Zero Crossing Rate
    - **Definition**: Counts zero crossings in the audio signal.
    - **Purpose**: Provides insights into temporal variations, indicating noisiness or silence.
    - **Extraction Process**:
        - Calculate the sign of audio samples.
        - Count zero crossings.
    - **Applications**: Voice activity detection, audio segmentation, and noise classification.
    - **Relation to Other Features**: Complementary to features like MFCCs, focusing on temporal aspects.

- #### Mel-Spectrogram
    - **Definition**: Represents the power spectral density in Mel-scale frequencies.
    - **Purpose**: Combines spectrogram and Mel-scale filtering benefits.
    - **Extraction Process**:
        - Compute Short-Time Fourier Transform (STFT).
        - Apply Mel filterbank to STFT magnitude.
    - **Applications**: Music genre classification, speech recognition, and sound event detection.
    - **Relation to Other Features**: Linked to MFCCs due to shared Mel-scale filtering.

-   #### Chroma Features
    - **Definition**: Represents the 12 distinct pitch classes (notes) in music.
    - **Purpose**: Captures harmonic content and tonality.
    - **Extraction Process**:
        - Compute STFT of the audio signal.
        - Sum magnitudes of STFT bins corresponding to each pitch class.
    - **Applications**: Music genre classification, chord recognition, and melody extraction.
    - **Relation to Other Features**: Complementary to features like MFCCs and spectrogram-based features.

Remember that feature choice depends on the task and data characteristics. Intelligently combining these features enhances model performance. 🎧🔍