<h4 style="margin-left: 8%;">  CA02 - <i>An Introduction to Artificial Intelligence</i> - Dr. Fadaei 
<br>
<b> Mohammad Montazeri - 810699269 </b> </h4>

<h1 style="text-align: center;"> Hidden Markov Model in Sound Classification </h1>

## Abstract
A Hidden Markov Model (HMM) is a statistical model that is used to describe a system that changes over time and whose behavior is not directly observable. It is a type of Markov chain where the state is not directly observable, but the output or observation is. The model consists of
- a set of states 
- a set of possible observations
- a set of probabilities that describe the likelihood of transitioning between states and 
- emitting observations.  

HMMs are widely used in speech recognition, handwriting recognition, gesture recognition, part-of-speech tagging, musical score following, and bioinformatics.
The main objective of this project is to create hidden models for each of the **blues**, **metal**, **hiphop**, and **pop** music groups. In the end, if a music track outside the problem data is given to the program, it can be determined _based on the created models_ that which genre of music it belongs to.

## Section One: Implementing with Libraries
In the first section, we need to design and implement a HMM model using the already-made libraries (hmmlearn). Then, we train the model on the data we have been given and finally evaluate the results with the criteria introduced in the evaluation and analysis section.

### Initializing
Here we define the constant values for our HMM model.
We also read all of 400 audio files using the `wavfile` method from *scipy.io* package and then, calculate their MFCC signals using `mfcc` from *python_speech_features* package.

In [None]:
import numpy as np
from hmmlearn import hmm
from python_speech_features import mfcc
from scipy.io import wavfile
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, classification_report, ConfusionMatrixDisplay


# Define the number of iterations for the training process
n_iter = 7

# Define the number of musical instances for each genre
n_instances = 100

# Define the list of genres
genres = ['blues', 'hiphop', 'pop', 'metal']

# Define the list of filenames
filenames = {genre:[] for genre in genres}
for genre in genres:
    for i in range(n_instances):
        if i < 10:
            filenames[genre].append(f'data/{genre}/{genre}.0000{i}.wav')
        else:
            filenames[genre].append(f'data/{genre}/{genre}.000{i}.wav')


# Define the list of MFCCs
mfccs = {genre:[] for genre in genres}
for genre, filename in filenames.items():
    for i in range(n_instances):
        # Load the audio file
        rate, signal = wavfile.read(filename[i])

        # Extract the MFCCs
        mfcc_feat = mfcc(signal, rate, nfft=1024)

        # Append the MFCCs to the list
        mfccs[genre].append(mfcc_feat)


### Training
Here, we train our 4 models (one for each genre) based on 80% of the given audio files. Note that we've defined different number of hidden states for different genres based on their complexity. The more complex genres need more hidden states.

In [None]:
# Define the number of hidden states for each genre
n_states = {'blues': 4, 'hiphop': 5, 'pop': 6, 'metal': 7}

# Define the model for each genre
models = {}
for genre in genres:
    models[genre] = hmm.GaussianHMM(n_components=n_states[genre], n_iter=n_iter)

# Train the model for each genre
for genre in genres:
    data = np.concatenate(mfccs[genre][:n_instances-20])
    models[genre].fit(data)



### Testing
Here, we check our algorithm's predictions on the remaining test files
(which is 20% of the given audio files), and calculate its efficiency. For each instance, it prints the predicted genre and in the end, prints the accuracy of the algorithm's estimation. Here is an output of this part:

|                                                 |                                                 |
| ----------------------------------------------- | ----------------------------------------------- |
| ![Alt text](<Screenshot 2023-12-07 225435.png>) | ![Alt text](<Screenshot 2023-12-07 225457.png>) |
| ![Alt text](<Screenshot 2023-12-07 225457.png>) | ![Alt text](<Screenshot 2023-12-07 225605.png>) |


In [None]:
labels = {genre:[] for genre in genres}  
for genre in genres:
    true = 0
    print(f'----------- {genre} -----------')
    for i in range(n_instances-20, n_instances):
        print(i, end='\t')
        test = mfccs[genre][i]
        B = models['blues'].score(test)
        H = models['hiphop'].score(test)
        P = models['pop'].score(test)
        M = models['metal'].score(test)
        prediction = max(B, H, P, M)

        if prediction == B:
            print('predicted genre is Blues')
            labels[genre].append('blues')
            true = true+1 if genre == 'blues' else true
        elif prediction == H:
            print('predicted genre is Hiphop')
            labels[genre].append('hiphop')
            true = true+1 if genre == 'hiphop' else true
        elif prediction == P:
            print('predicted genre is Pop')
            labels[genre].append('pop')
            true = true+1 if genre == 'pop' else true
        elif prediction == M:
            print('predicted genre is Metal')
            labels[genre].append('metal')
            true = true+1 if genre == 'metal' else true
    print(f'This algorithm predicted the genre of unseen {genre} musics with an accuracy of {true/20*100:.1f}% \n')

        


### Results
Here, for better inference, we plot the results obtained in the previous section. That includes the confusion matrix and also different factors for evaluation of machine learning models. We do so by means of `confusion_matrix()` method and `classification_report()` method from *sklearn.metrics* package. We also use *matplotlib.pyplot* to plot the bar graphs. Here is an output of this part:  


![Alt text](<Screenshot 2023-12-07 225627.png>)
<br>
![Alt text](Figure_1.png)





In [None]:
real_lbl = ['blues' for i in range(20)] + ['hiphop' for i in range(20)] + ['pop' for i in range(20)] + ['metal' for i in range(20)]
predicted_lbl = labels['blues'] + labels['hiphop'] + labels['pop'] + labels['metal']


cm = confusion_matrix(real_lbl, predicted_lbl, normalize='true')
print(cm, end='\n\n')
print(classification_report(real_lbl, predicted_lbl, target_names=genres))

# Plot the confusion matrix
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=['blues', 'hiphop', 'metal', 'pop'])
disp.plot(cmap='BuGn')
plt.show()

### Heatmap
Here, we plot the heatmap of one instance for each of the 4 given musical genres. For this matter, we've chosen the 10th index of each genre and plotted its bar graph using *matplotlib.pyplot*'s `imshow()` method. Here is the result:  

![Alt text](heatmap-1.png)

In [None]:
fig, ax = plt.subplots(4, 1, sharex=True, figsize=(9,27))
t_min, t_max = 0, 30
y_min, y_max = 0, 13

for i,genre in enumerate(genres):
    data = mfccs[genre][9]
    data = np.swapaxes(data, 0, 1)
    ax[i].imshow(data, interpolation='nearest', origin='lower', extent=[t_min, t_max, y_min, y_max])
    ax[i].set_title(genre.upper())
    ax[i].set_xlabel('Time (s)')
    ax[i].set_ylabel('MFCC Coefficients')

# Set the horizontal space between subplots
fig.subplots_adjust(top=0.7)

plt.show()
    

<h4 style="color:grey; margin-left:8%;">by: <br> Mohammad Montazeri
<br>
810699269 </h4>