## Setting up the Environemnt

Before we start, we first need to make sure that you have the necessary modules for the data. Here are the list of commands you need to do to run the line below:
```[link text](https://)
pip install pandas
pip install sklearn
pip install librosa
```

After which, download these datasets:
- Emotion (Audio): https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess/code


Note: This code here is the one that was use in **Module 1 Summative Assessment**. The only thing that was changed was the modelling part. We've also removed some unncessary features (only focusing on MEL) to make things a lot easier to work with.

## Audio Recognition (Emotion) [Data Preperation]

In this part, we will try to recognize emotion through audio. We will be using [this](https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess/code) dataset and [this](https://www.kaggle.com/code/bprathibalakshmi/classification-of-speech-emotion-99-accuracy) as our main code reference for our work and analysis.

In [None]:
# First we import the libraries we will use in this project
import numpy as np                              # Used to perform mathematical operations on the data (https://numpy.org/doc/)
import pandas as pd                             # Used to read the data (https://pandas.pydata.org/)
from sklearn.preprocessing import LabelEncoder  # Used to encode the labels (https://scikit-learn.org/stable/modules/preprocessing.html)

# We will make use of librosa to extract the features of the audio files
import librosa  # Used to extract the features of the audio files (https://librosa.org/doc/latest/index.html)

# utility
from pathlib import Path # Used to access the files in a directory (https://docs.python.org/3/library/pathlib.html)
import os # Used to access the files in a directory (https://docs.python.org/3/library/os.html)

Now that we've imported all of the necessary information. We'll start with preparing the dataset by loading it to a dataframe.

Note: We've use the same technique when loading the data from other sections such as Image Recognition and Emotion Recognition after this section. It will be important for you to understand each steps and how it works to understand other sections to as we're not going to re-explain everthing again

In [None]:
# Here's a good reference for data extraction in this audio dataset: https://www.kaggle.com/code/virial23/ser-database-to-feature-extractor. But we will be using other way that I think is easier.

# We first get the path of our dataset
_filePath = Path('audioData\\TESS Toronto emotional speech set data')

# Then we create a list of file paths and labels
## This can be read as follows:
## Create a list of all the files in the directory and subdirectories that ends with .png
_allFiles = list(_filePath.glob(r"**/*.wav")) # Reference: https://docs.python.org/3/library/glob.html

# get the labels of each images (Their classfications)
## This can be read as follows:
## Create a list of all the labels of the images using its parent folder name from the list of all files
_labels = list(map(lambda x: os.path.split(os.path.split(x)[0])[1],_allFiles)) # Reference: https://docs.python.org/3/library/os.path.html

# Create a series of the file paths
_allFiles = pd.Series(_allFiles).astype(str) # convert the datatype of the series to string (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html)

# Create a series of the labels
_labels = pd.Series(_labels) # References: https://pandas.pydata.org/docs/reference/api/pandas.Series.html

# Concatenate the two series to create a dataframe
df = pd.concat([_allFiles,_labels],axis=1) # References: https://pandas.pydata.org/docs/reference/api/pandas.concat.html

# Rename the columns of the dataframe
df.columns = ['wav', 'label']

# remove the OAF_ and YAF_ from the labels
# References: https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html

# Save the dataframe as a csv file
df.to_csv('audioDataset.csv', index=False) # References: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html

# Print for confirmation
df.head()

  _allFiles = pd.Series(_allFiles).astype(str) # convert the datatype of the series to string (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html)
  _labels = pd.Series(_labels) # References: https://pandas.pydata.org/docs/reference/api/pandas.Series.html


Unnamed: 0,wav,label


Now that we have a dataset that contains the path of the audio file as well as its label, it's time to do some initial processing. Let's ready our train_set and test_set

In [None]:
# Let's first createa new column that contains the numerical equivalent of the labels

# Create a LabelEncoder object
## Reference: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html
## Reference: https://scikit-learn.org/stable/modules/preprocessing.html#preprocessing-categorical-features
## A way to encode the label into numerical form in an easy way
_label_encoder = LabelEncoder()

# Convert the labels to numerical values
## References: https://www.analyticsvidhya.com/blog/2021/04/difference-between-fit-transform-fit_transform-methods-in-scikit-learn-with-python-code/
## References: https://ponder.io/scikit-learns-transformers-now-output-pandas-dataframes/
## It involves two steps: fit and transform
## In this case, we are first fitting the label encoder to the labels and then transforming the labels into numerical values
df['encoded'] = _label_encoder.fit_transform(df['label'])

df.head()

Unnamed: 0,wav,label,encoded


Now that we have the numerical equivalent of our data, we'll use a preexisting algorithm to get the features of our wave values.

In [None]:
def extract_mel(data):
    # Mel Spectrogram
    # Compute a mel-scaled spectrogram.
    # References: https://librosa.org/doc/latest/generated/librosa.feature.melspectrogram.html
    # References: https://en.wikipedia.org/wiki/Mel-frequency_cepstrum
    return np.mean(librosa.feature.melspectrogram(y=data[0], sr=data[1]).T, axis=0)


Now that all of the functions are in-place. It's time to apply them in our dataframe to get the features

In [None]:
# Extract features from the audio files
df['mel'] = df['wav'].apply(lambda x: extract_mel(librosa.load(x)))

Let's take a look at our dataset for confirmation

In [None]:
# Print for confirmation
df.head()

Unnamed: 0,wav,label,encoded,mel


We don't really need the 'wav' and 'label' column, so we'll just remove them

In [None]:
df.drop(columns=['wav', 'encoded'], inplace=True)

Now let's check our dataframe again.

In [None]:
df.head()

Unnamed: 0,label,mel


Since everything looks good, it's now time to do modelling.

## Data Modelling

Let's first import and create an instance of `DecisionTreeClassifier` with `gini` as its main criterion.

In [None]:
from sklearn.tree import DecisionTreeClassifier

# Create a Decision Tree Classifier object
## References: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
DTC = DecisionTreeClassifier(criterion='gini', )


Now, let's just split the dataset to training and test.

In [None]:
from sklearn.model_selection import train_test_split

# Split the data into training and testing sets
## References: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
X_train, X_test, y_train, y_test = train_test_split(df['mel'], df['label'], test_size=0.33, random_state=42)

Now that everything is prepared, we'll just need to fit and our data to the model and score it.

In [None]:
# Fit the model to the training data
DTC.fit(X_train.tolist(), y_train.tolist())

# Get the accuracy of the model
DTC.score(X_test.tolist(), y_test.tolist())

0.9632034632034632

We have a pretty good score for our model. Let's try to gain more information by cross validating it using the `cross_validate` class in the `sklearn` library.

In [None]:
from sklearn.model_selection import cross_validate

# Perform cross validation
## References: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html
results = cross_validate(DTC, df['mel'].tolist(), df['label'].tolist(), cv=5)

Now let's see the result.

In [None]:
results

{'fit_time': array([0.85120058, 0.79635239, 0.77088642, 0.75283885, 0.72628856]),
 'score_time': array([0.00208426, 0.0030005 , 0.00204611, 0.00307894, 0.00199914]),
 'test_score': array([1., 1., 1., 1., 1.])}

Looking at value at the top, we seem to have very good test score, which indicate good accuracy and result. (Might also indicate overfitting)

Now, let's just show the precision, recall, and f1-score for each class

In [None]:
from sklearn.metrics import classification_report

# Print the classification report
## References: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html
print(classification_report(y_test.tolist(), DTC.predict(X_test.tolist())))

                    precision    recall  f1-score   support

              Fear       0.99      1.00      0.99       138
 Pleasant_surprise       0.88      0.85      0.87       144
               Sad       0.97      0.95      0.96       126
             angry       0.99      0.99      0.99       294
           disgust       0.94      0.90      0.92       266
              fear       0.97      1.00      0.98       117
             happy       0.94      0.96      0.95       249
           neutral       0.99      1.00      1.00       269
pleasant_surprised       0.95      1.00      0.98       120
               sad       0.98      0.98      0.98       125

          accuracy                           0.96      1848
         macro avg       0.96      0.96      0.96      1848
      weighted avg       0.96      0.96      0.96      1848



We can see very good result on the table above, let's give it a last go by predicting emotion using our model.

In [None]:
DTC.predict(X_test.tolist())

array(['fear', 'angry', 'happy', ..., 'neutral', 'pleasant_surprised',
       'Sad'], dtype='<U18')

References:
- https://www.kaggle.com/datasets/crowww/a-large-scale-fish-dataset
- https://www.kaggle.com/datasets/jonathanoheix/face-expression-recognition-dataset
- https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess/code
- https://www.kaggle.com/code/bprathibalakshmi/classification-of-speech-emotion-99-accuracy
- https://numpy.org/doc/
- https://pandas.pydata.org/
- https://scikit-learn.org/stable/modules/preprocessing.html
- https://librosa.org/doc/latest/index.html
- https://docs.python.org/3/library/pathlib.html
- https://docs.python.org/3/library/os.html
- https://www.kaggle.com/code/virial23/ser-database-to-feature-extractor
- https://docs.python.org/3/library/glob.html
- https://docs.python.org/3/library/os.path.html
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html
- https://pandas.pydata.org/docs/reference/api/pandas.Series.html
- https://pandas.pydata.org/docs/reference/api/pandas.concat.html
- https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html
- https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html
- https://scikit-learn.org/stable/modules/preprocessing.html#preprocessing-categorical-features
- https://www.analyticsvidhya.com/blog/2021/04/difference-between-fit-transform-fit_transform-methods-in-scikit-learn-with-python-code/
- https://ponder.io/scikit-learns-transformers-now-output-pandas-dataframes/
- https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
- https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
- https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html
- https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html