### Features Used in Training

#### Mel-Frequency Cepstral Coefficients (MFCC)

1. **What is MFCC?**
    - **MFCC** stands for Mel-Frequency Cepstral Coefficients. It is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a non-linear mel scale of frequency.

2. **Why Use MFCC?**
    - **Human Hearing**: The mel scale closely approximates the human ear's response to different frequencies. This makes MFCC particularly effective for tasks related to human speech and music analysis.
    - **Feature Representation**: MFCCs capture the timbral texture of audio, which is crucial for distinguishing between different music genres. They encapsulate important frequency characteristics while reducing the dimensionality of the data.
    - **Robustness**: MFCCs are less sensitive to noise and distortions compared to raw audio signals, making them more reliable for classification tasks.

3. **Parameters Used in MFCC Calculation**:
    - `num_mfcc=40`: Number of MFCC coefficients to extract. Typically, more coefficients capture more details but also increase computational complexity.
    - `n_fft=2048`: Length of the FFT window. A higher value provides better frequency resolution.
    - `hop_length=512`: Number of samples between successive frames. This controls the overlap between frames, balancing time resolution and computational load.
    - `num_segment=10`: Number of segments to divide each audio file into. This helps in creating more training samples and capturing temporal variations within the audio.

### Model Architecture

#### Sequential Model

The architecture is a Sequential model, which means layers are stacked in a linear order. This simplicity ensures easy debugging and tuning.

#### LSTM Layers

1. **LSTM (Long Short-Term Memory) Layers**:
    - **Why LSTM?**: LSTM networks are a type of recurrent neural network (RNN) capable of learning long-term dependencies. They are particularly well-suited for sequential data like audio signals.
    - **Layer 1**: `LSTM(64, input_shape=(input_shape[0], input_shape[1]), return_sequences=True)`
        - **64 units**: Number of LSTM units, which determines the dimensionality of the output space. This allows the network to capture complex temporal dependencies.
        - **input_shape**: Specifies the shape of the input data. `input_shape[0]` represents the time steps (number of frames), and `input_shape[1]` represents the number of MFCC coefficients.
        - **return_sequences=True**: Ensures the output of this layer is a sequence, which is necessary when stacking LSTM layers. It passes the entire sequence to the next LSTM layer, allowing deeper temporal feature extraction.
    - **Layer 2**: `LSTM(64)`
        - Another LSTM layer with 64 units to further capture temporal dependencies from the previous layer's output.

#### Dense Layers

1. **Dense Layer 1**: `Dense(64, activation='relu')`
    - **64 units**: A fully connected layer with 64 neurons.
    - **ReLU Activation**: The Rectified Linear Unit (ReLU) activation function introduces non-linearity, enabling the network to learn more complex patterns.

2. **Output Layer**: `Dense(10, activation='softmax')`
    - **10 units**: Corresponds to the number of music genres. Each unit represents the probability of the input belonging to one of the genres.
    - **Softmax Activation**: Converts the output into a probability distribution, which is suitable for multi-class classification tasks.

### Compilation and Training

1. **Optimizer**: `Adam(lr=0.001)`
    - **Adam**: An adaptive learning rate optimization algorithm that combines the advantages of two other extensions of stochastic gradient descent, AdaGrad and RMSProp.
    - **Learning Rate**: Set to 0.001, which is a common choice that often works well as a starting point.

2. **Loss Function**: `sparse_categorical_crossentropy`
    - Suitable for multi-class classification with integer labels. It computes the cross-entropy loss between the true labels and predicted labels.

3. **Metrics**: `['accuracy']`
    - Accuracy is a straightforward metric to gauge the performance of the classification model.

4. **Training**: `classifier.fit(X_train, y_train, validation_data=(X_val,y_val), batch_size=32, epochs=60, verbose=2)`
    - **Batch Size**: 32, which balances computational efficiency and the ability to generalize.
    - **Epochs**: 60, which is the number of times the learning algorithm will work through the entire training dataset. This helps the model learn the underlying patterns better.
    - **Validation Data**: Used to monitor the model’s performance on unseen data and prevent overfitting.

### Summary

The chosen model architecture leverages LSTM layers to capture temporal dependencies in audio features (MFCC), which are critical for music genre classification. The dense layers further process these temporal features to make accurate genre predictions. The use of MFCC as input features ensures that the model focuses on the most relevant audio characteristics while maintaining a manageable level of complexity. This combination of LSTM and dense layers, optimized with the Adam optimizer, provides a robust framework for music genre classification.

In [None]:
import os
import librosa
import math
import json
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import IPython
from random import randint
import math
import ast
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow.keras.models import load_model
from tensorflow.keras import layers, models, Sequential
import statistics

In [None]:
path = 'Add-Your-Path'
genre_dict = {"gnawa":0,"chaabi":1,"andalusian":2, "rai":3, "imazighn":4, "rap":5, "pop":6}

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
def preprocess(dataset_path, num_mfcc=40, n_fft=2048, hop_length=512, num_segment=10):
    data = {'audio_path':[], "labels": [], "mfcc": []}
    sample_rate = 22050
    samples_per_segment = int(sample_rate * 30 / num_segment)

    for label_idx, (dirpath, dirnames, filenames) in enumerate(os.walk(dataset_path)):
        if dirpath == dataset_path:
            continue
        dirname = os.path.basename(dirpath)
        print(dirname)
        lbl = genre_dict[dirname]
        for f in sorted(filenames):
            if not f.endswith('.wav'):
                continue
            if(f=="jazz.00054.wav"):
            # As librosa only read files <1Mb
              continue
            file_path = os.path.join(dirpath, f)

            try:
                y, sr = librosa.load(file_path, sr=sample_rate)
            except:
                continue
            try:
              for n in range(num_segment):
                 start_sample = samples_per_segment * n
                 end_sample = start_sample + samples_per_segment

                 mfcc = librosa.feature.mfcc(y=y[start_sample:end_sample], sr=sr, n_mfcc=40, n_fft=2048, hop_length = 512)

                 mfcc = mfcc.T
                 if len(mfcc) == math.ceil(samples_per_segment / hop_length):
                     data["audio_path"].append(file_path)
                     data["mfcc"].append(mfcc.tolist())
                     data["labels"].append(lbl)
            except:
                print(file_path + str(lbl))

    return data

In [None]:
data = preprocess(path)

In [None]:
data["audio_path"][0], data["labels"][0]

In [None]:
data_df = pd.DataFrame(data)
data_df.to_csv('moroccanMusic_extracted_features.csv')

In [None]:
data = pd.read_csv('/content/drive/MyDrive/moroccanMusic_extracted_features.csv')

In [None]:
X = np.array(data["mfcc"])
X = np.array([ast.literal_eval(x) for x in X])
X = X.astype('float32')
y = np.array(data["labels"])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

In [None]:
y_train[0], X_train[0]

In [None]:
np.unique(y_train)

In [None]:
input_shape= (X_train.shape[1],X_train.shape[2])
input_shape

In [None]:
classifier = Sequential()
classifier.add(layers.LSTM(64, input_shape=(input_shape[0], input_shape[1]), return_sequences=True))
classifier.add(layers.LSTM(64))
classifier.add(layers.Dense(64, activation='relu'))
classifier.add(layers.Dense(10, activation='softmax'))

In [None]:
optimizer = tf.keras.optimizers.Adam(lr=0.001)
classifier.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
classifier.summary()
classifier.fit(X_train, y_train, validation_data=(X_val,y_val), batch_size=32, epochs=60, verbose=2)

In [None]:
train_loss, train_acc  = classifier.evaluate(X_train, y_train, batch_size=128)

In [None]:
test_loss, test_acc  = classifier.evaluate(X_test, y_test, batch_size=128)

In [None]:
model = load_model('Add-Your-Path/model.h5')

In [None]:
number_to_genre_dict = {0:"gnawa" , 1:"chaabi", 2:"andalusian", 3:"rai", 4:"imazighn", 5:"rap", 6:"pop"}

In [None]:
def print_class_name(classes):
    print(f'the predicted class is {number_to_genre_dict[statistics.mode(classes)]}')

In [None]:
def class_pred(classifier, file):
    y, sr = librosa.load(file)
    oneSong = []
    for n in range(10):
        start_sample = 22050*3  * n
        end_sample = start_sample + 22050*3

        mfcc = librosa.feature.mfcc(y=y[start_sample:end_sample], sr=sr, n_mfcc=40, n_fft=2048, hop_length = 512)

        mfcc = mfcc.T

        if len(mfcc) == math.ceil( 22050*3 / 512 ):
                oneSong.append(mfcc.tolist())

    oneSong = np.array(oneSong, dtype=object)
    oneSong = nsong = np.asarray(oneSong).astype('float32')
    oneSong.shape

    with open('output.txt', 'w') as file:
      # Iterate through the nested list
      for sublist in oneSong:
          # Write each sublist as a line in the file
          for item in sublist:
              # Convert each numerical value to a string and join them with commas
              line = ','.join(str(value) for value in item)
              # Write the line to the file
              file.write(line + '\n')

    prediction = classifier.predict(oneSong)
    classes_x =np.argmax(prediction,axis=1)
    return classes_x

In [None]:
classifier.save("model.h5")