# Genre Classification
This notebook builds a model using data from the FMA dataset to classify the genre of user-submitted audio.

Outline:
1. Build model
2. Get test audio
3. Classify test audio

Sources:
- [FMA: A Dataset For Music Analysis](https://github.com/mdeff/fma) by Michaël Defferrard, et. al. Provides the dataset used in this notebook.
- [Audio Data Analysis Using Deep Learning with Python](https://www.kdnuggets.com/2020/02/audio-data-analysis-deep-learning-python-part-1.html), by Nagesh Singh Chauhan, writing for KDnuggets. This notebook is written using the same basic concepts and implementation presented in this article.

In [None]:
import librosa
import pandas as pd
import numpy as np
import os
import pathlib
import csv
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
import keras
from keras import layers
from keras.models import Sequential

## 1. Build model
Here, we use the small subset of the FMA dataset to build our model. `fma_small` contains 8000 30s long tracks, representing a total of eight genres:
1. Electronic
2. Experimental
3. Folk
4. Hip-Hop
5. Instrumental
6. International
7. Pop
8. Rock

Each of these genres are evenly represented among the dataset, i.e. 1000 tracks per genre.

Before we can build our model, we first need to load and prepare the data. Rather than use the raw audio data to build our model, we use extracted features from the audio to represent it. Reference `Features.ipynb` for more information on which features we are using and how we extracted them. Here, we will load `fma_small.csv`, a file containing the features of the `fma_small` dataset produced by `Features.ipynb`.

Once we have our data prepared, we build and fit an artifical neural network (ANN) model.

In [None]:
GENRE_LIST = 'electronic experimental folk hip-hop instrumental international pop rock'.split()
GENRE_CNT = 8
FEATURES = 'fma_small.csv'

# Load features and trim filename column
data = pd.read_csv(FEATURES)
data = data.drop(['filename'],axis=1)

# Encoding the labels
genre_list = data.iloc[:, -1]
encoder = LabelEncoder()
y = encoder.fit_transform(genre_list)

# Scaling the feature columns
scaler = StandardScaler()
X = scaler.fit_transform(np.array(data.iloc[:, :-1], dtype = float))

# Dividing data into training and testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Now that we have our data set up, we can build and fit our model.

In [None]:
# Building the model
model = Sequential()
model.add(layers.Dense(256, activation='relu', input_shape=(X_train.shape[1],)))
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(GENRE_CNT, activation='softmax'))
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Fit the model
classifier = model.fit(X_train,
                    y_train,
                    epochs=100,
                    batch_size=128)

## 2. Get test audio
Now that we've built our model, we can get our test audio sorted out. To upload audio you want to classify with this model, navigate to the `./test_audio` directory and place audio files in their corresponding genre folder. The cell below will establish the directories as necessary if they have not already been made. Once you've finished placing your audio files in their respective folders, run the next cell to extract all of their features into `test_features.csv`

In [None]:
# Establish upload directories
try:
    os.mkdir('./test_audio')
    print('Directory ./test_audio created.')
except:
    print('./test_audio already exists.')
finally:
    for g in GENRE_LIST:
        try:
            os.mkdir(f'./test_audio/{g}')
            print(f'Directory ./test_audio/{g} created.')
        except:
            print(f'./test_audio/{g} already exists.')

Now we can extract features from our test audio. For more information on how we do so, take a look at `Features.ipynb`, where we use the same method to extract features for our training dataset.

In [None]:
# Create header for test_features.csv
header = 'filename chroma_stft rmse spectral_centroid spectral_bandwidth rolloff zero_crossing_rate'
for i in range(1, 21):
    header += f' mfcc{i}'
header += ' label'
header = header.split()

# Write header to file
file = open('test_features.csv', 'w', newline='')
with file:
    writer = csv.writer(file)
    writer.writerow(header)
    
# Feature extraction
for g in GENRE_LIST:
    for filename in os.listdir(f'./test_audio/{g}'):
        # Load audio and extract features
        songname = f'./test_audio/{g}/{filename}'
        y, sr = librosa.load(songname, mono=True, duration=30)
        rmse = librosa.feature.rms(y=y)
        chroma_stft = librosa.feature.chroma_stft(y=y, sr=sr)
        spec_cent = librosa.feature.spectral_centroid(y=y, sr=sr)
        spec_bw = librosa.feature.spectral_bandwidth(y=y, sr=sr)
        rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr)
        zcr = librosa.feature.zero_crossing_rate(y)
        mfcc = librosa.feature.mfcc(y=y, sr=sr)
        
        # Write feature labels
        to_append = f'{filename.replace(" ", "_")} {np.mean(chroma_stft)} {np.mean(rmse)} {np.mean(spec_cent)} {np.mean(spec_bw)} {np.mean(rolloff)} {np.mean(zcr)}'    
        for e in mfcc:
            to_append += f' {np.mean(e)}'
    
        # Write genre labels
        label = g.capitalize()
        if label == 'Hip-hop':
            label = 'Hip-Hop'
        to_append += f' {label}'
        
        # Write to file
        file = open('test_features.csv', 'a', newline='')
        with file:
            writer = csv.writer(file)
            writer.writerow(to_append.split())

Now that we've extracted the features from our test audio, let's format the data so our model can understand it.

In [None]:
# Load and trim data
data = pd.read_csv('test_features.csv')
filenames = data['filename']
data = data.drop(['filename'],axis=1)

# Encoding the Labels
genre_list = data.iloc[:, -1]
y_test = encoder.transform(genre_list)

#Scaling the Feature columns
scaler = StandardScaler()
X_test = scaler.fit_transform(np.array(data.iloc[:, :-1], dtype = float))

## 3. Classify test audio
Now that we've done everything we need to get the model and our test data set up, let's have the model try and classify the genre of our test audio.

The sample output might look something like this:

```
Results:  
50/50 - 0s - loss: 7.6068 - accuracy: 0.4281

Predictions:  
filename.wav: Rock (87.15%) - SUCCESS
filename.mp3: Electronic (75.44%) - FAILURE
etc.
```

In [None]:
# Run predictions
print('Results:')
test_scores = model.evaluate(X_test, y_test, verbose=2)
predictions = model.predict(X_test[:])

# Save predictions
results = []
nums = []
for i in predictions:
    highest = 0
    cnt = 0
    for j in i:
        if j > i[highest]:
            highest = cnt
        cnt += 1
    results.append(highest)
    nums.append(i[highest])

# Mark successes and failures
success = []
cnt = 0
for i in results:
    if i == y_test[cnt]:
        success.append("SUCCESS")
    else:
        success.append("FAILURE")
    cnt += 1

# Print individual results
results = encoder.inverse_transform(results)
print('\nPredictions:')
cnt = 0
for i in filenames:
    print('{}: {} ({:.2%}) - {}'.format(filenames[cnt], results[cnt], nums[cnt], success[cnt]))
    cnt += 1