# Train An SVM Tutorial

This tutorial walks you through how to train your own Support Vector Machine for music genre prediction.  To run this notebook, make sure you have the fma_small dataset from https://github.com/mdeff/fma downloaded into the "data/" directory 

In [21]:
from spectrogram_dataset import AudioFeature, create_audio_feature_dataset,create_dataframes
import torch
import numpy as np
import torch.nn as nn
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from svm_prediction import svm_prediction, svm_accuracy_report
from dataset_queries import return_similar_genres, return_most_popular_song, \
    play_random_song_from_genre, play_song_from_filename

First, create dataframes from the corresponding files.

In [16]:
file_path_df, track_df, relevant_genre_df, total_genre_df = create_dataframes(
    file_paths_path = 'data/all_data_paths.txt' , 
    tracks_csv_path = 'data/fma_metadata/tracks.csv',
    genre_csv_path = 'data/fma_metadata/genres.csv')


The file_path_df contains filepaths for all of the audio files in the 'data/fma_small' folder. 

In [8]:
file_path_df.head(1)

Unnamed: 0,file_path
0,000/000002.mp3


The track_df contains information about each audio file, including song title, album title, artist name, number of track listens, and top genre. 


In [11]:
track_df.head(1)

Unnamed: 0,track_id,album_id,album_listens,album_title,artist_name,track_favorites,track_genre_top,track_genres,track_interest,track_listens,track_title
0,2,1,6073,AWOL - A Way Of Life,AWOL,2,Hip-Hop,[21],4656,1293,Food


The relevant_genre_df contains all of the genres that are identified as the top genre for genres in the fma_small dataset.  The genre_df contains all of the genres in track_df. 

In [12]:
relevant_genre_df.head(1)

Unnamed: 0,index,genre_id,title
0,20,21,Hip-Hop


Next, we create the training, validation, and test data.  Each entry of a dataset will contain, in order, Mel Frequecy cepstral coeffiecients (MFCCs), the zero crossing rate, the spectral centroid, the spectral contrast, the spectral bandwidth, spectral rollof, and the genre label corresponding to the labeld genres in relevent_genre_df. <b> NOTE: Generating the data can take upwards of 30 min. </b>   We recommend generating the data once and then saving it to load again for later use.

In [20]:
train_data, validation_data, test_data = create_audio_feature_dataset(file_path_df, track_df, 
                                                                      relevant_genre_df, 
                                                                      test_percentage=.10, 
                                                                      validation_percentage=.10)


In [28]:
#### np.save('train_data', train_data)
#### np.save('test_data',test_data)
#### np.save('validation_data', validation_data)

#### train_data = np.load('train_data.npy', allow_pickle = True)
#### test_data = np.load('test_data.npy', allow_pickle = True)
#### validation_data = np.load('validation_data.npy', allow_pickle = True)

The actual features used for training the SVM can be changed.  In the code below, we choose to use the average of the mfccs, the median of the mfccs, the standard deviation of the mfccs, the average of the spectral contrast, the median of the spectral contrast, and the standard deviation of the spectral contrast. 

In [30]:
train_data_array =np.array([np.concatenate([np.average(row[0], axis = 1), np.median(row[0], axis = 1), 
                                            np.std(row[0], axis = 1), [np.average(row[3])], [np.median(row[3])], 
                                            [np.std(row[3])]]) for row in train_data])

train_data_label = [row[6] for row in train_data]

test_data_array =np.array([np.concatenate([np.average(row[0], axis = 1), np.median(row[0], axis = 1), 
                                            np.std(row[0], axis = 1), [np.average(row[3])], [np.median(row[3])], 
                                            [np.std(row[3])]]) for row in test_data])

test_data_label = [row[6] for row in test_data]



We choose to use sklearn to train our SVM. Kernel options include ‘linear’, ‘poly’, ‘rbf’, and ‘sigmoid’.  Please, see the documentation for more options

In [31]:
clf = SVC(kernel = 'linear')
clf.fit(train_data_array, train_data_label)

SVC(kernel='linear')

Predict the lables in the test set. And look at the accuracy metrics

In [98]:
pred_labels = clf.predict(test_data_array)
svm_accuracy_report(test_data_label, pred_labels)

307  test files of a total of  800 are predicted correctly for an accuracy of  38.375 %


True Positive Rate: 0.605 False Positive Rate: 0.134 Percent Correct for genre 0: 60.465
True Positive Rate: 0.163 False Positive Rate: 0.079 Percent Correct for genre 1: 16.346
True Positive Rate: 0.640 False Positive Rate: 0.124 Percent Correct for genre 2: 64.045
True Positive Rate: 0.238 False Positive Rate: 0.084 Percent Correct for genre 3: 23.762
True Positive Rate: 0.462 False Positive Rate: 0.098 Percent Correct for genre 4: 46.218
True Positive Rate: 0.247 False Positive Rate: 0.034 Percent Correct for genre 6: 24.742
True Positive Rate: 0.270 False Positive Rate: 0.049 Percent Correct for genre 8: 27.000
True Positive Rate: 0.490 False Positive Rate: 0.101 Percent Correct for genre 14: 49.038
