# Music genre recognition from images

We will compare the performance of logistic regression and MLP on the task of music genre recognition. 

We will use the small FMA (Free Music Archive) dataset of Defferrard et al., FMA: A Dataset For Music Analysis, ISMIR 2017. This dataset consists of 8000 balanced samples (30 sec each) from 8 different genres (International, Pop, Rock, Electronic, Folk, Hip-Hop, Experimental, Instrumental). The data contains:
- X: standard features of each sample (provided by the authors)
- Y: the label of each sample

Let us first load the data. To make use of the pytorch dataloader, we will create a new Dataset class. This class needs to have functions to create a new instance of the class, get the number of samples in a dataset and get one data sample. As in the MNIST case, we will make use of a boolean variable to indicate whether this dataset contains training data or test data.

In [14]:
import numpy as np
import scipy.io as sio # This will allow us to load the data
import torch
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
from sklearn import preprocessing

class MusicDataset(Dataset):

    def __init__(self, mat_file, train, transform=None):
        """
        Args:
            mat_file (string): Path to the mat file with the data
            train (boolean): Is it the training data or the test data
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        data = sio.loadmat(mat_file)
        self.X = data['X']
        self.Y = data['Y'].squeeze()
        Xtrain,Xtest,Ytrain,Ytest = train_test_split(self.X,self.Y,test_size=0.25,random_state=1)
        Xtrain = preprocessing.scale(Xtrain)
        Xtest = preprocessing.scale(Xtest)
        self.Xtrain = Xtrain.astype(float)
        self.Xtest = Xtest.astype(float)
        self.Ytrain = Ytrain.astype(float)
        self.Ytest = Ytest.astype(float)
        self.train = train
        self.transform = transform

    def __len__(self):
        if self.train:
            return (self.Xtrain.shape[0])
        else:
            return (self.Xtest.shape[0])

    def __getitem__(self, idx):
        
        if self.train:
            sample = {'features': self.Xtrain[idx,:], 'label': self.Ytrain[idx]}
        else:
            sample = {'features': self.Xtest[idx,:], 'label': self.Ytest[idx]}
            
        if self.transform:
            sample = self.transform(sample)

        return sample

## Logistic regression

As a baseline, we will make use of a logistic regression classifier.

Let us first create the corresponding training and test sets using our new class.

In [15]:
trainset = MusicDataset(mat_file='music_data.mat', train=True)
testset = MusicDataset(mat_file='music_data.mat', train=False)

We can now train the classifier

In [16]:
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(C=1,tol=0.001)
clf.fit(trainset.Xtrain,trainset.Ytrain)

LogisticRegression(C=1, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.001,
          verbose=0, warm_start=False)

And evaluate it

In [17]:
Yhat = clf.predict(testset.Xtest)

In [37]:
import sklearn.metrics as skm
cmat = skm.confusion_matrix(testset.Ytest, Yhat)
cmat

array([[172,  11,  13,   9,  29,  20,  11,  14],
       [ 22,  81,  34,  23,  25,  21,  22,  16],
       [  7,  21, 181,  13,  11,   5,  20,  12],
       [ 11,  13,   4, 142,   1,  27,  16,  25],
       [ 15,  24,  15,   5, 140,   4,  19,  29],
       [ 14,  19,   4,  35,   4, 165,  15,   6],
       [ 11,  21,  16,  14,  25,  12, 100,  41],
       [  2,  10,   6,  13,  19,   2,  28, 135]])

In [19]:
np.diag(cmat).sum()/cmat.sum()

0.558

## Multilayer Perceptron

Let us now move to the MLP.

To use pytorch, we will need to convert the data to tensors. This can be achieved by creating a new transform as a class.

In [20]:
class ToTensor(object):
    """Convert ndarrays in sample to Tensors."""

    def __call__(self, sample):
        features, label = sample['features'], sample['label']

        return {'features': torch.from_numpy(features).float(),
                'label': label}

We can now create the training and test set, and directly apply our new transform

In [21]:
trainset = MusicDataset(mat_file='music_data.mat', train=True, transform=ToTensor())
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)

testset = MusicDataset(mat_file='music_data.mat', train=False, transform=ToTensor())
testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)

Create a new MLP. Test different architectures. (TODO)

In [22]:
import torch.nn as nn
import torch.nn.functional as F

As in the tutorial, we will make use of the cross-entropy as a loss and of SGD with momentum as an optimization method.

In [45]:
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.0001, momentum=0.9)

You can now train your model. Try different number of epochs. To obtain the inputs, you can use "features = data['features']" and for the labels "labels = data['label'].long()". (TODO)



[1,   500] loss: 1.774
[1,  1000] loss: 1.729
[1,  1500] loss: 1.685
[2,   500] loss: 1.643
[2,  1000] loss: 1.566
[2,  1500] loss: 1.562
Finished Training


Evaluate the model on the test data (TODO)