# Classification of new tracks

This code uses the algoritm created from the Emotify dataset to classify new music tracks with emotion tags.

This code enables the user to import unlabeled music files, and by running this code, the machine learning model will create a csv. file with emotion tags attached to each file.

The first bracket of code imports all necessary libraries:

In [1]:
#import all necessary libraries
import numpy as np
import pandas as pd
import librosa, librosa.display
import sklearn
import scipy
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.style as ms
ms.use('seaborn-muted')
import os
import joblib
import csv

## Simpe user interaction

In the next code bracket, the user is asked to type in the index folder of all the music files. For this demonstration, the sample folder "input_files1" is chosen. It contains a folder of 300 random music files (borrowed from another dataset for this demonstration).

In [4]:
user_folder = input('Type in the index folder of the folder that contains the files you want to classify. It must be placed inside the folder "New_tracks/", and the tracks must be placed inside another folder called "tracks". To just test this notebook with the sample folder, type in "input_files1"')

#Creating a new csv file from content of the new folder of mp3s
            
with open('New_tracks/' + user_folder + '/track_id.csv', 'w', newline='') as f:
    w = csv.writer(f)
    for path, dirs, files in os.walk('New_tracks/' + user_folder + '/tracks/'):
        for filename in files:
            w.writerow([filename])
            
# I want to avoid the .DS_Store file that always appear as the first file in the row. 
header_name = ['track_id']
mycsv = pd.read_csv('New_tracks/' + user_folder + '/track_id.csv', header=None, skiprows=1, names=header_name)

mycsv

Type in the index folder of the folder that contains the files you want to classify. It must be placed inside the folder "New_tracks/", and the tracks must be placed inside another folder called "tracks". To just test this notebook with the sample folder, type in "input_files1"input_files1


Unnamed: 0,track_id
0,400.mp3
1,401.mp3
2,402.mp3
3,403.mp3
4,404.mp3
...,...
295,695.mp3
296,696.mp3
297,697.mp3
298,698.mp3


## Extracting features

The next code snippet extracts the features of all the new music files. This can be a time consuming process, depending on how many music files you want to classify. All parameters need to be the same as in the ACX_algorithm code.

In [6]:
sr = 22050


def lin_interp_2d(data, out_size):
    
    x_in_size = data.shape[1]
    y_in_size = data.shape[0]
    x_in = np.arange(0,x_in_size)
    y_in = np.arange(0,y_in_size)
    interpolator = scipy.interpolate.interp2d(x_in, y_in, data, kind='linear')
    x_out = np.arange(0,x_in_size-1,((x_in_size-1)/out_size[1]))
    y_out = np.arange(0,y_in_size-1,((y_in_size-1)/out_size[0]))
    output = interpolator(x_out, y_out)
    output = output[0:out_size[0],0:out_size[1]]
    
    return output

def extract_features(filenames, sr):

    signal, sr = librosa.load(filenames, sr, mono=True, offset=2, duration=30)

    temp0 = librosa.feature.tempogram(signal)
    temp1 = librosa.feature.mfcc(signal)

    melspect0 = lin_interp_2d(temp0, (15, 10))
    melspect1 = lin_interp_2d(temp1, (30, 10))
    
    feature0 = melspect0.flatten()
    feature1 = melspect1.flatten()

    output = np.append(feature0, feature1)
    return output


#Loading files in dataset.

features = np.zeros((len(mycsv.index),450)) #we compute the average of 450 features in the above code
classes = ['calmness', 'joyful_activation', 'nostalgia', 'power', 'tenderness', 'solemnity', 'sadness', 'tension', 'amazement']

import warnings # ignoring warning about PySoundFile failed. Trying audioread instead
warnings.filterwarnings('ignore')

print('processing .....')
for i, row in mycsv.iterrows():
    #print('processing',row['track_id'])
    features[i,:] = extract_features('New_tracks/' + user_folder + '/tracks/'+row['track_id'], sr)

print('Done!')

processing .....
Done!


## Applying LDA on the features and predicting labels 

Just as in the original code (ACX_algorithm.ipynb), we need to apply Linear Discriminant Analysis on the features. Everything that was done to the original dataset needs to be done with the new imported dataset. The Machine Learning Model is then restored from the original code and applied on the new music files to predict the emotion tags.


In [7]:
## LDA on the new dataset

restored_lda = joblib.load("projected.features")
projected_features = restored_lda.transform(features)

#restoring the classifier model from file
restored_svm = joblib.load("magic.SVM")

#applying the the restored classifier model on the test data
lab_predict =  restored_svm.predict(projected_features)

#Collecting the emotion tags from the numbers array
def num2word(num):
    return classes[num]

newlist = (np.array(list(map(num2word, lab_predict.astype(int)))))

#converting the numpy array to a list
new_emotiontags = newlist.tolist()
#print(new_emotiontags)


## Exporting the tags to an excel file

The list of music files together with emotion tags are then exported to an csv-file (excel file). 

In [8]:
#Creating the new csv file with emotion tags
mycsv.to_csv('New_tracks/' + user_folder + '/input_files1.csv', index=False) 

df = pd.read_csv('New_tracks/' + user_folder + '/input_files1.csv')
df["emotions"] = new_emotiontags
df.to_csv('New_tracks/' + user_folder + '/input_files1.csv', index=False)

df

Unnamed: 0,track_id,emotions
0,400.mp3,amazement
1,401.mp3,calmness
2,402.mp3,amazement
3,403.mp3,calmness
4,404.mp3,nostalgia
...,...,...
295,695.mp3,sadness
296,696.mp3,joyful_activation
297,697.mp3,joyful_activation
298,698.mp3,sadness
