# Music-Genre Classification with KNN

## Goal

This project will develop a deep learning model to infer music genres from audio files.

We will classify training-set audio files using their low-level features of *frequency* and *time domain*.

Training the model requires audio tracks having similar size and similar frequency range. The **GTZAN** genre classification dataset is a useful dataset for such purpose, since it was collected for this task specifically.

## Data

The GTZAN was collected 2000-2001 and named after its curator George Tzanetaki. More here: http://marsyas.info/downloads/datasets.html. The dataset consists of 1000 audio tracks each 30 seconds long. It contains 10 genres, each represented by 100 tracks. The tracks are all 22050Hz Mono 16-bit audio files in .wav format. The collection is ca. 1.2 gigabyte.

- Blues
- Classical
- Country
- Disco
- Hiphop
- Jazz
- Metal
- Pop
- Reggae
- Rock

## Feature Extraction with Mel Frequency Cepstral Coefficients (MFCC)

The classification will be based on the vocal content exclusively. This requires identifying the linguistic content and discarding noise.

MFCC are state-of-the-art features used in automatic speech and speech recognition studies. There are a set of steps:

1. Since the audio signals are constantly changing, divide these signals into smaller frames. Each frame is around 20-40 ms long.
2. Identify frequencies present in each frame.
3. Separate linguistic frequencies from noise.
4. To discard the noise, it then takes Discrete Cosine Transform (DCT) of these frequencies. Keep only a specific sequence of frequencies that have a high probability of information.

## ML Methods

We will use K-nearest neighbors (KNN) algorithm because it has shown the best results for this problem.

## Samples for Prediction

Music samples are available here: https://freemusicarchive.org/ 
The tracks are tagged by genre, but most authors categorize their work as multi-genre. Almost no tracks are unequivocally a single genre.

Most audio is available as mp3 today, use the following web app to convert to wav:
https://audio.online-convert.com/convert-to-wav

## Imports

In [1]:
from python_speech_features import mfcc
import scipy.io.wavfile as wav
import numpy as np
from tempfile import TemporaryFile
import os
import pickle
import random 
import operator
import math
import wget
import tarfile

## Step 1: Download GTZAN and Unzip

In [2]:
url = 'http://opihi.cs.uvic.ca/sound/genres.tar.gz'
out_path = f'{os.getcwd()}\\Data\\genres.tar.gz'

if not os.path.exists(out_path):
    wget.download(url, out=out_path)
    print("Download finished")
else:
    print("File exists")

File exists


In [3]:
GTZANtar = tarfile.open(out_path)
GTZANtar.extractall(f'{os.getcwd()}\\Data\\')
GTZANtar.close()

## Step 2: Extract features, save to .dat

In [4]:
directory = f'{os.getcwd()}\\Data\\genres'
f = open(f'{os.getcwd()}\\Data\\genre_features.dat' ,'wb') # dat file to be written
i = 0

for folder in [name for name in os.listdir(directory) if os.path.isdir(os.path.join(directory, name))]: # iterate over genre folders only
    i += 1
    if i == 11:
        break
    for file in os.listdir(os.path.join(directory, folder)): # iterate over files in each genre folder
        (rate,sig) =  wav.read(os.path.join(directory, folder, file))
        mfcc_feat =   mfcc(sig,rate,winlen=0.020,appendEnergy = False)
        covariance =  np.cov(np.matrix.transpose(mfcc_feat))
        mean_matrix = mfcc_feat.mean(0)
        feature =     (mean_matrix,covariance,i)
        pickle.dump(feature,f)
f.close()

## Step 3: Split into Train and Test

In [5]:
loaded_ds = []

def loadDataset(filename, split, trSet, teSet):
    with open(filename, 'rb') as f:
        while True:
            try:
                loaded_ds.append(pickle.load(f))
            except EOFError:
                f.close()
                break
    for x in range(len(loaded_ds)):
        if random.random() < split:   
            trSet.append(loaded_ds[x])
        else:
            teSet.append(loaded_ds[x])

In [6]:
trainingSet = []
testSet = []
split = 0.66

loadDataset(filename=f'{os.getcwd()}\\Data\\genres\\genre_features.dat', split=split, trSet=trainingSet, teSet=testSet)

## Step 4: Helper Functions for KNN

Distance between feature vectors and find neighbors

In [7]:
def distance(instance1, instance2, k):
    distance = 0 
    mm1 = instance1[0] 
    cm1 = instance1[1]
    mm2 = instance2[0]
    cm2 = instance2[1]
    
    distance = np.trace(np.dot(np.linalg.inv(cm2), cm1)) 
    distance += (np.dot(np.dot((mm2-mm1).transpose() , np.linalg.inv(cm2)) , mm2-mm1 )) 
    distance += np.log(np.linalg.det(cm2)) - np.log(np.linalg.det(cm1))
    distance -= k

    return distance

Identify the nearest neighbors

In [8]:
def getNeighbors(trainingSet, instance, k):
    distances = []

    for x in range(len(trainingSet)):
        dist = distance(trainingSet[x], instance, k) + distance(instance, trainingSet[x], k)
        distances.append((trainingSet[x][2], dist))

    distances.sort(key = operator.itemgetter(1))

    neighbors = []

    for x in range(k):
        neighbors.append(distances[x][0])

    return neighbors

Identify the nearest Class of the neighbors

In [9]:
def nearestClass(neighbors):
    classVote = {}

    for x in range(len(neighbors)):
        response = neighbors[x]
        if response in classVote:
            classVote[response]+=1 
        else:
            classVote[response]=1

    sorter = sorted(classVote.items(), key = operator.itemgetter(1), reverse=True)

    return sorter[0][0]

Model evaluation

In [10]:
def getAccuracy(testSet, predictions):
    correct = 0

    for x in range (len(testSet)):
        if testSet[x][-1]==predictions[x]:
            correct+=1

    return 1.0*correct/len(testSet)

## Step 5: Train KNN  on test data and get the accuracy

In [11]:
length_test = len(testSet)
predictions = []

for x in range(length_test):
    predictions.append(nearestClass(getNeighbors(trainingSet=trainingSet, instance=testSet[x], k=5))) 

train_accuracy = getAccuracy(testSet, predictions)
print("Train Accuracy: ", train_accuracy)

Train Accuracy:  0.6422287390029325


## Step 6: Predict genre on wav files

I conveniently added the genre tag to the file name for comparison with the prediction.

In [12]:
from collections import defaultdict
results = defaultdict(int)
i = 1

for folder in [name for name in os.listdir(directory) if os.path.isdir(os.path.join(directory, name))]:
    results[i]=folder
    i+=1
print(results)

defaultdict(<class 'int'>, {1: 'blues', 2: 'classical', 3: 'country', 4: 'disco', 5: 'hiphop', 6: 'jazz', 7: 'metal', 8: 'pop', 9: 'reggae', 10: 'rock'})


In [13]:
prediction_folder = f'{os.getcwd()}\\Data\\genres_predict\\'

for file in os.listdir(prediction_folder):
    (rate,sig) = wav.read(prediction_folder + file)

    # extract the features of the new file
    mfcc_feat = mfcc(sig,rate,winlen=0.020,appendEnergy=False)
    covariance = np.cov(np.matrix.transpose(mfcc_feat))
    mean_matrix = mfcc_feat.mean(0)
    feature = (mean_matrix,covariance,0)

    pred = nearestClass(getNeighbors(trainingSet=loaded_ds, instance=feature, k=5))
    print("File: " + file + '   ->   ' + results[pred])



File: Blues_Pierce_Murphy_-_02_-_Just_Give_It_Time.wav   ->   country




File: Blues_Pierce_Murphy_-_03_-_Try_To_Be_Nice.wav   ->   country




File: Country_Derek_Clegg_-_03_-_Peculiar.wav   ->   hiphop




File: Country_Thorn__Shout_-_06_-_Little_Demon.wav   ->   pop




File: Disco_Fhernando_-_07_-_Kiss_Me_Harder_Boy.wav   ->   hiphop




File: Disco_Miami_Slice_-_04_-_Solid_Gold.wav   ->   pop




File: Reggae_Dieumba__Bass_Culture_Players_-_01_-_Sin_Papeles.wav   ->   hiphop




File: Reggae_Negritage_ft_Jam_York_-_09_-_Stuck_ina_Babylon.wav   ->   pop




File: Rock_Cult_Fantastic_-_02_-_Animal.wav   ->   pop
File: Rock_JG_Hackett_-_01_-_Bootleg_Romanticism.wav   ->   pop


## Summary

It is obvious above, the predictiosn do not match the given genre tags very well.

## An explanation:

The selected tracks for prediction may be unrepresentative of the genres as they were trained. The training set is from the time 2000/2001. Music genres have progressed. The samples are 20 years newer. Style, taste, sound, may have changed significantly in these genres and the model is not generalized enough to catch the new tracks.

Another issue: Cross-genre tracks, as described by the artists on the website, carry multiple traits. However, they have to fall into a single genre in our model.

The prediction should be repeated with track from that period to prove its power.