Skip to content

AMontgomerie/genre_classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 

Repository files navigation

The goal of this project was to attempt to classify music genres from melspectrograms of tracks by using neural networks. There are two datasets and two models. A more detailed report of the project can be found here.

Datasets

There are two datasets, both of which are made up of melspectrogram pngs of size 128x1292. The melspectrograms were created by downloading 30 second song samples using the Spotify API. The samples were then converted to melspectrograms were then generated using librosa.

The first dataset is about 40,000 melspectrogram images which represent the following genres: pop, rock, rap, metal, house, r&b, classical, techno, jazz, and folk. The second dataset is made up of over 100,000 spectrograms representing a range of heavy metal, punk, and hardcore subgenres. The second dataset also includes additional tabular track data acquired from the Spotify API such as duration, mode, key, etc.

The second dataset is more difficult to classify as there are a larger number of classes and because the classes are much more similar to each other.

The melspectrograms are titled with their track ID as given by the spotify API. The labels for the first dataset can be found in the corresponding csv file. The labels for the metal subgenre dataset are in the same file as the tabular data which is metal_track_data.csv inside the zip file for the second dataset.

The datasets can be downloaded here:

Models

genre_classifier

This is a neural network which uses convolutional and recurrent layers to classify the genres of melspectrograms from the first dataset.

The network architecture is based on Convolutional Recurrent Neural Networks For Music Classification by Keunwoo Choi et al. The CRNN architecture takes an image as input and then passes it through 4 convolutional layers. The output is then passed through a 2 layer GRU and finally softmax. The network is able to achieve an 80% accuracy rate (80% for top 1 accuracy, and 98% top 5 accuracy).

metal_subgenre_classifier

The model is based on the same CRNN architecture as the genre_classifier. However there is also a 2 layer fully-connected network which takes tabular track data as an input, and whose output is concatenated with the output of the GRU before being passed through a softmax.

The addition of the tabular data slightly improves the accuracy of the model. However, the final accuracy of the model is only 62%, which is lower than the other model (62% top 1 accuracy, but 92% top 5 accuracy). This is probably due to the increased difficulty of classifying similar sub-genres rather than more distinct genre categories.

About

A convolutional neural network which classifies music genres based on spectrograms.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published