Datasets

The goal of this project was to attempt to classify music genres from melspectrograms of tracks by using neural networks. There are two datasets and two models. A more detailed report of the project can be found here.

Datasets

There are two datasets, both of which are made up of melspectrogram pngs of size 128x1292. The melspectrograms were created by downloading 30 second song samples using the Spotify API. The samples were then converted to melspectrograms were then generated using librosa.

The first dataset is about 40,000 melspectrogram images which represent the following genres: pop, rock, rap, metal, house, r&b, classical, techno, jazz, and folk. The second dataset is made up of over 100,000 spectrograms representing a range of heavy metal, punk, and hardcore subgenres. The second dataset also includes additional tabular track data acquired from the Spotify API such as duration, mode, key, etc.

The second dataset is more difficult to classify as there are a larger number of classes and because the classes are much more similar to each other.

The melspectrograms are titled with their track ID as given by the spotify API. The labels for the first dataset can be found in the corresponding csv file. The labels for the metal subgenre dataset are in the same file as the tabular data which is metal_track_data.csv inside the zip file for the second dataset.

The datasets can be downloaded here:

Models

genre_classifier

This is a neural network which uses convolutional and recurrent layers to classify the genres of melspectrograms from the first dataset.

The network architecture is based on Convolutional Recurrent Neural Networks For Music Classification by Keunwoo Choi et al. The CRNN architecture takes an image as input and then passes it through 4 convolutional layers. The output is then passed through a 2 layer GRU and finally softmax. The network is able to achieve an 80% accuracy rate (80% for top 1 accuracy, and 98% top 5 accuracy).

metal_subgenre_classifier

The model is based on the same CRNN architecture as the genre_classifier. However there is also a 2 layer fully-connected network which takes tabular track data as an input, and whose output is concatenated with the output of the GRU before being passed through a softmax.

The addition of the tabular data slightly improves the accuracy of the model. However, the final accuracy of the model is only 62%, which is lower than the other model (62% top 1 accuracy, but 92% top 5 accuracy). This is probably due to the increased difficulty of classifying similar sub-genres rather than more distinct genre categories.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
genre_classifier		genre_classifier
metal_subgenre_classifier		metal_subgenre_classifier
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

genre_classifier

genre_classifier

metal_subgenre_classifier

metal_subgenre_classifier

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Datasets

Models

genre_classifier

metal_subgenre_classifier

About

Releases

Packages

Languages

License

AMontgomerie/genre_classifier

Folders and files

Latest commit

History

Repository files navigation

Datasets

Models

genre_classifier

metal_subgenre_classifier

About

Topics

Resources

License

Stars

Watchers

Forks

Languages