Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?


Failed to load latest commit information.
Latest commit message
Commit time



Starter kit for the WWW2018 challenge "Learning to Recognize Musical Genre" hosted on CrowdAI. The following overview paper summarizes our experience running a challenge with open data for musical genre recognition. Those notes motivate the task and the challenge design, show some statistics about the submissions, and present the results.

FMA illustration

The data used for this challenge comes from the FMA dataset. You are encouraged to check out that repository for Jupyter notebooks showing how to use the data, exploring it, and training baseline models. This challenge uses the rc1 version of the data, make sure to checkout that version of the code. The associated paper describes the data.


Team Round 1
35k clips
log loss
Round 1
3k clips
log loss
Round 2
3k clips
log loss
Rank Round 1
35k clips
F1 score
Round 1
3k clips
F1 score
Round 2
3k clips
F1 score
minzwon & jaehun 0.55 0.67 1.31 1 85% 80% 63%
hglim 0.33 0.34 1.34 2 92% 92% 64%
benjamin_murauer 0.82 0.86 1.44 3 74% 74% 60%
gg12 & check 0.66 0.49 1.50 4 80% 86% 61%
viper & algohunt 0.66 0.65 1.52 5 80% 81% 60%
mimbres 0.41 0.43 2.08 6 90% 90% 60%

The three columns per metric references:

  1. the best scores obtained on the public leaderboard during the first round,
  2. the scores obtained by the submitted systems on a subset of the public test set,
  3. the scores obtained by the submitted systems on a private test set collected for the second round.

Find more details in the slides used to announce the results and in the overview paper.

In the interest of reproducibility and transparency for interested researchers, you'll find below links to the source code repositories of all systems submitted by the participants for the second round of the challenge. Thanks to all the participants for making this happen!

  1. Transfer Learning of Artist Group Factors to Musical Genre Classification
  2. Ensemble of CNN-based Models using various Short-Term Input
  3. Detecting Music Genre Using Extreme Gradient Boosting
  4. ConvNet on STFT spectrograms
  5. Xception on mel-scaled spectrograms
  6. Audio Dual Path Networks on mel-scaled spectrograms

The repositories should be self-contained and easily executable. You can execute any of the systems on your own mp3s by following those steps:

  1. Clone the git repository.
  2. Build a docker image with repo2docker
  3. Execute the docker image


Download and extract datasets such as:

  • Training metadata csv files from are accessible at data/fma_metadata/*.csv.
  • Training mp3 files from are accessible at data/fma_medium/*/*.mp3.
  • Test mp3 files from fma_crowdai_www2018_test.tar.gz are accessible at data/crowdai_fma_test/*.mp3.
git clone
cd crowdai-musical-genre-recognition-starter-kit
pip install -r requirements.txt

NOTE: This challenge requires crowdai version 1.0.14 at least. The code in this repository and the FMA repository has been tested with Python 3.6 only.


Run python to convert data/fma_metadata/tracks.csv to a simpler data/train_labels.csv file where the first column is the track_id and the second column is the target musical genre.

You can now load the training labels with:

import pandas as pd
labels = pd.read_csv('data/train_labels.csv', index_col=0)

The path to the training mp3 with a track_id of 2 is given by:

import fma
path = fma.get_audio_path(2)

and can be loaded as a numpy array with:

import librosa
x, sr = librosa.load(path, sr=None, mono=False)

The list of testing file IDs can be obtained with:

import glob
test_ids = sorted(glob.glob('data/crowdai_fma_test/*.mp3'))
test_ids = [path.split('/')[-1][:-4] for path in test_ids]

and the path to a testing mp3 is given by:

path = 'data/crowdai_fma_test/{}.mp3'.format(test_ids[0])

The submission file can be created with:

CLASSES = ['Blues', 'Classical', 'Country', 'Easy Listening', 'Electronic',
           'Experimental', 'Folk', 'Hip-Hop', 'Instrumental', 'International',
           'Jazz', 'Old-Time / Historic', 'Pop', 'Rock', 'Soul-RnB', 'Spoken']

submission = pd.DataFrame(1/16, pd.Index(test_ids, name='file_id'), CLASSES)
submission.to_csv('data/submission.csv', header=True)

and then submitted with:

import crowdai
API_KEY = '<your_crowdai_api_key_here>'
challenge = crowdai.Challenge('WWWMusicalGenreRecognitionChallenge', API_KEY)
response = challenge.submit('data/submission.csv')


The script submits random predictions, to be run as:

python --round=1 --api_key=<YOUR CROWDAI API KEY>

The script extracts many audio features (with the help of librosa) from all training and testing mp3s. Extracted features are stored in data/features.csv. Script to be run as:


Note that this script can take many hours to complete on the whole 60k tracks. For you to play with the data right away, you'll find those features pre-computed on the challenge's dataset page.

The script trains a support vector classifier (SVC) with data/train_labels.csv as target and data/features.csv as features. The predictions are stored in data/submission_svm.csv. Script to be run as:


Finally, a prediction can be submitted with the script:

python --api_key=<YOUR CROWDAI API KEY> data/submission.csv

Second round

The second round requires all participants to submit their code. It will be used by our grading orchestrator to predict the genres for all the files in a secret test set. The systems have to be submitted as binder compatible repositories. You'll find all the details to package and submit your code in the following documents:

  1. Packaging guidelines
  2. Submission guidelines

Predictions will be made on an arbitrary number of mp3 files of at most 30 seconds each. During the execution of the container, all the mp3 files will be mounted at /crowdai-payload. Execution of your container will be initiated by executing /home/ During the runtime, the container will not have access to the external Internet, and will have access to:

  • 1 Nvidia GTX GeForce 1080 Ti (11 GB GDDR5X),
  • 5 cores of an Intel Xeon E5-2650 v4 (2.20-2.90 GHz),
  • 60 GB of RAM,
  • 100 GB of disk,
  • and a timeout of 10 hours.

At the end of the process, your model will simply be an "executable" git repository. Please provide an open-source license and a README with an executive summary of how your system works. At the end of the challenge, we'll make all these repositories public. The public list of repositories will allow anybody to easily reproduce and reuse any of your systems!

License & co

The content of this repository is released under the terms of the MIT license. Please cite our paper if you use it.

  title = {Learning to Recognize Musical Genre from Audio},
  author = {Defferrard, Micha\"el and Mohanty, Sharada P. and Carroll, Sean F. and Salath\'e, Marcel},
  booktitle = {WWW '18 Companion: The 2018 Web Conference Companion},
  year = {2018},
  url = {},