Skip to content

Recognizing the genre of music files using machine learning and deep learning models


Notifications You must be signed in to change notification settings


Repository files navigation

Music Genre Classification


Recognizing music genre is a challenging task in the area of music information retrieval. Two approaches are studied here:

  1. Spectrogram based end-to-end image classification using a CNN (VGG-16)
  2. Feature Engineering Approach using Logistic Regression, SVMs, Random Forest and eXtreme Gradient Boosting.

For a detailed description about the project, please refer to Music Genre Classification using Machine Learning Techniques, published on arXiv.


The Audio Set data released by Google is used in this study. Specifically, only the wav files that correspond to the following class labels are extracted from YouTube based on the video link, start and end times.


  • tensorflow-gpu==1.3.0
  • Keras==2.0.8
  • numpy==1.12.1
  • pandas==0.22.0
  • youtube-dl==2018.2.4
  • scipy==0.19.0
  • librosa==0.5.1
  • tqdm==4.19.1
  • scipy==0.19.0
  • Pillow==4.1.1

Note: If you encounter any problem in installing the modules you just need to go to python unofficial binnaries and according to your python version you can install them.


  1. First, the audio wav files need to be downloaded using the tool youtube-dl. For this run Note that the each file is about 880 KB, totally upto 34 GB!
  2. Next, generate MEL spectrograms by running If needed, you may modify the same file to change the Short Time Fourier Transform (STFT) parameters.
  3. The next step is to run the models. Please refer to the corresponding Jupyter notebooks. The deep learning based models are present in notebooks 3.1, 3.2 and 3.3. Notebooks 4 and 5 contains steps for feature extraction (run and building the classifiers using sklearn.


The models are evaluated on the basis on AUC, accuracy and Fscore.

The most important 20 features based on the XGB classifier are shown below. The metric on the x-axis refers to the number of times a given features appears as a decision node in all of the decision trees used to build the gradient boost predictor.

The confusion matrix of the ensemble XGB and CNN classifier:


Recognizing the genre of music files using machine learning and deep learning models








No releases published


No packages published