Skip to content
Recognizing the genre of music files using machine learning and deep learning models
Branch: master
Clone or download
Latest commit bfee65b Apr 29, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data-files Initial commit with code and files Mar 25, 2018
pickle_files Initial commit with code and files Mar 25, 2018
plots Update README Mar 25, 2018
pred_probas Initial commit with code and files Mar 25, 2018
saved_models Initial commit with code and files Mar 25, 2018
spectrogram_images Initial commit with code and files Mar 25, 2018
wav_files Initial commit with code and files Mar 25, 2018
1_audio_retrieval.ipynb Initial commit with code and files Mar 25, 2018
2_plot_spectrogram.ipynb Initial commit with code and files Mar 25, 2018
3.1_vgg_model_transfer_learning.ipynb Initial commit with code and files Mar 25, 2018
3.2_vgg_model_fine_tuning.ipynb Initial commit with code and files Mar 25, 2018
3.3_feed_forward_baseline.ipynb Initial commit with code and files Mar 25, 2018
4_feature_extraction.ipynb Initial commit with code and files Mar 25, 2018
5_model_building.ipynb Initial commit with code and files Mar 25, 2018
LICENSE Create LICENSE Mar 25, 2018
Music_Genre_Classification_Paper.pdf Added paper and presentation slides Apr 3, 2018
Music_Genre_Classification_Slides.pdf Added paper and presentation slides Apr 3, 2018
README.md Update README.md Apr 29, 2019
audio_retrieval.py Initial commit with code and files Mar 25, 2018
df_features.csv Initial commit with code and files Mar 25, 2018
feature_extraction.py Initial commit with code and files Mar 25, 2018
generate_spectrograms.py Initial commit with code and files Mar 25, 2018

README.md

Music Genre Classification

Overview

Recognizing music genre is a challenging task in the area of music information retrieval. Two approaches are studied here:

  1. Spectrogram based end-to-end image classification using a CNN (VGG-16)
  2. Feature Engineering Approach using Logistic Regression, SVMs, Random Forest and eXtreme Gradient Boosting.

For a detailed description about the project, please refer to Music Genre Classification using Machine Learning Techniques, published on arXiv.

Datasets

The Audio Set data released by Google is used in this study. Specifically, only the wav files that correspond to the following class labels are extracted from YouTube based on the video link, start and end times.



Requirements

  • tensorflow-gpu==1.3.0
  • Keras==2.0.8
  • numpy==1.12.1
  • pandas==0.22.0
  • youtube-dl==2018.2.4
  • scipy==0.19.0
  • librosa==0.5.1
  • tqdm==4.19.1
  • scipy==0.19.0
  • Pillow==4.1.1

Instructions

  1. First, the audio wav files need to be downloaded using the tool youtube-dl. For this run audio_retrieval.py. Note that the each file is about 880 KB, totally upto 34 GB!
  2. Next, generate MEL spectrograms by running generate_spectrograms.py. If needed, you may modify the same file to change the Short Time Fourier Transform (STFT) parameters.
  3. The next step is to run the models. Please refer to the corresponding Jupyter notebooks. The deep learning based models are present in notebooks 3.1, 3.2 and 3.3. Notebooks 4 and 5 contains steps for feature extraction (run feature_extraction.py) and building the classifiers using sklearn.

Results

The models are evaluated on the basis on AUC, accuracy and Fscore.

The most important 20 features based on the XGB classifier are shown below. The metric on the x-axis refers to the number of times a given features appears as a decision node in all of the decision trees used to build the gradient boost predictor.

The confusion matrix of the ensemble XGB and CNN classifier:

You can’t perform that action at this time.