This project was completed during my time on the General Assembly Data Science Immersive course.
Through my analysis of the Free Music Archive audio files and use of the Python package, Librosa. I aim to determine the class of music genre of an audio file. As well as understand the key predictors of this type of classification problem.
I performed various statistical and machine learning models for the classification problem including one hyperparameter grid search on AWS. The model that generalises best was a Random Forest Classifier, predicting the music genre with a 65% accuracy compared to the baseline of 30%. Key insights and limitations in accuracy were found, such as increase of bias from class imbalance, inaccurate tagging of metadata for model training, limit of explained variance due to required dimensionality reduction.
Future improvements and general limitations have been summarised below.
Presentation slides: Used to present the project, findings, limitations and future improvements to a non-technical audience. Technical report: Report aimed for a technical audience. It contains a detailed explanation of the extraction methods employed and general methodology, exploratory data analysis, preprocessing steps, modelling stage, findings, limitations and improvements for the future.
-
Presentation Slides : Used to present the project, findings, limitations and future improvements to a non-technical audience.
-
Technical Report : Report aimed for a technical audience. It contains a detailed explanation of the extraction methods employed and general methodology, exploratory data analysis, preprocessing steps, modelling stage, findings, limitations and improvements for the future.
-
Jupyter Notebook files (.ipynb)
- 1.0 : Dataset Inspection
- 2.0 : Features
- 3.0 : Methodology
- SS
- PCA
- Collating genres (folders 51 - 100)
- 4.0 : EDA
- 5.0 : Modelling
- 6.0 : Evaluation (SS)
- 7.0 : Further EDA (SS)
- 8.0 : Unsupervised learning
Put very simply the main question I am asking - “Is there a way to automate music genre classification?”. Ultimately the current norm is still to manually assign genre labels leading to slow and tedious tagging methods. Automating this procedure is crucial for large music databases and music information retrieval systems as the music industry accelerates into the digital realm.
My main objectives for this project were to:
- Build a 10+ music genre classifier with > 70% accuracy
- Compare unsupervised classifier clusters to actual
- Classify sub-genres or styles
I used the publicly available data from FMA originally found on ISMIR. I downloaded the FMA medium zip file which includes:
- 16,000 tracks (22 GiB), mp3 format, 30s length
- 16 genres Unbalanced Genres
The FMA medium dataset is a very unbalanced dataset with Rock and Blues having a count of 4000+ and 200 respectively. A few genres also did not seem to represent a modern genre and decided not to include them ; ‘Easy listening’, ‘Spoken’, ‘Old-Time / Historic’ and ‘International’.
Ended up reducing to ~14,800 tracks, covering 12 genres (12 class problem)
Librosa is a python package for music and audio analysis. It provides the building blocks necessary to create music information retrieval systems. Includes capabilites for :
- Visualisation
- Audio Playback
- Feature Extraction (Spectral, Rhythm , Onset Detection, beats and tempo etc.)
Python package Librosa can be found here
Since extracted features from Librosa are composed of high dimensional arrays we cannot model on these as individual features due to limited processing power and disk space. Needed to process further with dimensionality reduction techniques. I employed the following approaches :
- Process extracted features with summary statistics (SS)
- Principal component analysis on extracted Features (PCA)
Some of the brief data analysis of classes and particular features shown here :
Full EDA for either method can be found in SS and PCA
Given both dimensionality reduction techniques employed I have summarised the general methodologies for either approach below :
- After extracting the desired features, get summary statistics for each feature (mean, median, std, min, max, kurtosis, skew)
- Calculate each statistic on'b' a total of 'a' times for a given feature array with shape : (a , b)
- For each track, store each statistic for a given feature in a dictionary (dictionary within dictionary)
- Model on summary statistics
- Created a function which takes the ATS (amplitude spectogram, y) for each track_id
- Calculate the Decibel Scaled Spectogram, DB for each track_id
- Apply PCA (chose number of components such that total explained variance was 97%)
- Model on PCA components
For the multiclass classification problem I selected the following classifiers :
- Random Forest
- SVM
- KNN
- Logistic Regression
- Decision Tree
- Adaboost (Decision Tree as base estimator)
- Naive Bayes
- Bagging
After hyperparameter tuning with various GridSearches, the model with the highest CV score was found to be a Random Forest classifier. The top 3 accuracy scores for 12 unbalanced genre classes :
Model | Scores |
---|---|
Random Forest | 0.64 |
SVM | 0.61 |
KNN | 0.58 |
We are able to identify a handful of the most important ones but not to a highly significant degree to have them stand out from the other indicators. The most reaccuring important features at the top of the list were 'H', 'P', 'melspec' and 'contrast' which denote the harmonic and or percussive elements of the track as well the dynamic of the spectral range and their difference. These are some of the most intuitive criterias that one could perceive objectively to distinguish between genres. (spectral peak, valley, and their difference in each frequency subband)
As seen below these 'H', 'P' and spectral features seem to distinguish certain genres relatively well.
- Audio features high dimensionality : When dealing with audio features the high dimensionality of the extracted raw features limits the number of features that can be modelled on. Moreover, the high dimensionality requires reduction techniques to be employed limiting the variance these models can train on.
- Inaccurately tagged meta-data : Having inaccurately tagged meta-data regarding a tracks' genre leads to a wrong representation of the data to be modelled on, increasing false positives and false negatives, overall reducing the model accuracy.
- Unknown classes : Equivalently, having other established genres in the dataset will also diminish model performance in a similar manner.
- Class Imbalance : The large class imbalance increases bias as machine learning classifiers tend to be more biased towards majority class, badly classifying minority classes. In the future, need to increase dataset and adjust class supports to reach a balanced data pool to train on.
Labeling music into genres is arbitrary, and the line between one genre and another is more often than not, blurred. As if it has been formed from several ‘flavours’ of genres, with a set of sub-genres and a type of ‘style’. Genre classification can be quite subjective and not so simple as classifying a colour, with a certain measurable wavelength. However, there are perceptual criteria that are related to instrumentation, structure of the rhythm, harmonics and texture of the music that can play a role in characterising a particular genre. Methods for automated genre classification would add value to many music information retrival systems, music apps and music streaming platforms of which many are still manually labelled.
See the requirements.txt file for specific dependencies.
NumPy
Pandas
Matplotlib
Seaborn
Pickle
Scikit-learn
Librosa
Copyright © ISMIR, 2000-2021