This notebook will use various Python libraries to perform build a machine learning model using audio file metadata.
Given features extracted from an audio file, can we predict the genre that the audio belongs to
The data came from the Free Music Archive developed by several people at École Polytechnique Fédérale de Lausann (EPFL) and Nanyang Technological University (NTU). The research paper can be found here and the GitHub repository containing the project files can be found here.
If we can reach accuracy that is above the highest benchmark recorded in the research paper, the project is complete.
Here is the benchmarks taken from the FMA research paper:
Feature set | LR | kNN | SVM | MLP |
---|---|---|---|---|
1 Chroma | 44 | 44 | 48 | 49 |
2 Tonnetz | 40 | 37 | 42 | 41 |
3 MFCC | 58 | 55 | 61 | 53 |
4 Spec. centroid | 42 | 45 | 46 | 48 |
5 Spec. bandwidth | 41 | 45 | 44 | 45 |
6 Spec. contrast | 51 | 50 | 54 | 53 |
7 Spec. rolloff | 42 | 46 | 48 | 48 |
8 RMS energy | 37 | 39 | 39 | 39 |
9 Zero-crossing rate | 42 | 45 | 45 | 46 |
3 + 6 | 60 | 55 | 63 | 54 |
3 + 6 + 4 | 60 | 55 | 63 | 53 |
1 to 9 | 61 | 52 | 63 | 58 |
The values in the table represent the accuracy % of a feature set and a given model.
The models are defined as:
- LR = Linear Regression with an L2 penalty
- kNN = k-nearest neighbours with k = 200
- SVM = support vector machines (SVM) with a radial basis function (RBF)
- MLP = multilayer perceptron (MLP) with 100 hidden neurons
Here are the features that come with each audio file: