# FMA: A Dataset For Music Analysis

Kirell Benzi, Michaël Defferrard, Pierre Vandergheynst, Xavier Bresson, EPFL LTS2.

## Baselines

We explore three types of baselines:
1. simple algorithms,
2. state-of-the-art in genre recognition,
3. deep Learning approaches,

using different input features:
1. raw audio,
2. echonest features,
3. audio features from librosa.

We aim at showing that given sufficient data, DL approaches can outperfom all the others without domain-specific / expert knowledge.

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import utils
import librosa
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import IPython.display as ipd
import time
import os.path

from sklearn.preprocessing import MultiLabelBinarizer, LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC, LinearSVC
#from sklearn.gaussian_process import GaussianProcessClassifier
#from sklearn.gaussian_process.kernels import RBF
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.multiclass import OneVsRestClassifier

In [2]:
DATA_DIR = os.path.join('..', 'fma_small')
df = pd.read_json(os.path.join(DATA_DIR, 'fma_small.json'))

## 1 Simple classifiers

Maximum observed with simple classifiers on Echonest features is around 38%.

Todo:
* Cross-validation for hyper-parameters.
* Dimensionality reduction?

### 1.1 Pre-processing

In [3]:
# Select features.
#features = utils.ECHONEST_AUDIO_FEATURES + utils.ECHONEST_SOCIAL_FEATURES
features = utils.ECHONEST_AUDIO_FEATURES

# Discard songs with NaN Echonest features.
# TODO: fix dataset.
keep = df[features].isnull().apply(lambda x: not x.any(), axis=1)
df = df[keep]

In [4]:
def pre_process(df, features, multi_label=False):
    if not multi_label:
        # Assign an integer value to each genre.
        enc = LabelEncoder()
        y = enc.fit_transform(df['top_genre'])
    else:
        # Create an indicator matrix.
        enc = MultiLabelBinarizer()
        y = enc.fit_transform(df['genres'])
    print('Genres ({}): {}'.format(len(enc.classes_), enc.classes_))

    X = df[features].as_matrix()
    
    # Split in training, validation and testing sets.
    train = df['train'] == True
    y_train = y[train]
    y_test = y[~train]
    X_train = X[train]
    X_test = X[~train]
    X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=.2, random_state=42)
    print('{} training examples, {} validation examples, {} testing examples'.format(y_train.shape[0], y_val.shape[0], y_test.shape[0]))
    print('{} features'.format(X_train.shape[1]))
    
    # Standardize features by removing the mean and scaling to unit variance.
    scaler = StandardScaler(copy=False)
    scaler.fit_transform(X_train)
    scaler.transform(X_val)
    scaler.transform(X_test)
    
    return y_train, y_val, y_test, X_train, X_val, X_test

### 1.2 Single genre

In [5]:
y_train, y_val, y_test, X_train, X_val, X_test = pre_process(df, features)

classifiers = [
    LogisticRegression(),
    KNeighborsClassifier(n_neighbors=200),
    SVC(),
    SVC(kernel="linear"),
    LinearSVC(),
    #GaussianProcessClassifier(1.0 * RBF(1.0), warm_start=True),
    DecisionTreeClassifier(max_depth=5),
    RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1),
    AdaBoostClassifier(n_estimators=10),
    MLPClassifier(max_iter=400),
    GaussianNB(),
    QuadraticDiscriminantAnalysis(),
]

for clf in classifiers:
    t = time.process_time()
    clf.fit(X_train, y_train)
    score = clf.score(X_test, y_test)
    print('{:.2f}% {:.2f}s {}'.format(score*100, time.process_time()-t, type(clf).__name__))

Genres (10): ['Electronic' 'Folk' 'Hip-Hop' 'Indie-Rock' 'Jazz' 'Old-Time / Historic'
 'Pop' 'Psych-Rock' 'Punk' 'Rock']
2524 training examples, 631 validation examples, 788 testing examples
8 features
33.25% 0.04s LogisticRegression
31.98% 0.07s KNeighborsClassifier
36.04% 0.37s SVC
33.50% 0.27s SVC
33.25% 1.00s LinearSVC
34.14% 0.01s DecisionTreeClassifier
33.76% 0.02s RandomForestClassifier
31.09% 0.05s AdaBoostClassifier
36.93% 4.40s MLPClassifier
31.47% 0.00s GaussianNB
31.22% 0.00s QuadraticDiscriminantAnalysis


### 1.3 Multiple genres

In [6]:
y_train, y_val, y_test, X_train, X_val, X_test = pre_process(df, features, multi_label=True)

classifiers = [
    #LogisticRegression(),
    OneVsRestClassifier(LogisticRegression()),
    OneVsRestClassifier(SVC()),
]

for clf in classifiers:
    t = time.process_time()
    clf.fit(X_train, y_train)
    score = clf.score(X_test, y_test)
    print('{:.2f}% {:.2f}s {}'.format(score*100, time.process_time()-t, type(clf).__name__))

Genres (108): ['20th Century Classical' 'African' 'Afrobeat' 'Alternative Hip-Hop'
 'Americana' 'Asia-Far East' 'Balkan' 'Big Band/Swing' 'Bigbeat'
 'Bluegrass' 'Bollywood' 'Brazilian' 'Breakbeat' 'Breakcore - Hard'
 'British Folk' 'Chamber Music' 'Chill-out' 'Chip Music' 'Chiptune'
 'Classical' 'Composed Music' 'Country' 'Country & Western' 'Cumbia'
 'Dance' 'Disco' 'Downtempo' 'Drone' 'Dubstep' 'Easy Listening'
 'Easy Listening: Vocal' 'Electro-Punk' 'Electroacoustic' 'Electronic'
 'Europe' 'Flamenco' 'Folk' 'Freak-Folk' 'Free-Folk' 'Free-Jazz' 'French'
 'Funk' 'Gospel' 'Goth' 'Hardcore' 'Hip-Hop' 'Hip-Hop Beats' 'Holiday'
 'House' 'IDM' 'Improv' 'Indian' 'Indie-Rock' 'Industrial' 'Instrumental'
 'Interview' 'Jazz' 'Jazz: Out' 'Jazz: Vocal' 'Klezmer' 'Krautrock' 'Latin'
 'Latin America' 'Loud-Rock' 'Lounge' 'Metal' 'Middle East'
 'Minimal Electronic' 'Minimalism' 'Modern Jazz' 'Musique Concrete'
 'New Age' 'New Wave' 'No Wave' 'Nu-Jazz' 'Old-Time / Historic' 'Opera'
 'Polka' 'Pop' 'P

  str(classes[c]))
  str(classes[c]))
  str(classes[c]))
  str(classes[c]))
  str(classes[c]))
  str(classes[c]))
  str(classes[c]))


10.15% 0.54s OneVsRestClassifier


  str(classes[c]))
  str(classes[c]))
  str(classes[c]))
  str(classes[c]))
  str(classes[c]))
  str(classes[c]))
  str(classes[c]))


11.42% 2.77s OneVsRestClassifier
