# FMA: A Dataset For Music Analysis

Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson, EPFL LTS2.

## Usage

1. Download dataset from <https://github.com/mdeff/fma>.
2. Uncompress the archive, e.g. with `unzip fma_small.zip`.
3. Load and play with the data in this notebook.

In [None]:
%matplotlib inline

import utils
import librosa
from sklearn.utils import shuffle
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import IPython.display as ipd
import os

In [None]:
# Load metadata and features.
tracks = utils.load('tracks.csv')
genres = utils.load('genres.csv')
features = utils.load('features.csv')
echonest = utils.load('echonest.csv')

np.testing.assert_array_equal(features.index, tracks.index)

# Directory where mp3 are stored.
AUDIO_DIR = os.environ.get('AUDIO_DIR')

tracks.shape, genres.shape, features.shape, echonest.shape

## 1 Metadata

The metadata table, a JSON file in the root directory of the archive, is composed of many colums:
1. The index is the ID of the song, taken from the FMA, used as the name of the audio file.
2. Per-track, per-album and per-artist metadata from the Free Music Archive website.
3. Two columns to indicate the subset (small, medium, large, full) and the split (training, validation, test).

In [None]:
ipd.display(tracks['track'].head())
ipd.display(tracks['album'].head())
ipd.display(tracks['artist'].head())
ipd.display(tracks['set'].head())

## 2 Genres

In [None]:
print('{} top-level genres'.format(len(genres['top_level'].unique())))
genres.loc[genres['top_level'].unique()].sort_values('#tracks', ascending=False)

In [None]:
genres.sort_values('#tracks').head(20)

## 3 Features

1. Features extracted from the audio for all tracks.
2. For some tracks, data colected from the [Echonest](http://the.echonest.com/) API.

In [None]:
print('{1} features for {0} tracks'.format(*features.shape))
columns = ['mfcc', 'chroma_cens', 'tonnetz', 'spectral_contrast']
columns.append(['spectral_centroid', 'spectral_bandwidth', 'spectral_rolloff'])
columns.append(['rmse', 'zcr'])
for column in columns:
    ipd.display(features[column].head().style.format('{:.2f}'))

In [None]:
print('{1} features for {0} tracks'.format(*echonest.shape))
ipd.display(echonest['echonest', 'metadata'].head())
ipd.display(echonest['echonest', 'audio_features'].head())
ipd.display(echonest['echonest', 'social_features'].head())
ipd.display(echonest['echonest', 'ranks'].head())

In [None]:
ipd.display(echonest['echonest', 'temporal_features'].head())
x = echonest.loc[5, ('echonest', 'temporal_features')]
plt.figure(figsize=(15, 5))
plt.plot(x);

# 4 Audio

You can listen to an audio excerpt with the below code.

In [None]:
filename = utils.get_audio_path(AUDIO_DIR, 2)
print('File: {}'.format(filename))
ipd.Audio(filename)

And use [librosa](https://github.com/librosa/librosa) to extract the raw waveform and compute audio features.

In [None]:
x, sr = librosa.load(filename)
print('Duration: {:.2f}s, {} samples'.format(x.shape[0] / sr, x.size))
ipd.display(ipd.Audio(data=x, rate=sr))

plt.figure(figsize=(15, 5))
plt.plot(x)

plt.figure(figsize=(15, 5))
S, freqs, bins, im = plt.specgram(x, NFFT=1024, Fs=sr, noverlap=512)

## 5 Genre classification

### 5.1 From features

In [None]:
small = tracks['set', 'subset'] <= 'small'

train = tracks['set', 'split'] == 'training'
val = tracks['set', 'split'] == 'validation'
test = tracks['set', 'split'] == 'test'

y_train = tracks.loc[small & train, ('track', 'genre_top')]
y_test = tracks.loc[small & test, ('track', 'genre_top')]
X_train = features.loc[small & train, 'mfcc']
X_test = features.loc[small & test, 'mfcc']

print('{} training examples, {} testing examples'.format(y_train.size, y_test.size))
print('{} features, {} classes'.format(X_train.shape[1], np.unique(y_train).size))

In [None]:
# Be sure training samples are shuffled.
X_train, y_train = shuffle(X_train, y_train, random_state=42)

# Standardize features by removing the mean and scaling to unit variance.
scaler = StandardScaler(copy=False)
scaler.fit_transform(X_train)
scaler.transform(X_test)

# Support vector classification.
clf = SVC()
clf.fit(X_train, y_train)
score = clf.score(X_test, y_test)
print('Accuracy: {:.2%}'.format(score))

### 5.2 From audio