# OpenMIC-2018 baseline model tutorial

This notebook demonstrates how to replicate the baseline modeling experiment in [(Humphrey, Durand, and McFee, 2018)](http://ismir2018.ircam.fr/doc/pdfs/203_Paper.pdf).

We'll load in the pre-computed VGGish features and labels, and fit a RandomForest model for each of the 20 instrument classes using the pre-defined train-test splits provided in the repository.

We'll then evaluate the models we fit, and show how to apply them to a new audio file that wasn't included in OpenMIC-2018.

In [100]:
# These dependencies are necessary for loading the data
import os
import json

import numpy as np
import pandas as pd

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# We'll use the openmic package to preprocess new recordings
# for classification beyond the dataset
import openmic

## Loading the data

The openmic data is provided in a python-friendly format as `openmic-2018.npz`.

You can load it as follows:

In [28]:
OPENMIC = np.load('openmic-2018.npz')

In [29]:
# What's included?
print(list(OPENMIC.keys()))

['X', 'Y_true', 'Y_mask', 'sample_key']


### What's included in the data?

- `X`: 20000 * 10 * 128 array of VGGish features
    - First index (0..19999) corresponds to the sample key
    - Second index (0..9) corresponds to the time within the clip
    - Third index (0..127) corresponds to the VGGish features at each point in the 10sec clip
    - Example `X[40, 8]` is the 128-dimensional feature vector for the 9th second in the 41st example
- `Y_true`: 20000 * 20 array of *true* label probabilities
    - First index corresponds to sample key, as above
    - Second index corresponds to the label class (accordion, ..., voice)
    - Example: `Y[40, 4]` indicates the confidence that example #41 contains the 5th instrument
- `Y_mask`: 20000 * 20 binary mask values
    - First index corresponds to sample key
    - Second index corresponds to the label class
    - Example: `Y[40, 4]` indicates whether or not we have observations for the 5th instrument for example #41
- `sample_key`: 20000 array of sample key strings
    - Example: `sample_key[40]` is the sample key for example #41

In [35]:
OPENMIC['X'].shape

(20000, 10, 128)

In [37]:
# Features for the 9th second of t
OPENMIC['X'][80, 8]

array([192,  30, 176, 126, 208,  85,  84,  95,  69, 234,  99, 118, 166,
       150, 106,  68, 165, 156, 146, 206,  75, 210, 131,  49,  61, 218,
        92, 152, 121, 167,  62, 166, 167, 237,  22, 168, 165, 137, 178,
       132, 196,  96,  54, 166, 169, 132,  59,  27,  46, 123,  89,  47,
        58, 116,  48, 188, 157,  28,  44, 252, 248, 100,  28, 154, 147,
       148, 204, 104,  95,  67, 109, 147, 204, 146, 196, 222,  90, 255,
        94, 171,  53, 133, 202, 152,  35,  55, 231, 255,  62, 227, 168,
       192,  87, 144, 130, 255,   0,   0, 163,  75, 255, 135, 216,  68,
         0, 199,   0, 193, 254, 114,  12, 255,   0,  74, 165,   0, 201,
       246,   0, 127, 211, 218, 164,  57, 238, 176, 158, 255])

In [39]:
OPENMIC['Y_true'][40]

array([0.5    , 0.5    , 0.5    , 0.5    , 0.5    , 0.15055, 0.5    ,
       0.5    , 0.5    , 0.5    , 0.5    , 0.5    , 0.5    , 0.5    ,
       0.5    , 0.5    , 0.5    , 0.5    , 0.5    , 0.5    ])

In [40]:
OPENMIC['Y_mask'][40]

array([False, False, False, False, False,  True, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False])

In [42]:
OPENMIC['sample_key'].shape

(20000,)

In [43]:
OPENMIC['sample_key'][40]

'000385_249600'

In [46]:
# It will be easier to use if we make direct variable names for everything
X, Y_true, Y_mask, sample_key = OPENMIC['X'], OPENMIC['Y_true'], OPENMIC['Y_mask'], OPENMIC['sample_key']

### Load the class map

For convenience, we provide a simple JSON object that maps class indices to names.


In [82]:
with open('class-map.json', 'r') as f:
    class_map = json.load(f)

In [83]:
class_map

{'accordion': 0,
 'banjo': 1,
 'bass': 2,
 'cello': 3,
 'clarinet': 4,
 'cymbals': 5,
 'drums': 6,
 'flute': 7,
 'guitar': 8,
 'mallet_percussion': 9,
 'mandolin': 10,
 'organ': 11,
 'piano': 12,
 'saxophone': 13,
 'synthesizer': 14,
 'trombone': 15,
 'trumpet': 16,
 'ukulele': 17,
 'violin': 18,
 'voice': 19}

## Loading the train-test splits

OpenMIC-2018 comes with a pre-defined train-test split.  Great care was taken to ensure that this split is approximately balanced and artists are not represented in both sides of the split, so please use it!

This is done by sample key, not row number, so you will need to go through the `sample_key` array to slice the data.

In [52]:
# Let's split the data into the training and test set
# We use squeeze=True here to return a single array for each, rather than a full DataFrame
split_train = pd.read_csv('split01_train.csv', header=None, squeeze=True)
split_test = pd.read_csv('split01_test.csv', header=None, squeeze=True)

In [54]:
# These two tables contain the sample keys for training and testing examples
# Let's see the keys for the first five training example
split_train.head(5)

0      000046_3840
1    000135_483840
2    000139_119040
3    000141_153600
4     000144_30720
Name: 0, dtype: object

In [55]:
# How many train and test examples do we have?  About 75%/25%
print('# Train: {},  # Test: {}'.format(len(split_train), len(split_test)))

# Train: 14915,  # Test: 5085


### Making sets
These sample key maps are easier to use as sets, so let's make them sets!

In [63]:
train_set = set(split_train)
test_set = set(split_test)

### Split the data

Now that we have the sample keys for the training and testing examples, we need to partition the data arrays (`X`, `Y_true`, `Y_mask`).

This is a little delicate to get right.

In [84]:
# These loops go through all sample keys, and save their row numbers
# to either idx_train or idx_test
#
# This will be useful in the next step for slicing the array data
idx_train, idx_test = [], []

for idx, n in enumerate(sample_key):
    if n in train_set:
        idx_train.append(idx)
    elif n in test_set:
        idx_test.append(idx)
    else:
        # This should never happen, but better safe than sorry.
        raise RuntimeError('Unknown sample key={}! Abort!'.format(sample_key[n]))
        
# Finally, cast the idx_* arrays to numpy structures
idx_train = np.asarray(idx_train)
idx_test = np.asarray(idx_test)

In [85]:
X_train = X[idx_train]
X_test = X[idx_test]

Y_mask_train = Y_mask[idx_train]
Y_mask_test = Y_mask[idx_test]

Y_true_train = Y_true[idx_train]
Y_true_test = Y_true[idx_test]

In [86]:
# Print out the sliced shapes as a sanity check
print(X_train.shape)
print(X_test.shape)

(14915, 10, 128)
(5085, 10, 128)


In [101]:
# This dictionary will include the classifiers for each model
models = dict()

# We'll iterate over all istrument classes, and fit a model for each one
# After training, we'll print a classification report for each instrument
for instrument in class_map:
    
    # Map the instrument name to its column number
    inst_num = class_map[instrument]
    
    # Initialize a new classifier
    clf = RandomForestClassifier(max_depth=8, random_state=0)
    
    # First, we need to select down to the data for which we have annotations
    # This is what the mask arrays are for
    train_inst = Y_mask_train[:, inst_num]
    test_inst = Y_mask_test[:, inst_num]
    
    # Here, we're using the Y_mask_train array to slice out only the training examples
    # for which we have annotations for the given class
    X_train_inst = X_train[train_inst]
    
    # Let's arrange the data for a sklearn Random Forest model 
    # Instead of having time-varying features, we'll summarize each track by its mean feature vector over time
    X_train_inst_sklearn = np.mean(X_train_inst, axis=1)
    
    # Again, we slice the labels down 
    Y_true_train_inst = Y_true_train[train_inst, inst_num] >= 0.5

    
    
    # Repeat the above slicing and dicing but for the test set
    X_test_inst = X_test[test_inst]
    X_test_inst_sklearn = np.mean(X_test_inst, axis=1)
    Y_true_test_inst = Y_true_test[test_inst, inst_num] >= 0.5
    
    clf.fit(X_train_inst_sklearn, Y_true_train_inst)

    # Finally, we'll evaluate the model on both train and test
    
    Y_pred_train = clf.predict(X_train_inst_sklearn)
    Y_pred_test = clf.predict(X_test_inst_sklearn)
    
    print('-' * 52)
    print(instrument)
    print('\tTRAIN')
    print(classification_report(Y_true_train_inst, Y_pred_train))
    print('\tTEST')
    print(classification_report(Y_true_test_inst, Y_pred_test))
    
    # Store the classifier in our dictionary
    models[instrument] = clf

----------------------------------------------------
accordion
	TRAIN
             precision    recall  f1-score   support

      False       0.96      0.98      0.97      1159
       True       0.95      0.87      0.91       374

avg / total       0.96      0.96      0.96      1533

	TEST
             precision    recall  f1-score   support

      False       0.84      0.94      0.89       423
       True       0.60      0.35      0.44       115

avg / total       0.79      0.81      0.79       538

----------------------------------------------------
banjo
	TRAIN
             precision    recall  f1-score   support

      False       0.95      0.96      0.96      1148
       True       0.92      0.90      0.91       592

avg / total       0.94      0.94      0.94      1740

	TEST
             precision    recall  f1-score   support

      False       0.80      0.85      0.83       338
       True       0.58      0.50      0.54       140

avg / total       0.74      0.75      0.74    

# Let's now use the model end-to-end on new audio

In [9]:
# first go from audio to VGGish
outpath = os.path.split(os.getcwd())[0] + '/tests/data/'
file_in = [os.path.split(os.getcwd())[0] + '/tests/data/audio/000046_3840.ogg']
featurefy.
featurefy.main(file_in, outpath)

  0%|          | 0/1 [00:00<?, ?it/s]

INFO:tensorflow:Restoring parameters from /Users/durand/miniconda2/envs/py36/lib/python3.6/site-packages/openmic/vggish/__model__/vggish_model.ckpt


100%|██████████| 1/1 [00:02<00:00,  2.61s/it]


[True]

In [None]:
examples = openmic.vggish.soundfile_to_examples()
time_points, features = openmic.vggish.transform(examples)

In [10]:
# second go from VGGish to numpy format
file_out = os.path.join(outpath,
                        os.path.extsep.join([filebase(str(file_in)), 'npz']))
vggish_new = np.load(file_out)
time_len, _ = np.shape(vggish_new['features_z'])
input_num = int(time_len / 10)
X_new = np.empty([input_num, 10, 128], dtype=int)
for ii in range(input_num):
    X_new[ii, :, :] = vggish_new['features_z'][ii * 10:(ii+1) * 10, :]
X_new_sklearn = np.concatenate((np.std(X_new, axis=1), np.std(X_new, axis=1)), axis=1)
X_new_sklearn = np.nan_to_num(X_new_sklearn)


In [99]:
# finally, apply the classifier
for instrument in models:
    clf = models[instrument]
    #filename = os.getcwd() + '/baseline-models/clf_joblib_' + instrument + '.sav'
    
    print('Probability of', instrument, 'is:', np.max(clf.predict_proba(X_new_sklearn)[:,1]))

NameError: name 'X_new_sklearn' is not defined