# MCT4052 Workshop 6e: K-Fold Cross Validation

*Author: Stefano Fasciani, stefano.fasciani@imv.uio.no, Department of Musicology, University of Oslo.*

In this notebook we use the K-Fold cross validation to provide a more fair evaluation of the performances of a supervised machine learning model. More methodological information on cross-validation are available [here](https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation).

The [KFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html) object in scikit-learn facilitates the creation of K-Fold train/test split to perform the cross-validation manually and perhaps perform a deeper analysis of performances (e.g. analyze what is going wrong at each time).

Instead, in this example we use use the [cross-validate](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html#sklearn.model_selection.cross_validate) object which performs the K-Fold train/test splitting internally and provide us only with performance metrics (and additional computational info such as the training time).

For more information on the possible scoring parameters that cross-valudate can compute, refer to this [page](https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter).

Keep in mind that when using the [cross-validate](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html#sklearn.model_selection.cross_validate) all scores follow the convention that higher return values are better than lower return values. Thus metrics which measure the distance between the model and the data, like metrics.mean_squared_error, are available as neg_mean_squared_error which return the negated value of the metric.


In [1]:
import numpy as np
import pandas as pd
import librosa
import sklearn
import os

In [2]:
#loading files and extracting features
metadata = pd.read_csv('./data/examples4/meta.csv')
classes = list(metadata.label.unique())
print('There are',len(classes),'different classes:',classes)

sr = 22050

def extract_features(filename, sr):
    signal, dummy = librosa.load(filename, sr, mono=True)
    output = np.mean(librosa.feature.mfcc(signal, n_mfcc=20), axis=1)
    return output

print('number of files in database',len(metadata.index))
features = np.zeros((len(metadata.index),20))
labels = np.zeros((len(metadata.index)))

for i, row in metadata.iterrows():
    features[i,:] = extract_features('./data/examples4/'+row['filename'], sr)
    labels[i] = (classes.index(row['label']))

print('Done!')

There are 5 different classes: ['cello', 'guitar', 'clarinet', 'flute', 'harmonica']
number of files in database 60
Done!


### 1. Applying cross validation to a classifier
In this example we simply show how to use the vross_validate object to a classifier. We havent scaled the features (ANN should able to handle that). Note that we pass the entire set of features/labels, the splitting happens internally. We only specify the number of folds (i.e. cv).

Besises listing the score details, we also compute and display the average (mean) which is a measure of the system performances and the variance, which when small, give us a measure of the reliability of the performance measure.

In [3]:
from sklearn.neural_network import MLPClassifier

#creating classifier
mlp = MLPClassifier(hidden_layer_sizes=(20,5), max_iter=10000, activation='relu')

#initializing and running the cross validator with classifier, features, labels, scores, and number of splits
#with cv=5 we partition the data into 5 splits of 20% and use 4 for trainign and 1 for testing.
#common value of cv (which is the k of k-fold) are 3 to 10, or the size of dataset per class
#(i.e. leaving only one sample out) when the dataset is homogeneous (same number of samples per class)
scores = sklearn.model_selection.cross_validate(mlp, features, labels, cv=5,scoring=('f1_macro', 'accuracy'),return_train_score=True)

print(scores,'\n')
print('Accuracy mean and variance', np.mean(scores['test_accuracy']),np.var(scores['test_accuracy']),'\n')
print('F1 macro mean and variance', np.mean(scores['test_f1_macro']),np.var(scores['test_f1_macro']),'\n')


{'fit_time': array([1.49261808, 0.16260791, 0.07557011, 0.00897598, 0.16454887]), 'score_time': array([0.00188184, 0.00122595, 0.00168872, 0.00166488, 0.00125694]), 'test_f1_macro': array([0.72666667, 0.08571429, 0.08      , 0.05714286, 0.1030303 ]), 'train_f1_macro': array([1.        , 0.10064935, 0.06315789, 0.06896552, 0.18316498]), 'test_accuracy': array([0.75      , 0.25      , 0.25      , 0.16666667, 0.16666667]), 'train_accuracy': array([1.        , 0.20833333, 0.1875    , 0.20833333, 0.27083333])} 

Accuracy mean and variance 0.3166666666666667 0.04833333333333333 

F1 macro mean and variance 0.21051082251082248 0.06681958014280094 



### 2. Applying cross validation to a pipeline
Since the train/test split is done internally in the cross validator we also need to pack all object that should be trained and evaluated with the same train/test split. Pipelines are essential tools for this purpose.


In [4]:
from sklearn.pipeline import Pipeline
from sklearn.neural_network import MLPClassifier
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

pipe = Pipeline([
        ('scaler', StandardScaler()),
        ('dim_red', PCA(n_components = 10)),
        ('classifier', MLPClassifier(hidden_layer_sizes=(20,5), max_iter=10000, activation='relu'))
        ])

#initializing and running the cross validator with classifier, features, labels, scores, and number of splits
#with cv=5 we partition iteratively the data into 5 splits of 20% and use 4 for trainign and 1 for testing.
#common value of cv (which is the k of k-fold) are 3 to 10, or the size of dataset per class
#(i.e. leaving only one sample out) when the dataset is homogeneous (same number of samples per class)
scores = sklearn.model_selection.cross_validate(pipe, features, labels, cv=5,scoring=('f1_macro', 'accuracy'),return_train_score=True)

print(scores,'\n')
print('Accuracy mean and variance', np.mean(scores['test_accuracy']),np.var(scores['test_accuracy']),'\n')
print('F1 macro mean and variance', np.mean(scores['test_f1_macro']),np.var(scores['test_f1_macro']),'\n')


{'fit_time': array([0.43313169, 0.51307011, 0.32755208, 0.6560142 , 0.55106425]), 'score_time': array([0.00180602, 0.00203609, 0.00141191, 0.00187683, 0.00184488]), 'test_f1_macro': array([0.58095238, 0.74      , 0.58666667, 0.76      , 0.70952381]), 'train_f1_macro': array([1.        , 0.97894737, 1.        , 1.        , 1.        ]), 'test_accuracy': array([0.66666667, 0.75      , 0.66666667, 0.75      , 0.75      ]), 'train_accuracy': array([1.        , 0.97916667, 1.        , 1.        , 1.        ])} 

Accuracy mean and variance 0.7166666666666666 0.0016666666666666683 

F1 macro mean and variance 0.6754285714285715 0.005857741496598638 



### 3. Follow up activity

Apply the cross validation on the pipeline you developed for the previous notebook (i.e. Workshop 6d). Try different fold values (cv) and use different scoring metrics.