# MCT4052 Workshop 6b: Saving and Restoring Trained ML Models with Joblib

*Author: Stefano Fasciani, stefano.fasciani@imv.uio.no, Department of Musicology, University of Oslo.*

When working with machine learning it is important to be able to save a trained model for later use, because training may be time consuming, or because we want to compare a collection of trained models. In this example we "save to file" and "restore from file" the models for scaling, dimensionality reduction and classifier. These models are needed when deploying ML in real-world application. The method detailed in this notebook works with all scikit-learn ML models. The models (i.e. Python objests) are saved/restored to/from files using the [Joblib package](https://joblib.readthedocs.io/en/latest/), which can be also used to save any other object in Python.


In [1]:
#import packages
import numpy as np
import pandas as pd
import librosa, librosa.display
import sklearn
import scipy
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.style as ms
ms.use("seaborn-v0_8")
import IPython.display as Ipd
import os
import joblib

In [2]:
sr = 22050

def lin_interp_2d(data, out_size):
    
    x_in_size = data.shape[1]
    y_in_size = data.shape[0]
    x_in = np.arange(0,x_in_size)
    y_in = np.arange(0,y_in_size)
    interpolator = scipy.interpolate.interp2d(x_in, y_in, data, kind='linear')
    x_out = np.arange(0,x_in_size-1,((x_in_size-1)/out_size[1]))
    y_out = np.arange(0,y_in_size-1,((y_in_size-1)/out_size[0]))
    output = interpolator(x_out, y_out)
    output = output[0:out_size[0],0:out_size[1]]
    
    return output

def extract_features(filename, sr):
    
    signal, dummy = librosa.load(filename, sr=sr, mono=True)
    temp = librosa.feature.melspectrogram(y=signal, n_mels=60)
    melspect1 = lin_interp_2d(temp, (50,10))
    output = melspect1.flatten()
    
    return output


filenames = os.listdir('./data/examples2')
features = np.zeros((len(filenames),500))
labels = np.zeros((len(filenames)))
classes = ['kick','snare','cymbal','clap'] 

for i in range(len(filenames)):
    features[i,:] = extract_features('./data/examples2/'+filenames[i], sr=sr)
    if filenames[i].find('kick') != -1:
        labels[i] = 0
    elif filenames[i].find('snare') != -1:
        labels[i] = 1
    elif filenames[i].find('cymbal') != -1:
        labels[i] = 2
    elif filenames[i].find('clap') != -1:
        labels[i] = 3
        
print('Done!')

Done!


In [3]:
from sklearn.model_selection import train_test_split

#splitting the dataset in training and testing parts
feat_train, feat_test, lab_train, lab_test = train_test_split(features, labels, test_size=0.2, random_state=7)

### 1. Saving and restoring all trained models to/from files

In [4]:
#learning the scaling transformation from the train data and applying it to both train and test set.

#creating scaling object
scaler = sklearn.preprocessing.StandardScaler()

#learning scaling from train set
scaler.fit(feat_train)

#saving the scaler model to file
joblib_file = "scaler_model.pkl"
joblib.dump(scaler, joblib_file)

#restoring the scaler model from file
restored_scaler = joblib.load(joblib_file)

#applying scaling to both train and test set
feat_train = restored_scaler.transform(feat_train)
feat_test = restored_scaler.transform(feat_test)

In [6]:
#Creating an instance of the LDA object, which is an object capable of learning and applying LDA from/to data.
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

lda = LinearDiscriminantAnalysis()

#This will learn LDA projection from train data
lda.fit(feat_train,lab_train)

#saving the dimensionality reduction model to file
joblib_file = "dimred_model.pkl"
joblib.dump(lda, joblib_file)

#restoring the dimensionality reduction model from file
restored_lda = joblib.load(joblib_file)

#Now we project the data using LDA
projected_features_train = restored_lda.transform(feat_train)
projected_features_test = restored_lda.transform(feat_test)

In [7]:
#Creating an instance of a SVM classifier and setting it to use a linear kernel
svm = sklearn.svm.SVC(kernel='rbf', C=2.0)

#training the model
svm.fit(projected_features_train, lab_train)

#saving the classifier model to file
joblib_file = "class_model.pkl"
joblib.dump(svm, joblib_file)

#restoring the classifier model from file
restored_svm = joblib.load(joblib_file)

#applying the the model on the test data (features)
lab_predict = restored_svm.predict(projected_features_test)


#print the number of misclassified samples, accuracy and complete report (using scikit learn metric tools) 
print('Number of mislabeled samples %d out of %d' % ((lab_test != lab_predict).sum(),lab_test.size))
print('Accuracy:',sklearn.metrics.accuracy_score(lab_test, lab_predict))

Number of mislabeled samples 11 out of 34
Accuracy: 0.6764705882352942
