# MNIST handwritten digits classification with parameter grid search for SVM

In this notebook, we'll use [grid search](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html) and a validation set to find optimal values for our SVM model's hyperparameters.

First, the needed imports. 

In [None]:
%matplotlib inline

from pml_utils import get_mnist

import numpy as np
from sklearn import svm, datasets, __version__
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV, PredefinedSplit

import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

# Suppress annoying warnings...
import warnings
from sklearn.exceptions import ConvergenceWarning
warnings.filterwarnings("ignore", category=ConvergenceWarning)

from packaging.version import Version
assert(Version(__version__) >= Version("0.20")), "Version >= 0.20 of sklearn is required."

Then we load the MNIST data. First time it downloads the data, which can take a while.

In [None]:
X_train, y_train, X_test, y_test = get_mnist('MNIST')

print('MNIST data loaded: train:',len(X_train),'test:',len(X_test))
print('X_train:', X_train.shape)
print('y_train:', y_train.shape)
print('X_test', X_test.shape)
print('y_test', y_test.shape)

## Linear SVM

Let's start with the linear SVM trained with a subset of training data.  `C` is the penalty parameter that we need to specify.  Let's first try with just some guess, e.g., `C=1.0`.

In [None]:
%%time

clf_lsvm = svm.LinearSVC(C=1.0)

print(clf_lsvm.fit(X_train[:10000,:], y_train[:10000]))

pred_lsvm = clf_lsvm.predict(X_test)
print('Predicted', len(pred_lsvm), 'digits with accuracy:', accuracy_score(y_test, pred_lsvm))

Next, let's try grid search, i.e., we try several different values for the parameter `C`.  Remember that it's important to *not* use the test set for evaluating hyperparameters.  Instead we opt to set aside the last 1000 images as a validation set.


In [None]:
%%time

# The values for C that we will try out
param_grid = {'C': [1, 10, 100, 1000]}

# Define the validation set
valid_split = PredefinedSplit(9000*[-1] + 1000*[0])

clf_lsvm_grid = GridSearchCV(clf_lsvm, param_grid, cv=valid_split, verbose=2)
print(clf_lsvm_grid.fit(X_train[:10000,:], y_train[:10000]))

We can now see what was the best value for C that was selected.

In [None]:
print(clf_lsvm_grid.best_params_)

best_C = clf_lsvm_grid.best_params_['C']

Let's try predicting with out new model with optimal hyperparameters.

In [None]:
clf_lsvm2 = svm.LinearSVC(C=best_C)

print(clf_lsvm2.fit(X_train[:10000,:], y_train[:10000]))

pred_lsvm2 = clf_lsvm2.predict(X_test)
print('Predicted', len(pred_lsvm2), 'digits with accuracy:', accuracy_score(y_test, pred_lsvm2))

## Kernel SVM

The Kernel SVM typically has two hyperparameters that need to be set.  For example for a Gaussian (or RBF) kernel we also have `gamma` (Greek $\gamma$) in addition to `C`. Let's first try with some initial guesses for the values.

In [None]:
%%time

clf_ksvm = svm.SVC(decision_function_shape='ovr', kernel='rbf', C=1.0, gamma=1e-6)
print(clf_ksvm.fit(X_train[:10000,:], y_train[:10000]))

pred_ksvm = clf_ksvm.predict(X_test)
print('Predicted', len(pred_ksvm), 'digits with accuracy:', accuracy_score(y_test, pred_ksvm))

Now we can try grid search again, now with two parameters.  We use even a smaller subset of the training set it will otherwise be too slow.

In [None]:
%%time

param_grid = {'C': [1, 10, 100],
              'gamma': [1e-8, 5e-8, 1e-7, 5e-7, 1e-6]}

train_items = 3000
valid_items = 500
tot_items = train_items + valid_items

valid_split = PredefinedSplit(train_items*[-1] + valid_items*[0])

clf_ksvm_grid = GridSearchCV(clf_ksvm, param_grid, cv=valid_split, verbose=2)
print(clf_ksvm_grid.fit(X_train[:tot_items,:], y_train[:tot_items]))


Again, let's see what parameters were selected.

In [None]:
print(clf_ksvm_grid.best_params_)

best_C = clf_ksvm_grid.best_params_['C']
best_gamma = clf_ksvm_grid.best_params_['gamma']

As we did the grid search on a small subset of the training set it probably makes sense to retrain the model with the selected parameters using a bigger part of the training data.

In [None]:
clf_ksvm2 = svm.SVC(decision_function_shape='ovr', kernel='rbf', C=best_C, gamma=best_gamma)
print(clf_ksvm2.fit(X_train[:10000,:], y_train[:10000]))

pred_ksvm2 = clf_ksvm2.predict(X_test)
print('Predicted', len(pred_ksvm2), 'digits with accuracy:', accuracy_score(y_test, pred_ksvm2))