## SVMs for MNIST

In this notebook, we'll be covering SVMs for multiclass prediction on the MNIST dataset.

In [None]:
%matplotlib inline
import numpy as np
import string
import random
from sklearn import svm

In [None]:
## Load the training set
train_data = np.load('MNIST/full_train_data.npy')
train_labels = np.load('MNIST/full_train_labels.npy')

## Load the testing set
test_data = np.load('MNIST/full_test_data.npy')
test_labels = np.load('MNIST/full_test_labels.npy')

## Print out their dimensions
print("Training dataset dimensions: ", np.shape(train_data))
print("Number of training labels: ", len(train_labels))
print("Testing dataset dimensions: ", np.shape(test_data))
print("Number of testing labels: ", len(test_labels))

## Multiclass Support Vector Machines

In the multiclass setting of SVMs, we are given a set of examples $(x_1, y_1), \ldots, (x_n, y_n)$. This time, however, each label $y_i$ can take values in the set $\{ 1, 2, \ldots, k \}$. Our goal is to find a set of weight vectors $w_1, \ldots, w_k \in \mathbb{R}^d$ that solves the following optimization problem:

$$ \min_{w_1, \ldots, w_k \in \mathbb{R}^d} \sum_{i=1}^k \| w_i \|^2 + c \sum_{i=1}^n \xi_i $$
$$ \text{such that for each } x_i \text{ with label } y_i = s \text{ we have } \langle w_s, x_i \rangle - \langle w_t, x_i \rangle \geq 1 - \xi_i \text{ for all } t \neq s$$

Scikit-learn provides functionality for solving this optimization problem.

In [None]:
## Fit Linear SVM over several values of c
c_vals = [0.01, 0.1, 1.0, 10.0, 100.0]
for c in c_vals:
    clf = svm.LinearSVC(C=c, loss='hinge')
    clf.fit(train_data,train_labels)

    ## Get predictions on test data
    train_err = 1.0-clf.score(train_data, train_labels)
    test_err = 1.0-clf.score(test_data, test_labels)
    
    print('C=%0.2f: train error %0.3f test error %0.3f' % (c, train_err, test_err))

Unfortunately, the test error is quite high. This indicates to us that perhaps a linear classifier is not the best for this task. Indeed, we can see that the SVM gets high error on the _training set_.

## Kernels



In [None]:
## Use a quadratic (degree 2 polynomial) kernel.
clf = svm.SVC(C=1.0, kernel='poly', degree=2)
clf.fit(train_data, train_labels)

## Compute errors
train_err = 1.0-clf.score(train_data, train_labels)
test_err = 1.0-clf.score(test_data, test_labels)
print('Training error %0.3f test error %0.3f' % (train_err, test_err))
print('Number of support vectors: %d' % (len(clf.support_)))