## SVM Problem

This exercise explores optimization of an SVM applied to a classification problem, the MINST digits.

In [28]:
# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

# Common imports
import numpy as np
import os

In [29]:
%%html
<style type='text/css'>.CodeMirror{
font-size: 18px;
</style>

In [30]:
from sklearn.svm import LinearSVC
from sklearn.svm import SVC

### Fetch the data
The following code gets all of the data.  Set the train portion to the first 20000 and the test portion to the next 10000.

In [31]:
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1, cache=True)

X = mnist["data"]
y = mnist["target"].astype(np.uint8)

X_train = X[:20000]
y_train = y[:20000]
X_test = X[20000:30000]
y_test = y[20000:30000]

  warn(


### A first attempt

Use LinearSVC (Linear Support Vector Classification.) to fit the data.  This is similar to SVC(kernel="linear").  It tends to work better with larger data sets and uses a different solver library.

In [32]:
lin_clf = LinearSVC(random_state=1)
lin_clf.fit(X_train, y_train)



Measure the accuracy of a prediction on your training data.  Your not done tweaking the model, so don't yet evaluate your testing data.

In [33]:
from sklearn.metrics import accuracy_score

y_pred = lin_clf.predict(X_train)
accuracy_score(y_train, y_pred)


0.9125

### Scale your data using StandardScaler.

Be sure to scale both the train and test data.  Then rerun and evaluate the same linear model as above.  Keep random_state the same!

In [34]:
from sklearn.preprocessing import StandardScaler
#TODO
scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [35]:
lin_clf = LinearSVC(random_state=1)
lin_clf.fit(X_train_scaled, y_train)



In [36]:
y_pred = lin_clf.predict(X_train_scaled)
accuracy_score(y_train, y_pred)

0.945

### Did scaling matter much?

### Try fitting a non-linear SVM

Using only the first 1000 items in your scaled training data, try a generic SVM (Scikit's SVC).  You can use all the defaults here.  Check your accuracy using *all* of the scaled training data.

In [37]:
svm_clf = SVC(gamma="scale")
svm_clf.fit(X_train_scaled[:1000], y_train[:1000])

In [38]:
y_pred = svm_clf.predict(X_train_scaled)
accuracy_score(y_train, y_pred)

0.8673

### Results?

How did well did you do?

You probably want to do better.  The right approach is to search in the parameter space of
* C, the regularizer.  With larger values it favors harder classification.
* gamma, a kernel coefficient for the kernel type chosen
* the kernel: rbf, polynomial, linear, sigmoid

To set up a grid search examine the GridSearchCV documentation.   You will want to start with a grid with widely spaced values (3-4 of them).  When you find an optimal value, refine the grid and search again.

Continue training on only 1000 data points, just to speed things up.

In [70]:
from sklearn.model_selection import GridSearchCV

param_grid = [ {'C': [4, 5, 6],
    'gamma': [0.00049, 0.0005, 0.00051],
    'kernel': ['rbf', 'poly', 'linear', 'sigmoid'] }]
grid_cv = GridSearchCV(estimator=svm_clf,param_grid=param_grid,verbose=2)
grid_cv.fit(X_train_scaled[:1000], y_train[:1000])

Fitting 5 folds for each of 36 candidates, totalling 180 fits
[CV] END .....................C=4, gamma=0.00049, kernel=rbf; total time=   0.2s
[CV] END .....................C=4, gamma=0.00049, kernel=rbf; total time=   0.2s
[CV] END .....................C=4, gamma=0.00049, kernel=rbf; total time=   0.2s
[CV] END .....................C=4, gamma=0.00049, kernel=rbf; total time=   0.2s
[CV] END .....................C=4, gamma=0.00049, kernel=rbf; total time=   0.2s
[CV] END ....................C=4, gamma=0.00049, kernel=poly; total time=   0.3s
[CV] END ....................C=4, gamma=0.00049, kernel=poly; total time=   0.3s
[CV] END ....................C=4, gamma=0.00049, kernel=poly; total time=   0.3s
[CV] END ....................C=4, gamma=0.00049, kernel=poly; total time=   0.3s
[CV] END ....................C=4, gamma=0.00049, kernel=poly; total time=   0.4s
[CV] END ..................C=4, gamma=0.00049, kernel=linear; total time=   0.2s
[CV] END ..................C=4, gamma=0.00049, 

### Results of grid search

You can examine the best values, and best score.

In [68]:
grid_cv.best_estimator_

In [69]:
grid_cv.best_score_

0.8779999999999999

### Final results

Using the best estimator from your series of grid searches, train on all the scaled data.

Now test your final model on the testing data.

In [65]:
grid_cv.best_estimator_.fit(X_train_scaled, y_train)

In [66]:
y_pred = grid_cv.best_estimator_.predict(X_train_scaled)
accuracy_score(y_train, y_pred)

0.94455