## Problem 5.9 - SVM Applied to MNIST

**The Problem**: 

Train an SVM classifier on the MNIST dataset. Since SVM classifiers are binary classifiers, you will need to use one-versus-the-rest to classify all 10 digits. You may want to tune the hyperparameters using small validation sets to speed up the process. What accuracy can you reach?


### Importing the MNIST Dataset

We'll try to use the same import try as in Ch. 3 - Classification.

In [None]:
from sklearn.datasets import fetch_openml

# Try importing the MNIST dataset from Open ML, if not then from a github link
try:
    mnist_X, mnist_y = fetch_openml('mnist_784', version=1, return_X_y=True)
except Exception as ex:        
    from six.moves import urllib
    from scipy.io import loadmat
    import os

    mnist_path = os.path.join(".", "datasets", "mnist-original.mat")

    # download dataset from github.
    mnist_alternative_url = "https://github.com/amplab/datascience-sp14/raw/master/lab7/mldata/mnist-original.mat"
    response = urllib.request.urlopen(mnist_alternative_url)
    with open(mnist_path, "wb") as f:
        content = response.read()
        f.write(content)

    mnist_raw = loadmat(mnist_path)
    mnist = {
        "data": mnist_raw["data"].T,
        "target": mnist_raw["label"][0],
        "COL_NAMES": ["label", "data"],
        "DESCR": "mldata.org dataset: mnist-original",
    }
    print("Done!")

In [None]:
import numpy as np

# Check out the dimensions of the dataset
print("Shape of mnist_X: ", np.shape(mnist_X))
print("Shape of mnist_y: ", np.shape(mnist_y))

So, it is safe to say that our _mnist_X_ variable containst our instances (the rows) of 784 total pixels each (the columns). The _mnist_y_ variable contains the categorization of each respective instance.

### Splitting Testing and Training Data

We have to split our testing and training data. We can do this with a scikit-learn method. Or, since MNIST is already split up into testing and training sets, we can just take the first rows that are recommended for test.

In [None]:
# Split up the test and training data
mnist_X_train, mnist_X_test = mnist_X[:60000], mnist_X[60000:]
mnist_y_train, mnist_y_test = mnist_y[:60000], mnist_y[60000:]

# Make sure they are shaped correctly
print("Training set images: ", np.shape(mnist_X_train))
print("Training set categories: ", np.shape(mnist_y_train))
print("\n")
print("Test set images: ", np.shape(mnist_X_test))
print("Test set categories: ", np.shape(mnist_y_test))

This distribution will work.

### Train the SVM Classifier



In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

# Create the pipeline for the linear SVM classifier
svm_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("linear_svc", LinearSVC(C=1, loss="hinge", max_iter=5000))
])


In [None]:
from sklearn.multiclass import OneVsRestClassifier

In [None]:
# Set up the one v. rest classifier
ovr_clf = OneVsRestClassifier(svm_clf)

# Train it
ovr_clf.fit(mnist_X_train, mnist_y_train)

### Evaluate Performance with Confusion Matrix

In [None]:
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import confusion_matrix

import matplotlib.pyplot as plt

In [None]:
# Implement cross-validation
mnist_y_train_pred = cross_val_predict(ovr_clf, mnist_X_train, mnist_y_train, cv=3)

# Plot the confusion matrix
conf_mx = confusion_matrix(mnist_y_train, mnist_y_train_pred)

plt.matshow(conf_mx, cmap=plt.cm.gray)
plt.show()

### Fixing the ConvergenceWarning

Here, we enter the first error that I do not know if what I am going to do will help! We'll take the warning's advice and increase the number of iterations

In [None]:
# Create the pipeline for the linear SVM classifier
svm_clf_inc_iter = Pipeline([
    ("scaler", StandardScaler()),
    ("linear_svc", LinearSVC(C=1, loss="hinge", max_iter=10000))
])

# Set up the one v. rest classifier
ovr_clf_inc_iter = OneVsRestClassifier(svm_clf_inc_iter)

# Train it
ovr_clf_inc_iter.fit(mnist_X_train, mnist_y_train)

It's quite obvious that this is not the solution we are looking for. Instead, we can try a different solution. We will try the **polynomial kernel trick** next and see if that helps.

In [None]:
from sklearn.svm import SVC

In [None]:
# Attempting to fix the convergence warning with the polynomial kernel trick
poly_kernel_svm_clf = Pipeline([
        ("scaler", StandardScaler()),
        ("poly_svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=5))
    ])

# Set up the one v. rest classifier
ovr_clf_poly_svm = OneVsRestClassifier(poly_kernel_svm_clf)

# Implement cross-validation
mnist_y_train_pred = cross_val_predict(poly_kernel_svm_clf, mnist_X_train, mnist_y_train, cv=3)

# Plot the confusion matrix
conf_mx = confusion_matrix(mnist_y_train, mnist_y_train_pred)

print("confusion matrix: \n ", conf_mx)

plt.matshow(conf_mx, cmap=plt.cm.gray)
plt.show()

We'll explicitly avoid this warning and rerun the training.

In [None]:
# Attempting to fix the convergence warning with the polynomial kernel trick
poly_kernel_svm_clf = Pipeline([
        ("scaler", StandardScaler()),
        ("poly_svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=5, gamma='auto'))
    ])

# Set up the one v. rest classifier
ovr_clf_poly_svm = OneVsRestClassifier(poly_kernel_svm_clf)

# Implement cross-validation
mnist_y_train_pred = cross_val_predict(poly_kernel_svm_clf, mnist_X_train, mnist_y_train, cv=3)

# Plot the confusion matrix
conf_mx = confusion_matrix(mnist_y_train, mnist_y_train_pred)

print("confusion matrix: \n ", conf_mx)

plt.matshow(conf_mx, cmap=plt.cm.gray)
plt.show()

### Implementing a Grid Search for the Polynomial Kernal SVM



In [None]:
from sklearn.model_selection import GridSearchCV

In [None]:
# Attempting to fix the convergence warning with the polynomial kernel trick
poly_kernel_svm_clf = Pipeline([
        ("scaler", StandardScaler()),
        ("poly_svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=5, gamma='auto'))
    ])

In [None]:
# Set up the one v. rest classifier
ovr_clf_poly_svm = OneVsRestClassifier(poly_kernel_svm_clf)

To see which parameters we can vary across our grid search we can use the following command.

In [None]:
# Print out keys we can vary across the GridSearch
ovr_clf_poly_svm.get_params().keys()

In [None]:
# Choose which params to search across
param_grid = [
    {'estimator__poly_svm_clf__kernel': ["poly"],
     'estimator__poly_svm_clf__coef0': [1],
     'estimator__poly_svm_clf__C': [0.1, 1, 10],
     'estimator__poly_svm_clf__degree': [2, 5, 10]}
]

# Perform the grid search through the PipeLine
grid_search = GridSearchCV(ovr_clf_poly_svm, param_grid,
                           cv=3,
                           scoring='accuracy',
                           return_train_score=True)

# Fit the grid-searches to the MNIST data
grid_search.fit(mnist_X_train, mnist_y_train)

In [None]:
# Print the best parameters
grid_search.best_params_
grid_search.best_estimator_

In [None]:
cvres = grid_search.cv_results_
for mean_score, params in zip(cvres["mean_test_score"], cvres["params"]):
    print(np.sqrt(-mean_score), params)

## Problem 5.10

**The Problem**: Train an SVM Regressor on the California housing dataset.