# Student: McLaughlin, Chris

# Problem 2. Support machines.

## Notes/README
- Notebook is heavy on memory usage and MUST be run in an environment with sufficient RAM - Assignment was done in Google Collab!

## Acknowledgements/Citations

- Lab 5 notebook as a resource on SVMs in scikit-learn
- https://stackoverflow.com/questions/17071871/how-do-i-select-rows-from-a-dataframe-based-on-column-values
- https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o
- https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
- https://stackoverflow.com/questions/68300381/is-scikit-learns-support-vector-classifier-hard-margin-or-soft-margin
- https://stackoverflow.com/questions/12355434/svm-with-hard-margin-and-c-value
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html
- https://scikit-learn.org/stable/modules/cross_validation.html
- https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html
- https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
- https://scikit-learn.org/stable/modules/svm.html
- https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsOneClassifier.html

## Setup

In [None]:
# This should be roughly the content of the first code cell
import numpy as np
import random
np.random.seed(1337)
random.seed(1337)

In [None]:
# Plotting support
from matplotlib import pyplot as plt
#from plotnine import *
# Standard libraries
import pandas as pd
import sklearn as sk
from keras.datasets import mnist
import tensorflow as tf
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
from sklearn.multiclass import OneVsOneClassifier

### Data Import and Processing

In [None]:
(train_X, train_y), (test_X, test_y) = mnist.load_data() # From assignment spec

#train_X.shape
#test_X.shape

# Flatten and Normalise data while casting to float32 - Same as in Problem 1, see that questions explaination and acknowledements
train_X = train_X.reshape(60000,784) # Reshape data from into a 2D array - each 2D 28*28 img array in the training data will be converted to a 1D 784 array
train_X = train_X.astype(np.float32)
train_X = train_X/255

# Repeat for test data
test_X = test_X.reshape(10000,784)
test_X = test_X.astype(np.float32)
test_X = test_X/255

""" OTHER METHOD - THIS WAS SLOW AS MOLLASES SO I'M TRYING SOMETHING ELSE
# Select 1s and 7s
# For train
newtrain_X=np.ndarray(shape=(60000,784))
newtrain_y=np.ndarray(60000,)

for index in range(60000):
  print(index)
  if train_y[index]==1 or train_y[index]==7:
    np.append(newtrain_X,train_X[index])
    np.append(newtrain_y,train_y[index])

"""

# Merge X and y for filtering
newtrain=pd.DataFrame(train_X)
newtrain["RESULT"]=train_y
newtest=pd.DataFrame(test_X)
newtest["RESULT"]=test_y

# Filter data
datatrain=newtrain.loc[(newtrain["RESULT"]==1) | (newtrain["RESULT"]==7)]
datatest=newtest.loc[(newtest["RESULT"]==1) | (newtest["RESULT"]==7)]

#datatrain.head()
#datatest.head()

What we've done here is import the MNIST dataset and filter it into two pandas dataframes (train and test) which contain 784 data columns and one result column!

## a) Hard

We're going to train a hard-margin SVM classifier (implemented via an extremely high C hyperparameter and linear kernel) that determines whether a character is a 1 or a 7

In [None]:
hardsvm = SVC(C=1e20, kernel="linear").fit(datatrain.drop(columns=["RESULT"]),datatrain["RESULT"])
score_train = hardsvm.score(datatrain.drop(columns=["RESULT"]),datatrain["RESULT"])
score_test = hardsvm.score(datatest.drop(columns=["RESULT"]),datatest["RESULT"])
print("Training Score:", score_train, "\nTesting Score:", score_test)

Training Score: 1.0 
Testing Score: 0.9916782246879334


And we regularly get over 99% Testing accuracy! That's pretty impressive!

## b) Soft

We're going to repeat the above, but use a soft-margin SVM with varying hyperparameter C - we will select the best C by running several options and validating each with 5-fold cross-validation!

In [None]:
c_vals = [0.1,0.5,1,10,50]
for c in c_vals:
  softsvm = SVC(C=c, kernel="linear") #scores_crossval automatically takes care of fitting :D
  scores_crossval = cross_val_score(softsvm, datatrain.drop(columns=["RESULT"]), datatrain["RESULT"], cv=5) # see cross_val_score documentation
  print("C hyperparameter:", c, "\tCross-Validated Accuracy Scores:", scores_crossval, "\tAverage Score with this C:", sum(scores_crossval)/len(scores_crossval))

C hyperparameter: 0.1 	Cross-Validated Accuracy Scores: [0.99654112 0.99692544 0.99653979 0.99307958 0.99615532] 	Average Score with this C: 0.9958482532438154
C hyperparameter: 0.5 	Cross-Validated Accuracy Scores: [0.99654112 0.99654112 0.99538639 0.99384852 0.99500192] 	Average Score with this C: 0.9954638152830121
C hyperparameter: 1 	Cross-Validated Accuracy Scores: [0.99654112 0.99538816 0.99577086 0.99346405 0.99538639] 	Average Score with this C: 0.99531011693309
C hyperparameter: 10 	Cross-Validated Accuracy Scores: [0.99538816 0.99385088 0.99423299 0.99384852 0.99423299] 	Average Score with this C: 0.9943107082624463
C hyperparameter: 50 	Cross-Validated Accuracy Scores: [0.99538816 0.99385088 0.99423299 0.99384852 0.99423299] 	Average Score with this C: 0.9943107082624463


There's surprisingly little change with different values of C, but we can see that in general the lower C value of 0.1 produces more accurate results.

## c) Kernel

Now we're going to try a few different kernels with our best performing SVM from above (C=0.1)!

### i) Polynomial Kernel


First we'll try a polymomial kernel with a few different degrees

In [None]:
degrees = [2, 3, 4]
for deg in degrees:
  polysvm = SVC(C=0.1, kernel="poly", degree=deg).fit(datatrain.drop(columns=["RESULT"]),datatrain["RESULT"])
  score_train = polysvm.score(datatrain.drop(columns=["RESULT"]),datatrain["RESULT"])
  score_test = polysvm.score(datatest.drop(columns=["RESULT"]),datatest["RESULT"])
  print("Degree:", deg, "\tTraining Accuracy:", score_train, "\tTesting Accuracy:", score_test)

Degree: 2 	Training Accuracy: 0.9943107557469055 	Testing Accuracy: 0.9902912621359223
Degree: 3 	Training Accuracy: 0.9931575305604674 	Testing Accuracy: 0.9893666204345816
Degree: 4 	Training Accuracy: 0.9896978550011533 	Testing Accuracy: 0.9833564493758669


We can see that a second degree polynomial kernel typically produces the best testing results.

### ii) Gaussian Kernel

Now we'll do a Gaussian kernel and try a few different gamma hyperparameters

Note: Be warned, this takes a hot minute to execute!

In [None]:
gammas = [0.1, 0.5, 1]
for gamma in gammas:
  gausvm = SVC(C=0.1, kernel="rbf", gamma=gamma).fit(datatrain.drop(columns=["RESULT"]),datatrain["RESULT"])
  score_train = gausvm.score(datatrain.drop(columns=["RESULT"]),datatrain["RESULT"])
  score_test = gausvm.score(datatest.drop(columns=["RESULT"]),datatest["RESULT"])
  print("Gamma:", gamma, "\tTraining Accuracy:", score_train, "\tTesting Accuracy:", score_test)

Gamma: 0.1 	Training Accuracy: 0.9830860305989083 	Testing Accuracy: 0.9852057327785483
Gamma: 0.5 	Training Accuracy: 0.7950334435304067 	Testing Accuracy: 0.7651410078594545
Gamma: 1 	Training Accuracy: 0.5183362804643653 	Testing Accuracy: 0.5247341655108645


And we see setting the gamma hyperparameter to 0.1 produces by *far* the best results!

### iii) Linear Kernel

We already did a linear kernel earlier when we did our cross-validating soft svm, but we'll repeat the process with a standard svm here!

In [None]:
linsvm = SVC(C=0.1, kernel="linear").fit(datatrain.drop(columns=["RESULT"]),datatrain["RESULT"])
score_train = linsvm.score(datatrain.drop(columns=["RESULT"]),datatrain["RESULT"])
score_test = linsvm.score(datatest.drop(columns=["RESULT"]),datatest["RESULT"])
print("Training Accuracy:", score_train, "Testing Accuracy:", score_test)

Training Accuracy: 0.9977704313062197 Testing Accuracy: 0.9944521497919556


And we get a *very* accurate model!

## d) AVA

All-vs-all classification consists of creating $ n(n-1) $ pairwise (also called one-vs-one) classifiers for each class in the data. When run, the character that is most predicted by all one-vs-one classifiers is the one that is predicted by the overall all-vs-all aggregation.

Because each pairwise classifier is a binary classifier, just like the previous SVMs that determine whether a character is a 1 or a 7, we actually only need half the total number of classifiers, or $ n(n-1)/2 $. For ten possible characters (0 through 9), this corresponds to 45 pairwise classifiers as part of our all-vs-all classifer. Scikit-learn provides an aggregator precisely for this (confusingly enough called one-vs-one, though the algorithm it employs is in fact all-vs-all, and it is merely made up of one-vs-one classfiers)

NOTE: Warning! This one also takes a long time to run!

In [None]:
# We'll be using newtrain and newtest here since we want the full dataset!
# We'll use a soft-margin linear svm with a c hyperparameter of 0.1 since this has given us the best test accuracy so far

ava = OneVsOneClassifier(
    SVC(C=0.1, kernel="linear")
).fit(newtrain.drop(columns=["RESULT"]), newtrain["RESULT"])
avascore_train = ava.score(newtrain.drop(columns=["RESULT"]), newtrain["RESULT"])
avascore_test = ava.score(newtest.drop(columns=["RESULT"]), newtest["RESULT"])
print("Training Accuracy:", avascore_train, "\tTesting Accuracy:", avascore_test)

Training Accuracy: 0.9579333333333333 	Testing Accuracy: 0.947


And we typically get around 95% testing accuracy, which, when one remembers we're working with the whole dataset and all ten possible digits here, is quite impressive!