# ELE435/535 LAB 8: Kernel SVM
## Version: 2022

## Objectives:  

We will use a linear SVM, kernel SVM, and logistic regression to classify MNIST digits into c=10 classes. The best performance among these methods will lower bound what we expect to achive using a Neural Network. Hence it will provide a benchmark for the next round of methods: Multinomial Softmax Regression, a one hidden layer neural network, a Multilayer feedforward neural network, and a convolutional neural network. 
  
The SVM and logistic regression a naturally binary classifiers. But the methods can be used to perform multi-class classification using the one-versus-the-rest method. As the name suggests this trains $c$ binary classifiers, each of which distinguishes one class from the rest. The final classification is made by resolving conflicting classifications (no need to go into the details).

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import sklearn
from time import time
import datetime
%matplotlib inline

## Kernel-SVM on MNIST

**1.1)** First, import the provided subsets of the MNIST dataset:  
MNISTcwtrain1000.npy and  MNISTcwtest100.npy

Normalize the scalar data values to the range [0,1].


In [2]:
train_data = np.load('MNISTcwtrain1000.npy')
train_data = train_data.astype(dtype='float64')
test_data = np.load('MNISTcwtest100.npy')
test_data = test_data.astype(dtype='float64')

train_data = train_data/255.0
test_data = test_data/255.0
print('training data: ', train_data.shape)
print('testing data: ', test_data.shape)

training data:  (784, 10000)
testing data:  (784, 1000)


**1.2)** The SVM can be used to classify multiclass data. One-versus-the-rest is the default when you call Scikit to learn an SVM with multi-class labels.

Train a one-versus-rest SVM using a linear kernel and C=0.1.   
Report classification accuracy on the training and testing data. (Use sklearn's built-in commands for training and testing)

In [3]:
num_classes = 10
num_train_class_samples = 1000
num_train_total_samples = num_classes * num_train_class_samples
num_test_class_samples = 100
num_test_total_samples = num_classes * num_test_class_samples

train_labels = (
   np.ones((num_classes, num_train_class_samples)).T * np.arange(num_classes)
).T.reshape(-1)

test_labels = (
   np.ones((num_classes, num_test_class_samples)).T * np.arange(num_classes)
).T.reshape(-1)

print(train_labels)
# print(test_labels)


[0. 0. 0. ... 9. 9. 9.]


In [4]:
# This code is provided

from sklearn import svm
start = time()
#-----------------------------------
# Your code here

svm_classifier = svm.SVC(kernel='linear', C=0.1)
svm_classifier.fit(train_data.T, train_labels)

train_score = svm_classifier.score(train_data.T, train_labels)
test_score = svm_classifier.score(test_data.T, test_labels)

print(f"Training accuracy: {train_score}.")
print(f"Testing accuracy: {test_score}.")

#------------------------------------
# This code is provided
end = time()
print('Estimated running time:' + str(datetime.timedelta(seconds=end - start)))

Training accuracy: 0.9739.
Testing accuracy: 0.921.
Estimated running time:0:00:07.672331


**1.3)** Now, do the same using an SVM with 'rbf' (Gaussian) kernel. Search over C in the interval [0.005,0.1] and 'gamma' in the interval [0.005, 0.1] and report the best test accuracy (use sklearn's built-in commands). Hint: In order to get a feeling for selecting an appropriate value for gamma, take a look at http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html.

In [5]:
# This code is provided
start = time()
#--------------------------------------------
# Your code here

c_list = np.arange(0.005, 0.105, 0.005)
gamma_list = np.arange(0.005, 0.105, 0.005)

c_gamma_list = np.array(np.meshgrid(c_list, gamma_list)).T.reshape(-1, 2)
svc_list = [svm.SVC(kernel='rbf', C=c, gamma=gamma) for (c, gamma) in c_gamma_list]
for svc in svc_list:
    svc.fit(train_data.T, train_labels)
test_score_list = np.array([svc.score(test_data.T, test_labels) for svc in svc_list])
test_score_optimal = test_score_list.max()
c_gamma_optimal = c_gamma_list[test_score_list.argmax()]

print(f'Best testing accuracy: {test_score_optimal}.')
print(f'Optimal C: {c_gamma_optimal[0]}.')
print(f'Optimal gamma: {c_gamma_optimal[1]}.')

#---------------------------------------------
# This code is provided
end = time()
print('Estimated running time:' + str(datetime.timedelta(seconds=end - start)))

Best testing accuracy: 0.915.
Optimal C: 0.1.
Optimal gamma: 0.02.
Estimated running time:3:17:38.409192


**1.3)** Now, do the same using l2 regularized logistic regression. For multi-class data, scikit learn defaults to one-versus-the-rest classification. The regularization parameter $C$ plays the role of $1/\lambda.$ Smaller values of $C$ mean stronger regularization. Search over three or fours values in the interval [0.01, 1] to find the best testing performance.

In [6]:
# This code is provided
from sklearn.linear_model import LogisticRegression
start = time()
#--------------------------------------------
# Your code here

c_list = np.geomspace(0.01, 1, 5)
regression_list = [LogisticRegression(C=c).fit(train_data.T, train_labels) for c in c_list]
test_score_list = np.array([regression.score(test_data.T, test_labels) for regression in regression_list])
test_score_optimal = test_score_list.max()
c_optimal = c_list[test_score_list.argmax()]

print(f'Best testing accuracy: {test_score_optimal}.')
print(f'Optimal C: {c_optimal}.')

#---------------------------------------------
# This code is provided
end = time()
print('Estimated running time:' + str(datetime.timedelta(seconds=end - start)))

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

Best testing accuracy: 0.896.
Optimal C: 0.1.
Estimated running time:0:00:08.069240


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


**1.4)** With what accuracy can the best of the above clasifiers predict the classes of the 1,000 test images? This is the benchmark to beat.


ANS: $0.921$ by linear SVM with $C=0.1$.