# Support Vector Machine exercise

*Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission.*

In this exercise you will:
    
- implement the fully-vectorized **loss function** for the linear SVM
- use a validation set to **tune the learning rate and regularization** strength
- **optimize** the loss function with **SGD**


## SVM Classifier

Your code for this section will be written inside **svm/classifiers/**. 

Please implement the fully-vectorized **loss function** for the linear SVM.

In [1]:
from svm.classifiers.linear_svm import linear_svm_loss_vectorized
import time
import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn import datasets
from sklearn.model_selection import train_test_split
import numpy as np

import matplotlib.pyplot as plt

diabetes = pd.read_csv('svm/datasets/diabetes.csv')
x_train, x_test, y_train, y_test = train_test_split(np.array(diabetes.loc[:, diabetes.columns != 'Outcome']), np.array(diabetes['Outcome']), stratify=np.array(diabetes['Outcome']), random_state=66)
print(x_train.shape, x_test.shape)
# Split the data into train, val, and test sets. In addition we will
# create a small development set as a subset of the training data;
# we can use this for development so our code runs faster.
num_training = 480
num_validation = 96 
num_test = 160
num_dev = 32

# Change label set from {0, 1} to {-1, 1}
y_train = 2 * y_train - 1
y_test = 2 * y_test - 1

# Our validation set will be num_validation points from the original
# training set.
mask = range(num_training, num_training + num_validation)
X_val = x_train[mask]
y_val = y_train[mask]

# Our training set will be the first num_train points from the original
# training set.
mask = range(num_training)
X_train = x_train[mask]
y_train = y_train[mask]

# We will also make a development set, which is a small subset of
# the training set.
mask = np.random.choice(num_training, num_dev, replace=False)
X_dev = x_train[mask]
y_dev = y_train[mask]

# We use the first num_test points of the original test set as our
# test set.
mask = range(num_test)
X_test = x_test[mask]
y_test = y_test[mask]


 
# generate a random SVM weight matrix of small numbers
W = np.random.randn(9, ) * 0.0001 
#### Evaluate the implementation of the linear SVM loss we provided for you:
loss, grad = linear_svm_loss_vectorized(W, X_dev, y_dev, 0.00001)

#print scores.shape
print('loss: %f' % (loss, ))


(576, 8) (192, 8)
loss: 1.002500


The `grad` returned from the function above is right now all zero. Derive and implement the gradient for the SVM cost function and implement it inline inside the function `linear_svm_loss_vectorized`. You will find it helpful to interleave your new code inside the existing function.


### Stochastic Gradient Descent

We now have vectorized and efficient expressions for the loss, the gradient. We are therefore ready to do SGD to minimize the loss of the SVM.

In [2]:
# In the file svm_classifier.py, implement SGD in the function
# SVMClassifier.train() and then run it with the code below.
from svm.classifiers import LinearSVM
svm = LinearSVM()
tic = time.time()
loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=5e4,
                      num_iters=1500, verbose=True)

toc = time.time()
print('That took %fs' % (toc - tic))
# print('loss history: ')
# print(loss_hist)

That took 6.047517s


In [3]:
# Write the SVMClassifier.predict function and evaluate the performance on both the training and validation set
y_train_pred = svm.predict(X_train)
print('training accuracy: %f' % (np.mean(y_train == y_train_pred), ))
y_val_pred = svm.predict(X_val)
print('validation accuracy: %f' % (np.mean(y_val == y_val_pred), ))

training accuracy: 0.654167
validation accuracy: 0.635417


In [9]:
# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths.

best_val = -1   # The highest validation accuracy that we have seen so far.
best_svm = None # The SVM object that achieved the highest validation rate.

################################################################################
# TODO:                                                                        #
# Write code that chooses the best hyperparameters by tuning on the validation #
# set. For each combination of hyperparameters, train an SVM on the            #
# training set, compute its accuracy on the training and validation sets.      #
# In addition, store the best                                                  #
# validation accuracy in best_val and the SVM object that achieves this        #
# accuracy in best_svm.                                                        #
#                                                                              #
# Hint: You should use a small value for num_iters as you develop your         #
# validation code so that the SVMs don't take much time to train; once you are #
# confident that your validation code works, you should rerun the validation   #
# code with a larger value for num_iters.                                      #
################################################################################

# Grid Search
reg_list = [1e-3, 1e-2, 1e-1, 1, 10, 100, 1000]
rate_list = [1e-6, 1e-5, 1e-4, 1e-3, 1e-2]

trn_accuracy_val = -1 # highest validation accuracy for training set

best_trn_reg = None
best_trn_rate = None
best_val_reg = None
best_val_rate = None

for reg in reg_list:
    for rate in rate_list:
        svm = LinearSVM()
        loss_hist = svm.train(X_train, y_train, learning_rate=rate, reg=reg,
                      num_iters=500, verbose=True)

        y_train_pred = svm.predict(X_train)
        val = np.mean(y_train == y_train_pred)
        if val > trn_accuracy_val:
            trn_accuracy_val = val
            best_trn_reg = reg
            best_trn_rate = rate
        
        y_val_pred = svm.predict(X_val)
        val = np.mean(y_val == y_val_pred)
        if val > best_val:
            best_val = val
            best_val_reg = reg
            best_val_rate = rate

# After a coarse-grained grid search, the highest accuracy occurs normally when reg=10 & learning rate=1e-4. Let's take a finer search on regularization strength.

reg_step = best_val_reg / 10

for i in range(5, 15):
    svm = LinearSVM()
    reg = i * reg_step
    loss_hist = svm.train(X_train, y_train, learning_rate=best_val_rate, reg=reg,
                      num_iters=500, verbose=True)
    y_val_pred = svm.predict(X_val)
    val = np.mean(y_val == y_val_pred)

    #print('reg: %f; val: %f' % (reg, val))
    if val > best_val:
        best_val = val
        best_val_reg = reg
        best_svm = svm



################################################################################
#                              END OF YOUR CODE                                #
################################################################################
    
# Print out results.
# Your code
    
print('best validation accuracy achieved during cross-validation: %f' % best_val)
print('regularization strength when best accuracy occurs: %f' % best_val_reg)
print('learning rate when best accuracy occurs: %f' % best_val_rate)

reg: 5.000000; val: 0.697917
reg: 6.000000; val: 0.697917
reg: 7.000000; val: 0.666667
reg: 8.000000; val: 0.697917
reg: 9.000000; val: 0.677083
reg: 10.000000; val: 0.697917
reg: 11.000000; val: 0.708333
reg: 12.000000; val: 0.687500
reg: 13.000000; val: 0.677083
reg: 14.000000; val: 0.687500
best validation accuracy achieved during cross-validation: 0.708333
regularization strength when best accuracy occurs: 10.000000
learning rate when best accuracy occurs: 0.000100


In [21]:
# Evaluate the best linear SVM on test set
# Your code

clf = LinearSVM()
loss_hist = clf.train(X_train, y_train, learning_rate=best_val_rate, reg=best_val_reg,
                      num_iters=5000, verbose=True)
y_pred = clf.predict(X_test)
val = np.mean(y_test == y_pred)

print('validation accuracy achieved on test dataset: %f' % val)

validation accuracy achieved on test dataset: 0.693750
