# Chapter 5 Support Vector Machine

This chapter introduces the basic idea of Support Vector Machine(SVM). This notebook contains my own solutions to the exercises of the book, which includes Ex.8, Ex.9 and Ex.10.

I've tested my code on my MBP. However, it will be a liitle bit slow to run grid search on my machine. Therefore, I directly set the hyperparameters of the model and train it with data.

## Exercise 9: SVM on MNIST

Requirement: Train a SVM on MNIST dataset. Since SVM is a two-class classifier, you need to use one-versus-rest classifier to do multiclass classification. Besides, you might need validation set to speed up the process of tuning hyperparameters. Try and see the precision score of your final model.

Just as other machine learning algorithms, we need to prepare the dataset. Again we use the ```mnist-original.mat``` file in Chapter 3 to do SVM classification.

In [None]:
import scipy.io as sio
import numpy as np
from sklearn.preprocessing import StandardScaler

mnist = sio.loadmat("./mnist/mnist-original.mat")
data, target = np.transpose(mnist["data"]), np.transpose(mnist["label"])
X_train, y_train, X_test, y_test = data[:60000], target[:60000], data[60000:], target[60000:]

scaler = StandardScaler()
scaler.fit_transform(X_train)

We create a OneVsRestClassifier to do multiclass classification. (Tuning of hyperparameters is omitted)

In [None]:
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import precision_score, roc_auc_score

split_folds = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)

for train_index, val_index in split_folds.split(X_train, y_train):
    X_train_fold, X_val_fold = X_train[train_index], X_train[val_index]
    y_train_fold, y_val_fold = y_train[train_index], y_train[val_index]
    ovr_clf = OneVsRestClassifier(SVC(kernel="rbf", C=1, gamma="scale"))
    ovr_clf.fit(X_train_fold, y_train_fold)
    y_pred = ovr_clf.predict(X_val_fold)
    print("Validation Precision Score:", precision_score(y_val_fold, y_pred))
    print("Validation AUC:", roc_auc_score(y_val_fold, y_pred))
    
final_model = OneVsRestClassifier(SVC(kernel="rbf", C=1, gamma="scale"))
final_model.fit(X_train, y_train)

Check the precision score and AUC of the model.

In [None]:
X_test = scaler.transform(X_test)
predictions = final_model.predict(X_test)
print("Test Precision score:", precision_score(y_test, predictions))
print("Test AUC:", roc_auc_score(y_test, predictions))

## Exercise 10: SVM on California housing dataset

Requirement: Create a SVM regressor on California housing dataset. Try and see the precision score of your final model.

Load California housing dataset with sklearn and preprocess it. If fetching data with sklearn, the dataset is well processed and can be used directly.

In [None]:
from sklearn.datasets import california_housing
from sklearn.preprocessing import StandardScaler

housing = california_housing.fetch_california_housing()
data, target = housing["data"], housing["target"]
X_train, X_test = data[:20000], data[20000:]
y_train, y_test = target[:20000], target[20000:]

scaler = StandardScaler()
scaler.fit_transform(X_train)

Next create a SVM regressor to do regression.

In [None]:
from sklearn.svm import SVR
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import precision_score, roc_auc_score

split_folds = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)

for train_index, val_index in split_folds.split(X_train, y_train):
    X_train_fold, X_val_fold = X_train[train_index], X_train[val_index]
    y_train_fold, y_val_fold = y_train[train_index], y_train[val_index]
    ovr_clf = SVR(kernel="rbf", C=1, gamma="scale")
    ovr_clf.fit(X_train_fold, y_train_fold)
    y_pred = ovr_clf.predict(X_val_fold)
    print("Validation Precision Score:", precision_score(y_val_fold, y_pred))
    print("Validation AUC:", roc_auc_score(y_val_fold, y_pred))
    
final_model = SVR(kernel="rbf", C=1, gamma="scale")
final_model.fit(X_train, y_train)

Finally check the performance of the model.

In [None]:
scaler.transform(X_test)
predictions = final_model.predict(X_test)
print("Test Precision Score:", precision_score(y_test, predictions))
print("Test AUC:", roc_auc_score(y_test, predictions))

Here is a script for exporting the housing dataset to csv file. 

In [None]:
import csv

with open("dataset.csv", 'w') as csvfile:
    csv_writer = csv.writer(csvfile)
    header = ["label"]

    for item in housing["feature_names"]:
        header.append(item)

    csv_writer.writerow(header)

    for i in range(len(data)):
        content = []
        content.append(target[i])
        for item in data[i]:
            content.append(item)

        csv_writer.writerow(content)