# Supervised Learning Algorithms: Kernelized Support Vector Machines

*In this template, only **data input** and **input/target variables** need to be specified (see "Data Input & Variables" section for further instructions). None of the other sections needs to be adjusted. As a data input example, .csv file from IBM Box web repository is used.*

## 1. Libraries

*Run to import the required libraries.*

In [1]:
%matplotlib notebook
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

## 2. Data Input and Variables

*Define the data input as well as the input (X) and target (y) variables and run the code. Do not change the data & variable names **['df', 'X', 'y']** as they are used in further sections.*

In [2]:
### Data Input
# df = 

### Defining Variables  
# X = 
# y = 

### Data Input Example 
df = pd.read_csv('https://ibm.box.com/shared/static/q6iiqb1pd7wo8r3q28jvgsrprzezjqk3.csv')

X = df[['horsepower']]
y = df['price']

## 3. The Model

*Run to build the SVM with both default radial basis function (RBF) and polynomial kernel.*

In [5]:
from sklearn.svm import SVC

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)



# The default SVC kernel is radial basis function (RBF)
clf = SVC().fit(X_train, y_train)

print('Accuracy of RBF-kernel SVC on training set: {:.2f}'
     .format(clf.score(X_train, y_train)))
print('Accuracy of RBF-kernel SVC on test set: {:.2f}'
     .format(clf.score(X_test, y_test)))

### THIS MIGHT TAKE A WHILE
# # Compare decision boundries with polynomial kernel, degree = 3
# clf = SVC(kernel = 'poly', degree = 3).fit(X_train, y_train)

# print('Accuracy of poly-kernel SVC on training set: {:.2f}'
#      .format(clf.score(X_train, y_train)))
# print('Accuracy of poly-kernel SVC on test set: {:.2f}'
#      .format(clf.score(X_test, y_test)))

Accuracy of RBF-kernel SVC on training set: 0.37
Accuracy of RBF-kernel SVC on test set: 0.00


### 3.1. Support Vector Machine with RBF kernel: gamma parameter

In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)

for this_gamma in [0.00001, 100]:
    clf = SVC(kernel = 'rbf', gamma=this_gamma).fit(X_train, y_train)
    print('SVM (RBF) with gamma = {}'.format(this_gamma))
    print('Accuracy of SVM (RBF) classifier on training set: {:.2f}'
         .format(clf.score(X_train, y_train)))
    print('Accuracy of SVM (RBF) classifier on test set: {:.2f}\n'
         .format(clf.score(X_test, y_test)))

SVM (RBF) with gamma = 1e-05
Accuracy of SVM (RBF) classifier on training set: 0.09
Accuracy of SVM (RBF) classifier on test set: 0.00

SVM (RBF) with gamma = 100
Accuracy of SVM (RBF) classifier on training set: 0.37
Accuracy of SVM (RBF) classifier on test set: 0.00



### 3.2. Support Vector Machine with RBF kernel: using both C and gamma parameter

In [10]:
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)

for this_gamma in [0.01, 1, 5]:
    
    for this_C in [0.1, 1, 15, 250]:
        title = 'gamma = {:.2f}, C = {:.2f}'.format(this_gamma, this_C)
        clf = SVC(kernel = 'rbf', gamma = this_gamma, C = this_C).fit(X_train, y_train)
        print('SVM (RBF) with gamma = {} and C = {}'.format(this_gamma, this_C))
        print('Accuracy of SVM (RBF) classifier on training set: {:.2f}'
             .format(clf.score(X_train, y_train)))
        print('Accuracy of SVM (RBF) classifier on test set: {:.2f}\n'
             .format(clf.score(X_test, y_test)))

SVM (RBF) with gamma = 0.01 and C = 0.1
Accuracy of SVM (RBF) classifier on training set: 0.09
Accuracy of SVM (RBF) classifier on test set: 0.00

SVM (RBF) with gamma = 0.01 and C = 1
Accuracy of SVM (RBF) classifier on training set: 0.15
Accuracy of SVM (RBF) classifier on test set: 0.00

SVM (RBF) with gamma = 0.01 and C = 15
Accuracy of SVM (RBF) classifier on training set: 0.33
Accuracy of SVM (RBF) classifier on test set: 0.00

SVM (RBF) with gamma = 0.01 and C = 250
Accuracy of SVM (RBF) classifier on training set: 0.37
Accuracy of SVM (RBF) classifier on test set: 0.00

SVM (RBF) with gamma = 1 and C = 0.1
Accuracy of SVM (RBF) classifier on training set: 0.11
Accuracy of SVM (RBF) classifier on test set: 0.00

SVM (RBF) with gamma = 1 and C = 1
Accuracy of SVM (RBF) classifier on training set: 0.37
Accuracy of SVM (RBF) classifier on test set: 0.00

SVM (RBF) with gamma = 1 and C = 15
Accuracy of SVM (RBF) classifier on training set: 0.37
Accuracy of SVM (RBF) classifier on te

### 3.3. SVMs with normalized data (feature preprocessing using minmax scaling)

In [12]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

clf = SVC(C=10).fit(X_train_scaled, y_train)
print('Cars dataset (normalized with MinMax scaling)')
print('RBF-kernel SVC (with MinMax scaling) training set accuracy: {:.2f}'
     .format(clf.score(X_train_scaled, y_train)))
print('RBF-kernel SVC (with MinMax scaling) test set accuracy: {:.2f}'
     .format(clf.score(X_test_scaled, y_test)))

Cars dataset (normalized with MinMax scaling)
RBF-kernel SVC (with MinMax scaling) training set accuracy: 0.09
RBF-kernel SVC (with MinMax scaling) test set accuracy: 0.00
