<a href="https://colab.research.google.com/github/Jaeger47/A.I-Seminar/blob/main/Support_Vector_Machines.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Download the Dataset

In this notebook the Pima Indians dataset and Boston house dataset will be used to demonstrate the Support Vector Machine algorithm.
 
Before proceeding to the next sections,  upload the `pima-indians-diabetes.data.csv` and `housing.csv` by running the code below:

### Option 1: Upload the data from your Local File System

In [None]:
# Uploading the data from Local File System
from google.colab import files

uploaded = files.upload()

### Option 2: Mount your Google Drive

In [None]:
# Mount your google drive and copy the authentication key to allow access
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

# Locate the file in your Google Drive directory
%cd drive/My\ Drive/Colab\ Notebooks/ML\ training  

# Uncomment this if you want to list files in the directory to check if the file is there
# %ls

# Support Vector Machine as a Classification Algorithm

## Iris Dataset example

In [None]:
# SVM classification using Iris Dataset
from sklearn.svm import SVC
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
import pandas as pd

# Load the data
iris = datasets.load_iris()

# Select first 2 features / variables
X = iris.data[:, :2] 
y = iris.target
feature_names = iris.feature_names[:2]
classes = iris.target_names
print(f"Features: {feature_names}")
print(f"Classes: {classes}")

def make_meshgrid(x, y, h=.02):
    x_min, x_max = x.min() - 1, x.max() + 1
    y_min, y_max = y.min() - 1, y.max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    return xx, yy

def plot_contours(ax, clf, xx, yy, **params):
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    out = ax.contourf(xx, yy, Z, **params)
    return out

# The classification SVC model
model = SVC(kernel="linear")
clf = model.fit(X, y)
fig, ax = plt.subplots()

print(f"Support vectors: \n{clf.support_vectors_[:4,:]}") # get support vectors
print(f"Support vectors indices: \n{clf.support_}") # get indices of support vectors
print(f"No. of support vectors: \n{clf.n_support_}") # get number of support vectors for each class

# Title for the plots
title = ('Decision surface of linear SVC ')

# Set-up grid for plotting.
X0, X1 = X[:, 0], X[:, 1]       #X0 - sepal length, X1 - sepal width
xx, yy = make_meshgrid(X0, X1)
plot_contours(ax, clf, xx, yy, cmap=plt.cm.coolwarm, alpha=0.8)
ax.scatter(X0, X1, c=y, cmap=plt.cm.coolwarm, s=20, edgecolors="k")
ax.set_ylabel("{}".format(feature_names[0]))
ax.set_xlabel("{}".format(feature_names[1]))
ax.set_xticks(())
ax.set_yticks(())
ax.set_title(title)
plt.show()


## Pima Indians example

In [None]:
# SVM Classification
from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.svm import SVC

# Load the dataset
filename = 'pima-indians-diabetes.data.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = read_csv(filename, header=0, names=names)
array = dataframe.values
print(dataframe.head())

# Assign the feature columns to X
# This is the data we fit to our model
X = array[:,0:8]

# Assign the ground truth or 'class' column to Y
# This is our target variable that our model will try to predict
Y = array[:,8]

# Split the dataset into 10 folds 
kfold = KFold(n_splits=10, random_state=None)

# Fit data on SVC
model = SVC()
results = cross_val_score(model, X, Y, cv=kfold)
print(("Accuracy: %f") % results.mean())

# Support Vector Machine as a Regression Algorithm

Support Vector Machines (SVM) were developed for binary classification. The technique has been extended for the prediction of real-valued problems called Support Vector Regression (SVR). Like the classification example, SVR is built upon the LIBSVM library. You can create an SVM model for regression using the SVR class.

More info on SVR [here](http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html)


In [1]:
# Uploading the data from Local File System
from google.colab import files

uploaded = files.upload()

Saving Energy_efficiency_DataSet.csv to Energy_efficiency_DataSet.csv


In [26]:
# SVM Regression
from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.svm import SVR
import matplotlib.pyplot as plt

# Load dataset
filename = 'Energy_efficiency_DataSet.csv'
names = ['RC', 'SA', 'WA', 'RA', 'OH', 'O', 'GA', 'GAD', 'HL', 'CL']
#dataframe = read_csv(filename, delim_whitespace=True, names=names)
dataframe = read_csv('Energy_efficiency_DataSet.csv', names=names, comment='#')
array = dataframe.values
print(dataframe.head())

# Assign the feature columns to X
# This is the data we fit to our model
X = array[:,0:-2]

# Assign the ground truth or 'class' column to Y
# This is our target variable that our model will try to predict
Y1 = array[:,-2]
Y2 = array[:,-1]

# Split the dataset into 10 folds 
num_folds = 10
kfold = KFold(n_splits=10, random_state=None)

# Fit on SVR and calculate MSE
model1 = SVR()
model2 = SVR()

scoring = 'neg_mean_squared_error'
results1 = cross_val_score(model1, X, Y1, cv=kfold, scoring=scoring)
results2 = cross_val_score(model2, X, Y2, cv=kfold, scoring=scoring)
print(results1.mean())
print(results2.mean())



     RC     SA     WA      RA   OH  O   GA  GAD     HL     CL
0  0.98  514.5  294.0  110.25  7.0  2  0.0    0  15.55  21.33
1  0.98  514.5  294.0  110.25  7.0  3  0.0    0  15.55  21.33
2  0.98  514.5  294.0  110.25  7.0  4  0.0    0  15.55  21.33
3  0.98  514.5  294.0  110.25  7.0  5  0.0    0  15.55  21.33
4  0.90  563.5  318.5  122.50  7.0  2  0.0    0  20.84  28.28
-30.426694806412865
-25.915327114477158


In [24]:
print(Y1)

[15.55 15.55 15.55 15.55 20.84 21.46 20.71 19.68 19.5  19.95 19.34 18.31
 17.05 17.41 16.95 15.98 28.52 29.9  29.63 28.75 24.77 23.93 24.77 23.93
  6.07  6.05  6.01  6.04  6.37  6.4   6.37  6.4   6.85  6.79  6.77  6.81
  7.18  7.1   7.1   7.1  10.85 10.54 10.77 10.56  8.6   8.49  8.45  8.5
 24.58 24.63 24.63 24.59 29.03 29.87 29.14 28.09 26.28 26.91 26.37 25.27
 23.53 24.03 23.54 22.58 35.56 37.12 36.9  35.94 32.96 32.12 32.94 32.21
 10.36 10.43 10.36 10.39 10.71 10.8  10.7  10.75 11.11 11.13 11.09 11.16
 11.68 11.69 11.7  11.69 15.41 15.2  15.42 15.21 12.96 12.97 12.93 13.02
 24.29 24.31 24.13 24.25 28.88 29.68 28.83 27.9  26.48 27.02 26.33 25.36
 23.75 24.23 23.67 22.79 35.65 37.26 36.97 36.03 33.16 32.4  33.12 32.41
 10.42 10.46 10.32 10.45 10.64 10.72 10.55 10.68 11.45 11.46 11.32 11.49
 11.45 11.42 11.33 11.43 15.41 15.18 15.34 15.19 12.88 13.   12.97 13.04
 24.28 24.4  24.11 24.35 28.07 29.01 29.62 29.05 25.41 26.47 26.89 26.46
 22.93 23.84 24.17 23.87 35.78 35.48 36.97 36.7  32.

In [25]:
print(Y2)

[21.33 21.33 21.33 21.33 28.28 25.38 25.16 29.6  27.3  21.97 23.49 27.87
 23.77 21.46 21.16 24.93 37.73 31.27 30.93 39.44 29.79 29.68 29.79 29.4
 10.9  11.19 10.94 11.17 11.27 11.72 11.29 11.67 11.74 12.05 11.73 11.93
 12.4  12.23 12.4  12.14 16.78 16.8  16.75 16.67 12.07 12.22 12.08 12.04
 26.47 26.37 26.44 26.29 32.92 29.87 29.58 34.33 30.89 25.6  27.03 31.73
 27.31 24.91 24.61 28.51 41.68 35.28 34.43 43.33 33.87 34.07 34.14 33.67
 13.43 13.71 13.48 13.7  13.8  14.28 13.87 14.27 14.28 14.61 14.3  14.45
 13.9  13.72 13.88 13.65 19.37 19.43 19.34 19.32 14.34 14.5  14.33 14.27
 25.95 25.63 26.13 25.89 32.54 29.44 29.36 34.2  30.91 25.63 27.36 31.9
 27.38 25.02 24.8  28.79 41.07 34.62 33.87 42.86 33.91 34.07 34.17 33.78
 13.39 13.72 13.57 13.79 13.67 14.11 13.8  14.21 13.2  13.54 13.32 13.51
 14.86 14.75 15.   14.74 19.23 19.34 19.32 19.3  14.37 14.57 14.27 14.24
 25.68 26.02 25.84 26.14 34.14 32.85 30.08 29.67 31.73 31.01 25.9  27.4
 28.68 27.54 25.35 24.93 43.12 41.22 35.1  34.29 33.85