![logo](https://user-images.githubusercontent.com/59526258/124226124-27125b80-db3b-11eb-8ba1-488d88018ebb.png)

> **Copyright (c) 2021 CertifAI Sdn. Bhd.**<br>
 <br>
This program is part of OSRFramework. You can redistribute it and/or modify
<br>it under the terms of the GNU Affero General Public License as published by
<br>the Free Software Foundation, either version 3 of the License, or
<br>(at your option) any later version.
<br>
<br>This program is distributed in the hope that it will be useful,
<br>but WITHOUT ANY WARRANTY; without even the implied warranty of
<br>MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
<br>GNU Affero General Public License for more details.
<br>
<br>You should have received a copy of the GNU Affero General Public License
<br>along with this program.  If not, see <http://www.gnu.org/licenses/>.
<br>

# Introduction to Support Vector Machines (SVM)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn import model_selection
from sklearn import metrics
from sklearn.metrics import confusion_matrix

#TODO : Import Support Vector Classifier 
from sklearn.svm import 

%matplotlib inline

Create a dummy dataset

In [None]:
X, y = datasets.make_classification(n_samples=100, n_features=2,
                                    n_redundant=0, n_classes=2,
                                    random_state=123)

Display the dimension of the data

In [None]:
X.shape, y.shape

Visualizing the data

In [None]:
plt.figure(figsize=(10, 6))
plt.scatter(X[:,0], X[:,1], c=y, cmap=plt.cm.coolwarm, s=100)
plt.xlabel('X values')
plt.ylabel('Y values')

Split the dataset into training set and test set.

In [None]:
#TODO : Split the data into training set and test set
, , ,  = model_selection.train_test_split(X, y, test_size=0.3, 
                                          random_state=123)

Create a Linear SVM classifier.

In [None]:
# TODO: Build the classifier
params = {'kernel':''}
classifier = (**params, gamma='auto')

Train the classifier model.

In [None]:
#TODO: Fit the model with the data
classifier.(X_train,y_train)

Get the predictions.

In [None]:
#TODO: Predict the Test set
predictions = classifier.(X_test)

Get the confusion matrix.

In [None]:
#TODO: Get the confusion matrix
print((y_test,predictions))

Get the accuracy score of predicted results versus ground truth labels.

In [None]:
print(metrics.accuracy_score(y_test, predictions))

In [None]:
def plot_decision_boundary(classifier, X_test, y_test):
    
    # create a mesh to plot in
    
    h = 0.02  # step size in mesh
    x_min, x_max = X_test[:, 0].min() - 1, X_test[:, 0].max() + 1
    y_min, y_max = X_test[:, 1].min() - 1, X_test[:, 1].max() + 1
    
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    
    X_hypo = np.c_[xx.ravel().astype(np.float32),
                   yy.ravel().astype(np.float32)]
    zz = classifier.predict(X_hypo)
    zz = zz.reshape(xx.shape)
    
    plt.contourf(xx, yy, zz, cmap=plt.cm.coolwarm, alpha=0.9)
    plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=plt.cm.coolwarm, s=200)

Visualizing the decision boundary

In [None]:
plt.figure(figsize=(10, 6))
plot_decision_boundary(classifier, X_test, y_test)

# Build a non-linear classifier using SVM

Create a SVM classifier that uses **RBF** kernel.

In [None]:
#TODO : Fill in the params for RBF Kernel
params = {'':''}
classifier = SVC(**params, gamma='auto')
classifier.fit(X_train, y_train)
predictions = classifier.predict(X_test)

print(confusion_matrix(y_test, predictions))
print(metrics.accuracy_score(y_test, predictions))

The accuracy increased to 95% when we use a non-linear classifier!

Since the data has only 2 features, it is easy to plot for visualization.

You can visualize the effect of a non-linear SVM classifier on our test dataset.

In [None]:
plt.figure(figsize=(10, 6))
plot_decision_boundary(classifier, X_test, y_test)

We can test out other non-linear SVM variants available by changing the SVM kernel and visualize the effect on our test dataset.

Create a SVM classifier that uses **polynomial** kernel.

In [None]:
#TODO : Fill in the params for Polynomial Kernel
params = {'':'','degree':3}
classifier = SVC(**params, gamma='auto')
classifier.fit(X_train,y_train)
predictions = classifier.predict(X_test)

print(confusion_matrix(y_test,predictions))
print(metrics.accuracy_score(y_test, predictions))

Visualize the decision boundary of the **polynomial** kernel

In [None]:
plt.figure(figsize=(10, 6))
plot_decision_boundary(classifier, X_test, y_test)

Create a SVM classifier that uses **sigmoid** kernel.

In [None]:
params = {'kernel':'sigmoid'}
classifier = SVC(**params, gamma='auto')
classifier.fit(X_train,y_train)
predictions = classifier.predict(X_test)

print(confusion_matrix(y_test,predictions))
print(metrics.accuracy_score(y_test, predictions))

Visualize the decision boundary of the **sigmoid** kernel

In [None]:
plt.figure(figsize=(10, 6))
plot_decision_boundary(classifier, X_test, y_test)

# Classifying IRIS dataset by using Support Vector Machine

### Load Data
Here we will load the IRIS dataset from *scikit-learn*. We will be utilizing `iris.data` and `iris.target` as usual for our features and values.

In [None]:
iris = datasets.load_iris()

As usual `dir(iris)` shows the attributes of the iris datasets.<br> 
- `iris.data.shape` shows the shape of the data.<br>
- `iris.target_names` shows the classes that we want to classify.<br>
- `iris.feature_names` shows the name of features that we are training.

In [None]:
dir(iris)

In [None]:
iris.data.shape

In [None]:
iris.target_names

In [None]:
iris.feature_names

The respective integer values assigned for the output(y) for training

In [None]:
np.unique(iris.target)

In [None]:
print(iris.target)

In [None]:
data = iris.data.astype(np.float32)
target = iris.target.astype(np.float32)

In [None]:
print(len(data))
print(len(target))

Splitting the data into training and test set

In [None]:
X_train, X_test, y_train, y_test = model_selection.train_test_split(data, 
                                                                    target,
                                                                    test_size=0.3, 
                                                                    random_state=123)

In [None]:
X_train.shape, y_train.shape

In [None]:
X_test.shape, y_test.shape

Declaring the parameters for the SVM

In [None]:
params = {'kernel':'linear'}
classifier = SVC(**params, gamma='auto')

Train the classifier.

In [None]:
#TODO : Fill in the needed arguments for training
classifier.fit(,)

Get the predictions.

In [None]:
# TODO: Fill in the arguments for classifier.predict
predictions = classifier.predict()

Visualizing the confusion matrix and accuracy

In [None]:
print(confusion_matrix(y_test, predictions))

In [None]:
print(metrics.accuracy_score(y_test, predictions))

Use SVM with **non-linear** kernels to perform classification.

Check with Polynomial, RBF then Sigmoid kernels.

In [None]:
# TODO: Fill the params for each respective kernels

poly_params = {'':'', 'degree':3}
rbf_params = {'':''}
sigmoid_params = {'':''}

params_list = [poly_params,rbf_params,sigmoid_params]

Use for loop to train models with different kernels

In [None]:
for params in params_list:
    
    # TODO: Complete the code for training
    classifier = (**params, gamma='auto')
    classifier.(X_train, y_train)
    predictions = classifier.predict(X_test)

    print("Kernel: " + params['kernel'])
    print("Confusion Matrix:")
    print(confusion_matrix(y_test, predictions))
    print("Accuracy:")
    print(metrics.accuracy_score(y_test, predictions))
    print("")

As you can see the rbf and polynomial kernels, slightly improve the accuracy score because the IRIS dataset that we use is almost linearly separable. Sigmoid kernel does not perform well for this dataset.