### SVM (Support Vector Machines)
1) A support vector machine (SVM) is a supervised machine learning algorithm used for Classification and Regression<br>
2) A SVM takes in takes the data points (inputs) and outputs the hyperplane (which in two dimensions is simply a line) that best separates the data points.<br>
3) The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space(N — the number of features) that distinctly classifies the data points<br>
4) It follows a technique called the <b> kernel trick to transform</b> the data and based on these transformations, it finds an optimal boundary between the possible outputs.<br>
5) <b>SVM Intuition - The main idea is to identify the optimal separating hyperplane which maximizes the margin of the training data.</b><br>
6) There can be multiple hyperplanes, but which one of them is the best separating hyperplane? It can be easily seen that line B is the one which best separates the two classes.

<img src="svm1.png" width="450" height="400">

7) There can be multiple separating hyperplanes as well. How do we find the optimal one? Intuitively, if we select a hyperplane which is close to the data points of one class, then it might not generalize well(It may not generate good results for the test data.) <b>So the aim is to choose the hyperplane which is as far as possible from the data points of each category.</b>
Therefore, maximizing the distance between the nearest points of each class and the hyperplane would result in an optimal separating hyperplane. This distance is called the margin.
The goal of SVMs is to find the optimal hyperplane because it not only classifies the existing dataset but also helps predict the class of the unseen data. And the optimal hyperplane is the one which has the biggest margin.<br>

8)	To separate the two classes of data points, there are many possible hyperplanes that could be chosen. SVMs objective is to find a plane that has the maximum margin, i.e. the maximum distance between data points of both classes. 

<img src="svm2.png">
<img src="svm3.png">

### Terminologies

1)<b> Hyperplane </b>- Hyperplanes are decision boundaries that help classify the data points. Data points falling on either side of the hyperplane can be attributed to different classes. Also, the dimension of the hyperplane depends upon the number of features. If the number of input features is 2, then the hyperplane is just a line. If the number of input features is 3, then the hyperplane becomes a two-dimensional plane. It becomes difficult to imagine when the number of features exceeds 3.<br>

2)<b> Support Vectors </b>- Support vectors are data points that are closest to the hyperplane and influence the position and orientation of the hyperplane. Using these support vectors, the model maximizes the margin of the classifier. These are the points that help us build our SVM.

3)<b> Margin </b>– It is the distance between support vectors and the hyperplane.

<img src="svm4.png">

### Kernel Trick

<img src="svm_kernel.png" height="350" width="500">
Non-linearly separabale data can be classified using SVMs

### SVM Parameters

<b>1) Gamma</b> - The gamma parameter defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’. <br>
Low gamma values emphasises on far away data points from the hyperplane<br>
High gamma values emphasises on data points  close to the hyperplane<br>
Used with rbf kernel. We take values like 0.001,0.01,0.1,1,10,100,1000 and on on.


<b>2)  C </b> The C parameter tells the SVM optimization how much you want to avoid misclassifying each training example. Small values of C attribues to more misclassification and larger values of C attribues to less misclassification. <br>
We take values like 0.001,0.01,0.1,1,10,100,1000 and so on.

<b>3) kernel</b> - SVM follows a technique called the kernel trick to transform the data and based on these transformations, it finds an optimal boundary between the possible outputs. Different kernel options are linear, rbf, poly

<b>4) degree</b> - Defines degree of polynomial for 'poly' kernel



### Kernel Types

<b>1) Linear</b>
<b>K(xi, xj) = xi.xj</b>
Here, xi, xj represents the data you’re trying to classify.

<b>2) Poly</b>
<b>K(xi, xj) = (xi.xj+1)^d</b>
Here ‘.’ shows the dot product of both the values, and d denotes the degree of the polynomial.
K(xi, xj) representing the decision boundary to separate the given classes. 

<b>3) RBF (Radial Basis Function) </b>
<b>K(xi, xj) = exp(-γ*||xi - xj||^2)
The value of γ>0
</b>


### SVM Pros and Cons
#### Pros

1) SVM is an algorithm which is suitable for both linearly and nonlinearly separable data (using kernel trick).<br>
2) SVM is very good when we have no idea about the data.<br>
3) Kernel trick is what makes SVM unique.<br>

#### Cons

1) They are not suitable for larger datasets because the training time with SVMs can be high and much more computationally intensive.<br>
2) Tuning parameters can be time consuming<br>


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [3]:
df = pd.read_csv('iris.csv')
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,label
0,4.9,3.0,1.4,0.2,Iris-setosa
1,4.7,3.2,1.3,0.2,Iris-setosa
2,4.6,3.1,1.5,0.2,Iris-setosa
3,5.0,3.6,1.4,0.2,Iris-setosa
4,5.4,3.9,1.7,0.4,Iris-setosa


In [5]:
df.isnull().sum()

sepal_length    0
sepal_width     0
petal_length    0
petal_width     0
label           0
dtype: int64

In [7]:
df.dtypes

sepal_length    float64
sepal_width     float64
petal_length    float64
petal_width     float64
label            object
dtype: object

In [9]:
df['label'].value_counts()

Iris-versicolor    50
Iris-virginica     50
Iris-setosa        49
Name: label, dtype: int64

In [11]:
x = df.iloc[:,:-1]  # x = df.drop('label',axis=1)
y = df.iloc[:,-1]
print(x.shape)
print(y.shape)

(149, 4)
(149,)


In [13]:
from sklearn.model_selection import train_test_split

In [28]:
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.25)
print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)

(111, 4)
(38, 4)
(111,)
(38,)


#### Applying SVM with linear kernel

In [29]:
from sklearn.svm import SVC # SupportVectorClassifier

In [30]:
m1 = SVC(kernel='linear',C=1)
m1.fit(x_train,y_train)

SVC(C=1, kernel='linear')

In [31]:
# Accuracy
print('Training Score',m1.score(x_train,y_train))
print('Testing Score',m1.score(x_test,y_test))

Training Score 0.990990990990991
Testing Score 0.9736842105263158


In [32]:
ypred_m1 = m1.predict(x_test)
print('ypred\n',ypred_m1)

ypred
 ['Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-virginica'
 'Iris-virginica' 'Iris-virginica' 'Iris-versicolor' 'Iris-versicolor'
 'Iris-versicolor' 'Iris-virginica' 'Iris-virginica' 'Iris-versicolor'
 'Iris-setosa' 'Iris-virginica' 'Iris-versicolor' 'Iris-versicolor'
 'Iris-versicolor' 'Iris-setosa' 'Iris-virginica' 'Iris-virginica'
 'Iris-versicolor' 'Iris-virginica' 'Iris-setosa' 'Iris-virginica'
 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor'
 'Iris-setosa' 'Iris-versicolor' 'Iris-virginica' 'Iris-versicolor'
 'Iris-virginica' 'Iris-versicolor' 'Iris-virginica' 'Iris-versicolor'
 'Iris-versicolor' 'Iris-setosa']


In [33]:
from sklearn.metrics import confusion_matrix,classification_report

In [34]:
cm_m1 = confusion_matrix(y_test,ypred_m1)
print(cm_m1)
print(classification_report(y_test,ypred_m1))

[[ 8  0  0]
 [ 0 16  0]
 [ 0  1 13]]
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00         8
Iris-versicolor       0.94      1.00      0.97        16
 Iris-virginica       1.00      0.93      0.96        14

       accuracy                           0.97        38
      macro avg       0.98      0.98      0.98        38
   weighted avg       0.98      0.97      0.97        38



#### Applying SVM with rbf kernel

In [37]:
m2 = SVC(kernel='rbf',gamma=0.1,C=0.1)
m2.fit(x_train,y_train)

SVC(C=0.1, gamma=0.1)

In [36]:
print('Training Score',m2.score(x_train,y_train))
print('Testing Score',m2.score(x_test,y_test))

Training Score 0.963963963963964
Testing Score 0.8947368421052632


#### Applying SVM with poly kernel

In [39]:
m3 = SVC(kernel='poly',degree=3,C=10)
m3.fit(x_train,y_train)

SVC(C=10, kernel='poly')

In [40]:
print('Training Score',m3.score(x_train,y_train))
print('Testing Score',m3.score(x_test,y_test))

Training Score 0.990990990990991
Testing Score 0.868421052631579
