# Support Vector Machine (SVM)
***
***
In machine learning, __support-vector machines__ are <font color=yellow>supervised learning models</font> with associated learning algorithms that analyze data used for <font color=yellow>classification and regression analysis</font>.
***
### History
- SVMs introduced in COLT-92 by Boser, Guyon & Vapnik. Became rather popular since the 1960s then they were refined again in the 1990s.
- Theoretically well motivated algorithm: developed from Statistical Learning Theory (Vapnik & Chervonenkis) 
- Centralized website: __(www.kernel-machines.org)__.
- Several textbooks, e.g. <font color=yellow>”An introduction to Support Vector Machines”</font> by Cristianini and Shawe-Taylor.

***

### How it works
<table style="width:80%">
    <tr>
        <td><img src="./images/Screenshot (101).png" height = 300 width = 300/></td>
        <td><img src="./images/Screenshot (102).png" height = 300 width = 300/></td>
    </tr>
    <tr>
        <td> <br> </td> <!--The br tag did what i was looking for -->
    </tr>
    <tr>
        <td><img src="./images/Screenshot (103).png" height = 300 width = 300/></td>
        <td><img src="./images/Screenshot (106).png" height = 300 width = 300/></td>
    </tr>
</table>

We have to find the <font color=yellow>best line or the optimal line</font> that seperates 2 classes of data points. So how can SVM help in searching this line?

***

#### Maximum Margin
<table style="width:80%">
    <tr>
        <td align="center" valign="center"><img src="./images/Screenshot (107).png" height = 500 width = 500/></td>
    </tr>
</table>

The <font color=yellow>Maximal-Margin Classifier</font> is a hypothetical classifier that best explains how SVM works in practice.

The numeric input variables (x) in your data (the columns) form an n-dimensional space. For example, if you had 2 input variables, this would form a 2-dimensional space.

A <font color=yellow>hyperplane is a line that splits the input variable space</font>. The distance between the line and the closest data points is referred to as the <font color=yellow>margin</font>. The best or optimal line that can separate the 2 classes is the line that has the largest margin. This is called the <font color=yellow>Maximal-Margin hyperplane</font>.
***
#### Support Vectors
<table style="width:80%">
    <tr>
        <td align="center" valign="center"><img src="./images/Screenshot (109).png" height = 500 width = 500/></td>
    </tr>
</table>

The margin is calculated as the perpendicular distance from the line to only the closest points. Only these points are relevant in defining the line and in the construction of the classifier. These points are called the <font color=yellow>support vectors</font>. They support or define the hyperplane.

***

#### Hyperplanes
<table style="width:80%">
    <tr>
        <td align="center" valign="center"><img src="./images/Screenshot (112).png" height = 500 width = 500/></td>
    </tr>
</table>

The hyperplane is learned from training data using an optimization procedure that maximizes the margin. It has n-1 dimension.
***
<table style="width:80%">
    <tr>
        <td align="center" valign="center"><img src="./images/Svm_separating_hyperplanes_(SVG).svg.png" height = 300 width = 300 style = "background-color:white;"/></td>
    </tr>
</table>
H1 does not separate the classes. H2 does, but only with a small margin. H3 separates them with the maximal margin.<br>

***

#### Non linear Hyperplanes
<table style="width:80%">
    <tr>
        <td align="center" valign="center"><img src="./images/non_linear_hyperplane.png" height = 300 width = 300/></td>
        <td align="center" valign="center"><img src="./images/non_linear_hyperplane2.png" height = 300 width = 300/></td>
    </tr>
</table>
We cannot have linear hyper-plane between the 2 classes depicted above, so how does SVM classify these 2 classes? SVMs can efficiently perform a non-linear classification using what is called the <font color=yellow>kernel trick</font>, implicitly mapping their inputs into high-dimensional feature spaces by introducing additional feature. Here, the additional feature is <font color=yellow>z=x^2+y^2</font> which is depicted in the above image on the right side.

***

### How to implement SVM in Python

Run svm.ipynb

***

### How to tune Parameters of SVM?
Tuning parameters value for machine learning algorithms effectively improves the model performance. Let’s look at the list of parameters available with SVM.

`sklearn.svm.SVC(C=1.0, kernel='rbf', degree=3, gamma=0.0, coef0=0.0, shrinking=True, probability=False,tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, random_state=None)`

##### <font color=green>kernel : </font> 
We have various options available with kernel like, <font color=yellow>“linear”, “rbf”,”poly” and others (default value is “rbf”)</font>.  Here “rbf” and “poly” are useful for non-linear hyper-plane. Let’s look at an example for linear and rbf kernel on 2 feature of iris data set to classify their class.
<table style="width:80%">
    <tr>
        <td><img src="./images/SVM_linear_kernel.png" height = 300 width = 300/></td>
        <td><img src="./images/SVM_rbf.png" height = 300 width = 300/></td>
    </tr>
</table>

##### <font color=green>gamma : </font>
<font color=yellow>Kernel coefficient</font> for ‘rbf’, ‘poly’ and ‘sigmoid’. Higher the value of gamma, will try to exact fit the data as per training data set i.e. generalization error and cause over-fitting problem.
Example: Let’s check difference if we have different gamma values like 0, 10 or 100.
<table style="width:80%">
    <tr>
        <td align="center" valign="center"><img src="./images/gamma.png" height = 650 width = 650 style = "background-color:white;"/></td>
    </tr>
</table>

##### <font color=green>C : </font>
<font color=yellow>Penalty parameter</font> C of the error term. It also controls the trade off between smooth decision boundary and classifying the training points correctly.
<table style="width:80%">
    <tr>
        <td align="center" valign="center"><img src="./images/c.png" height = 650 width = 650 style = "background-color:white;"/></td>
    </tr>
</table>

***

### Pros & Cons :

##### Advantages
- Effective in high dimensional spaces.
- Still effective in cases where number of dimensions is greater than the number of samples.
- Uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.
- Versatile: different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.

##### Disadvantages

- If the number of features is much greater than the number of samples, avoid over-fitting in choosing Kernel functions and regularization term is crucial.
- SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation 

***

### Applications
SVMs can be used to solve various real-world problems:

- SVMs are helpful in <font color=yellow>text and hypertext categorization</font>, as their application can significantly reduce the need for labeled training instances in both the standard inductive and transductive settings.
- <font color=yellow>Classification of images</font> can also be performed using SVMs. Experimental results show that SVMs achieve significantly higher search accuracy than traditional query refinement schemes after just three to four rounds of relevance feedback. This is also true for image segmentation systems.
- <font color=yellow>Hand-written characters</font> can be recognized using SVM.
- The SVM algorithm has been widely applied in the <font color=yellow>biological and other sciences</font>. They have been used to classify proteins with up to 90% of the compounds classified correctly. Permutation tests based on SVM weights have been suggested as a mechanism for interpretation of SVM models. 
***