## Programming Exercise 6: Support Vector Machines
#### Author - Rishabh Jain

In [1]:
import warnings
warnings.simplefilter('ignore')

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
%matplotlib inline

from scipy.io import loadmat

#### Learning Resources
1. [SVM Video Lecture (MIT)](https://www.youtube.com/watch?v=_PwhiWxHK8o)
2. [29 to 33 SVM Video Lectures (University of Buffalo)](https://www.youtube.com/watch?v=N4pai7eZW_o&list=PLhuJd8bFXYJsSXPMrGlueK6TMPdHubICv&index=29)
3. [Support Vector Machine Succinctly (PDF)](./Lectures/SVM_succinctly.pdf)
4. [An Idiot’s guide to Support vector machines](./Lectures/SVM_notes.pdf)

### 0&nbsp;&nbsp;&nbsp;&nbsp;Maths Behind SVM (Maximum Margin Classifier)

For two-class, such as the one shown below, there are lots of possible linear separators. Intuitively, a decision boundary drawn in the middle of the void between data items of the two classes seems better than one which approaches very close to examples of one or both classes. While some learning methods such as the logistic regression find just any linear separator. **The SVM in particular defines the criterion to be looking for a decision surface that is MAXIMALLY far away from any data point**. This distance from the decision surface to the closest data point determines the margin of the classifier.

<img src="./images/svm1.png" width="380">

Let's imagine a vector $\vec{w}$ perpandicular to the margin and an unknown data point $\vec{u}$ which can be on either side of the margin. In order to know whether $\vec{u}$ is on the right or left side of the margin, we will project (Dot product) $\vec{u}$ onto $\vec{w}$.

$$\vec{w}.\vec{u}\geq c$$
$$\boxed{\vec{w}.\vec{u}+b\geq 0}\;\;(1)$$ 

If the projection of $\vec{u}$ plus some constant $b$ is greater than zero, then its a positive sample otherwise its a negative sample.**Eq. (1) is our DECISION RULE**. Here the problem is that we don't know what $w$ and $b$ to use.  

**An unknown sample may be located anywhere inside or outside the margin (i.e. >0 or <0), but if it's a known positive sample $\vec{x_{+}}$ then the SVM decision rule should insist the dot product plus some constant $b$ to be 1 or greater than 1.** Likewise for a negative sample $\vec{x_{-}}$, dot product plus some constant $b$ should be less than or equal to -1 Hence:

$\vec{w}.\vec{x_{+}}+b\geq 1 $   
$\vec{w}.\vec{x_{-}}+b\leq -1 $ 

Introducing a variable $y_i$ such that :  

$$\begin{equation}
  y_{i}=\begin{cases}
    +1 & \text{for +ve samples}\\
    -1 & \text{for -ve samples}
  \end{cases}
\end{equation}$$

Mutiplying the above two inequality eqauations with $y_i$:

For +ve sample : $y_{i}(\vec{w}.\vec{x_{i}}+b) \geq 1$  
For -ve sample : $y_{i}(\vec{w}.\vec{x_{i}}+b) \geq 1$

###### Note : Sign changed from $\leq$ to $\geq$ because $y_i$ is -1 in case of -ve samples
Since both the equations are same, we can rewrite them as :

$$\boxed{y_{i}(\vec{w}.\vec{x_{i}}+b)\geq 1}\;\;(2)$$

$$\boxed{y_{i}(\vec{w}.\vec{x_{i}}+b)-1= 0}\;\;(3)\;\;\text{For samples on margin}$$

Eq.(1) is basically a **constraint** for our margin, which means that **all the training samples should be on the correct side OR on the margin** (i.e. +ve samples on the right and -ve samples on the left side of the margin) and **NO training sample should be inside the margin at all meaning ZERO TRAINING ERROR.** 

###### Let's calculate the width of the margin.

<img src="./images/svm2.png" width="400">

Let's imagine two vectors $\vec{x_+}$ and $\vec{x_-}$, both are +ve and -ve known samples respectively. The difference of these two vectors is a resultant vector called $\vec{R}$ where :

$$\vec{R}=\vec{x_+}-\vec{x_-}$$

All we need is a $\hat{u}$, **so that the WIDTH of the margin will be the projection of $\vec{R}$ onto $\hat{u}$**. From the first image, we already know a vector $\vec{w}$ in the same direction.

$$\hat{u}=\frac{\vec{w}}{||w||}$$

**WIDTH** $=\vec{R}.\hat{u} $  

$\;\;\;\;\;\;\;\;\;\;=(\vec{x_+}-\vec{x_-}).\frac{\vec{w}}{||w||}$  
$\;\;\;\;\;\;\;\;\;\;=\frac{(\vec{x_+}.\vec{w}-\vec{x_-}.\vec{w})}{||w||}$

Using eq (3), we get

$\;\;\;\;\;\;\;\;\;\;=\frac{(1-b+1+b)}{||w||}$
$$\boxed{\text{WIDTH}=\frac{2}{||w||}}\;\;(4)$$

Now, we want to maximize the margin while incurring zero training error.

max $\frac{2}{||w||}$ with 0 loss OR (Flipping for mathematical convenience)

min $\frac{||w||}{2}\;$ with 0 loss OR (Squaring the numerator for mathematical convenience)

min $\frac{||w||^2}{2}$ with 0 loss **(NO LONGER A CONSTRAINED OPTIMIZATION)**

##### Optimization Formulation

> minimize $\;\;\frac{||w||^2}{2}$  
> subject to $\;\;y_{i}(\vec{w}.\vec{x_{i}}+b)\geq 1\;\;,i=1,2...N$