# Support Vector Machine Algorithm

Support vector machine or SVM in short is a classification algorithm. It basicaly fits a hyperplane to separate data points as wide as possible. <br> <br>Simple Hyperplane Example :
![SVM.png](SVM.png)

Suppose we are given plot of two label classes. We need to fit a hyperplane such that it will siplit the classes with the <b>maximum margin</b> or <b>street</b> with maximum width.

Let <b>$\overline{\rm w}$</b> be a vector that is <b>perpendicular to the median line of the street with an unknown length</b>. For any data point <b>u</b>, to decide whether it is on the right side of the street or on the left side of the street, we <b>project</b> the vector $\overline{\rm u}$ to $\overline{\rm w}$ and observe if the projection is big enough to cross the median line of the street. If it is big enough to cross median line, it is green, if it is not big enough to cross the median line, it is red.
![svm2.png](svm2.png)
So, what we are looking for is if the inequality below is true or not: <br> <br>
$$\overline{\rm u} \bullet \overline{\rm w} \enspace\geq\enspace some\enspace constant\enspace C$$ <br><br>
or we can write
$$\overline{\rm u} \bullet \overline{\rm w} + b \enspace\geq\enspace 0$$<br><br> where $C = -b$,<br> we will consider this inequality as our, <br><br><h4>DECISION RULE:</h4> <br><br>
If   $\boxed{\overline{\rm u} \bullet \overline{\rm w} + b \enspace\geq\enspace 0}$ is true, then u is green.

What we don't know here is what constant $b$ to use and which $\overline{\rm w}$ to use either. We only know that $\overline{\rm w}$ needs to be perpendicular to the median line. Clearly, constraints here to fix a particular $b$ and $\overline{\rm w}$ are not enough. What we are going to do isto put forward some constraints in order to calculate a $b$ and a $\overline{\rm w}$. <br> Let's define two additional planes perpendicular to the median of the street:<br><br>
     $$(1) \qquad\enspace\boxed{ H_+:\enspace\overline{\rm w} \bullet \overline{\rm x} + b \enspace=1   \qquad
   \enspace\enspace H_-:\enspace\overline{\rm w} \bullet \overline{\rm x} + b \enspace=-1  }$$ <br><br> 
In order to combine these two equations into one and make life a little easier, we define:<br><br>

$$y_i = 
\begin{cases} 
      1 &,\enspace i = Green \\
      -1 &,\enspace i = Red \\
   \end{cases}
\
$$<br><br>
then we can write the equation below: <br><br>
$$ y_i(\overline{\rm x_i}\bullet \overline{\rm w} + b)\geq 1$$<br><center>or</center> <br>
$$(2)\qquad \boxed{y_i(\overline{\rm x_i}\bullet \overline{\rm w} + b)= 1}$$ <br><center>for $x_i$'s which are on $H_+$ or $H_-$</center>

Let's remember what our aim is: <br>
We are trying to put a line such that the street will seperate the green ones from red ones <b>as wide as possible</b>. <br>To formulate the width of the street, assume $x_+$ is a point on $H_+$ and $x_-$ is a point on $H_-$:

![svm3%282%29.png](svm3%282%29.png) <br><br>

We already know that $\overline{\rm w}$ is a normal vector to the median line. Therefore, we can calculate the width of the street as:<br><br>

$$Width=\frac{(\overline{\rm x_+} - \overline{\rm x_-})\bullet \overline{\rm w}}{||w||}$$ <br><br>
$$= \frac{\overline{\rm w}\bullet \overline{\rm x_+} - \overline{\rm w}\bullet \overline{\rm x_-}}{||w||} $$<br><br>
If we use the equation (2): <br><br>
$$(2)\qquad y_i(\overline{\rm w}\bullet \overline{\rm x_i} + b) = 1$$ <br>
<center>We know that $y_+ = 1$ and $y_- = -1$ so:</center> <br>
$$y_+(\overline{\rm w}\bullet \overline{\rm x_+} + b) = 1 \Longrightarrow 1(\overline{\rm w}\bullet \overline{\rm x_+} + b) = 1$$<br>
$$\Longrightarrow \overline{\rm w}\bullet \overline{\rm x_+} = 1-b$$ <br>
$$y_-(\overline{\rm w}\bullet \overline{\rm x_-} + b) = 1 \Longrightarrow -1(\overline{\rm w}\bullet \overline{\rm x_-} + b) = 1$$<br>
$$\Longrightarrow \overline{\rm w}\bullet \overline{\rm x_+} = -1-b $$ <br>
$$\Longrightarrow \overline{\rm w}\bullet \overline{\rm x_+} - \overline{\rm w}\bullet \overline{\rm x_-} = 1-b+b+1 = 2$$<br><br>
$$\Longrightarrow \boxed{Width=\frac{2}{||w||}}$$ <br>



So know we know that, we are trying to maximize $\frac{2}{||w||}$, which means we can minimize $||w||$ instead. Moreover, we can minimize $\frac{1}{2}||w||^2$ instead, which is mathematically convenient. <br>
$$min(\frac{1}{2}||w||^2)$$ <br>
Now we have an expression that we would like to find extremum of, and we have some constraints that we need to satisfy. Therefore, we need to use <b>Lagrange Multipliers</b> to solve this quadratic optimization problem.<br>
<br>

$$min \qquad L=\frac{1}{2}||w||^2 - \sum_{i=1}^{m} \alpha_i[y_i(\overline{\rm w}\bullet \overline{\rm x_i} + b)-1]$$ <br>
Since we are to find extremum, we use derivative:<br>
$$\frac{\partial L}{\partial \overline{\rm w}} = \overline{\rm w} - \sum_{i=1}^{m} \alpha_i y_i x_i = 0$$<br>
$$\Longrightarrow \boxed{ \overline{\rm w} = \sum_{i=1}^{m} \alpha_i y_i x_i} \qquad (*)$$ <br>
$$\frac{\partial L}{\partial b} = - \sum_{i=1}^{m} \alpha_i y_i = 0$$ <br>
$$\boxed{\sum_{i=1}^{m} \alpha_i y_i = 0}\qquad (**)$$<br>
If we substitute $(*)$ in L , we get: <br><br>
$$L = \frac{1}{2}\left(\sum_{i=1}^{m} \alpha_i y_i x_i\right)\bullet\left(\sum_{j=1}^{m} \alpha_j y_j x_j\right) - \left(\sum_{i=1}^{m} \alpha_i y_i x_i\right)\bullet\left(\sum_{j=1}^{m} \alpha_j y_j x_j\right) - \underbrace{\sum_{j=1}^{m} \alpha_i y_i b}_{0 \thinspace from \thinspace(**)} + \sum_{j=1}^{m} \alpha_i$$ <br><br>And then we get:
$$(3) \qquad \boxed{L = \sum_{i=1}^{m} \alpha_i + \frac{1}{2}\sum_{i=1}^{m}\sum_{j=1}^{m}\alpha_i\alpha_j y_i y_j x_i \bullet x_j}$$<br><br>
So we can see that optimization problem only depends only on dot product of samples. <br><br>
$$L = \sum_{i=1}^{m} \alpha_i + \frac{1}{2}\sum_{i=1}^{m}\sum_{j=1}^{m}\alpha_i\alpha_j y_i y_j\boxed{ x_i \bullet x_j}$$<br><br>Solving this equations with numerical analysis methods gets us $\alpha_i$'s.<br>
If we use $(*)$ back in our <b>DECISION RULE</b>, it becomes:<br><br>
<center>If $\boxed {\sum_{i=1}^{m} \alpha_i y_i \overline{\rm x_i} \bullet \overline{\rm u} + b \geq 0}$ is true, then $u$ is green</center><br><br> 
and the points $x_i$ on $H_+$ or $H_-$ are called <b>support vectors</b>.