# Support Vector Machines

## What is a SVM? 

A support vector machine is a supervised learning algorithm, that can be used for both classification and regression, but mostly classification. SVMs classify data by finding a hyperplane ("dividing line" that splits the input variables) between the classes in the training data. The hyperplane **maximizes the distance between the hyperplane and the closest data points (the "margin")**.

<img src="https://miro.medium.com/max/469/0*j6b6qNc-E0RfBxFj">

## How does the SVM draw the hyperplane?

The hyperplane is chosen as the dividing line which separates the data points *as widely as possible*, hence why the margin is maximized. First, the SVM draws "Support Vectors", that is, two hyperplanes with one intersecting the first data point of class A and the other intersecting the first data point of class B. Then the final hyperplane is drawn in the middle.

## How are SVM algorithms implemented in practice?

We use something called the *kernel trick*. Basically, the lower-dimensional input data set is transformed, using linear algebra, into a higher-dimensional space. Why? So it will be easier to find a hyperplane that can separate the data. See the image for a visual example.

<img src="https://miro.medium.com/max/700/0*ZnINGVLyQZfrcZYG">

How do we implement the kernel trick? The linear SVM can be transformed by computing the **inner product** of any two given observations. The inner product of two input vectors is the sum of each pair of input values multipled together.

A **kernel function** is the function that transforms the input data. Types of kernel functions used in SVM include:
- Linear kernel (as mentioned above, compute the inner product)
- Polynomial kernel
- RBF (Radial Basis Function) kernel


We would need to use polynomial or RBF (more common) if the data set is not linearly separable. 

### Polynomial kernel
Instead of using inner product, we can use a polynomial kernel function to transform the input vectors x_1 and x_2. 

$$ K(x_1, x_2) = (x_1^Tx_2 + c)^d $$

### RBG kernel
Defined mathematically as

$$ K(x_1, x_2) = exp(-\frac{|| x_1 - x_2 ||^2}{2\sigma^2}) $$

And note that the $|| x_1 - x_2||^2$ is the squared Euclidean distance between two feature vectors, and $\sigma$ is a free parameter.

## When would you use SVM over Random Forest?
* When the data set is not linearly separable. Then SVM can use the kernel trick, such as with RBF kernel.
* When the data is very high-dimensional. For example in text classification and other NLP problems.


# Links
https://machinelearningmastery.com/support-vector-machines-for-machine-learning/
https://en.wikipedia.org/wiki/Support-vector_machine

