# 1. Introduction

Week 7 is all about Support Vector Machines ("<b>SVM</b>").  This is the final <b>supervised</b> learning algorithm explored in Andrew Ng's Introduction to Machine Learning course.  The previous supervised techniques were:

* Linear Regression


* Logistic Regression


* Neural Networks

# 2. The Basics

## 2.1. What is a SVM?

SVMs:

* Are supervised learning models for classification and regression problems.


* Can solve linear and non-linear problems.


* Are most commonly used in <b>classification</b> problems.

## 2.2. When to use a SVM?

SVMs can be used for:

1. Regression problems; and


2. Classification problems.

## 2.2. How does a support vector machine work

SVM are based on the simple idea of finding a <b>hyperplane</b> that best divides a dataset into two classes.  

<p align = "center">
<img src="..\Images\SVMsimple.PNG" width="40%"/>
</p>

To better understand this, we need to first define some key SVM terminology:

### 2.2.1. SVM Terminology

* High level SVM components:

<p align = "center">
<img src="..\Images\Hyperplanes.PNG" width="60%"/>
</p>

* <b>Hyperplane:</b> a multidimensional line (3+ dimensions) that separates and classifies a set of data.  By contrast, a <b>line</b> is a 2d line and a <b>plane</b> is a 3d surface.

    Intuitively, the further from the hyperplane our data points the more confident we are that they have been correctly classified.  We therefore want our data points to be as far away from the hyperplane as possible, while still being on the correct side of it.
    
    

* <b>Support Vector:</b> data points <b>nearest to the hyperplane</b> and are points that, if removed, would alter the position of the dividing hyperplane.  Because of this, the support vectors ("<b>SV</b>") can be considered critical elements of a dataset.  This is unlike linear regression and neural networks where <b>all</b> data points influence optimization.


* <b>Margin:</b> the distance between the hyperplane and the nearest data point from either side.  The goal is to choose a hyperplane that is equidistant as far as possible for both sides, i.e. not too close to one class or the other:

<p align = "center">
<img src="..\Images\Margin.PNG" width="60%"/>
</p>

* <b>Linear vs. Non-Linear Separable:</b> figure A shows separation possible with a linear plane (i.e. straight line) whereas figure B shows separation possible with a non-linear plane (i.e. bendy, in this case circular, line):

<p align = "center">
<img src="..\Images\SVMLinSep.PNG" width="40%"/>
</p>


* <b>Gamma:</b> defines how far the influence of a single training example reaches, which low values meaning "far" and high values meaning "close".  In other words, with low gamma, data points far away from plausible hyperplanes are considered in calculation for the hyperplane.  Whereas high gamma means the data points close to the plausible hyperplanes are considered for the calculation.

<p align = "center">
<img src="..\Images\Gamma.PNG" width="60%"/>
</p>

### 2.2.2. What if there is no clear hyperplane?

In order to classify a messy (i.e. real world) dataset, we need to move away from a 2d view of the data to a 3d view.  

For instance, imagine we are trying to classify green balls from blue balls but their distribution is like this:

<p align = "center">
<img src="..\Images\SVM2d.PNG" width="40%"/>
</p>

We can imagine floating these balls in 3d space and using a sheet to separate them:

<p align = "center">
<img src="..\Images\SVM3d.PNG" width="80%"/>
</p>

This "floating" of the balls represents mapping the data from 2d to 3d, i.e. to a higher dimension.  This is known as "<b>Kernelling</b>".

Once in 3d, the line can no longer be a line: it is a <b>plane</b>.  The idea is to then experiment with mapping the data to higher and higher dimensions until a hyperplane can be formed to adequately separate it into the correct classes.

### 2.2.3. How is SVM's hyperplane different to linear classifiers (e.g. Logistic Regression)?

Unlike other linear classifiers, e.g. Logistic Regression, the <i>motivation</i> of SVMs is to <b>maximise margin</b>.  In other words, an SVM tries to find the classifier whose decision boundary is furthest away from any data point.

## 2.3. Pros and Cons of SVMs

### 2.3.1. Pros

* Accuracy


* Works well on smaller cleaner datasets


* Can be more efficient because it uses a subset of training points

### 2.3.2. Cons

* Is not suited to large datasets as the training time with SVMs can be high


* Less effective on noisier datasets with overlapping classes

## 2.4. SVM Uses

SVM is used for:

* Text classification tasks such as category assignment, detecting spam and sentiment analysis. 


* Image recognition challenges, performing particularly well in aspect-based recognition and color-based classification. 


* Handwritten digit recognition, such as postal automation services.

# 3. Vectors (Brief) Recap

## 3.1. What are vectors?

A vector:

* is an <b>n dimensional object</b>;


* has <b>magnitude</b>, i.e. length; 


* has <b>amplitude</b>, i.e. direction; and


* strts from the origin, (0, 0).

## 3.2. Components Explained

### 3.2.1. Notation

* Vectors usually identified by the letters of its <b>head</b> and <b>tail</b> like so: $\overrightarrow{\rm AB}$


* Vector magnitude is denoted by: $\| a \|$.


* Vector directon is denoted by: $\theta$.

### 3.2.2. Calculations

* The most common way is to first break up vectors into their constituent $x$ and $y$ parts, i.e. horiztonal and vertical vectors, like so:

<p align = "center">
<img src="..\Images\vectorComponents.PNG" width="20%"/>
</p>

### 3.2.2. Magnitude (Length)

* Magnitude or length of a vector is written as $\| a \|$, and calculated as follows: $\|a \| = \sqrt{x^2, y^2}$

### 3.2.3. Direction

* Direction is written as: $\theta = tan^{-1}\begin{pmatrix}\frac{y}{x}\end{pmatrix}$.

### 3.2.4. Operations Explained

See [here](https://www.mathsisfun.com/algebra/vectors.html) via www.mathsisfun.co.uk!

## 3.3. Useful Resources

- https://www.mathsisfun.com/algebra/vectors.html


- http://teachers.henrico.k12.va.us/math/ito_08/10AdditionalTrig/10LES4/vector_notate_n.pdf

# 4. SVM Maths

<i>To be covered</i> using [this article](https://www.jeremyjordan.me/support-vector-machines/) as its inspiration.

# 5. Useful Resources

- https://www.kdnuggets.com/2016/07/support-vector-machines-simple-explanation.html


- https://www.quantstart.com/articles/Support-Vector-Machines-A-Guide-for-Beginners


- https://maviccprp.github.io/a-support-vector-machine-in-just-a-few-lines-of-python-code/


- https://medium.com/@LSchultebraucks/introduction-to-support-vector-machines-9f8161ae2fcb


- http://web.mit.edu/6.034/wwwbob/svm-notes-long-08.pdf


- https://medium.com/machine-learning-101/chapter-2-svm-support-vector-machine-theory-f0812effc72


- https://www.jeremyjordan.me/support-vector-machines/


- https://cling.csd.uwo.ca/cs860/papers/SVM_Explained.pdf