## Module 01: Supervised Learning

### Lesson 06: Support Vector Machines

> Support vector machines are a common method used for classification problems. They have been proven effective using what is known as the 'kernel' trick!

#### 01. Intro

Support vector machines are a very powerful algorithm for classification that not only aims to classify the data, but it also aims to find the best possible boundary, namely, the one that maintains the largest distance from the points.

#### 02. Which line is better?

#### 03. Minimizing Distances

#### 04. Error Function Intuition

$Error = Classification Error + Margin Error$

#### 05. Perceptron Algorithm

Perceptron algorithm will be to minimize this error using gradient descent in order to find the ideal W and b that give us the best possible cut.

#### 06. Classification Error

The error starts from the bottom line and top line seperately. 

#### 07. Margin Error

* Large margin, small error; Small margin, large error
* $Margin = \frac{2}{|W|}$
* $Error = |W|^2$

#### 08. (Optional) Margin Error Calculation

#### 09. Error Function

We minimize error function using gradient descent.

#### 10. The C Parameter

$Error = C * Classification Error + Margin Error$ (Constant)
* Small C $\to$ large margin $\to$ May make classification errors
* Large C $\to$ small margin $\to$ May make classifies points well

#### 11. Polynomial Kernel 1

#### 12. Polynomial Kernel 2

#### 13. Polynomial Kernel 3

A kernel means a set of functions that will come to help us out.

#### 14. RBF Kernel 1

RBF: radial basis functions kernel(Gaussian kernel function)


#### 15. RBF Kernel 2

#### 16. RBF Kernel 3

* Large $\gamma$ tend to overfit, small $\gamma$ tend to underfit
* Normal Distribution: $y = \frac{1}{\sigma\sqrt{2\pi}} e^-{\frac{(x-\mu)^2}{2\sigma^2}}$
* $\gamma = \frac{1}{2\sigma^2}$

As long as we think of gamma as some parameter that is associated with the width of the curve in an inverse way, then we are grasping the concept of the gamma parameter and the RBF kernel.

#### 17. SVMs in sklearn

**Support Vector Machines in sklearn**

For your support vector machine model, you'll be using scikit-learn's SVC class. This class provides the functions to define and fit the model to your data.

```
>>> from sklearn.svm import SVC
>>> model = SVC()
>>> model.fit(x_values, y_values)
```

In the example above, the model variable is a support vector machine model that has been fitted to the data x_values and y_values. Fitting the model means finding the best boundary that fits the training data. Let's make two predictions using the model's predict() function.

```
>>> print(model.predict([ [0.2, 0.8], [0.5, 0.4] ]))
[[ 0., 1.]]
```

The model returned an array of predictions, one prediction for each input array. The first input, [0.2, 0.8], got a prediction of 0.. The second input, [0.5, 0.4], got a prediction of 1..

**Hyperparameters**

When we define the model, we can specify the hyperparameters. As we've seen in this section, the most common ones are

* `C`: The C parameter.
* `kernel`: The kernel. The most common ones are 'linear', 'poly', and 'rbf'.
* `degree`: If the kernel is polynomial, this is the maximum degree of the monomials in the kernel.
* `gamma`: If the kernel is rbf, this is the gamma parameter.

For example, here we define a model with a polynomial kernel of degree 4, and a C parameter of 0.1.
```
>>> model = SVC(kernel='poly', degree=4, C=0.1)
```

**Support Vector Machines Quiz**

You'll need to complete each of the following steps:

1. Build a support vector machine model
    * Create a support vector machine classification model using scikit-learn's SVC and assign it to the variablemodel.
2. Fit the model to the data
    * If necessary, specify some of the hyperparameters. The goal is to obtain an accuracy of 100% in the dataset. Hint: Not every kernel will work well.
3. Predict using the model
    * Predict the labels for the training set, and assign this list to the variable y_pred.
4. Calculate the accuracy of the model
    * For this, use the function sklearn function accuracy_score.

When you hit Test Run, you'll be able to see the boundary region of your model, which will help you tune the correct parameters, in case you need them.

In [1]:
# Import statements 
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as np

# Read the data.
data = np.asarray(pd.read_csv('../../data/svm_data.csv', header=None))
# Assign the features to the variable X, and the labels to the variable y. 
X = data[:,0:2]
y = data[:,2]

# TODO: Create the model and assign it to the variable model.
# Find the right parameters for this model to achieve 100% accuracy on the dataset.
model = SVC(kernel='rbf', gamma=27)

# TODO: Fit the model.
model.fit(X, y)

# TODO: Make predictions. Store them in the variable y_pred.
y_pred = model.predict(X)

# TODO: Calculate the accuracy and assign it to the variable acc.
acc = accuracy_score(y, y_pred)
acc

1.0

#### 18. Recap & Additional Resources

**Recap**

In this lesson, you learned about Support Vector Machines (or SVMs). SVMs are a popular algorithm used for classification problems. You saw three different ways that SVMs can be implemented:

* Maximum Margin Classifier
* Classification with Inseparable Classes
* Kernel Methods

1. Maximum Margin Classifier

When your data can be completely separated, the `linear version of SVMs` attempts to maximize the distance from the linear boundary to the closest points (called the `support vectors`).

2. Classification with Inseparable Classes

Unfortunately, data in the real world is rarely completely separable. For this reason, we introduced a new `hyper-parameter called C`. The C hyper-parameter determines `how flexible we are willing to be with the points that fall on the wrong side of our dividing boundary`. The value of C ranges between 0 and infinity. When C is large, you are forcing your boundary to have fewer errors than when it is a small value. (`Small C, Large margin; Large C, Small margin`)

Note: when `C is too large` for a particular set of data, you `might not get convergence` at all because your data cannot be separated with the small number of errors allotted with such a large value of C.

3. Kernels

Finally, we looked at what makes SVMs truly powerful, kernels. Kernels in SVMs allow us the ability to separate data when the boundary between them is `nonlinear`. Specifically, you saw two types of kernels:

* polynomial
* rbf

By far the most popular kernel is the rbf kernel (which stands for radial basis function). The rbf kernel allows you the opportunity to classify points that seem hard to separate in any space. This is a density based approach that looks at the closeness of points to one another. This introduces another `hyper-parameter gamma`. When gamma is large, the outcome is similar to having a large value of C, that is your algorithm will attempt to classify every point correctly. Alternatively, small values of gamma will try to cluster in a more general way that will make more mistakes, but may perform better when it sees new data.