# Support Vector Machines

An SVM makes classifications by defining a decision boundary and then seeing what side of the boundary an unclassified point falls on. Decision boundaries are defined by using a training set of classified points

Decision boundaries are easiest to wrap your head around when the data has two features. In this case, the decision boundary is a line. Take a look at the example below.

![image.png](attachment:image.png)

After finding a decision boundary using the training set, you could give the SVM an unlabeled data point, and it will predict whether or not that team will make the playoffs.

Decision boundaries exist even when your data has more than two features.

## Optimal decision boundaries

In general, we want our decision boundary to be as far away from training points as possible.

![image-2.png](attachment:image-2.png)

**Support vectors** are the points in the training set closest to the decision boundary. These vectors are crucial in defining the decision boundary — that’s where the “support” comes from. If you are using n features, there are at least n+1 support vectors.

![image-3.png](attachment:image-3.png)

Because the support vectors are so critical in defining the decision boundary, many of the other training points can be ignored. This is one of the advantages of SVMs. Many supervised machine learning algorithms use every training point in order to make a prediction, even though many of those training points aren’t relevant. SVMs are fast because they only use the support vectors!

## Scikit-learn


In [2]:
from sklearn.svm import SVC

classifier = SVC(kernel = 'linear')

training_points = [[1, 2], [1, 5], [2, 2], [7, 5], [9, 4], [8, 2]]
labels = [1, 1, 1, 0, 0, 0]
classifier.fit(training_points, labels) 

print(classifier.predict([[3, 4]]))
print(classifier.predict([[6, 7]]))

[1]
[0]


## Outliers

SVMs try to maximize the size of the margin while still correctly separating the points of each class. As a result, outliers can be a problem. Consider the image below.

![image.png](attachment:image.png)

SVMs have a parameter C that determines how much error the SVM will allow for. If C is large, then the SVM has a hard margin — it won’t allow for many misclassifications, and as a result, the margin could be fairly small. If C is too large, the model runs the risk of overfitting. It relies too heavily on the training data, including the outliers.

On the other hand, if C is small, the SVM has a soft margin. Some points might fall on the wrong side of the line, but the margin will be large. This is resistant to outliers, but if C gets too small, you run the risk of underfitting. The SVM will allow for so much error that the training data won’t be represented.


## Kernels

![image.png](attachment:image.png)

It’s impossible to draw a straight line to separate the red points from the blue points!

Luckily, SVMs have a way of handling these data sets. Remember when we set kernel = 'linear' when creating our SVM? Kernels are the key to creating a decision boundary between data points that are not linearly separable.

Kernels add extra dimensions to the data to find an optimal decision boundary

## Radial Basis Function Kernel

The most commonly used kernel in SVMs is a radial basis function (rbf) kernel. This is the default kernel used in scikit-learn’s SVC object. If you don’t specifically set the kernel to "linear", "poly" the SVC object will use an rbf kernel. If you want to be explicit, you can set kernel = "rbf", although that is redundant.

classifier = SVC(kernel = "rbf", gamma = 0.5, C = 2)

gamma is similar to the C parameter. You can essentially tune the model to be more or less sensitive to the training data. A higher gamma, say 100, will put more importance on the training data and could result in overfitting. Conversely, A lower gamma like 0.01 makes the points in the training data less relevant and can result in underfitting.



In [10]:


for i in range(0, 100, 10):
    print(i)



0
10
20
30
40
50
60
70
80
90
