# Introduction
SVM stands for Support Vector Machine. SVM's are typically used for classification tasks similar to K Nearest Neighbors. They work very well for high dimensional data and allows us to classify data that does not have a linear correspondence. For example calssifying data set like the one below.
![svm-data.png](attachment:svm-data.png)

Attempting to use K Nearest Neighbors to do this would likey give us a very low accuracy score and is not favorable. This is where SVM's are useful.

# Importing modules

In [1]:
import sklearn
from sklearn import svm
from sklearn import datasets

# Data
The data sets are from sklearn. These are much easier and niccer to work with and have some nice methods that makes loading in data very quck.
We will be using breast cancer data set. It consists of many features describing a tumor and classifies them as either cancerous or non cancerous.


In [2]:
# load our data
cancer = datasets.load_breast_cancer()

In [3]:
# See list of features
print("Features: ", cancer.feature_names)

Features:  ['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']


In [4]:
# see labels
print("Labels: ", cancer.target_names)

Labels:  ['malignant' 'benign']


# Splitting Data

In [5]:
# Now split it into training and testing data. 
x = cancer.data #All of the features
y = cancer.target

x_train, x_test,y_train, y_test = sklearn.model_selection.train_test_split(x,y,test_size=0.2)

In [6]:
# Look at out data we can print the first few instances
print(x_train[:5],y_train[:5])

[[1.532e+01 1.727e+01 1.032e+02 7.133e+02 1.335e-01 2.284e-01 2.448e-01
  1.242e-01 2.398e-01 7.596e-02 6.592e-01 1.059e+00 4.061e+00 5.946e+01
  1.015e-02 4.588e-02 4.983e-02 2.127e-02 1.884e-02 8.660e-03 1.773e+01
  2.266e+01 1.198e+02 9.288e+02 1.765e-01 4.503e-01 4.429e-01 2.229e-01
  3.258e-01 1.191e-01]
 [1.283e+01 2.233e+01 8.526e+01 5.032e+02 1.088e-01 1.799e-01 1.695e-01
  6.861e-02 2.123e-01 7.254e-02 3.061e-01 1.069e+00 2.257e+00 2.513e+01
  6.983e-03 3.858e-02 4.683e-02 1.499e-02 1.680e-02 5.617e-03 1.520e+01
  3.015e+01 1.053e+02 7.060e+02 1.777e-01 5.343e-01 6.282e-01 1.977e-01
  3.407e-01 1.243e-01]
 [1.113e+01 1.662e+01 7.047e+01 3.811e+02 8.151e-02 3.834e-02 1.369e-02
  1.370e-02 1.511e-01 6.148e-02 1.415e-01 9.671e-01 9.680e-01 9.704e+00
  5.883e-03 6.263e-03 9.398e-03 6.189e-03 2.009e-02 2.377e-03 1.168e+01
  2.029e+01 7.435e+01 4.211e+02 1.030e-01 6.219e-02 4.580e-02 4.044e-02
  2.383e-01 7.083e-02]
 [1.350e+01 1.271e+01 8.569e+01 5.662e+02 7.376e-02 3.614e-02 2.758

# What a SVM Does?
A SVM has a large list of applicable uses. However, in machine learning it is typically used for classification. It is a powerful tool that is a good choice for classifying complicated data with a high degree of dimensions(features). Note that K-Nearest Neighbors does not perform well on high-dimensional data.

# How A Support Vector Machine Works
In short a support vector machine works by dividing data into multiple classes using something called a hyper-plane. A hyper plane is fancy word for something that is straight that can divide data points. In 2D space a hyper-plane is simply a line, in 3D space a hyper-plane is a plane. In any space higher than 3D it is simply called a hyper-plane.

Here is an example of a hyper-plane for the data points on the 2D graph below.
![svm2.png](attachment:svm2.png)

# Hyper-Planes

When we create a hyper-plane we need to do the following. We must pick two points that are known as our support vectors. These points must be the two closest points to the hyper-plane and their distance from the hyper-plane must be identical. In the example above we can see that the two circled points are our support vectors and their distance to the hyper-plane is the same, they are also the closest points. With the rule we can actually create an infinite amount of hyper planes.
![svm4.png](attachment:svm4.png)
![svm3.png](attachment:svm3.png)

Both images above are valid hyper-planes.

# Picking a Hyper Plane
Once we create a hyper-plane we are going to use it to classify our data. If a test point is on the left side of the plane we would classify it as red (in our examples above) and if it is on the right we would classify it as green. So how can we pick a hyper-plane that will give us the best classification predictions?

Have a look at the hyper-planes above and determine which you think would give the best classification for a mystery test point. What do you notice about the hyper-plane?

Well the best possible hyper-plane would be the first image on this page. Notice the distance between the support vectors and the hyper-plane is far greater than other generated hyper-planes.

When we pick a hyper-plane we want to pick one that has the greatest possible margin.

# Margin
The margin is the distance that seperates all of the points in our test data. The blue lines below show you the margin for this particular data and hyper-plane. Typically the greater our margin the better our classification will be.
![svm5.png](attachment:svm5.png)

Note: Imagine the blue lines are parallel to the black...

# Kernels
Let us say our data isn't as pretty and we have some points that look like this:
![svm6.png](attachment:svm6.png)

Can you determine which hyper-plane would be the best for this data? Even if you could it would make a horrible classifier. This is where we introduce something called kernels.

Kernels provide a way for us to create a hyper-plane for data like seen above. We use a kernel to bring our data up to a higher dimension (in this case from 2D -> 3D). We hope that by doing this we will have our points plotted in a way that we can divide them using a hyper-plane.

By applying a kernel to our data above we hope to get something that looks like the following:

![svm8.png](attachment:svm8.png)

You can see that we can now divide our points with a plane in 3D. By applying the kernel our data has become separable.

# What is a Kernel?

A kernel is simply a function that takes as input our features (x1, x2 in our examole) and returns a value equal to the third-dimensional coordinate (x3). An example of a kernel could be the equation.
(x1)^2 + (x2)^2 = x3
![svm7.png](attachment:svm7.png)

Typically when we use a kernel we use a pre-existing one. There is much debate about which kernel is the best but here are some examples popular kernels.
- Linear
- Polynomial
- Circular
- Hyperbolic Tangent (Sigmoid)

# Soft & Hard Margin

The last topic to touch on is soft and hard margins. A hard margins is precisely what you've laearned already, no points may exist inside the margin. However, sometimes if we have outlier points we want to allow them to exist inside the margin and use points that are not the closest to the hyper-plane to be our support vectors. Doing this is called a soft margin.

![svm9.png](attachment:svm9.png)

You can see in the example above that there is a point that exists inside the margin. If we had not allowed this not only would it be difficult to create a hyper-plane but our classification would perform poorly.

THe amount of points you allow to exists inside the margin is something we can define as hyper-parameter.

# Implementing a SVM
We can simply create a new model and call .fit() on our training data

In [20]:
from sklearn import svm

#clf = svm.SVC()
#clf.fit(x_train,y_train)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

In [21]:
# TO score our data we will use a useful tool from the sklearn module
from sklearn import metrics
# y_pred = clf.predict(x_test) # Predict values for our test data
# acc = metrics.accuracy_score(y_test,y_pred) #Test them against our correct values
# print(acc)

0.9385964912280702


# Adding a Kernel
The reason we received such a low accuracy score was we forgot to add a kernel! We need to specify which kernel we should use to increase our accuracy.

Kernel Options:
- linear
- poly
- rbf
- sigmoid
- precomputed

We will use linear for this data-set.

In [12]:
clf = svm.SVC(kernel="linear")

In [14]:
acc

0.9385964912280702

# Changing the Margin
By default our kernel has a soft margin of value 1. This parameter is known as C. We can increase C to give more of a soft margin, we can also decrease it to 0 to make a hard margin. Playing with this value should alter your result slightly.
If you want to play around with some other parameters have a look here. https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

In [19]:
clf = svm.SVC(kernel = "linear", C=2)

# Comparing to KNearest Neighbors
If we want to see how this algorithm runs in comparision to KNN we can run the KNN calssifier on the data-set and compare our accuracy values.
To hange to KNN classifier is quite simple
from sklearn.neighbors import KNeighborsClassifier

clf = KNeighborsClassifier(n_neighbors=11)

Simply change clf to what is above

Note that KNN still does well on this data set but hovers around the 90% mark.

# Final Code


In [22]:
clf = svm.SVC(kernel="linear")
clf.fit(x_train,y_train)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

In [24]:
y_pred = clf.predict(x_test)

acc = metrics.accuracy_score(y_test, y_pred)
print(acc)

0.9824561403508771
