# Support Vector Machines

Support Vector Machines (SVMs) are a powerful supervised learning algorithm used for both **classification** and **regression**. SVMs establish a hyperplane that separates the two classes by maximizing the margin.

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt

from scipy import stats

import random
import seaborn

import numpy as np
import pandas as pd
import pylab as pl

import sklearn

from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_blobs, make_circles
from sklearn.model_selection import train_test_split

from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, f1_score

seaborn.set()

## A Simple Example

In [None]:
X, y = make_blobs(n_samples=60, centers=2, random_state=0, cluster_std=0.60)
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='spring')
plt.xlim(-1, 3.5);

We fit a support vector machine with a linear kernel.

In [None]:
clf = SVC(kernel='linear')
clf.fit(X, y)

We plot the decision boundary. In the following plot the dashed lines touch the *support vectors*, which are stored in the ``support_vectors_`` attribute of the classifier.

In [None]:
def plot_svc_decision_function(clf, ax=None):
    """Plot the decision function for a 2D SVC"""
    if ax is None:
        ax = plt.gca()
    x = np.linspace(plt.xlim()[0], plt.xlim()[1], 30)
    y = np.linspace(plt.ylim()[0], plt.ylim()[1], 30)
    Y, X = np.meshgrid(y, x)
    P = np.zeros_like(X)
    for i, xi in enumerate(x):
        for j, yj in enumerate(y):
            P[i, j] = clf.decision_function([[xi, yj]])
    # plot the margins
    ax.contour(X, Y, P, colors='k', levels=[-1, 0, 1], alpha=0.5, linestyles=['--', '-', '--'])
    
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='spring')
plot_svc_decision_function(clf)

We additoonally highlight the support vectors

In [None]:
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='spring')
plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=200, facecolors='none',edgecolors="black");
plot_svc_decision_function(clf)

The dataset above was non-overlapping (or linearly separable), which means we could come up with a hyperplane that separated the dataset perfectly. Let us now consider a dataset where no perfect separation is possible. In this case the SVM tries to minimize the datapoints lying on the wrong side of the hyperplane. These datapoints are considered support vectors as well.

At first, we generate the datapoints of the first class by sampling from a normal distribution with standard deviation 1.3 and mean (2,4)

In [None]:
num_entries=100
X=np.zeros((2*num_entries,2))

for i in range(0,num_entries):
    X[i,0]=np.random.normal()*1.3+2
    X[i,1]=np.random.normal()*1.3+4
y = num_entries*[0]

Next, we sample the data points from the second class with standard deviation 1.0 and mean (1,0). 

In [None]:
for i in range(num_entries,2*num_entries):
    X[i,0]=np.random.normal()+1
    X[i,1]=np.random.normal()
y2 = num_entries*[1]

Let us combine the class vectors `y` and `y2`

In [None]:
y.extend(y2)

assert len(X) == len(y)

Let us visualize the generated data

In [None]:
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='spring')
_ = plt.xlim(-1, 3.5)

Now fit a linear SVM to find the best separating hyperplane.

In [None]:
clf = SVC(kernel='linear')
clf.fit(X, y)

Let us again visualize the hyperplane and the support vectors.

In [None]:
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='spring')
plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=200, facecolors='none',edgecolors="black");
plot_svc_decision_function(clf)
_ = plt.xlim(-1, 3.5)

## Support Vector Machine with Kernels

Kernels are useful when the decision boundary is not linear. A Kernel is a similarity measure of two data points after projection to some higher dimensional space. Let us generate a data set that is even less linearly separable than the one before.

In [None]:
X_circles, y_circles = make_circles(100, factor=.1, noise=.1)

Create and visualize  a linear SVM and fit it to X and y

In [None]:
clf = SVC(kernel='linear').fit(X_circles, y_circles)

plt.scatter(X_circles[:, 0], X_circles[:, 1], c=y_circles, s=50, cmap='spring')
plot_svc_decision_function(clf);

The kernel called **radial basis function (rbf)** will do the job

In [None]:
clf = SVC(kernel='rbf')
clf.fit(X_circles, y_circles)

In [None]:
plt.scatter(X_circles[:, 0], X_circles[:, 1], c=y_circles, s=50, cmap='spring')
plot_svc_decision_function(clf)
plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=200, facecolors='none');

## Skin Disease Dataset

We want to apply the SVM to segment skin diseases. Each row is an image pixel to which 14 different image filters have been applied (feature engineering, column t0 to t13). The class (target variable) indidates whether the pixel shows healthy skin or a skin disease (labels from medical doctors).

In [None]:
df = pd.read_csv("skin_disease.csv")
df.head()

SVMs are not very fast. In order to save time, we only use 50000 entries for training and validation. We also display a histogram of the target variable and observe that the data is extremely disbalanced. This is why we will use the f1-score for performance measurement below. 

In [None]:
df = df.sample(10000)

_ = df['class'].hist()

Let us split this dataset into training and validation set (we do not need a test set here)

In [None]:
train, valid = train_test_split(df, test_size=0.5)

X_train = train.drop('class', axis=1)
X_valid = valid.drop('class', axis=1)

y_train = train["class"]
y_valid = valid["class"]

We train and evaluate an SVM classifier on this dataset, which can take some minutes. 
Let us first use the `rbf` kernel and a `gamma` value of 0.1.
We measure the f1-score and accuracy on the test set

In [None]:
classifiers = {
    'SVM with RBF kernel' : SVC(kernel='rbf', gamma=0.1),
    'SVM with linear kernel' : SVC(),
    'Decision Tree' : DecisionTreeClassifier(max_depth=5)
}


for name, model in classifiers.items():

    model.fit(X_train, y_train)
    y_pred = model.predict(X_valid)
    f1 = f1_score(y_valid, y_pred)
    
    print ("Performance of {} is {:.3f}:".format(name, f1))


## Playground for Exercises

In [None]:
data = [[0,0,-1], [3,0,-1], [0,2,1], [2,3,1]]
df = pd.DataFrame(data, columns=['x', 'y', 'label'])

_ = plt.scatter(df['x'], df['y'], c=df['label'], s=50, cmap='rainbow')

In [None]:
clf = SVC(kernel='linear').fit(df[['x', 'y']].values, df['label'])

plt.scatter(df['x'], df['y'], c=df['label'], s=50, cmap='rainbow')
plt.quiver([0], [1], [0], [1], angles='xy', scale_units='xy', scale=1)
plot_svc_decision_function(clf);

In [None]:
x = np.linspace(-5,5,100)

plt.scatter(df['x'], df['y'], c=df['label'], s=50, cmap='rainbow')
plt.plot(x, 1/3 * x + 1, '-r', label='y = 1/3 * x + 1')
plt.plot(x, 0*x + 1, '-b', label='y = 1')
plt.quiver([0,0], [1,1], [0,1/3], [1,-1], angles='xy', scale_units='xy', scale=1)

plt.axis((-5,5,-5,5))

plt.show()

In [None]:
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print ("f1 SVM:", f1)
print ("accuracy SVM:", accuracy)