# Unfamiliar SVMs with a Familiar Dataset

To motivate our basic understandings of SVMs, lets explore SVMs through our favorite iris dataset.

In [None]:
import numpy as np
import pandas as pd
from sklearn import svm
from sklearn import datasets

import matplotlib.pyplot as plt
import seaborn as sb
%matplotlib inline

In [None]:
iris = datasets.load_iris()

In [None]:
iris.feature_names

In [None]:
X = iris.data
X

In [None]:
y = iris.target
y

Pretty standard procedure so far. We've created a design matrix X, and a vector of responses y.

We'll also fit, predict, score, etc. in the same way as other sklearn models. Let's take a look at the SVC arguments...

In [None]:
# Let's look at some of the parameters
model = svm.SVC()

In [None]:
model.fit(X,y)

In [None]:
model.predict(X)

In [None]:
# Pretty good, though we kind of expected this
model.score(X,y)

In [None]:
# How many support vectors exist for each class
model.n_support_

In [None]:
# Indices of the support vectors
model.support_

In [None]:
# Locations of the support vectors
model.support_vectors_

# Plotting Decision Boundaries

We are going to explore what happens to the **decision boundary** and **support vectors** as we change C, kernel, and kernel parameters.

In [None]:
X = iris.data[:,[0,2]]
y = iris.target

In [None]:
model = svm.SVC()
model.fit(X,y)
print "# of Support Vectors Per Class:", model.n_support_

In [None]:
# Create a grid of points to predict over
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1), np.arange(y_min, y_max, 0.1))

In [None]:
Z = model.predict(np.c_[xx.ravel(),yy.ravel()])
Z = Z.reshape(xx.shape)


plt.contourf(xx,yy,Z,alpha=0.4)
plt.scatter(X[:,0],X[:,1],c=y,alpha=0.8)
plt.xlabel(iris.feature_names[0]), plt.ylabel(iris.feature_names[2])
plt.title("Iris Decision Boundaries for SVM")

# Exercise

1. Use the "rbf" (radial basis function) kernel in creating svm.SVC classifier. This is the gaussian drop off we learned.
  1. Modulate the gamma parameter from .001 to 100. Observe what happens to the number of support vectors as well as the shape of the decision boundaries as you increase gamma?
  2. How does this relate to our conversation of underfitting/overfitting?
2. Using the same "rbf" kernel, do not supply an argument for gamma. This automatically sets gamma = 1 / n_features.
  1. Now, modulate "C" from .1 to 1 to 1000. What happens to the decision boundaries? C is the penalty parameter of the errors. High C tells the SVM to work harder to find a less imperfect boundary.
3. If you finish early, explore using the "poly" (polynomial degree--requires the "degree" parameter), as well as the "linear" kernel.

There are also other SVM classifiers within sklearn including LinearSVC. LinearSVC is slightly different from SVM with a linear kernel.

More information here: http://scikit-learn.org/stable/modules/svm.html#svm

# Hyperparameter search

Ah, so we've seen that in the case of "rbf" or radial basis kernels, we have to modulate the C parameter of the SVM, which modulates how much errors should be avoided. We also have to modulate gamma, which dictates the width of the gaussian kernels.

We're going to use GridSearchCV to search for both parameters. In addition, we're going to finalize our example. Remember that SVM uses distance features but we have not normalized any of our features!! We'll start doing that

In [None]:
from sklearn.cross_validation import StratifiedShuffleSplit
from sklearn.grid_search import GridSearchCV

from sklearn.preprocessing import StandardScaler

In [None]:
X = iris.data
y = iris.target
X[0:5,0:2]

In [None]:
scaler = StandardScaler()
X = scaler.fit_transform(X)
X[0:5,0:2]

In [None]:
# Let's confirm...
print np.mean(X,axis=0)
print np.var(X,axis=0)

In [None]:
# We're going to create a space of C and gamma values to grid search over
# And then map it to a dictionary
C_range = np.logspace(-3, 10, 13)
gamma_range = np.logspace(-13, 3, 13)
param_grid = dict(gamma=gamma_range, C=C_range)

In [None]:
cv = StratifiedShuffleSplit(y, n_iter=5, test_size=0.2)
grid = GridSearchCV(svm.SVC(), param_grid=param_grid, cv=cv)
grid.fit(X, y)

In [None]:
print "Best Params:", grid.best_params_
print "Best Score:", grid.best_score_

In [None]:
# Let's see how the score changes over range of C and gamma
scores = [x[1] for x in grid.grid_scores_]
scores = np.array(scores).reshape(len(C_range), len(gamma_range))

In [None]:
sb.heatmap(scores,xticklabels=C_range,yticklabels=gamma_range,cmap=plt.cm.Reds)