# Lab 08: Support Vector Machine

In this lab we will explore SVMs on a small classical data set of machine learning, the Iris data set.

In [None]:
%pylab inline

## Importing data

In [None]:
from sklearn.datasets import load_iris
iris = load_iris()

Description of the Iris data set:

In [None]:
print(iris.DESCR)

In [None]:
# Features name
iris.feature_names

## Linear SVM

Let us for now use only **two classes**: setosa and virginica, and **two features** (for better visualization): sepal length and sepal width. Let us train a **linear SVM** and plot the separating hyperplane (a line in 2D).

We'll use the [SVC](http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC) class of the svm module of scikit-learn.

In [None]:
from sklearn import svm

In [None]:
# select data for 2 classes and 2 features
X = iris.data[iris.target!=1, :2]
print("X shape:", X.shape)
y = iris.target[iris.target!=1]
print("y shape:", y.shape)

# initialize a model
clf = svm.SVC(kernel='linear', C=1000)

# fit the model
clf.fit(X, y)

Let us plot the separating hyperplane.

In [None]:
plt.figure(figsize=(10, 10))

# plot the point cloud
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap=plt.cm.Paired)

# get frame limits
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()

# visualize support vectors with a cross
ax.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], 
           s=50, linewidth=1, marker='x', color='k')

# create grid to evaluate model
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = clf.decision_function(xy).reshape(XX.shape)

# plot decision boundary and margins
ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], 
           alpha=0.5, linestyles=['--', '-', '--'])

# format the plot
plt.xticks(fontweight="bold", fontsize=15)
plt.yticks(fontweight="bold", fontsize=15)
plt.tight_layout()

plt.show()

__Question:__ Where are support vectors located?

Let us now check the classifier's performance.

In [None]:
print(clf.score(X, y))

__Question :__ Which performance metric is computed by `clf.score`? PLease use the [documentation](http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC.score). What does a performance of 1.0 mean?

Let us now consider versicolour vs Virginica.

In [None]:
# select data for 2 classes and 2 features
X = iris.data[iris.target!=0, :2]
print("X shape:", X.shape)
y = iris.target[iris.target!=0]
print("y shape:", y.shape)

# initialize a model
clf = svm.SVC(kernel='linear', C=1000)

# fit the model
clf.fit(X, y)

In [None]:
plt.figure(figsize=(10, 10))

# plot the point cloud
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap=plt.cm.Paired)

# get frame limits
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()

# visualize support vectors with a cross
ax.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], 
           s=50, linewidth=1, marker='x', color='k')

# create grid to evaluate model
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = clf.decision_function(xy).reshape(XX.shape)

# plot decision boundary and margins
ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], 
           alpha=0.5, linestyles=['--', '-', '--'])

# format the plot
plt.xticks(fontweight="bold", fontsize=15)
plt.yticks(fontweight="bold", fontsize=15)
plt.tight_layout()

plt.show()

__Question :__  Wher are the support vectors?

__Question:__ What is the model's performance?

In [None]:
print(clf.score(X, y))

Let us now see if we can get a better performance with a non-linear kernel.

## Kernel SVM

Let us use a **RBF Gaussian kernel**, for several values of the gamma parameter. Here is its equation:

\begin{align}
k(x, x') = \exp\bigg[-\frac{||x - x'||^2}{2 \sigma^2}\bigg]
\end{align}

__Question:__ What does the gamma parameter mean?

Let us try a grid of gamma parameters.

In [None]:
# select data for Versicolour and Virginica classes
X = iris.data[iris.target!=0, :2]
y = iris.target[iris.target!=0]

# gamma values
gamma_range = np.linspace(0.1, 50, 20)

for param in gamma_range:
    clf = svm.SVC(kernel='rbf', C=0.01, gamma=param)
    clf.fit(X, y)
    score = clf.score(X, y)
    print("gamma: {0:.2f} | score: {1:.2f}".format(param, score))

__Question :__ Now plot the separating hyperplane for the best of these models.

__Question :__ Who are the support vectors? Do you think the model will generalize well?

We will check the generalization ability of this model using a train/test split.

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
# split the dataset between train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=.4, 
                                                    random_state=56)

# test the performance for different values of gamma
acc_train, acc_test = list(), list()
gamma_range = np.linspace(0.1, 50, 20)
for param in gamma_range:
    clf = svm.SVC(kernel='rbf', C=0.01, gamma=param)
    clf.fit(X_train, y_train)
    acc_train.append(clf.score(X_train, y_train))
    acc_test.append(clf.score(X_test, y_test))

In [None]:
plt.figure(figsize=(10, 5))

# plot train and test scores for different gamma values
plt.plot(gamma_range, acc_train, label='train set', lw=4)
plt.plot(gamma_range, acc_test, label='test set', lw=4)

# add a legend
plt.legend(loc='best', fontsize=12)

# format the plot
plt.xlabel("Gamma", fontweight="bold", fontsize=20)
plt.ylabel("Performance", fontweight="bold", fontsize=20)
plt.xticks(fontweight="bold", fontsize=15)
plt.yticks(fontweight="bold", fontsize=15)
plt.tight_layout()

plt.show()

__Question:__ Do you observe overfitting?

Let us perform a __cross-validation__ on the training set to set the value of gamma, using [GridSearchCV](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html).

In [None]:
from sklearn.model_selection import GridSearchCV
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

In [None]:
# define a set of parameter to test
parameters = {'kernel':('linear', 'rbf'), 
              'C':[0.1, 1, 10]}

# initialize a model
svc = svm.SVC()

# initialize cross validation
clf = GridSearchCV(estimator=svc, 
                   param_grid=parameters,
                   cv=5)

# run the cross validation using train dataset
clf.fit(X_train, y_train)

In [None]:
plt.figure(figsize=(10, 3))

# format results from gridsearch
scores = clf.cv_results_['mean_test_score'].reshape(len(parameters['kernel']), len(parameters['C']))

# plot performance scores
plt.imshow(scores, interpolation='none')
#plt.imshow(scores, interpolation='none', cmap="RdBu_r", vmin=0, vmax=1)

# add a colorbar
plt.colorbar()

# format the plot
plt.title("Score", fontweight="bold", fontsize=20)
plt.xlabel("C", fontweight="bold", fontsize=18)
plt.ylabel("Noyau", fontweight="bold", fontsize=18)
plt.ylim((-0.5, 1.5))
plt.xticks(np.arange(len(parameters['C'])), parameters['C'], fontsize=15)
plt.yticks(np.arange(len(parameters['kernel'])), parameters['kernel'], rotation=90, fontsize=15)
plt.tight_layout()

plt.show()

__Question:__ What is the role of C and what happens when C grows?

Let us now select the best SVM hyperparameters for classifying with all 4 of the available features.

__Question :__ ETrain an SVM classifier to separate setosa from virginica, then versicolour from virginica. Use a cross-validation on the train set. What is your optimal model and how does it perform on the test set?

### 1) Setosa vs virginica

In [None]:
# Réponse

### 2) Versicolour vs virginica

In [None]:
# Réponse

__Question:__ HOw would you build a multi-class classifier, using SVMs, to classify samples between setosa, virginica and versicolour?