# Support Vector Machines 


## The Data
In this Exercise we will use the famous [Iris flower data set](http://en.wikipedia.org/wiki/Iris_flower_data_set). 

The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor), so 150 total samples. Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters.

Here's a picture of the three different Iris types:

In [None]:
# The Iris Setosa
from IPython.display import Image
url = 'http://upload.wikimedia.org/wikipedia/commons/5/56/Kosaciec_szczecinkowaty_Iris_setosa.jpg'
Image(url,width=300, height=300)

In [None]:
# The Iris Versicolor
from IPython.display import Image
url = 'http://upload.wikimedia.org/wikipedia/commons/4/41/Iris_versicolor_3.jpg'
Image(url,width=300, height=300)

In [None]:
# The Iris Virginica
from IPython.display import Image
url = 'http://upload.wikimedia.org/wikipedia/commons/9/9f/Iris_virginica.jpg'
Image(url,width=300, height=300)

The iris dataset contains measurements for 150 iris flowers from three different species.

The three classes in the Iris dataset:

    Iris-setosa (n=50)
    Iris-versicolor (n=50)
    Iris-virginica (n=50)

The four features of the Iris dataset:

    sepal length in cm
    sepal width in cm
    petal length in cm
    petal width in cm

## Get the data

**Use seaborn to get the Iris data as we have done in the Seaborn Lab.**

In [None]:
import seaborn as sns
iris = sns.load_dataset('iris')

## Data Visualization
**Import some libraries you think you will need for data visualization.**

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

** Create a pairplot of the dataset. Which flower species seems to be the most separable?**

In [None]:
# Setosa is the most separable. 
sns.pairplot(iris,hue='species',palette='Dark2')

**Create a kde plot of sepal_length versus sepal width for setosa species of flower.**

In [None]:
setosa = iris[iris['species']=='setosa']
sns.kdeplot( setosa['sepal_width'], setosa['sepal_length'],
                 cmap="plasma", shade=True, shade_lowest=False)

## Train Test Split

** Split your data into a training set and a test set.**

In [None]:
X = iris.drop('species',axis=1)
y = iris['species']

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)

## Train a Model

**Call the SVC() model from sklearn and fit the model to the training data.**

In [None]:
from sklearn.svm import SVC
svc_model = SVC()
svc_model.fit(X_train,y_train)

## Model Evaluation

** Get predictions from the model and create a confusion matrix and a classification report and check the accuracy score.**

In [None]:
predictions = svc_model.predict(X_test)

In [None]:
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score

In [None]:
# Accuracy matrix
print(confusion_matrix(y_test,predictions))

In [None]:
# Test accuracy score
print('Test accuracy score: '+ str(accuracy_score(y_test,predictions)))

In [None]:
# Classification report
print(classification_report(y_test,predictions))

## Gridsearch Practice

A standard SVM seeks to find a margin that separates all positive and negative examples. However, this can lead to poorly fit models if any examples are mislabeled or extremely unusual.
To account for this, in 1995, Cortes and Vapnik proposed the idea of a "soft margin" SVM that allows some examples to be "ignored" or placed on the wrong side of the margin; this innovation often leads to a better overall fit. 
C is the parameter for the soft margin cost function, which controls the influence of each individual support vector; this process involves trading error penalty for stability.
A standard SVM is a type of linear classification using dot product. However, in 1992, Boser, Guyan, and Vapnik proposed a way to model more complicated relationships by replacing each dot product with a nonlinear kernel function (such as a Gaussian radial basis function or Polynomial kernel). $\gamma$ is the free parameter to handle non-linear classification of the Gaussian radial basis function.

Let's tune the hyper-parameters. Before digging into these sections, first analyze a little bit more in detail from a theoretical point of view what is the Kernel for SVM, what kind of Kernel is possible to use and what is the role of the parameters C and $\gamma$ (where is needed).

** Import `GridsearchCV` from Scikit-Learn. Check the documentation to understand how it works.**

In [None]:
from sklearn.model_selection import GridSearchCV

**Create a dictionary called param_grid and fill out some parameters for C and $\gamma$.**

In [None]:
param_grid = {'C': [0.1,1, 10, 100], 'gamma': [1,0.1,0.01,0.001]} 

** Create a GridSearchCV object and fit it to the training data. **

In [None]:
grid = GridSearchCV(SVC(),param_grid,refit=True,verbose=2)
grid.fit(X_train,y_train)

** Now take that grid model and create some predictions using the test set and create classification reports, test accuracy and confusion matrix. **

In [None]:
grid_predictions = grid.predict(X_test)

In [None]:
# Confusion matrix
print(confusion_matrix(y_test,grid_predictions))

In [None]:
# Test accuracy score
print('Test accuracy score: '+ str(accuracy_score(y_test,grid_predictions)))

In [None]:
# Classification report
print(classification_report(y_test,grid_predictions))

** Were you able to improve? What can you conclude from these results? **

You should have done about the same or exactly the same, this makes sense, there is basically just one point that is too noisey to grab, which makes sense, we don't want to have an overfit model that would be able to grab that.

## Gridsearch Extra 

** Now try to tune also the kernel type. Investigate a little bit on which type of kernel you can use and how they work. Repeat the previous step of the GridSearch but changing the parameters grid by adding also the kernel information.**

In [None]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

In [None]:
param_grid = [
  {'C': [1, 10, 100, 1000], 'kernel': ['linear']},
  {'C': [0.1, 1, 10, 100], 'gamma': [0.1, 1, 10], 'kernel': ['poly', 'rbf']},
 ]

** Create a GridSearchCV object and fit it to the training data and check the best estimator.**

In [None]:
svc_model = SVC()
svc_grid = GridSearchCV(SVC(),param_grid,refit=True,verbose=2)
svc_grid.fit(X_train,y_train)

In [None]:
svc_best = svc_grid.best_estimator_
print(svc_best)

In [None]:
grid_pred = svc_grid.predict(X_test)

In [None]:
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score

# Confusion matrix
print(confusion_matrix(y_test,grid_pred))

In [None]:
# Test accuracy score
print('Test accuracy score: '+ str(accuracy_score(y_test,grid_pred)))

In [None]:
# Classification report
print(classification_report(y_test,grid_pred))

## Decision Boundaries Visualization

We will import again the Iris dataset but this time from Scikit-Learn.

In [None]:
from sklearn import svm, datasets
import numpy as np
import matplotlib.pyplot as plt
iris = datasets.load_iris()

#### Take the first two features of the dataset, the output is already created for you instead.

In [None]:
X = iris.data[:, :2]
y = iris.target

**Create three SVC instance: the first with linear kernel, the second with rbf kernel and $\gamma= 0.7$, the third with polynomial kernel of degree 3. All the three models with parameter C=0.1. **

**Then fit the three models.**

In [None]:
C = 1.0  # SVM regularization parameter
models = (svm.SVC(kernel='linear', C=C),
          svm.SVC(kernel='rbf', gamma=0.7, C=C),
          svm.SVC(kernel='poly', degree=3, C=C))
models = (clf.fit(X, y) for clf in models)

In [None]:
def make_meshgrid(x, y, h=.02):
    """Create a mesh of points to plot in

    Parameters
    ----------
    x: data to base x-axis meshgrid on
    y: data to base y-axis meshgrid on
    h: stepsize for meshgrid, optional

    Returns
    -------
    xx, yy : ndarray
    """
    x_min, x_max = x.min() - 1, x.max() + 1
    y_min, y_max = y.min() - 1, y.max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    return xx, yy


def plot_contours(ax, clf, xx, yy, **params):
    """Plot the decision boundaries for a classifier.

    Parameters
    ----------
    ax: matplotlib axes object
    clf: a classifier
    xx: meshgrid ndarray
    yy: meshgrid ndarray
    params: dictionary of params to pass to contourf, optional
    """
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    out = ax.contourf(xx, yy, Z, **params)
    return out

#### Fill in the #TO DOs to complete the plot.

In [None]:
titles = ('SVC with linear kernel',
          'SVC with RBF kernel',
          'SVC with polynomial (degree 3) kernel')

# Set-up 2x2 grid for plotting.
fig, sub = plt.subplots(2, 2)
plt.subplots_adjust(wspace=0.4, hspace=0.4)

X0, X1 = X[:, 0], X[:, 1]
xx, yy = make_meshgrid(X0, X1)


for clf, title, ax in zip(models, titles, sub.flatten()):
    plot_contours(ax, clf, xx, yy,
                  cmap=plt.cm.coolwarm, alpha=0.8)
    ax.scatter(X0, X1, c=y, cmap=plt.cm.coolwarm, s=20, edgecolors='k')
    ax.set_xlim(xx.min(), xx.max())
    ax.set_ylim(yy.min(), yy.max())
    ax.set_xlabel('Sepal length')
    ax.set_ylabel('Sepal width')
    ax.set_xticks(())
    ax.set_yticks(())
    ax.set_title(title)

plt.show()