# *Deep Learning* - Part I

## Theory

Two important figures from Chapter 5:

![](img/fig.5.3.png)

![](img/fig5.6.png)

## Practical

Training an SVM in [scikit-learn](http://scikit-learn.org/stable/) and choosing its hyperparameters using cross-validation. We are using a polynomial kernel and are tuning the polynomial degree of the kernel:

$
\kappa(\mathbf{u}, \mathbf{v}) = (\mathbf{u}^T \mathbf{v} + c)^d
$

We are using the Iris flower data set first introduced by Ronald Fisher https://en.wikipedia.org/wiki/Iris_flower_data_set which contains:

- 50 samples
- 4 features (Sepal length, Sepal width, Petal length, Petal width)
- 3 classes

In [1]:
import numpy as np
import pandas as pd
from sklearn import svm, datasets
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV, train_test_split

In [2]:
# load iris data
iris = datasets.load_iris()
X = iris.data
y = iris.target

In [3]:
X[:3]

array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2]])

In [4]:
y[:3]

array([0, 0, 0])

Randomly select 20% of the samples as test set.

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Using cross-validation, try out $d=1,2,\ldots,20$.
Use accuracy to determine the train/test error.

In [6]:
parameters = {'degree':list(range(1, 21))}
svc = svm.SVC(kernel='poly')
clf = GridSearchCV(svc, parameters, scoring='accuracy')
clf.fit(X_train, y_train)

GridSearchCV(cv=None, error_score='raise',
       estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='poly',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
       fit_params={}, iid=True, n_jobs=1,
       param_grid={'degree': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring='accuracy', verbose=0)

The cross-validation results can be loaded into a [pandas](http://pandas.pydata.org/) DataFrame. We see that the model starts overfitting for polynomial degrees $>3$.

In [7]:
pd.DataFrame(clf.cv_results_)

Unnamed: 0,mean_fit_time,mean_score_time,mean_test_score,mean_train_score,param_degree,params,rank_test_score,split0_test_score,split0_train_score,split1_test_score,split1_train_score,split2_test_score,split2_train_score,std_fit_time,std_score_time,std_test_score,std_train_score
0,0.000769,0.000468,0.958333,0.974996,1,{'degree': 1},1,0.97561,0.962025,0.9,1.0,1.0,0.962963,0.00017,0.000123,0.042432,0.017685
1,0.000608,0.000312,0.95,0.979216,2,{'degree': 2},3,0.95122,0.974684,0.9,1.0,1.0,0.962963,9e-06,1.6e-05,0.040575,0.015456
2,0.000837,0.000414,0.958333,0.98755,3,{'degree': 3},1,0.97561,0.987342,0.9,1.0,1.0,0.975309,0.000204,0.000106,0.042432,0.010081
3,0.01244,0.000452,0.933333,0.99177,4,{'degree': 4},11,0.95122,1.0,0.9,1.0,0.948718,0.975309,0.013862,4.9e-05,0.023592,0.01164
4,0.016245,0.000461,0.933333,1.0,5,{'degree': 5},11,0.95122,1.0,0.9,1.0,0.948718,1.0,0.011013,5.4e-05,0.023592,0.0
5,0.013771,0.000355,0.933333,1.0,6,{'degree': 6},11,0.95122,1.0,0.9,1.0,0.948718,1.0,0.010346,3.9e-05,0.023592,0.0
6,0.012751,0.000335,0.941667,1.0,7,{'degree': 7},9,0.97561,1.0,0.9,1.0,0.948718,1.0,0.010838,4.5e-05,0.031441,0.0
7,0.024844,0.000338,0.95,1.0,8,{'degree': 8},3,0.97561,1.0,0.925,1.0,0.948718,1.0,0.026482,5.7e-05,0.020808,0.0
8,0.035246,0.000426,0.95,1.0,9,{'degree': 9},3,0.97561,1.0,0.925,1.0,0.948718,1.0,0.039927,1e-05,0.020808,0.0
9,0.047215,0.000335,0.95,1.0,10,{'degree': 10},3,0.97561,1.0,0.925,1.0,0.948718,1.0,0.058638,5.6e-05,0.020808,0.0


Finally, train the model with lowest mean test error in cross-validation on all training data and determine the error on the test set.

In [8]:
e = clf.estimator.fit(X_train, y_train)
e

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='poly',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [9]:
y_pred = e.predict(X_test)
accuracy_score(y_test, y_pred)

1.0