![Fixel Algorithms](https://fixelalgorithms.co/images/CCExt.png)

# <center> Machine Learning Methods </center>
## <center> Lecture 5 - Nonlinear Classification</center>
### <center> Kernel SVM - Solution</center>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/MachineLearningMethod/05_NonlinearClassfication/MainGaussianSVM%20-%20Solution.ipynb)

In [1]:
import numpy             as np
import matplotlib.pyplot as plt
import matplotlib

matplotlib.rc('font', **{'size' : 16})

#### Polynomial kernel:
$$
K\left(\boldsymbol{x}_{i},\boldsymbol{x}_{j}\right)=\left(1+\boldsymbol{x}_{i}^{T}\boldsymbol{x}_{j}\right)^{d}
$$

#### Guassian kernel (rbf):
$$K\left(\boldsymbol{x}_{i},\boldsymbol{x}_{j}\right)=\exp\left(-\gamma\left\Vert \boldsymbol{x}_{i}-\boldsymbol{x}_{j}\right\Vert _{2}^{2}\right)=\exp\left(-\frac{1}{2\sigma^{2}}\left\Vert \boldsymbol{x}_{i}-\boldsymbol{x}_{j}\right\Vert _{2}^{2}\right)$$

### Exercise
* Train a kernel SVM (either `poly` or `rbf`) on the breast cancer data: `load_breast_cancer`.
* Use cross validation with 50 folds.
* Find optimal hyper-parameters.
* Can you get better performance than a linear SVM?
* What is your best accuracy (averaging on over the 50 folds)?

In [2]:
from sklearn.datasets import load_breast_cancer

dData = load_breast_cancer()
mX    = dData.data
vY    = dData.target

#-- Normalize data:
mX    = mX - np.mean(mX, axis=0)
mX    = mX / np.std (mX, axis=0)

vY[vY == 0] = -1

In [3]:
import pandas as pd
from   sklearn.svm             import SVC
from   sklearn.model_selection import cross_val_score, KFold

dRes = pd.DataFrame(columns=['C', 'Accuracy'])

for C in np.linspace(1e-3, 2, 10):
    vAccuracy           = cross_val_score(SVC(C=C, kernel='linear'), mX, vY, cv=KFold(50, shuffle=True))
    dRes.loc[len(dRes)] = [C, np.mean(vAccuracy)]

dRes.sort_values(by='Accuracy', ascending=False)


Unnamed: 0,C,Accuracy
1,0.223111,0.976818
4,0.889444,0.975455
3,0.667333,0.975
5,1.111556,0.975
2,0.445222,0.973636
6,1.333667,0.973485
7,1.555778,0.971667
8,1.777889,0.971515
9,2.0,0.971515
0,0.001,0.943788


In [4]:
import pandas as pd
from   sklearn.model_selection import cross_val_score, KFold

dRes = pd.DataFrame(columns=['C', 'P', 'Accuracy'])

for C in np.linspace(.1, 5, 10):
    for P in range(1, 4):
        vAccuracy           = cross_val_score(SVC(C=C, kernel='poly', degree=P), mX, vY, cv=KFold(50, shuffle=True))
        dRes.loc[len(dRes)] = [C, P, np.mean(vAccuracy)]

dRes.sort_values(by='Accuracy', ascending=False)


Unnamed: 0,C,P,Accuracy
27,5.0,1.0,0.979091
6,1.188889,1.0,0.978939
24,4.455556,1.0,0.978636
21,3.911111,1.0,0.977273
12,2.277778,1.0,0.977273
15,2.822222,1.0,0.977121
9,1.733333,1.0,0.97697
18,3.366667,1.0,0.97697
3,0.644444,1.0,0.971364
0,0.1,1.0,0.952727


In [5]:
import pandas as pd
from   sklearn.model_selection import cross_val_score, KFold

dRes = pd.DataFrame(columns=['C', 'σ', 'Accuracy'])

for C in np.linspace(1e-3, 2, 10):
    for σ in np.linspace(3, 6, 10):
        vAccuracy           = cross_val_score(SVC(C=C, kernel='rbf', gamma=1/σ**2), mX, vY, cv=KFold(50, shuffle=True))
        dRes.loc[len(dRes)] = [C, σ, np.mean(vAccuracy)]

dRes.sort_values(by='Accuracy', ascending=False)


Unnamed: 0,C,σ,Accuracy
87,1.777889,5.333333,0.982727
97,2.000000,5.333333,0.982273
96,2.000000,5.000000,0.982273
85,1.777889,4.666667,0.982121
74,1.555778,4.333333,0.981212
...,...,...,...
5,0.001000,4.666667,0.627273
6,0.001000,5.000000,0.627121
4,0.001000,4.333333,0.626818
8,0.001000,5.666667,0.626061
