<a href="https://colab.research.google.com/github/WelfLowe/ML4developers/blob/main/5_Kernel_Methods_and_SVMs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SVMs and parameter selection

Here the entry point into the corresponding [sklearn documentation](https://scikit-learn.org/stable/modules/cross_validation.html).

Import necessary libraries.

In [42]:
import pandas as pd
from sklearn import datasets
from sklearn import svm
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV

Load Iris.

In [48]:
iris = datasets.load_iris()
X = iris.data
y = iris.target
X[0:5,:]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2]])

SVM classifiers are (in contrast to other classifiers) sensitive to different value ranges of the predictors. Hence, it is suggested to normalize the predictors. 

**OBS**, as a matter of fact, in the Iris dataset normalization does not help actually. Find out by commenting out the below line and rerunning the notebook.

In [49]:
X = preprocessing.normalize(X)
X[0:5,:]

array([[0.80377277, 0.55160877, 0.22064351, 0.0315205 ],
       [0.82813287, 0.50702013, 0.23660939, 0.03380134],
       [0.80533308, 0.54831188, 0.2227517 , 0.03426949],
       [0.80003025, 0.53915082, 0.26087943, 0.03478392],
       [0.790965  , 0.5694948 , 0.2214702 , 0.0316386 ]])

Train an SVM model with fixed hyperparameters and assess the models using train-test splitting.

In [55]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
clf = svm.SVC(kernel='linear', C=1).fit(X_train, y_train)
score = clf.score(X_test, y_test)
score

0.6333333333333333

This suggests that the classifier is quite bad.

Train SVM models with the same fixed hyperparameters and assess the models using 5-fold cross validation.

In [56]:
clf = svm.SVC(kernel='linear', C=1, random_state=42)
scores = cross_val_score(clf, X, y, cv=5) #
scores

array([1.        , 0.93333333, 0.93333333, 0.93333333, 1.        ])

Calculating the mean score of all 5 folds (and its standard deviation).

In [57]:
print("%0.2f accuracy with a standard deviation of %0.2f" % (scores.mean(), scores.std()))

0.96 accuracy with a standard deviation of 0.03


Obviously, the train-test split is random and the accuracy varies with it. Averaging over several splits, as done with cross validation, reduces the uncertainty in the accuracy estimation. 

Still, we need to guess the hyperparamters, here the $C$ hyperparamters, i.e., the penalty parameter of the error term, and the selected kernel, each of which may or may not come with further hyperparameters, here $\gamma$. 

In [58]:
parameters = {'kernel':['linear', 'rbf', 'poly', 'sigmoid'], 'C':[0.1, 1, 10, 100], 'gamma': [1,0.1,0.01,0.001]}
svc = svm.SVC()
clf = GridSearchCV(svc, parameters)
clf.fit(X, y)
GridSearchCV(estimator=svc, param_grid=parameters)
sorted(clf.cv_results_.keys())

['mean_fit_time',
 'mean_score_time',
 'mean_test_score',
 'param_C',
 'param_gamma',
 'param_kernel',
 'params',
 'rank_test_score',
 'split0_test_score',
 'split1_test_score',
 'split2_test_score',
 'split3_test_score',
 'split4_test_score',
 'std_fit_time',
 'std_score_time',
 'std_test_score']

In [59]:
df = pd.DataFrame.from_dict(clf.cv_results_)
df[[
 'mean_test_score',
 'param_C',
 'param_kernel',
 'param_gamma',
 'std_test_score']].sort_values([
 'mean_test_score',
 'std_test_score',
 'param_kernel'],ascending=False).head(15)

Unnamed: 0,mean_test_score,param_C,param_kernel,param_gamma,std_test_score
35,0.973333,10.0,sigmoid,1.0,0.03266
51,0.973333,100.0,sigmoid,1.0,0.03266
33,0.973333,10.0,rbf,1.0,0.024944
53,0.973333,100.0,rbf,0.1,0.024944
55,0.966667,100.0,sigmoid,0.1,0.029814
34,0.966667,10.0,poly,1.0,0.029814
32,0.966667,10.0,linear,1.0,0.029814
36,0.966667,10.0,linear,0.1,0.029814
40,0.966667,10.0,linear,0.01,0.029814
44,0.966667,10.0,linear,0.001,0.029814


The linear kernel with $C=1$ is not among the champions, so we did not so good in the initial fixed setting.

Recall the definition of [kernel functions](https://scikit-learn.org/dev/modules/svm.html#svm-kernels). The parameter $\gamma$ is not used in the linear kernel. Since $\gamma$ does not matter here, the best linear kernels with $C=10$ comes in four equally good paramerizations with $\gamma \in \{1,0.1,0.01,0.001\}$. Testing them adds to training time and is the drawback of using the grid search library instead of programming nested loops manually.