### Codio Activity 16.8: Tuning the `SVC` Classifier

**Expected Time = 60 minutes**

**Total Points = 40**

This activity focuses on tuning the `SVC` classifier parameters to improve its performance using the wine data.  Typically, the `SVC` will need some parameter tuning.  In practice, you will want to be deliberate about the tuning parameters and not be too exhaustive as the grid searches can be energy intensive.  Here, you will compare different kernels and the `gamma` parameter of the classifier.

#### Index

- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)
- [Problem 3](#-Problem-3)
- [Problem 4](#-Problem-4)

In [1]:
import pandas as pd
import seaborn as sns
from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV

In [2]:
from sklearn.linear_model import LogisticRegression
from sklearn.dummy import DummyClassifier
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import confusion_matrix
from sklearn.datasets import load_wine

In [3]:
X, y = load_wine(return_X_y=True, as_frame=True)

In [4]:
y.value_counts(normalize = True)

1    0.398876
0    0.331461
2    0.269663
Name: target, dtype: float64

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42)

[Back to top](#-Index)

### Problem 1

#### Baseline for Classifier

**10 Points**

Below, determine the baseline score for the classifier by using the `DummyClassifier` with the training data.  Score the estimator on the test set and assign this to `baseline_score` below.  **Note**: The `DummyClassifier` works just as all other estimators you have encountered and has a `.fit` and `.score` method.

In [6]:
### GRADED
dummy_clf = ''
baseline_score = ''

    
# YOUR CODE HERE
dummy_clf = DummyClassifier().fit(X_train, y_train)
baseline_score = dummy_clf.score(X_test, y_test)

### ANSWER CHECK
print(baseline_score)

0.4


[Back to top](#-Index)

### Problem 2

#### Default Settings with `SVC`

**10 Points**

Now, fit an `SVC` estimator on the training data with default settings and score this on the test data.  Assign your answer as a float to `svc_defaults` below.

In [7]:
### GRADED
svc = ''
svc_defaults = ''

    
# YOUR CODE HERE
svc = SVC().fit(X_train, y_train)
svc_defaults = svc.score(X_test, y_test)

### ANSWER CHECK
print(svc_defaults)

0.7111111111111111


[Back to top](#-Index)

### Problem 3

#### Grid Searching with `SVC`

**10 Points**

While your svc should improve upon the baseline score, there is possible room for improvement.  Below, grid search the different kernels available with the `SVC` estimator as well as some different `gamma` values using the `params` dictionary below.  Create your grid and use `cv = 5` which is the default.  Assign the score on the test data to `grid_score` below.  

In [8]:
params = {'kernel': ['rbf', 'poly', 'linear', 'sigmoid'], 'gamma': [0.1, 1.0, 10.0, 100.0],}

In [13]:
### GRADED
grid = ''
grid_score = ''

    
# YOUR CODE HERE
grid = GridSearchCV(svc, param_grid=params).fit(X_train, y_train)
grid_score = grid.score(X_test, y_test)

### ANSWER CHECK
print(grid_score)

1.0


[Back to top](#-Index)

### Problem 4

#### Optimal Kernel Function

**10 Points**

Based on your grid search above what is the best performing kernel function?  Assign your answer as a string -- `linear`, `poly`, `rbf`, or `sigmoid` -- to `best_kernel` below.  

In [10]:
### GRADED
best_kernel = ''

    
# YOUR CODE HERE
best_kernel = grid.best_params_['kernel']

### ANSWER CHECK
print(best_kernel)

poly


In [11]:
grid.best_params_

{'gamma': 10.0, 'kernel': 'poly'}

In [14]:
grid.cv_results_

{'mean_fit_time': array([0.00179992, 0.08565207, 0.04360809, 0.00163546, 0.00156436,
        0.07817631, 0.03976512, 0.00140371, 0.00158353, 0.07681193,
        0.0418035 , 0.00119991, 0.00159993, 0.07025833, 0.03973751,
        0.00106564]),
 'std_fit_time': array([0.00074905, 0.06185169, 0.0263322 , 0.00048885, 0.0004665 ,
        0.04250176, 0.02220743, 0.00048735, 0.000478  , 0.05234388,
        0.02320495, 0.00040009, 0.00048971, 0.04216135, 0.02222853,
        0.00013189]),
 'mean_score_time': array([0.00120039, 0.00123472, 0.00120745, 0.00038495, 0.0012259 ,
        0.00123591, 0.00102625, 0.00079441, 0.00119896, 0.00110893,
        0.00121522, 0.00100012, 0.00079989, 0.00122142, 0.00100579,
        0.00080004]),
 'std_score_time': array([3.99615439e-04, 3.84062758e-04, 3.96423171e-04, 4.72064105e-04,
        3.88530223e-04, 4.52808201e-04, 3.15439944e-05, 3.97225473e-04,
        3.97467910e-04, 1.95387329e-04, 3.93520363e-04, 2.78041453e-07,
        3.99947177e-04, 3.90080629e-

In [19]:
list(grid.cv_results_.keys())

['mean_fit_time',
 'std_fit_time',
 'mean_score_time',
 'std_score_time',
 'param_gamma',
 'param_kernel',
 'params',
 'split0_test_score',
 'split1_test_score',
 'split2_test_score',
 'split3_test_score',
 'split4_test_score',
 'mean_test_score',
 'std_test_score',
 'rank_test_score']

In [20]:
grid.cv_results_['mean_test_score']

array([0.3985755 , 0.91025641, 0.90997151, 0.3985755 , 0.3985755 ,
       0.91025641, 0.90997151, 0.3985755 , 0.3985755 , 0.91766382,
       0.90997151, 0.3985755 , 0.3985755 , 0.91766382, 0.90997151,
       0.3985755 ])