# Parameter Tuning

Once you have your model selected, the next step in the machine learning process is to use hyper parameter tuning to optimize the model.


## Grid Search
One way of doing so is to use grid search. Grid search is brute force technique that looks through all the possible combinations of hyperparameter to make a selection. It is quite easy to write the implementation yourself, but the wrapper has some other features like automatic model evaluation to make an informed choice. 

CV is short for cross-validated. Its a more generalized sampling and comparison technique to make the evaluation progress more robust.


### Example 1

Basic usage

In [32]:
from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
import pandas as pd

# data
iris = datasets.load_iris()

# select the best hyper parameters
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
svc = svm.SVC(gamma='scale')
clf = GridSearchCV(svc, parameters, cv=5, n_jobs=-1, return_train_score=False)
clf.fit(iris.data, iris.target)

# results from the grid search using the estimators scoring metric
pd.DataFrame(clf.cv_results_)

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_C,param_kernel,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.000901,0.000102,0.000377,2.4e-05,1,linear,"{'C': 1, 'kernel': 'linear'}",0.966667,1.0,0.966667,0.966667,1.0,0.98,0.01633,1
1,0.001237,0.001086,0.000331,1.6e-05,1,rbf,"{'C': 1, 'kernel': 'rbf'}",0.966667,1.0,0.966667,0.966667,1.0,0.98,0.01633,1
2,0.000809,0.000452,0.000324,3.4e-05,10,linear,"{'C': 10, 'kernel': 'linear'}",1.0,1.0,0.9,0.966667,1.0,0.973333,0.038873,4
3,0.000619,3.3e-05,0.000324,2e-05,10,rbf,"{'C': 10, 'kernel': 'rbf'}",0.966667,1.0,0.966667,0.966667,1.0,0.98,0.01633,1


In [53]:
# we can also obtain the models best parameters
clf.best_params_

{'C': 1, 'kernel': 'linear'}

### Example 2

Alternatively, you can define multiple scoring metrics

In [54]:
from sklearn.metrics import fbeta_score, make_scorer
from sklearn.model_selection import GridSearchCV


ftwo_scorer = make_scorer(fbeta_score, beta=2)
scoring = {'accuracy': 'accuracy', 'balanced_accuracy': 'balanced_accuracy'}

svc = svm.SVC(gamma='scale')
# clf = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]}, scoring=scoring, cv=5)
clf = GridSearchCV(svc, parameters, cv=5, n_jobs=1, return_train_score=False, scoring=scoring, refit='balanced_accuracy')
clf.fit(iris.data, iris.target)
clf.best_params_

{'C': 1, 'kernel': 'linear'}

### Example 3

You can also define your own custom scoring function.

In [56]:
import numpy as np


def my_custom_loss_func(y_true, y_pred):
    diff = np.abs(y_true - y_pred).max()
    return np.log1p(diff)
score = make_scorer(my_custom_loss_func, greater_is_better=False)


svc = svm.SVC(gamma='scale')
clf = GridSearchCV(svc, parameters, cv=5, n_jobs=1, return_train_score=False, scoring=score)
clf.fit(iris.data, iris.target)
clf.best_params_

{'C': 10, 'kernel': 'linear'}