## Hyperparameter Tuning using Evolutionary Algorithms
Hyperparameter tuning is an essentail part of machine learning. It is  an integral step in building accurate machine learning model. Since the hyperparameters determine the model architecture it is important to use the optimal hyperparameters for best performance, this can be achieved through hyperparameter tuning. Grid Search method and Randomized search method are the two mostly widely used hyperparameter tuning techinques. 

Herein we demonstrate the use of evolutionary algorithm (GA) to perform hyperparameter tuning in sklearn. 

Most evolutionary algorithms start with a population of individuals and each individual represents a set of hyperparameters to use in our machine learning mode. By using some mechanisms that try to emulate the way populations evolve, the algorithm reproduces, mutates, and selects new hyperparameters based on the results of the already tested parameters, using some kind of metric to define its fitness (for example, the cross-validation accuracy) and repeats this process over several generations of individuals.

In [1]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn.svm import SVC
from sklearn.model_selection import StratifiedKFold
import numpy as np
import random

dataset = datasets.load_breast_cancer()

X = dataset.data
Y = dataset.target

# split the dataset in training set and testing set using train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state = 0)


# parameter range 
paramgrid = {"kernel": ["rbf", "linear"],
             "C"     : np.logspace(0, 3, num=10),
             "gamma" : np.logspace(-4, -3, num=4)}

random.seed(1)


from evolutionary_search import EvolutionaryAlgorithmSearchCV
cv = EvolutionaryAlgorithmSearchCV(estimator=SVC(),
                                   params=paramgrid,
                                   scoring="accuracy",
                                   cv=StratifiedKFold(n_splits=4),
                                   verbose=1,
                                   population_size=50,
                                   gene_mutation_prob=0.10,
                                   gene_crossover_prob=0.5,
                                   tournament_size=3,
                                   generations_number=10,
                                   n_jobs=1)
cv.fit(X_train, y_train)

Types [1, 2, 2] and maxint [1, 9, 3] detected
--- Evolve in 80 possible combinations ---
gen	nevals	avg     	min     	max     	std      
0  	50    	0.941758	0.916484	0.958242	0.0126273
1  	32    	0.952352	0.931868	0.958242	0.00611353
2  	28    	0.95556 	0.947253	0.958242	0.00302276
3  	23    	0.957626	0.953846	0.958242	0.00152522
4  	30    	0.957363	0.947253	0.958242	0.00244737
5  	24    	0.957802	0.949451	0.958242	0.00158486
6  	22    	0.958066	0.953846	0.958242	0.000861359
7  	33    	0.956747	0.947253	0.958242	0.00330227 
8  	28    	0.957714	0.949451	0.958242	0.00167726 
9  	27    	0.957758	0.942857	0.958242	0.00229626 
10 	30    	0.957275	0.934066	0.958242	0.00403247 
Best individual is: {'kernel': 'linear', 'C': 1.0, 'gamma': 0.0001}
with fitness: 0.9582417582417583


In [3]:
dir(cv)

['__abstractmethods__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_abc_impl',
 '_check_is_fitted',
 '_check_n_features',
 '_check_refit_for_multimetric',
 '_cv_results',
 '_estimator_type',
 '_fit',
 '_format_results',
 '_get_param_names',
 '_get_tags',
 '_individual_evals',
 '_more_tags',
 '_pairwise',
 '_repr_html_',
 '_repr_html_inner',
 '_repr_mimebundle_',
 '_required_parameters',
 '_run_search',
 '_validate_data',
 'all_history_',
 'all_logbooks_',
 'best_estimator_',
 'best_index_',
 'best_mem_params_',
 'best_mem_score_',
 'best_params_',
 'best_score_',
 'classes_',
 'cv',
 'cv_results_',
 'decision_function',
 'error_s

In [4]:
cv.best_params_

{'kernel': 'linear', 'C': 1.0, 'gamma': 0.0001}

In [5]:
cv.best_score_

0.9582417582417583