## Improve Performance with Algorithm Tuning



Algorithm tuning is a final step in the process of applied machine learning before  finalizing your
model. It is sometimes called hyperparameter optimization where the algorithm parameters
are referred to as hyperparameters, whereas the coefficients found by the machine learning
algorithm itself are referred to as parameters. Optimization suggests the search-nature of the
problem. Phrased as a search problem, you can use different search strategies to find a good and
robust parameter or set of parameters for an algorithm on a given problem. Python scikit-learn
provides two simple methods for algorithm parameter tuning:

 - Grid Search Parameter Tuning
 - Random Search Parameter Tuning.


### Grid Search Parameter Tuning
Grid search is an approach to parameter tuning that will methodically build and evaluate a
model for each combination of algorithm parameters specified in a grid. You can perform a grid search using the [GridSearchCV class](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html). The example below evaluates different alpha values for
the Ridge Regression algorithm on the standard diabetes dataset. This is a one-dimensional
grid search.


In [None]:
# Load libraries
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')
warnings.filterwarnings('ignore', category=DeprecationWarning)

In [1]:
# Grid Search for Algorithm Tuning
import numpy
from pandas import read_csv
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV
filename = 'pima-indians-diabetes.data.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = read_csv(filename, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
alphas = numpy.array([1,0.1,0.01,0.001,0.0001,0])
param_grid = dict(alpha=alphas)
model = Ridge()
grid = GridSearchCV(estimator=model, param_grid=param_grid)
grid.fit(X, Y)
print(grid.best_score_)
print(grid.best_estimator_.alpha)

0.2761084412929244
1.0


Running the example lists out the optimal score achieved and the set of parameters in the
grid that achieved that score. In this case the alpha value of 1.0.


### Random Search Parameter Tuning
Random search is an approach to parameter tuning that will sample algorithm parameters from
a random distribution (i.e. uniform) for a fixed number of iterations. A model is constructed
and evaluated for each combination of parameters chosen. You can perform a random search
for algorithm parameters using the [RandomizedSearchCV class](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html). The example below evaluates
different random alpha values between 0 and 1 for the Ridge Regression algorithm on the
standard diabetes dataset. A total of 100 iterations are performed with uniformly random alpha
values selected in the range between 0 and 1 (the range that alpha values can take).

In [2]:
# Randomized for Algorithm Tuning
import numpy
from pandas import read_csv
from scipy.stats import uniform
from sklearn.linear_model import Ridge
from sklearn.model_selection import RandomizedSearchCV
filename = 'pima-indians-diabetes.data.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = read_csv(filename, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
param_grid = {'alpha': uniform()}
model = Ridge()
rsearch = RandomizedSearchCV(estimator=model, param_distributions=param_grid, n_iter=100,
random_state=7)
rsearch.fit(X, Y)
print(rsearch.best_score_)
print(rsearch.best_estimator_.alpha)

0.2761075573402854
0.9779895119966027


Running the example produces results much like those in the grid search example above. An
optimal alpha value near 1.0 is discovered.