## Hyperparamter Tuning

In [56]:
data = pd.read_csv('Mobile_Price_train.csv')

In [57]:
data.head()

Unnamed: 0,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,...,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi,price_range
0,842,0,2.2,0,1,0,7,0.6,188,2,...,20,756,2549,9,7,19,0,0,1,1
1,1021,1,0.5,1,0,1,53,0.7,136,3,...,905,1988,2631,17,3,7,1,1,0,2
2,563,1,0.5,1,2,1,41,0.9,145,5,...,1263,1716,2603,11,2,9,1,1,0,2
3,615,1,2.5,0,0,0,10,0.8,131,6,...,1216,1786,2769,16,8,11,1,0,0,2
4,1821,1,1.2,0,13,1,44,0.6,141,2,...,1208,1212,1411,8,2,15,1,1,0,1


In [58]:
data.shape

(2000, 21)

In [65]:
# import packages 
import numpy as np 
import pandas as pd 
from sklearn.ensemble import RandomForestClassifier 
from sklearn import metrics
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler 
from hyperopt import tpe, hp, fmin, STATUS_OK,Trials
from hyperopt.pyll.base import scope

from sklearn.model_selection import GridSearchCV, RandomizedSearchCV

import warnings
warnings.filterwarnings("ignore")

In [61]:
# split data into features and target 
X = data.drop("price_range", axis=1).values 
y = data.price_range.values

# standardize the feature variables 

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)




In [70]:
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y)

In [69]:
rf = RandomForestClassifier()

params = [{
    "n_estimators": [100, 200, 300, 400,500,600],
    "max_depth":  [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
    "criterion": ["gini", "entropy"],
}]

## Grid Search

In [71]:
%%time
gs = GridSearchCV(rf, param_grid=params, cv=5, n_jobs=-1)
gs.fit(X_train, y_train)

print('Best Parameters for Grid Search :',gs.best_params_)

Best Parameters for Grid Search : {'criterion': 'entropy', 'max_depth': 13, 'n_estimators': 200}
Wall time: 6min 10s


In [78]:
gs_rf = RandomForestClassifier(n_estimators = 200,criterion = 'entropy',max_depth = 13)
gs_rf.fit(X_train,y_train)

gs_rf.score(X_test,y_test)

0.882

GridSearh took 6 minutes of time and gave the accuracy of 88

## Randomised Search

In [87]:
%%time
params = {
    "n_estimators": [100, 200, 300, 400,500,600],
    "max_depth":  [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
    "criterion": ["gini", "entropy"],
}

rs = RandomizedSearchCV(rf, param_distributions=params, cv=5, n_jobs=-1)
rs.fit(X_train, y_train)

print('Best Parameters for Random Search :',rs.best_params_)

Best Parameters for Random Search : {'n_estimators': 500, 'max_depth': 9, 'criterion': 'entropy'}
Wall time: 26 s


In [88]:
rs_rf = RandomForestClassifier(n_estimators = 500,criterion = 'entropy',max_depth = 9)
rs_rf.fit(X_train,y_train)

rs_rf.score(X_test,y_test)

0.872

RandomSearh took 26 seconds of time and gave the accuracy of 87

## Bayesian Optimization

Implementing Bayesina Optimization using Hyperot involves 4 parts



#### Objective function       
The objective function canbe any function which returns a real value that we want to minimize. In this case, we want to minimize the validation error of a machine learning model with respect to the hyperparameters. If the real value is accuracy we want to amximize it. the function should return the negative of the metric

In [None]:
# define objective function

def hyperparameter_tunning(params):
    clf =  RandomForestClassifier(**params, n_jobs = -1)
    acc = cross_val_score(clf, X_train, y_train , scoring='accuracy').mean()
    return {'loss':-acc, 'status':STATUS_OK}

#### Domain space
The domain space is the range of values that we want to evaluate for each hyperparameter. In each iteration of the search, the bayesian algorithm will choose one value for each hyperparameter from the domain space. In Bayesian optimization this space has probability distributions for each hyperparameter rather than the discrete values. When first tuning a model, we should create a wide domain space centered around the default values and then refine it in subsequent searches.

In [83]:
# Define space for optimization
space = {
    "n_estimators": hp.choice("n_estimators", [100, 200, 300, 400,500,600]),
    "max_depth": hp.quniform("max_depth", 1, 15,1),
    "criterion": hp.choice("criterion", ["gini", "entropy"]),
}


There are different domain distribution types:       

| Distribution Name | Utility |
|---| --- |
| choice | categorical variables |
| quniform | discrete uniform (integers spaced evenly) |       
| uniform | continuous uniform (floats spaced evenly) |      
| loguniform | continuous log uniform (floats spaced evenly on a log scale) |

#### Optimization Algorithm
Hyperopt has the TPE option along with random search. During optimation, the TPE algorithm constructs the probability model from the past results and  decides the next set of hyperparameters to evaluate in the objective fuction by maximizing the expected improvent

#### Results History
Hyperopt will track the results internally fro the alogorithm. To take a look into the results we can use Trails object which will store basic trainig information and also the dictionary returned from the objective function(which includes the loss and the params)

In [None]:
# Initialize trails object

trials = Trials()

In [84]:
%%time

best = fmin(fn=hyperparameter_tunning,
           space = space,
           algo = tpe.suggest,
           max_evals = 100,
           trials = trials
           )

print("Best : {}".format(best))

100%|█████████████████████████████████████████████| 100/100 [05:09<00:00,  3.09s/trial, best loss: -0.8886996774653766]
Best : {'criterion': 1, 'max_depth': 14.0, 'n_estimators': 5}
Wall time: 5min 9s


Within each iteration, the algorithm chooses new hyperparameter values from the surrogate function which is constructed based on the previous results and evaluates these values in the objective function. This continues for num_eval evaluations of the objective function with the surrogate function continually updated with each new result.

The best_param object yield the results. The best_param object that is returned from fmin contains the hyperparameters that yielded the lowest loss on the objective function. Once we have these hyperparameters, we can use them to train a model on the full training data and then evaluate on the test data.

In [85]:
by_rf = RandomForestClassifier(n_estimators = 600,criterion = 'entropy',max_depth = 14)
by_rf.fit(X_train,y_train)

by_rf.score(X_test,y_test)

0.87

Since the grid of parameters are small in number in this problem, RandomSearch seems to does better. When the parameters increase Bayesian Optimization does well