# Automate Tuning
After having become quite a bit more comfortable with manual tuning, it is time to start including some automated tuning. 
I will be looking at Hyperopt: 
- using https://www.kaggle.com/prashant111/a-guide-on-xgboost-hyperparameters-tuning as a guide

# HYPERPOT
## Bayesian Optimization
The optimization process consists of 4 parts which are as follows-

1. Initialize domain space:
    - The domain space is the input values over which we want to search.

2. Define objective function:
    - The objective function can be any function which returns a real value that we want to minimize. In this case, we want to minimize the validation error of a machine learning model with respect to the hyperparameters. If the real value is accuracy, then we want to maximize it. Then the function should return the negative of that metric.

3. Optimization algorithm:
    - It is the method used to construct the surrogate objective function and choose the next values to evaluate.

4. Results:
    - Results are score or value pairs that the algorithm uses to build the model.

# TODO: 
1. https://gist.github.com/abhi1868sharma/7a9fc1cd34e4b9167849b8c3e761494e :: cross-validation objective function?
2. speed up convergence of hyperopt?

In [9]:
import pandas as pd
import numpy as np 
import seaborn as sns 
import matplotlib.pyplot as plt
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.metrics import mean_squared_error as mse
from sklearn.model_selection import train_test_split, GridSearchCV, KFold
import xgboost as xgb
from yellowbrick.regressor import residuals_plot, prediction_error
from fast_ml.model_development import train_valid_test_split
from hyperopt import STATUS_OK, Trials, fmin, hp, tpe

In [5]:
X_train = pd.read_csv('D:/Projects/Prediction/Techniques Practice/Trees/XGBoost/Suicides/temp data/X_train.csv').drop('Unnamed: 0',axis=1)
y_train = pd.read_csv('D:/Projects/Prediction/Techniques Practice/Trees/XGBoost/Suicides/temp data/y_train.csv').drop('Unnamed: 0',axis=1)
X_val = pd.read_csv('D:/Projects/Prediction/Techniques Practice/Trees/XGBoost/Suicides/temp data/X_val.csv').drop('Unnamed: 0',axis=1)
y_val = pd.read_csv('D:/Projects/Prediction/Techniques Practice/Trees/XGBoost/Suicides/temp data/y_val.csv').drop('Unnamed: 0',axis=1)
X_test = pd.read_csv('D:/Projects/Prediction/Techniques Practice/Trees/XGBoost/Suicides/temp data/X_test.csv').drop('Unnamed: 0',axis=1)
y_test = pd.read_csv('D:/Projects/Prediction/Techniques Practice/Trees/XGBoost/Suicides/temp data/y_test.csv').drop('Unnamed: 0',axis=1)

In [None]:
space={ 'n_estimators': 2000, #? looks like you can't vary n_boosting_rounds
        'eta' : 0.07 ,
        'max_depth': hp.quniform("max_depth", 3, 18, 1),
        'gamma': hp.uniform ('gamma', 1,9),
        'reg_alpha' : hp.quniform('reg_alpha', 40,180,1),
        'reg_lambda' : hp.uniform('reg_lambda', 0,1),
        'colsample_bytree' : hp.uniform('colsample_bytree', 0.5,1),
        'min_child_weight' : hp.quniform('min_child_weight', 0, 10, 1),
        'seed': 0
    }
def objective(space):
    clf=xgb.XGBRegressor(n_estimators = space['n_estimators'], max_depth = int(space['max_depth']),learning_rate=space['eta'], gamma = space['gamma'],
                    reg_alpha = int(space['reg_alpha']),min_child_weight=int(space['min_child_weight']),
                    colsample_bytree=int(space['colsample_bytree']), tree_method = "gpu_hist",single_precision_histogram=True, gpu_id=0)
    evaluation = [( X_train, y_train), ( X_val, y_val)]
    
    clf.fit(X_train, y_train,
            eval_set=evaluation, eval_metric="rmse",
            early_stopping_rounds=10,verbose=False)
    

    pred = clf.predict(X_test)
    accuracy = np.sqrt(mse(y_test, pred))
    print ("RMSE:", accuracy)
    return {'loss': -accuracy, 'status': STATUS_OK }

In [31]:
space={'learning_rate': 0.07, 'max_depth': 6, 'n_estimators': 2100}
def objective(space):
   
    clf=xgb.XGBRegressor(**space, tree_method = "gpu_hist",single_precision_histogram=True, gpu_id=0)
    evaluation = [( X_train, y_train), ( X_val, y_val)]
    
    clf.fit(X_train, y_train,
            eval_set=evaluation, eval_metric="rmse",
            early_stopping_rounds=10,verbose=False)
    

    pred = clf.predict(X_test)
    accuracy = np.sqrt(mse(y_test, pred))
    print ("RMSE:", accuracy)
    return {'loss': -accuracy, 'status': STATUS_OK }

In [30]:
trials = Trials()

best_hyperparams = fmin(fn = objective,
                        space = space,
                        algo = tpe.suggest,
                        max_evals = 500,
                        trials = trials)

RMSE:                                                                                                                  
0.8784864662298276                                                                                                     
RMSE:                                                                                                                  
0.8744319247583034                                                                                                     
RMSE:                                                                                                                  
0.8755733775037771                                                                                                     
RMSE:                                                                                                                  
0.8746966465114387                                                                                                     
RMSE:                                   

KeyboardInterrupt: 

In [32]:
trials = Trials()

best_hyperparams = fmin(fn = objective,
                        space = space,
                        algo = tpe.suggest,
                        max_evals = 100,
                        trials = trials)

RMSE:                                                                                                                  
0.04784008726303311                                                                                                    
RMSE:                                                                                                                  
0.04784008726303311                                                                                                    
RMSE:                                                                                                                  
0.04784008726303311                                                                                                    
  3%|█▍                                            | 3/100 [00:09<05:13,  3.23s/trial, best loss: -0.04784008726303311]


KeyboardInterrupt: 

In [None]:
#....
# DO YOU SEE WHAT I SEE
# what is this shit 
# trying to optimize all params leads to error of 0.88 and having the simplesssttt search space yields 0.04 right off the bat???

okay. lesson learned. let's stick to what we learned by tuning manually and try to see if we can get even the teensiest squeeze on that? 

In [37]:
space={ 'n_estimators': hp.quniform("n_estimators", 1500, 3000, 100), 
        'eta' : hp.quniform('eta', 0.05, 0.1, 0.02 ), 
        'max_depth': hp.quniform("max_depth", 4, 12, 1),
        'gamma': hp.uniform ('gamma', 0,9),
        'reg_alpha' : hp.quniform('reg_alpha', 0,180,1),
        'reg_lambda' : hp.uniform('reg_lambda', 0,1),
        'colsample_bytree' : hp.uniform('colsample_bytree', 0.5,1),
        'min_child_weight' : hp.quniform('min_child_weight', 0, 4, 1),
        'seed': 0
    }

def objective(space):
   
    clf=xgb.XGBRegressor(n_estimators = int(space['n_estimators']), max_depth = int(space['max_depth']),learning_rate=space['eta'], gamma = space['gamma'],
                    reg_alpha = int(space['reg_alpha']),min_child_weight=int(space['min_child_weight']),
                    colsample_bytree=int(space['colsample_bytree']), tree_method = "gpu_hist",single_precision_histogram=True, gpu_id=0)
    evaluation = [( X_train, y_train), ( X_val, y_val)]
    
    clf.fit(X_train, y_train,
            eval_set=evaluation, eval_metric="rmse",
            early_stopping_rounds=10,verbose=False)
    

    pred = clf.predict(X_test)
    accuracy = np.sqrt(mse(y_test, pred))
    print ("RMSE:", accuracy)
    return {'loss': -accuracy, 'status': STATUS_OK }

trials = Trials()

best_hyperparams = fmin(fn = objective,
                        space = space,
                        algo = tpe.suggest,
                        max_evals = 100,
                        trials = trials)

RMSE:                                                                                                                  
0.8746131302838872                                                                                                     
RMSE:                                                                                                                  
0.7440878628116653                                                                                                     
RMSE:                                                                                                                  
0.8650550793202926                                                                                                     
RMSE:                                                                                                                  
0.857987875137898                                                                                                      
RMSE:                                   

is it the introduction of regularization that completely just BENDS things over?

In [38]:
best_hyperparams

{'colsample_bytree': 0.7517777300309474,
 'eta': 0.08,
 'gamma': 4.162497969488593,
 'max_depth': 10.0,
 'min_child_weight': 1.0,
 'n_estimators': 2000.0,
 'reg_alpha': 180.0,
 'reg_lambda': 0.9528689904241021}

what if I take out reg_alpha completely 

In [45]:
space={ 'n_estimators': hp.normal("n_estimators", 1800, 300 ), 
        'eta' : hp.normal('eta', 0.05, 0.1 ), 
        'max_depth': hp.quniform("max_depth", 5, 12, 1),
        'gamma': hp.uniform ('gamma', 0,9),
        'reg_alpha' : hp.quniform('reg_alpha', 0,180,1),
        'reg_lambda' : hp.uniform('reg_lambda', 0,1),
        'colsample_bytree' : hp.uniform('colsample_bytree', 0.5,1),
        'min_child_weight' : hp.quniform('min_child_weight', 0, 4, 1),
        'seed': 0
    }

def objective(space):
   
    clf=xgb.XGBRegressor(n_estimators = abs(int(space['n_estimators'])), max_depth = abs(int(space['max_depth'])),learning_rate=abs(space['eta']), gamma = space['gamma'],
                    min_child_weight=int(space['min_child_weight']), reg_alpha = int(space['reg_alpha']),
                    colsample_bytree=int(space['colsample_bytree']), tree_method = "gpu_hist",single_precision_histogram=True, gpu_id=0)
    evaluation = [( X_train, y_train), ( X_val, y_val)]
    
    clf.fit(X_train, y_train,
            eval_set=evaluation, eval_metric="rmse",
            early_stopping_rounds=10,verbose=False)
    

    pred = clf.predict(X_test)
    accuracy = np.sqrt(mse(y_test, pred))
    print ("RMSE:", accuracy)
    return {'loss': accuracy, 'status': STATUS_OK } #adding the - to accuracy made hyperopt find the max

trials = Trials()

best_hyperparams = fmin(fn = objective,
                        space = space,
                        algo = tpe.suggest,
                        max_evals = 1500,
                        trials = trials)

RMSE:                                                                                                                  
0.847493529904555                                                                                                      
RMSE:                                                                                                                  
0.8138848368866455                                                                                                     
RMSE:                                                                                                                  
0.8696551409695308                                                                                                     
RMSE:                                                                                                                  
0.8309962811803352                                                                                                     
RMSE:                                   

In [49]:
best_hyperparams

{'colsample_bytree': 0.5510293765856744,
 'eta': 1.1616406880087637,
 'gamma': 3.2395204247720217,
 'max_depth': 5.0,
 'min_child_weight': 2.0,
 'n_estimators': 1964.832966411027,
 'reg_alpha': 6.0,
 'reg_lambda': 0.759339869021229}

In [56]:
params = {'colsample_bytree': 0.5510293765856744,
 'eta': 1.1616406880087637,
 'gamma': 3.2395204247720217,
 'max_depth': 5,
 'min_child_weight': 2.0,
 'n_estimators': 1965,
 'reg_alpha': 6.0,
 'reg_lambda': 0.759339869021229}
xgb_reg = xgb.XGBRegressor(**params, tree_method = "gpu_hist",single_precision_histogram=True, gpu_id=0)
xgb_reg.fit(X_train, y_train)

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=0.5510293765856744,
             enable_categorical=False, eta=1.1616406880087637,
             gamma=3.2395204247720217, gpu_id=0, importance_type=None,
             interaction_constraints='', learning_rate=1.16164064,
             max_delta_step=0, max_depth=5, min_child_weight=2.0, missing=nan,
             monotone_constraints='()', n_estimators=1965, n_jobs=12,
             num_parallel_tree=1, predictor='auto', random_state=0,
             reg_alpha=6.0, reg_lambda=0.759339869021229, scale_pos_weight=1,
             single_precision_histogram=True, subsample=1,
             tree_method='gpu_hist', validate_parameters=1, verbosity=None)

In [57]:
preds = xgb_reg.predict(X_test)
np.sqrt(mse(preds, y_test))

0.3219330690314253

In [None]:
#error 10x larger than manual tuning. Which is itself larger than defaults

In [58]:
#last experiment : fix params we've tuned manually 

space={ 'n_estimators': 2100, 
        'eta' : 0.07, 
        'max_depth': 6,
        'gamma': hp.uniform ('gamma', 0,9),
        'reg_alpha' : hp.quniform('reg_alpha', 0,180,1),
        'reg_lambda' : hp.uniform('reg_lambda', 0,1),
        'colsample_bytree' : hp.uniform('colsample_bytree', 0.5,1),
        'min_child_weight' : hp.quniform('min_child_weight', 0, 4, 1),
        'seed': 0
    }

def objective(space):
   
    clf=xgb.XGBRegressor(n_estimators = abs(int(space['n_estimators'])), max_depth = abs(int(space['max_depth'])),learning_rate=abs(space['eta']), gamma = space['gamma'],
                    min_child_weight=int(space['min_child_weight']), reg_alpha = int(space['reg_alpha']),
                    colsample_bytree=int(space['colsample_bytree']), tree_method = "gpu_hist",single_precision_histogram=True, gpu_id=0)
    evaluation = [( X_train, y_train), ( X_val, y_val)]
    
    clf.fit(X_train, y_train,
            eval_set=evaluation, eval_metric="rmse",
            early_stopping_rounds=10,verbose=False)
    

    pred = clf.predict(X_test)
    accuracy = np.sqrt(mse(y_test, pred))
    print ("RMSE:", accuracy)
    return {'loss': accuracy, 'status': STATUS_OK } #adding the - to accuracy made hyperopt find the max

trials = Trials()

best_hyperparams = fmin(fn = objective,
                        space = space,
                        algo = tpe.suggest,
                        max_evals = 1500,
                        trials = trials)

RMSE:                                                                                                                  
0.8811070946920265                                                                                                     
RMSE:                                                                                                                  
0.8778507681366817                                                                                                     
RMSE:                                                                                                                  
0.8535696171328029                                                                                                     
RMSE:                                                                                                                  
0.8547779377377658                                                                                                     
RMSE:                                   

In [None]:
# conclulsion: fixing params doesn't help. anyway. at least I got a bit of pracitce with hyperopt

In [59]:
X_test

Unnamed: 0,x0_Albania,x0_Antigua and Barbuda,x0_Argentina,x0_Armenia,x0_Aruba,x0_Australia,x0_Austria,x0_Azerbaijan,x0_Bahamas,x0_Bahrain,...,x3_Generation X,x3_Generation Z,x3_Millenials,x3_Silent,year,population,suicides_p100k,HDI for year,gdp_for_year,gdp_per_capita
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,-0.502825,1.148962,3.478894,-0.996957,-0.272038,-0.837627
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.559887,-0.303766,-0.675913,1.171238,-0.200617,0.393892
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.914124,-0.469533,-0.675913,-0.225580,-0.305957,-0.355238
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,-1.801694,-0.159229,-0.356312,-0.851021,-0.279882,-0.673548
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.087570,-0.422264,-0.675913,0.483253,-0.280303,0.042438
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2777,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,1.032203,-0.321863,-0.441750,0.723006,-0.244967,0.027560
2778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.559887,-0.046863,-0.196512,-0.069220,-0.221554,-0.574381
2779,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,-1.093220,-0.350485,-0.653762,-0.955261,-0.304338,-0.840910
2780,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,-1.093220,0.670271,-0.645324,0.451981,1.154079,0.583544
