# Hyperparameter tuning with XGBoost, Ray Tune, Hyperopt and BayesOpt

Finding the best hyperparameters for complex modern machine learning algorithms is time-consuming. XGBoost, LightGBM, and neural networks have so many tuning parameters and combinations that a fine-grained grid search may be infeasible.

Fortunately, modern hyperparameter tuning algos, like HyperOpt and Optuna, can run many tests concurrently on a single machine or on a cluster, accelerating the tuning process, saving time and yielding better hyperparameters. This post will demonstrate speeding hyperparameter tuning using Ray Tune, HyperOpt and BayesOpt on a clusters to significantly accelerate tuning.

I will use this [Housing Prices Competition for Kaggle Learn Users](https://www.kaggle.com/c/house-prices-advanced-regression-techniques) . The response we are predicting is the log-transformed SalePrice based on house features like square feet, neighborhood location, property features like pool, condition. I already did [some feature engineering and feature selection](https://github.com/druce/iowa) and my submission was top 5% when I submitted it in 2019.

Outline:
- Baseline linear regression with no hyperparameters
- ElasticNet with L1 and L2 regularization using ElasticNetCV hyperparameter optimization
- ElasticNet with GridSearchCV hyperparameter optimization
- XGBoost with sequential grid search over hyperparameter subsets with early stopping 
- XGBoost with Ray, HyperOpt and BayesOpt search algorithms
- Accelerate advanced algorithms with a Ray cluster


| ML Algo           | Hyperparameter search algo   | CV Error (RMSE in $)  | Time     |
|-------------------|------------------------------|-----------------------|----------|
| Linear Regression | None                         | $18192                |   0:01s  |
| ElasticNet        | ElasticNetCV (Grid Search)   | $18122                |   0:02s  |          
| ElasticNet        | GridSearchCV                 | $18061                |   0:05s  |          
| XGB               | Sequential Grid Search       | $18783                |   36:09  |
| XGB               | HyperOpt (128 samples)       | $18808                |   21:41  |
| XGB               | BayesOpt                     | $18506                | 1:15:04  |
| XGB               | Optuna                       | $18618

In [1]:
from itertools import product
from datetime import datetime, timedelta
import os
import random
import string

import numpy as np
import pandas as pd

import sklearn
from sklearn.linear_model import LinearRegression, ElasticNet, ElasticNetCV, Ridge, RidgeCV
from sklearn.model_selection import train_test_split, cross_val_score, cross_val_predict, GridSearchCV, KFold
from sklearn.preprocessing import RobustScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.pipeline import make_pipeline

#!conda install -y -c conda-forge  xgboost 
import xgboost
from xgboost import XGBRegressor
from xgboost import plot_importance

import ray
from ray import tune
from ray.tune.suggest import ConcurrencyLimiter
from ray.tune.schedulers import AsyncHyperBandScheduler
from ray.tune.suggest.bayesopt import BayesOptSearch
from ray.tune.suggest.hyperopt import HyperOptSearch
from ray.tune.suggest.optuna import OptunaSearch
from ray.tune.logger import DEFAULT_LOGGERS
from ray.tune.integration.wandb import WandbLogger

# pip install hyperopt
# pip install optuna

import wandb
os.environ['WANDB_NOTEBOOK_NAME'] = 'hyperparameter_optimization.ipynb'

print(datetime.now())

print ("%-20s %s"% ("numpy", np.__version__))
print ("%-20s %s"% ("pandas", pd.__version__))
print ("%-20s %s"% ("sklearn", sklearn.__version__))
print ("%-20s %s"% ("xgboost", xgboost.__version__))
print ("%-20s %s"% ("ray", ray.__version__))


2020-10-11 00:55:50.826820
numpy                1.19.1
pandas               1.1.3
sklearn              0.23.2
xgboost              1.2.0
ray                  1.1.0.dev0


In [2]:
# set seed for reproducibility
RANDOMSTATE = 42
np.random.seed(RANDOMSTATE)


In [3]:
def get_random_tag(length):
    """random tag for experiments"""
    letters_and_digits = string.ascii_letters + string.digits
    result_str = ''.join((random.choice(letters_and_digits) for i in range(length)))
    return result_str.upper()

get_random_tag(8)

'JKK2T97D'

In [4]:
# import train data
df = pd.read_pickle('x4.pickle')

response = 'SalePrice'
predictors = ['YearBuilt',
              'BsmtFullBath',
              'FullBath',
              'KitchenAbvGr',
              'GarageYrBlt',
              'LotFrontage',
              'MasVnrArea',
              '1stFlrSF',
              'GrLivArea',
              'GarageArea',
              'WoodDeckSF',
              'PorchSF',
              'AvgBltRemod',
              'FireBathRatio',
              'TotalSF x OverallQual x OverallCond',
              'AvgBltRemod x Functional x TotalFinSF',
              'Functional x OverallQual',
              'KitchenAbvGr x KitchenQual',
              'GarageCars x GarageYrBlt',
              'GarageQual x GarageCond x GarageCars',
              'HeatingQC x Heating',
              'monthnum',
              'log_YearBuilt',
              'log_LotArea',
              'log_TotalFinSF',
              'log_GarageRatio',
              'log_TotalSF x OverallQual x OverallCond',
              'log_TotalSF x OverallCond',
              'log_AvgBltRemod x TotalFinSF',
              'sq_2ndFlrSF',
              'sq_BsmtFinSF',
              'sq_BsmtFinSF x BsmtQual',
              'sq_BsmtFinSF x BsmtBath',
              'BldgType_4',
              'BsmtExposure_1',
              'BsmtExposure_4',
              'BsmtFinType1_1',
              'BsmtFinType1_2',
              'BsmtFinType1_4',
              'BsmtFinType1_5',
              'BsmtFinType1_6',
              'CentralAir_0',
              'CentralAir_1',
              'Condition1_1',
              'Condition1_3',
              'ExterCond_2',
              'ExterQual_2',
              'Exterior1st_4',
              'Exterior1st_5',
              'Exterior1st_10',
              'Fence_0',
              'Fence_2',
              'Foundation_1',
              'Foundation_5',
              'GarageCars_1',
              'GarageFinish_2',
              'GarageFinish_3',
              'GarageType_2',
              'HouseStyle_2',
              'KitchenQual_4',
              'LotConfig_0',
              'LotConfig_4',
              'MSSubClass_30',
              'MSSubClass_70',
              'MSZoning_0',
              'MSZoning_1',
              'MSZoning_4',
              'MasVnrType_2',
              'MasVnrType_3',
              'MoSold_1',
              'MoSold_5',
              'MoSold_6',
              'MoSold_11',
              'Neighborhood_3',
              'Neighborhood_4',
              'Neighborhood_5',
              'Neighborhood_10',
              'Neighborhood_11',
              'Neighborhood_16',
              'Neighborhood_17',
              'Neighborhood_19',
              'Neighborhood_22',
              'Neighborhood_24',
              'OverallCond_7',
              'OverallQual_5',
              'OverallQual_6',
              'OverallQual_7',
              'OverallQual_9',
              'PavedDrive_0',
              'PavedDrive_2',
              'SaleCondition_1',
              'SaleCondition_2',
              'SaleCondition_5',
              'SaleType_4',
              'BedroomAbvGr_1',
              'BedroomAbvGr_4',
              'BedroomAbvGr_5',
              'HalfBath_1',
              'TotalBath_1.0',
              'TotalBath_2.5']

X_train, X_test, y_train, y_test = train_test_split(df, df[response], test_size=.25)

display(df[predictors].head())
display(df[[response]].head())


Unnamed: 0_level_0,YearBuilt,BsmtFullBath,FullBath,KitchenAbvGr,GarageYrBlt,LotFrontage,MasVnrArea,1stFlrSF,GrLivArea,GarageArea,...,SaleCondition_1,SaleCondition_2,SaleCondition_5,SaleType_4,BedroomAbvGr_1,BedroomAbvGr_4,BedroomAbvGr_5,HalfBath_1,TotalBath_1.0,TotalBath_2.5
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,7,1,2,1,7,65.0,196.0,856,1710,548.0,...,0,0,0,1,0,0,0,1,0,0
2,34,0,2,1,34,80.0,0.0,1262,1262,460.0,...,0,0,0,1,0,0,0,0,0,1
3,9,1,2,1,9,68.0,162.0,920,1786,608.0,...,0,0,0,1,0,0,0,1,0,0
4,95,1,1,1,12,60.0,0.0,961,1717,642.0,...,1,0,0,1,0,0,0,0,0,0
5,10,1,2,1,10,84.0,350.0,1145,2198,836.0,...,0,0,0,1,0,1,0,1,0,0


Unnamed: 0_level_0,SalePrice
Id,Unnamed: 1_level_1
1,12.247699
2,12.109016
3,12.317171
4,11.849405
5,12.42922


In [5]:
# we are training on a response which is the log of 1 + the sale price
# transform prediction back to original basis with expm1 and evaluate vs. original

def evaluate(y_train, y_pred_train, y_test, y_pred_test):
    """evaluate in train_test split"""
    print('Train RMSE', np.sqrt(mean_squared_error(np.expm1(y_train), np.expm1(y_pred_train))))
    print('Train R-squared', r2_score(np.expm1(y_train), np.expm1(y_pred_train)))
    print('Train MAE', mean_absolute_error(np.expm1(y_train), np.expm1(y_pred_train)))
    print()
    print('Test RMSE', np.sqrt(mean_squared_error(np.expm1(y_test), np.expm1(y_pred_test))))
    print('Test R-squared', r2_score(np.expm1(y_test), np.expm1(y_pred_test)))
    print('Test MAE', mean_absolute_error(np.expm1(y_test), np.expm1(y_pred_test)))

MEAN_RESPONSE=df[response].mean()
def cv_to_raw(cv_val):
    """convert log1p rmse to underlying SalePrice error"""
    return np.expm1(MEAN_RESPONSE+cv_val) - np.expm1(MEAN_RESPONSE)

In [6]:
# always use same k-folds for reproducibility
kfolds = KFold(n_splits=10, shuffle=True, random_state=RANDOMSTATE)


## Baseline linear regression
- Raw CV RMSE 18191.9791
- Wall time 2.81 s

In [7]:
%%time
# Tune lr search space for alphas and l1_ratio
print("LinearRegression")

print(len(predictors), "predictors")

lr = LinearRegression()

#train and evaluate in train/test split
lr.fit(X_train[predictors], y_train)

y_pred_train = lr.predict(X_train[predictors])
y_pred_test = lr.predict(X_test[predictors])
evaluate(y_train, y_pred_train, y_test, y_pred_test)

# evaluate using kfolds, same process as train/test split but average results over 10 folds
# more sample-efficient, less CPU-efficient

scores = -cross_val_score(lr, df[predictors], df[response],
                          scoring="neg_root_mean_squared_error",
                          cv=kfolds,
                          n_jobs=-1)
raw_scores = [cv_to_raw(x) for x in scores]
print()
print("Log1p CV RMSE %.04f (STD %.04f)" % (np.mean(scores), np.std(scores)))
print("Raw CV RMSE %.04f (STD %.04f)" % (np.mean(raw_scores), np.std(raw_scores)))


LinearRegression
100 predictors
Train RMSE 16551.16362069584
Train R-squared 0.955449013087463
Train MAE 10998.860387471454

Test RMSE 17846.171920180186
Test R-squared 0.936430176939307
Test MAE 12755.395673705598

Log1p CV RMSE 0.1037 (STD 0.0099)
Raw CV RMSE 18191.9791 (STD 1838.6678)
CPU times: user 222 ms, sys: 278 ms, total: 501 ms
Wall time: 940 ms


## Native Sklearn xxxCV
- LogisticRegressionCV, LassoCV, RidgeCV, ElasticNetCV, etc.
- Test many hyperparameters in parallel with multithreading
- Note improvement vs. LinearRegression due to controlling overfitting
- RMSE $18103
- Time 5s


In [8]:
%%time
# Tune elasticnet search space for alphas and L1_ratio
# predictor selection used to create the training set used lasso
# so l1 parameter is close to 0
# could use ridge (eg elasticnet with 0 L1 regularization)
# but then only 1 param, more general and useful to do this with elasticnet
print("ElasticnetCV")

# make pipeline
# with regularization must scale predictors
elasticnetcv = make_pipeline(RobustScaler(),
                             ElasticNetCV(max_iter=100000, 
                                          l1_ratio=[0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99],
                                          alphas=np.logspace(-4, -2, 9),
                                          cv=kfolds,
                                          n_jobs=-1,
                                          verbose=1,
                                         ))

#train and evaluate in train/test split
elasticnetcv.fit(X_train[predictors], y_train)

y_pred_train = elasticnetcv.predict(X_train[predictors])
y_pred_test = elasticnetcv.predict(X_test[predictors])
evaluate(y_train, y_pred_train, y_test, y_pred_test)
l1_ratio = elasticnetcv._final_estimator.l1_ratio_
alpha = elasticnetcv._final_estimator.alpha_
print('l1_ratio', l1_ratio)
print('alpha', alpha)

# evaluate using kfolds on full dataset
# I don't see API to get CV error from elasticnetcv, so we use cross_val_score
elasticnet = ElasticNet(alpha=alpha,
                        l1_ratio=l1_ratio,
                        max_iter=10000)

scores = -cross_val_score(elasticnet, df[predictors], df[response],
                          scoring="neg_root_mean_squared_error",
                          cv=kfolds,
                          n_jobs=-1)
raw_scores = [cv_to_raw(x) for x in scores]
print()
print("Log1p CV RMSE %.04f (STD %.04f)" % (np.mean(scores), np.std(scores)))
print("Raw CV RMSE %.04f (STD %.04f)" % (np.mean(raw_scores), np.std(raw_scores)))


ElasticnetCV


[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 8 concurrent workers.
........................................................................................................................................................................................................................................................................................................................................[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.4s
.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Train RMSE 16782.099077546347
Train R-squared 0.9541971157728129
Train MAE 11025.896226946621

Test RMSE 17457.333905669526
Test R-squared 0.9391701570474736
Test MAE 12389.820674779758
l1_ratio 0.01
alpha 0.005623413251903491

Log1p CV RMSE 0.1033 (STD 0.0112)
Raw CV RMSE 18122.0127 (STD 2074.9545)
CPU times: user 6.17 s, sys: 4.66 s, total: 10.8 s
Wall time: 1.65 s


## GridSearchCV
- Useful for algos with no native multithreaded xxxCV
- Test many hyperparameter combinations in parallel with multithreading
- Similar result vs ElasticNetCV, not exact, need more research as to why


In [9]:
%%time
gs = make_pipeline(RobustScaler(),
                   GridSearchCV(ElasticNet(max_iter=100000),
                                param_grid={'l1_ratio': [0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99],
                                            'alpha': np.logspace(-4, -2, 9),
                                           },
                                scoring='neg_mean_squared_error',
                                refit=True,
                                cv=kfolds,
                                n_jobs=-1,
                                verbose=1
                               ))

# do cv using kfolds on full dataset
print("\nCV on full dataset")
gs.fit(df[predictors], df[response])
print('best params', gs._final_estimator.best_params_)
print('best score', -gs._final_estimator.best_score_)
l1_ratio = gs._final_estimator.best_params_['l1_ratio']
alpha = gs._final_estimator.best_params_['alpha']

elasticnet = ElasticNet(alpha=alpha,
                        l1_ratio=l1_ratio,
                        max_iter=100000)
print(elasticnet)

scores = -cross_val_score(elasticnet, df[predictors], df[response],
                          scoring="neg_root_mean_squared_error",
                          cv=kfolds,
                          n_jobs=-1)
raw_scores = [cv_to_raw(x) for x in scores]
print()
print("Log1p CV RMSE %.06f (STD %.04f)" % (np.mean(scores), np.std(scores)))
print("Raw CV RMSE %.06f (STD %.04f)" % (np.mean(raw_scores), np.std(raw_scores)))

# difference in average CV scores reported by GridSearchCV and cross_val_score
# with same alpha, l1_ratio, kfolds
# one reason could be that we used simple average, GridSearchCV is weighted by # of samples per fold?
nsamples = [len(z[1]) for z in kfolds.split(df)]
print("weighted average %.06f" % np.average(scores, weights=nsamples))
# not sure why, also ElasticSearchCV shows fewer fits, takes less time



CV on full dataset
Fitting 10 folds for each of 117 candidates, totalling 1170 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  56 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 1123 tasks      | elapsed:    3.9s
[Parallel(n_jobs=-1)]: Done 1170 out of 1170 | elapsed:    4.0s finished


best params {'alpha': 0.0031622776601683794, 'l1_ratio': 0.01}
best score 0.010637685614240399
ElasticNet(alpha=0.0031622776601683794, l1_ratio=0.01, max_iter=100000)

Log1p CV RMSE 0.103003 (STD 0.0109)
Raw CV RMSE 18060.902698 (STD 2008.2407)
weighted average 0.103023
CPU times: user 834 ms, sys: 342 ms, total: 1.18 s
Wall time: 4.46 s


In [10]:
# roll-our-own CV 
# matches cross_val_score
alpha = 0.0031622776601683794
l1_ratio = 0.01
regressor = ElasticNet(alpha=alpha,
                       l1_ratio=l1_ratio,
                       max_iter=10000)
print(regressor)
cverrors = []
for train_fold, cv_fold in kfolds.split(df): 
    fold_X_train=df[predictors].values[train_fold]
    fold_y_train=df[response].values[train_fold]
    fold_X_test=df[predictors].values[cv_fold]
    fold_y_test=df[response].values[cv_fold]
    regressor.fit(fold_X_train, fold_y_train)
    y_pred_test=regressor.predict(fold_X_test)
    cverrors.append(np.sqrt(mean_squared_error(fold_y_test, y_pred_test)))
    
print("%.06f" % np.average(cverrors))
    

ElasticNet(alpha=0.0031622776601683794, l1_ratio=0.01, max_iter=10000)
0.103003


## XGBoost CV 
- XGBoost has native multithreading, CV
- XGBoost has many tuning parameters so a complete grid search has an unreasonable number of combinations
- We tune reduced sets sequentially and use early stopping. 

### Tuning methodology
- Set an initial set of starting parameters
- Do 10-fold CV
- Use early stopping to halt training in each fold if no improvement after eg 100 rounds, pick hyperparameters to minimize average error over kfolds
- Tune sequentially on groups of hyperparameters that don't interact too much between groups to reduce combinations
- Tune max_depth and min_child_weight 
- Tune subsample and colsample_bytree
- Tune alpha, lambda and gamma (regularization)
- Tune learning rate: lower learning rate will need more rounds/n_estimators
- Retrain on full dataset with best learning rate and best n_estimators (average stopping point over kfolds)

### Notes
- It doesn't seem possible to get XGBoost early stopping and also use GridSearchCV. GridSearchCV doesn't pass the kfolds in a way that XGboost understands for early stopping
- 2 alternative approaches 
    - use native xgboost .cv which understands early stopping but doesn't use sklearn API (uses DMatrix, not np array or dataframe)
    - use sklearn API and roll our own grid search instead of GridSearchCV (used below)
- XGboost terminology differs from sklearn
    - boost_rounds = n_estimators
    - eta = learning_rate
- parameter reference: https://xgboost.readthedocs.io/en/latest/parameter.html
- training reference: https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.training
- times are wall times on an amazon t2.2xlarge instance with 
- to set up environment:
    - `conda create --name hyperparam python=3.8`
    - `conda activate hyperparam`
    - `conda install jupyter`
    - `pip install -r requirements.txt`
- round 1 Wall time: 6min 23s
- round 2 Wall time: 19min 22s
- round 3 Wall time: 5min 30s
- round 4 Wall time: 4min 54s
- total time 36:09
- RMSE 18783.117031

In [10]:
%%time
# this cell runs a single round
# the full process is 
# set initial XGboost parameters
# remove overrides (search for TODO: in this cell)
# run round 1 and override initial max_depth, min_child_weight based on best values (search for TODO:)
# run round 2 and override subsample and colsample_bytree based on best values
# run round 3 and override reg_alpha, reg_lambda, reg_gamma based on best values
# run round 4 and obtain learning_rate and best n_iterations
# this is not an exhaustive list but a representative list of most important parameters to tune
# see https://xgboost.readthedocs.io/en/latest/parameter.html for all parameters

# in XGBoost > 1.0.2 this seems to give a warning
# Parameters: { early_stopping_rounds } might not be used.
# but early stopping seems to be used correctly

max_depth = 5
min_child_weight=5
colsample_bytree = 0.5
subsample = 0.5
reg_alpha = 1e-05
reg_lambda = 1
reg_gamma = 0
learning_rate = 0.01

BOOST_ROUNDS=50000   # we use early stopping so make this arbitrarily high
EARLY_STOPPING_ROUNDS=100 # stop if no improvement after 100 rounds

# round 1: tune depth and min_child_weight
max_depths = list(range(1,5))
min_child_weights = list(range(1,5))
gridsearch_params_1 = product(max_depths, min_child_weights)

# round 2: tune subsample and colsample_bytree
subsamples = np.linspace(0.1, 1.0, 10)
colsample_bytrees = np.linspace(0.1, 1.0, 10)
gridsearch_params_2 = product(subsamples, colsample_bytrees)

# round 2 (refined): tune subsample and colsample_bytree
subsamples = np.linspace(0.4, 0.8, 9)
colsample_bytrees = np.linspace(0.05, 0.25, 5)
gridsearch_params_2 = product(subsamples, colsample_bytrees)

# round 3: tune alpha, lambda, gamma
reg_alphas = np.logspace(-3, -2, 3)
reg_lambdas = np.logspace(-2, 1, 4)
reg_gammas = [0]
#reg_gammas = np.linspace(0, 5, 6)
gridsearch_params_3 = product(reg_alphas, reg_lambdas, reg_gammas)

# round 4: learning rate
learning_rates = reversed(np.logspace(-3, -1, 5).tolist())
gridsearch_params_4 = learning_rates

# TODO: remove these overrides to reset the search
# override initial parameters after search
# round 1:
max_depth=2
min_child_weight=2
# # round 2:
subsample=0.60
colsample_bytree=0.05
# # round 3:  
reg_alpha = 0.003162
reg_lambda = 0.1
reg_gamma = 0

def my_cv(df, predictors, response, kfolds, regressor, verbose=False):
    """Roll our own CV over kfolds with early stopping"""
    metrics = []
    best_iterations = []

    for train_fold, cv_fold in kfolds.split(df): 
        fold_X_train=df[predictors].values[train_fold]
        fold_y_train=df[response].values[train_fold]
        fold_X_test=df[predictors].values[cv_fold]
        fold_y_test=df[response].values[cv_fold]
        regressor.fit(fold_X_train, fold_y_train,
                      early_stopping_rounds=EARLY_STOPPING_ROUNDS,
                      eval_set=[(fold_X_test, fold_y_test)],
                      eval_metric='rmse',
                      verbose=verbose
                     )
        y_pred_test=regressor.predict(fold_X_test)
        metrics.append(np.sqrt(mean_squared_error(fold_y_test, y_pred_test)))
        best_iterations.append(xgb.best_iteration)
    return np.average(metrics), np.std(metrics), np.average(best_iterations)

results = []
best_iterations = []

# TODO: iteratively uncomment 1 of the following 4 lines
# for i, (max_depth, min_child_weight) in enumerate(gridsearch_params_1): # round 1
# for i, (subsample, colsample_bytree) in enumerate(gridsearch_params_2): # round 2
# for i, (reg_alpha, reg_lambda, reg_gamma) in enumerate(gridsearch_params_3): # round 3
for i, learning_rate in enumerate(gridsearch_params_4): # round 4

    params = {
        'max_depth': max_depth,
        'min_child_weight': min_child_weight,
        'subsample': subsample,
        'colsample_bytree': colsample_bytree,
        'reg_alpha': reg_alpha,
        'reg_lambda': reg_lambda,
        'gamma': reg_gamma,
        'learning_rate': learning_rate,
    }
    print("%s params  %3d: %s" % (datetime.strftime(datetime.now(), "%T"), i, params))
    
    xgb = XGBRegressor(
        objective='reg:squarederror',
        n_estimators=BOOST_ROUNDS,
        early_stopping_rounds=EARLY_STOPPING_ROUNDS,
        random_state=RANDOMSTATE,    
        verbosity=1,
        n_jobs=-1,
        **params
    )
    
    metric_rmse, metric_std, best_iteration = my_cv(df, predictors, response, kfolds, xgb, verbose=False)    
    results.append([max_depth, min_child_weight, subsample, colsample_bytree, reg_alpha, reg_lambda, reg_gamma, 
                   learning_rate, metric_rmse, metric_std, best_iteration])
    
    print("%s %3d result mean: %.6f std: %.6f, iter: %.2f" % (datetime.strftime(datetime.now(), "%T"), i, metric_rmse, metric_std, best_iteration))


results_df = pd.DataFrame(results, columns=['max_depth', 'min_child_weight', 'subsample', 'colsample_bytree', 
                               'reg_alpha', 'reg_lambda', 'reg_gamma', 'learning_rate', 'rmse', 'std', 'best_iter']).sort_values('rmse')
results_df


18:56:19 params    0: {'max_depth': 2, 'min_child_weight': 2, 'subsample': 0.6, 'colsample_bytree': 0.05, 'reg_alpha': 0.003162, 'reg_lambda': 0.1, 'gamma': 0, 'learning_rate': 0.1}
Parameters: { early_stopping_rounds } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { early_stopping_rounds } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { early_stopping_rounds } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip thro

Parameters: { early_stopping_rounds } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { early_stopping_rounds } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { early_stopping_rounds } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { early_stopping_rounds } might not be used.

  This may not be accurate due to some parameters a

Parameters: { early_stopping_rounds } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { early_stopping_rounds } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { early_stopping_rounds } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { early_stopping_rounds } might not be used.

  This may not be accurate due to some parameters a

Unnamed: 0,max_depth,min_child_weight,subsample,colsample_bytree,reg_alpha,reg_lambda,reg_gamma,learning_rate,rmse,std,best_iter
1,2,2,0.6,0.05,0.003162,0.1,0,0.031623,0.105996,0.012395,1134.8
2,2,2,0.6,0.05,0.003162,0.1,0,0.01,0.106046,0.013608,2888.1
3,2,2,0.6,0.05,0.003162,0.1,0,0.003162,0.106838,0.013188,7783.7
4,2,2,0.6,0.05,0.003162,0.1,0,0.001,0.107344,0.013132,22413.3
0,2,2,0.6,0.05,0.003162,0.1,0,0.1,0.109786,0.010449,476.0


In [11]:
max_depth = int(results_df.iloc[0]['max_depth'])
min_child_weight = results_df.iloc[0]['min_child_weight']
subsample = results_df.iloc[0]['subsample']
colsample_bytree = results_df.iloc[0]['colsample_bytree']
reg_alpha = results_df.iloc[0]['reg_alpha']
reg_lambda = results_df.iloc[0]['reg_lambda']
reg_gamma = results_df.iloc[0]['reg_gamma']
learning_rate = results_df.iloc[0]['learning_rate']
N_ESTIMATORS = int(results_df.iloc[0]['best_iter'])

params = {
    'max_depth': int(max_depth),
    'min_child_weight': min_child_weight,
    'subsample': subsample,
    'colsample_bytree': colsample_bytree,
    'reg_alpha': reg_alpha,
    'reg_lambda': reg_lambda,
    'gamma': reg_gamma,
    'learning_rate': learning_rate,
    'n_estimators': N_ESTIMATORS,    
}

print(params)

{'max_depth': 2, 'min_child_weight': 2.0, 'subsample': 0.6, 'colsample_bytree': 0.05, 'reg_alpha': 0.003162, 'reg_lambda': 0.1, 'gamma': 0.0, 'learning_rate': 0.03162277660168379, 'n_estimators': 1134}


In [12]:
%%time
# evaluate without early stopping

xgb = XGBRegressor(
    objective='reg:squarederror',
    random_state=RANDOMSTATE,    
    verbosity=1,
    n_jobs=-1,
    **params
)
print(xgb)

scores = -cross_val_score(xgb, df[predictors], df[response],
                          scoring="neg_root_mean_squared_error",
                          cv=kfolds,
                          n_jobs=-1)
raw_scores = [cv_to_raw(x) for x in scores]
print()
print("Log1p CV RMSE %.06f (STD %.04f)" % (np.mean(scores), np.std(scores)))
print("Raw CV RMSE %.06f (STD %.04f)" % (np.mean(raw_scores), np.std(raw_scores)))


XGBRegressor(base_score=None, booster=None, colsample_bylevel=None,
             colsample_bynode=None, colsample_bytree=0.05, gamma=0.0,
             gpu_id=None, importance_type='gain', interaction_constraints=None,
             learning_rate=0.03162277660168379, max_delta_step=None,
             max_depth=2, min_child_weight=2.0, missing=nan,
             monotone_constraints=None, n_estimators=1134, n_jobs=-1,
             num_parallel_tree=None, random_state=42, reg_alpha=0.003162,
             reg_lambda=0.1, scale_pos_weight=None, subsample=0.6,
             tree_method=None, validate_parameters=None, verbosity=1)

Log1p CV RMSE 0.106893 (STD 0.0125)
Raw CV RMSE 18783.117031 (STD 2307.2858)
CPU times: user 27.4 ms, sys: 8.45 ms, total: 35.9 ms
Wall time: 1.66 s


In [11]:
# refactor for ray.tune
def my_xgb(config):
    
    # fix these configs for hyperopt
    config['max_depth'] += 2   # hyperopt needs left to start at 0 but we want to start at 2
    config['max_depth'] = int(config['max_depth'])
    config['n_estimators'] = int(config['n_estimators'])   # pass float eg loguniform distribution, use int
    
    xgb = XGBRegressor(
        objective='reg:squarederror',
        n_jobs=1,
        random_state=RANDOMSTATE,
        **config,
    )
    scores = np.sqrt(-cross_val_score(xgb, df[predictors], df[response],
                                      scoring="neg_mean_squared_error",
                                      cv=kfolds))
    tune.report(mse=np.mean(scores))
    return {'mse': np.mean(scores)}


In [12]:
config = {
    'max_depth': max_depth-2,
    'min_child_weight': min_child_weight,
    'subsample': subsample,
    'colsample_bytree': colsample_bytree,
    'reg_alpha': reg_alpha,
    'reg_lambda': reg_lambda,
    'gamma': reg_gamma,
    'learning_rate': learning_rate,
    'n_estimators': N_ESTIMATORS,
}

xgb = my_xgb(config)

print(xgb)


NameError: name 'max_depth' is not defined

## HyperOpt
https://conference.scipy.org/proceedings/scipy2013/pdfs/bergstra_hyperopt.pdf
https://github.com/hyperopt/hyperopt
http://hyperopt.github.io/hyperopt/
https://blog.dominodatalab.com/hyperopt-bayesian-hyperparameter-optimization/
    

In [16]:
NUM_SAMPLES=128

start_time = datetime.now()
print("%-20s %s" % ("Start Time", start_time))

algo = HyperOptSearch(random_state_seed=RANDOMSTATE)
# uncomment and set max_concurrent to limit number of cores
# algo = ConcurrencyLimiter(algo, max_concurrent=10)
scheduler = AsyncHyperBandScheduler()

tune_kwargs = {
    "num_samples": NUM_SAMPLES,
    "config": {
        "n_estimators": tune.loguniform(100, 10000),
        "max_depth": tune.randint(0, 6),
        'min_child_weight': tune.randint(0, 6),
        "subsample": tune.quniform(0.4, 0.9, 0.05),
        "colsample_bytree": tune.quniform(0.05, 0.8, 0.05),
        "reg_alpha": tune.loguniform(1e-04, 1),
        "reg_lambda": tune.loguniform(1e-04, 100),
        "gamma": 0,
        "learning_rate": tune.loguniform(0.001, 0.1)
        "wandb": {
            "project": "iowa",
            "api_key_file": "secrets/wandb.txt",
            "log_config": True
        }    
    }
}

analysis = tune.run(my_xgb,
                    name="xgb_hyperopt",
                    metric="mse",
                    mode="min",
                    search_alg=algo,
                    scheduler=scheduler,
                    verbose=1,
                    loggers=DEFAULT_LOGGERS + (WandbLogger, ),
                    **tune_kwargs)

end_time = datetime.now()
print("%-20s %s" % ("Start Time", start_time))
print("%-20s %s" % ("End Time", end_time))
print(str(timedelta(seconds=(end_time-start_time).seconds)))

Trial name,status,loc,colsample_bytree,gamma,learning_rate,max_depth,min_child_weight,n_estimators,reg_alpha,reg_lambda,subsample,iter,total time (s),mse
my_xgb_a9219292,TERMINATED,,0.7,0,0.0105391,0,4,435.406,0.000188888,9.90358,0.65,2,8.45924,0.203631
my_xgb_a9219293,TERMINATED,,0.3,0,0.00673344,3,2,428.153,0.538855,2.32463,0.6,2,5.8163,0.681993
my_xgb_a9219294,TERMINATED,,0.15,0,0.00133237,5,4,7988.18,0.00163546,19.9476,0.8,1,134.862,0.114781
my_xgb_a9219295,TERMINATED,,0.2,0,0.0187659,2,3,735.64,0.000262686,32.5338,0.6,2,11.313,0.11581
my_xgb_a9219296,TERMINATED,,0.6,0,0.00155207,0,3,4032.08,0.0246711,0.00107456,0.8,2,76.4402,0.124603
my_xgb_a9219297,TERMINATED,,0.15,0,0.00106609,1,5,2700.39,0.0127785,0.0424791,0.8,1,30.8362,0.661069
my_xgb_a9219298,TERMINATED,,0.4,0,0.00556109,3,2,238.15,0.000451262,43.2731,0.75,2,2.43737,3.25644
my_xgb_a9219299,TERMINATED,,0.55,0,0.0154611,0,1,5605.97,0.00272955,4.48938,0.7,2,104.367,0.109036
my_xgb_a921929a,TERMINATED,,0.5,0,0.00126552,5,3,4734.92,0.00967981,33.1173,0.65,1,143.681,0.1519
my_xgb_ab1a337e,TERMINATED,,0.75,0,0.00153686,2,2,678.644,0.306625,1.59856,0.9,1,11.6458,4.08285


2020-10-10 20:30:26,481	INFO tune.py:439 -- Total run time: 4627.55 seconds (4625.19 seconds for the tuning loop).


Start Time           2020-10-10 19:13:18.929332
End Time             2020-10-10 20:30:26.841673


'22:42:52'

In [24]:
str(timedelta(seconds=(start_time-end_time).seconds))

'0:02:07'

In [17]:
analysis.results_df.columns

Index(['mse', 'time_this_iter_s', 'done', 'timesteps_total', 'episodes_total',
       'training_iteration', 'experiment_id', 'date', 'timestamp',
       'time_total_s', 'pid', 'hostname', 'node_ip', 'time_since_restore',
       'timesteps_since_restore', 'iterations_since_restore', 'experiment_tag',
       'config.n_estimators', 'config.max_depth', 'config.min_child_weight',
       'config.subsample', 'config.colsample_bytree', 'config.reg_alpha',
       'config.reg_lambda', 'config.gamma', 'config.learning_rate'],
      dtype='object')

In [18]:
analysis_results_df = analysis.results_df[['mse', 'date', 'time_this_iter_s',
       'config.n_estimators', 'config.max_depth', 'config.min_child_weight', 'config.subsample',
       'config.colsample_bytree', 'config.reg_alpha', 'config.reg_lambda', 'config.gamma',
       'config.learning_rate']].sort_values('mse')
analysis_results_df


Unnamed: 0_level_0,mse,date,time_this_iter_s,config.n_estimators,config.max_depth,config.min_child_weight,config.subsample,config.colsample_bytree,config.reg_alpha,config.reg_lambda,config.gamma,config.learning_rate
trial_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
3080ae2e,0.105949,2020-10-10_19-41-01,0.093306,8397,3,1,0.45,0.25,0.000332,0.000525,0,0.004963
904770b2,0.106154,2020-10-10_20-20-16,0.003645,3252,3,3,0.50,0.10,0.000827,0.000102,0,0.016355
ee4d43ea,0.106205,2020-10-10_20-20-24,0.004241,7110,3,4,0.55,0.15,0.000102,0.011954,0,0.009865
b3a1218e,0.106234,2020-10-10_19-55-19,0.006352,8008,3,1,0.45,0.35,0.000154,0.000183,0,0.005256
a6a2f000,0.106246,2020-10-10_19-31-25,0.070678,7707,4,3,0.45,0.25,0.000120,0.000649,0,0.005331
...,...,...,...,...,...,...,...,...,...,...,...,...
c091cb4a,4.762144,2020-10-10_19-14-37,1.229263,110,4,2,0.60,0.35,0.228840,17.810306,0,0.008214
a0336ff4,5.151138,2020-10-10_20-10-59,10.486364,104,2,0,0.55,0.25,0.000547,0.077015,0,0.007725
585751e4,6.436485,2020-10-10_19-48-40,71.252799,481,3,1,0.40,0.45,0.000593,0.003833,0,0.001212
a1bfe016,6.976963,2020-10-10_19-58-23,65.139262,274,5,1,0.45,0.50,0.001270,0.004657,0,0.001832


In [19]:
max_depth = analysis_results_df.iloc[0]['config.max_depth']
min_child_weight = analysis_results_df.iloc[0]['config.min_child_weight']
subsample = analysis_results_df.iloc[0]['config.subsample']
colsample_bytree = analysis_results_df.iloc[0]['config.colsample_bytree']
reg_alpha = analysis_results_df.iloc[0]['config.reg_alpha']
reg_lambda = analysis_results_df.iloc[0]['config.reg_lambda']
reg_gamma = analysis_results_df.iloc[0]['config.gamma']
learning_rate = analysis_results_df.iloc[0]['config.learning_rate']
N_ESTIMATORS = analysis_results_df.iloc[0]['config.n_estimators']    


In [20]:
best_config = {
    'max_depth': max_depth,
    'min_child_weight': min_child_weight,
    'subsample': subsample,
    'colsample_bytree': colsample_bytree,
    'reg_alpha': reg_alpha,
    'reg_lambda': reg_lambda,
    'gamma': reg_gamma,
    'learning_rate': learning_rate,
    'n_estimators':  N_ESTIMATORS
}

xgb = XGBRegressor(
    objective='reg:squarederror',
    random_state=RANDOMSTATE,    
    verbosity=1,
    n_jobs=-1,
    **best_config
)
print(xgb)

scores = -cross_val_score(xgb, df[predictors], df[response],
                          scoring="neg_root_mean_squared_error",
                          cv=kfolds)

raw_scores = [cv_to_raw(x) for x in scores]
print()
print("Log1p CV RMSE %.06f (STD %.04f)" % (np.mean(scores), np.std(scores)))
print("Raw CV RMSE %.06f (STD %.04f)" % (np.mean(raw_scores), np.std(raw_scores)))
raw_scores = [cv_to_raw(x) for x in scores]


XGBRegressor(base_score=None, booster=None, colsample_bylevel=None,
             colsample_bynode=None, colsample_bytree=0.25, gamma=0, gpu_id=None,
             importance_type='gain', interaction_constraints=None,
             learning_rate=0.004962862480146764, max_delta_step=None,
             max_depth=3, min_child_weight=1, missing=nan,
             monotone_constraints=None, n_estimators=8397, n_jobs=-1,
             num_parallel_tree=None, random_state=42,
             reg_alpha=0.00033205448063754536, reg_lambda=0.0005251623626252946,
             scale_pos_weight=None, subsample=0.45, tree_method=None,
             validate_parameters=None, verbosity=1)

Log1p CV RMSE 0.105949 (STD 0.0120)
Raw CV RMSE 18607.426140 (STD 2227.8137)


In [12]:
# bayesopt
NUM_SAMPLES=128

start_time = datetime.now()
print("%-20s %s" % ("Start Time", start_time))

algo = BayesOptSearch(utility_kwargs={
    "kind": "ucb",
    "kappa": 2.5,
    "xi": 0.0
})

# uncomment and set max_concurrent to limit number of cores
# algo = ConcurrencyLimiter(algo, max_concurrent=10)
scheduler = AsyncHyperBandScheduler()

tune_kwargs = {
    "num_samples": NUM_SAMPLES,
    "config": {
        "n_estimators": tune.loguniform(100, 10000),
        "max_depth": tune.quniform(0, 6, 1),
        'min_child_weight': tune.quniform(0, 6, 1),
        "subsample": tune.quniform(0.4, 0.9, 0.05),
        "colsample_bytree": tune.quniform(0.05, 0.8, 0.05),
        "reg_alpha": tune.loguniform(1e-04, 1),
        "reg_lambda": tune.loguniform(1e-04, 100),
        "gamma": 0,
        "learning_rate": tune.loguniform(0.001, 0.1),
        "wandb": {
            "project": "iowa",
            "api_key_file": "secrets/wandb.txt",
            "log_config": True
        }    
    }
}

analysis = tune.run(my_xgb,
                    name="xgb_bayesopt",
                    metric="mse",
                    mode="min",
                    search_alg=algo,
                    scheduler=scheduler,
                    verbose=1,
                    loggers=DEFAULT_LOGGERS + (WandbLogger, ),
                    **tune_kwargs)

end_time = datetime.now()
print("%-20s %s" % ("Start Time", start_time))
print("%-20s %s" % ("End Time", end_time))
print(str(timedelta(seconds=(end_time-start_time).seconds)))

Trial name,status,loc,colsample_bytree,learning_rate,max_depth,min_child_weight,n_estimators,reg_alpha,reg_lambda,subsample,iter,total time (s),mse
my_xgb_ab5585ca,TERMINATED,,0.330905,0.0951207,4.39196,3.59195,1644.58,0.156079,5.80846,0.833088,1,115.305,0.116099
my_xgb_ab5585cb,TERMINATED,,0.500836,0.0710992,0.123507,5.81946,8341.18,0.212418,18.1826,0.491702,2,352.564,0.114678
my_xgb_ab5585cc,TERMINATED,,0.278182,0.0529509,2.59167,1.74737,6157.34,0.13958,29.2145,0.583181,1,315.873,0.116462
my_xgb_ab5585cd,TERMINATED,,0.392052,0.0787324,1.19804,3.08541,5964.9,0.0465458,60.7545,0.485262,1,304.088,0.117542
my_xgb_ab5585ce,TERMINATED,,0.0987887,0.0949397,5.79379,4.85038,3115.68,0.0977623,68.4233,0.620076,1,134.776,0.116519
my_xgb_ab5585cf,TERMINATED,,0.141529,0.0500225,0.206331,5.45592,2661.92,0.662556,31.1712,0.660034,2,24.8708,0.110889
my_xgb_ab5585d0,TERMINATED,,0.460033,0.0193006,5.81751,4.6508,9401.04,0.894838,59.79,0.860937,1,527.072,0.114999
my_xgb_ab5585d1,TERMINATED,,0.116369,0.0204023,0.271364,1.95198,3947.91,0.271422,82.8738,0.578377,1,37.0207,0.110967
my_xgb_ab5585d2,TERMINATED,,0.260701,0.0547269,0.845545,4.81318,838.051,0.986888,77.2245,0.499358,1,39.9745,0.120372
my_xgb_bd344e2a,TERMINATED,,0.0541416,0.0817307,4.24114,4.37404,7735.58,0.0741372,35.8466,0.457935,1,223.816,0.117329


2020-10-10 22:25:22,068	INFO tune.py:439 -- Total run time: 5936.10 seconds (5933.64 seconds for the tuning loop).


Start Time           2020-10-10 20:46:25.971966
End Time             2020-10-10 22:25:22.436346
1:38:56


In [14]:
analysis_results_df = analysis.results_df[['mse', 'date', 'time_this_iter_s',
       'config.n_estimators', 'config.max_depth', 'config.min_child_weight', 'config.subsample',
       'config.colsample_bytree', 'config.reg_alpha', 'config.reg_lambda', 'config.gamma',
       'config.learning_rate']].sort_values('mse')
analysis_results_df

Unnamed: 0_level_0,mse,date,time_this_iter_s,config.n_estimators,config.max_depth,config.min_child_weight,config.subsample,config.colsample_bytree,config.reg_alpha,config.reg_lambda,config.gamma,config.learning_rate
trial_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
aec612b4,0.108979,2020-10-10_22-07-09,29.829220,2458,2,4.637463,0.597422,0.111506,0.444802,5.230673,0,0.032992
2d5dfdfa,0.109755,2020-10-10_21-58-48,0.080215,2462,4,2.702549,0.567215,0.196221,0.280367,12.512252,0,0.022958
1ff1d9e6,0.110118,2020-10-10_21-00-05,0.070961,2464,2,0.370562,0.791494,0.100331,0.438020,48.311842,0,0.035826
4d551230,0.110348,2020-10-10_21-08-02,0.319639,3950,2,0.313258,0.535908,0.140109,0.038970,80.603354,0,0.054104
6bc0945e,0.110413,2020-10-10_22-01-27,17.358009,2458,2,3.268031,0.705119,0.662432,0.220652,14.186939,0,0.031578
...,...,...,...,...,...,...,...,...,...,...,...,...
06ab4e9a,0.134785,2020-10-10_22-10-31,105.000010,2456,3,0.194705,0.461668,0.267900,0.090546,9.129955,0,0.002555
95095af6,0.135519,2020-10-10_21-05-20,270.285912,2464,6,2.365190,0.821552,0.281026,0.123086,52.885852,0,0.002970
14338c3a,0.152903,2020-10-10_21-37-46,291.565789,2460,4,5.259133,0.883229,0.670976,0.486580,41.671012,0,0.002409
1111bf50,0.172168,2020-10-10_22-05-15,208.060097,2466,4,5.298071,0.830726,0.210578,0.187721,10.811405,0,0.002001


In [16]:
max_depth = analysis_results_df.iloc[0]['config.max_depth']
min_child_weight = analysis_results_df.iloc[0]['config.min_child_weight']
subsample = analysis_results_df.iloc[0]['config.subsample']
colsample_bytree = analysis_results_df.iloc[0]['config.colsample_bytree']
reg_alpha = analysis_results_df.iloc[0]['config.reg_alpha']
reg_lambda = analysis_results_df.iloc[0]['config.reg_lambda']
reg_gamma = analysis_results_df.iloc[0]['config.gamma']
learning_rate = analysis_results_df.iloc[0]['config.learning_rate']
N_ESTIMATORS = analysis_results_df.iloc[0]['config.n_estimators']    

best_config = {
    'max_depth': max_depth,
    'min_child_weight': min_child_weight,
    'subsample': subsample,
    'colsample_bytree': colsample_bytree,
    'reg_alpha': reg_alpha,
    'reg_lambda': reg_lambda,
    'gamma': reg_gamma,
    'learning_rate': learning_rate,
    'n_estimators':  N_ESTIMATORS
}

xgb = XGBRegressor(
    objective='reg:squarederror',
    random_state=RANDOMSTATE,    
    verbosity=1,
    n_jobs=-1,
    **best_config
)
print(xgb)

scores = -cross_val_score(xgb, df[predictors], df[response],
                          scoring="neg_root_mean_squared_error",
                          cv=kfolds)

raw_scores = [cv_to_raw(x) for x in scores]
print()
print("Log1p CV RMSE %.06f (STD %.04f)" % (np.mean(scores), np.std(scores)))
print("Raw CV RMSE %.06f (STD %.04f)" % (np.mean(raw_scores), np.std(raw_scores)))


XGBRegressor(base_score=None, booster=None, colsample_bylevel=None,
             colsample_bynode=None, colsample_bytree=0.15000000000000002,
             gamma=0, gpu_id=None, importance_type='gain',
             interaction_constraints=None, learning_rate=0.009178764335388945,
             max_delta_step=None, max_depth=3, min_child_weight=2.0,
             missing=nan, monotone_constraints=None, n_estimators=6274,
             n_jobs=-1, num_parallel_tree=None, random_state=42,
             reg_alpha=0.004690341140695094, reg_lambda=0.07189641064457597,
             scale_pos_weight=None, subsample=0.75, tree_method=None,
             validate_parameters=None, verbosity=1)

Log1p CV RMSE 0.105404 (STD 0.0122)
Raw CV RMSE 18506.866243 (STD 2247.7111)


In [12]:
# optuna
NUM_SAMPLES=128
optuna_xgb = my_xgb

start_time = datetime.now()
print("%-20s %s" % ("Start Time", start_time))

algo = OptunaSearch()
# uncomment and set max_concurrent to limit number of cores
# algo = ConcurrencyLimiter(algo, max_concurrent=10)
scheduler = AsyncHyperBandScheduler()

tune_kwargs = {
    "num_samples": NUM_SAMPLES,
    "config": {
        "n_estimators": tune.loguniform(100, 10000),
        "max_depth": tune.quniform(0, 6, 1),
        'min_child_weight': tune.quniform(0, 6, 1),
        "subsample": tune.quniform(0.4, 0.9, 0.05),
        "colsample_bytree": tune.quniform(0.05, 0.8, 0.05),
        "reg_alpha": tune.loguniform(1e-04, 1),
        "reg_lambda": tune.loguniform(1e-04, 100),
        "gamma": 0,
        "learning_rate": tune.loguniform(0.001, 0.1),
        "wandb": {
            "project": "iowa",
            "api_key_file": "secrets/wandb.txt",
            "log_config": True,
            "name": get_random_tag(6)
        }           
    }
}

analysis = tune.run(optuna_xgb,
                    name="xgb_optuna",
                    metric="mse",
                    mode="min",
                    search_alg=algo,
                    scheduler=scheduler,
                    verbose=1,
                    loggers=DEFAULT_LOGGERS + (WandbLogger, ),                    
                    **tune_kwargs)

end_time = datetime.now()
print("%-20s %s" % ("Start Time", start_time))
print("%-20s %s" % ("End Time", end_time))
print(str(timedelta(seconds=(end_time-start_time).seconds)))

Trial name,status,loc,colsample_bytree,learning_rate,max_depth,min_child_weight,n_estimators,reg_alpha,reg_lambda,subsample,iter,total time (s),mse
my_xgb_930d3bd4,TERMINATED,,0.15,0.0314171,6,5,6807.39,0.00575988,27.0103,0.9,2,128.999,0.11142
my_xgb_930d3bd5,TERMINATED,,0.15,0.0042362,3,5,167.418,0.0812339,0.00332861,0.5,2,3.27641,5.67701
my_xgb_930d3bd6,TERMINATED,,0.3,0.0104766,0,1,607.451,0.210119,0.0102476,0.9,2,10.3329,0.123781
my_xgb_930d3bd7,TERMINATED,,0.3,0.0238368,2,2,170.764,0.00602881,0.561848,0.4,2,6.33881,0.235453
my_xgb_930d3bd8,TERMINATED,,0.25,0.00305347,1,0,3817.32,0.00665748,0.0544615,0.65,2,60.2732,0.110122
my_xgb_930d3bd9,TERMINATED,,0.45,0.00870765,5,5,1147.77,0.0457967,0.0429545,0.9,2,50.9841,0.110807
my_xgb_930d3bda,TERMINATED,,0.55,0.00203052,5,1,2174.38,0.050622,1.31777,0.65,1,75.8827,0.195515
my_xgb_930d3bdb,TERMINATED,,0.65,0.00104443,5,3,6833.58,0.557335,0.000138857,0.65,1,890.088,0.116843
my_xgb_930d3bdc,TERMINATED,,0.05,0.0178592,0,0,5758.97,0.999132,72.5317,0.55,2,40.0678,0.111974
my_xgb_97ad1042,TERMINATED,,0.1,0.00106316,6,4,701.528,0.00424684,96.2884,0.85,1,4.42204,5.81415


2020-10-11 03:57:17,336	INFO tune.py:439 -- Total run time: 10859.68 seconds (10857.27 seconds for the tuning loop).


Start Time           2020-10-11 00:56:17.650110
End Time             2020-10-11 03:57:17.752208
3:01:00


In [13]:
analysis_results_df = analysis.results_df[['mse', 'date', 'time_this_iter_s',
       'config.n_estimators', 'config.max_depth', 'config.min_child_weight', 'config.subsample',
       'config.colsample_bytree', 'config.reg_alpha', 'config.reg_lambda', 'config.gamma',
       'config.learning_rate']].sort_values('mse')
analysis_results_df

Unnamed: 0_level_0,mse,date,time_this_iter_s,config.n_estimators,config.max_depth,config.min_child_weight,config.subsample,config.colsample_bytree,config.reg_alpha,config.reg_lambda,config.gamma,config.learning_rate
trial_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
de28909c,0.105995,2020-10-11_03-54-01,0.010888,4675,4,4.0,0.50,0.10,0.000161,0.148360,0,0.013047
e2625d1e,0.106060,2020-10-11_03-55-28,2.107011,5010,4,4.0,0.60,0.10,0.000153,0.145994,0,0.009320
baafac72,0.106665,2020-10-11_03-53-40,0.044884,6156,3,5.0,0.50,0.10,0.046471,0.075936,0,0.006274
5e9b6ff8,0.106832,2020-10-11_03-42-55,0.069611,2996,4,4.0,0.55,0.15,0.000119,0.464172,0,0.014943
e04d44d0,0.106884,2020-10-11_03-55-25,0.004738,6034,4,4.0,0.50,0.15,0.000149,0.143899,0,0.007961
...,...,...,...,...,...,...,...,...,...,...,...,...
99bc355c,3.337908,2020-10-11_00-56-52,17.239963,754,4,2.0,0.50,0.35,0.079077,0.007286,0,0.001644
a8e0abb2,4.166211,2020-10-11_00-57-04,3.944388,221,3,0.0,0.75,0.30,0.004161,0.024865,0,0.004599
930d3bd5,5.677007,2020-10-11_00-56-24,0.009199,167,5,5.0,0.50,0.15,0.081234,0.003329,0,0.004236
97ad1042,5.814153,2020-10-11_00-56-35,4.422036,701,8,4.0,0.85,0.10,0.004247,96.288407,0,0.001063


In [14]:
max_depth = analysis_results_df.iloc[0]['config.max_depth']
min_child_weight = analysis_results_df.iloc[0]['config.min_child_weight']
subsample = analysis_results_df.iloc[0]['config.subsample']
colsample_bytree = analysis_results_df.iloc[0]['config.colsample_bytree']
reg_alpha = analysis_results_df.iloc[0]['config.reg_alpha']
reg_lambda = analysis_results_df.iloc[0]['config.reg_lambda']
reg_gamma = analysis_results_df.iloc[0]['config.gamma']
learning_rate = analysis_results_df.iloc[0]['config.learning_rate']
N_ESTIMATORS = analysis_results_df.iloc[0]['config.n_estimators']    

best_config = {
    'max_depth': max_depth,
    'min_child_weight': min_child_weight,
    'subsample': subsample,
    'colsample_bytree': colsample_bytree,
    'reg_alpha': reg_alpha,
    'reg_lambda': reg_lambda,
    'gamma': reg_gamma,
    'learning_rate': learning_rate,
    'n_estimators':  N_ESTIMATORS
}

xgb = XGBRegressor(
    objective='reg:squarederror',
    random_state=RANDOMSTATE,    
    verbosity=1,
    n_jobs=-1,
    **best_config
)
print(xgb)

scores = -cross_val_score(xgb, df[predictors], df[response],
                          scoring="neg_root_mean_squared_error",
                          cv=kfolds)

raw_scores = [cv_to_raw(x) for x in scores]
print()
print("Log1p CV RMSE %.06f (STD %.04f)" % (np.mean(scores), np.std(scores)))
print("Raw CV RMSE %.06f (STD %.04f)" % (np.mean(raw_scores), np.std(raw_scores)))
raw_scores = [cv_to_raw(x) for x in scores]

XGBRegressor(base_score=None, booster=None, colsample_bylevel=None,
             colsample_bynode=None, colsample_bytree=0.1, gamma=0, gpu_id=None,
             importance_type='gain', interaction_constraints=None,
             learning_rate=0.01304679013796074, max_delta_step=None,
             max_depth=4, min_child_weight=4.0, missing=nan,
             monotone_constraints=None, n_estimators=4675, n_jobs=-1,
             num_parallel_tree=None, random_state=42,
             reg_alpha=0.00016078583273545032, reg_lambda=0.14836039021092462,
             scale_pos_weight=None, subsample=0.5, tree_method=None,
             validate_parameters=None, verbosity=1)

Log1p CV RMSE 0.105995 (STD 0.0133)
Raw CV RMSE 18618.765060 (STD 2454.0224)
