# Hyperparameter tuning with XGBoost, Ray Tune, Hyperopt and BayesOpt

Finding the best hyperparameters for complex modern machine learning algorithms is time-consuming. XGBoost, LightGBM, and neural networks have so many tuning parameters and combinations that a fine-grained grid search may be infeasible.

Fortunately, modern hyperparameter tuning algos, like HyperOpt and Optuna, can run many tests concurrently on a single machine or on a cluster, accelerating the tuning process, saving time and yielding better hyperparameters. This post will demonstrate speeding hyperparameter tuning using Ray Tune, HyperOpt and BayesOpt on a clusters to significantly accelerate tuning.

I will use this [Housing Prices Competition for Kaggle Learn Users](https://www.kaggle.com/c/house-prices-advanced-regression-techniques) . The response we are predicting is the log-transformed SalePrice based on house features like square feet, neighborhood location, property features like pool, condition. I already did [some feature engineering and feature selection](https://github.com/druce/iowa) and my submission was top 5% when I submitted it in 2019.

Outline:
- Baseline linear regression with no hyperparameters
- ElasticNet with L1 and L2 regularization using ElasticNetCV hyperparameter optimization
- ElasticNet with GridSearchCV hyperparameter optimization
- XGBoost with sequential grid search over hyperparameter subsets with early stopping 
- XGBoost with Ray, HyperOpt and BayesOpt search algorithms
- Accelerate advanced algorithms with a Ray cluster


| ML Algo           | Hyperparameter search algo   | CV Error (RMSE in $)  | Time     |
|-------------------|------------------------------|-----------------------|----------|
| Linear Regression | None                         | $18192                |   0:01s  |
| ElasticNet        | ElasticNetCV (Grid Search)   | $18122                |   0:02s  |          
| ElasticNet        | GridSearchCV                 | $18061                |   0:05s  |          
| XGB               | Sequential Grid Search       | $18783                |   36:09  |
| XGB               | HyperOpt (128 samples)       | $18808                |   21:41  |
| XGB               | BayesOpt                     | $18506                | 1:15:04  |
| XGB               | Optuna                       | $18618

In [1]:
from itertools import product
from datetime import datetime, timedelta
import os
import random
import string

import numpy as np
import pandas as pd

import sklearn
from sklearn.linear_model import LinearRegression, ElasticNet, ElasticNetCV, Ridge, RidgeCV
from sklearn.model_selection import train_test_split, cross_val_score, cross_val_predict, GridSearchCV, KFold
from sklearn.preprocessing import RobustScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.pipeline import make_pipeline

#!conda install -y -c conda-forge  xgboost 
import xgboost
from xgboost import XGBRegressor
from xgboost import plot_importance

import ray
from ray import tune
from ray.tune.suggest import ConcurrencyLimiter
from ray.tune.schedulers import AsyncHyperBandScheduler
from ray.tune.suggest.bayesopt import BayesOptSearch
from ray.tune.suggest.hyperopt import HyperOptSearch
from ray.tune.suggest.optuna import OptunaSearch
from ray.tune.logger import DEFAULT_LOGGERS
from ray.tune.integration.wandb import WandbLogger

# pip install hyperopt
# pip install optuna

import wandb
os.environ['WANDB_NOTEBOOK_NAME'] = 'hyperparameter_optimization.ipynb'

print(datetime.now())

print ("%-20s %s"% ("numpy", np.__version__))
print ("%-20s %s"% ("pandas", pd.__version__))
print ("%-20s %s"% ("sklearn", sklearn.__version__))
print ("%-20s %s"% ("xgboost", xgboost.__version__))
print ("%-20s %s"% ("ray", ray.__version__))


2020-10-11 05:27:09.616465
numpy                1.19.1
pandas               1.1.3
sklearn              0.23.2
xgboost              1.2.0
ray                  1.1.0.dev0


In [2]:
# set seed for reproducibility
RANDOMSTATE = 42
np.random.seed(RANDOMSTATE)


In [3]:
def get_random_tag(length):
    """random tag for experiments"""
    letters_and_digits = string.ascii_letters + string.digits
    result_str = ''.join((random.choice(letters_and_digits) for i in range(length)))
    return result_str.upper()

get_random_tag(8)

'5R6LESKK'

In [4]:
# import train data
df = pd.read_pickle('x4.pickle')

response = 'SalePrice'
predictors = ['YearBuilt',
              'BsmtFullBath',
              'FullBath',
              'KitchenAbvGr',
              'GarageYrBlt',
              'LotFrontage',
              'MasVnrArea',
              '1stFlrSF',
              'GrLivArea',
              'GarageArea',
              'WoodDeckSF',
              'PorchSF',
              'AvgBltRemod',
              'FireBathRatio',
              'TotalSF x OverallQual x OverallCond',
              'AvgBltRemod x Functional x TotalFinSF',
              'Functional x OverallQual',
              'KitchenAbvGr x KitchenQual',
              'GarageCars x GarageYrBlt',
              'GarageQual x GarageCond x GarageCars',
              'HeatingQC x Heating',
              'monthnum',
              'log_YearBuilt',
              'log_LotArea',
              'log_TotalFinSF',
              'log_GarageRatio',
              'log_TotalSF x OverallQual x OverallCond',
              'log_TotalSF x OverallCond',
              'log_AvgBltRemod x TotalFinSF',
              'sq_2ndFlrSF',
              'sq_BsmtFinSF',
              'sq_BsmtFinSF x BsmtQual',
              'sq_BsmtFinSF x BsmtBath',
              'BldgType_4',
              'BsmtExposure_1',
              'BsmtExposure_4',
              'BsmtFinType1_1',
              'BsmtFinType1_2',
              'BsmtFinType1_4',
              'BsmtFinType1_5',
              'BsmtFinType1_6',
              'CentralAir_0',
              'CentralAir_1',
              'Condition1_1',
              'Condition1_3',
              'ExterCond_2',
              'ExterQual_2',
              'Exterior1st_4',
              'Exterior1st_5',
              'Exterior1st_10',
              'Fence_0',
              'Fence_2',
              'Foundation_1',
              'Foundation_5',
              'GarageCars_1',
              'GarageFinish_2',
              'GarageFinish_3',
              'GarageType_2',
              'HouseStyle_2',
              'KitchenQual_4',
              'LotConfig_0',
              'LotConfig_4',
              'MSSubClass_30',
              'MSSubClass_70',
              'MSZoning_0',
              'MSZoning_1',
              'MSZoning_4',
              'MasVnrType_2',
              'MasVnrType_3',
              'MoSold_1',
              'MoSold_5',
              'MoSold_6',
              'MoSold_11',
              'Neighborhood_3',
              'Neighborhood_4',
              'Neighborhood_5',
              'Neighborhood_10',
              'Neighborhood_11',
              'Neighborhood_16',
              'Neighborhood_17',
              'Neighborhood_19',
              'Neighborhood_22',
              'Neighborhood_24',
              'OverallCond_7',
              'OverallQual_5',
              'OverallQual_6',
              'OverallQual_7',
              'OverallQual_9',
              'PavedDrive_0',
              'PavedDrive_2',
              'SaleCondition_1',
              'SaleCondition_2',
              'SaleCondition_5',
              'SaleType_4',
              'BedroomAbvGr_1',
              'BedroomAbvGr_4',
              'BedroomAbvGr_5',
              'HalfBath_1',
              'TotalBath_1.0',
              'TotalBath_2.5']

X_train, X_test, y_train, y_test = train_test_split(df, df[response], test_size=.25)

display(df[predictors].head())
display(df[[response]].head())


Unnamed: 0_level_0,YearBuilt,BsmtFullBath,FullBath,KitchenAbvGr,GarageYrBlt,LotFrontage,MasVnrArea,1stFlrSF,GrLivArea,GarageArea,...,SaleCondition_1,SaleCondition_2,SaleCondition_5,SaleType_4,BedroomAbvGr_1,BedroomAbvGr_4,BedroomAbvGr_5,HalfBath_1,TotalBath_1.0,TotalBath_2.5
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,7,1,2,1,7,65.0,196.0,856,1710,548.0,...,0,0,0,1,0,0,0,1,0,0
2,34,0,2,1,34,80.0,0.0,1262,1262,460.0,...,0,0,0,1,0,0,0,0,0,1
3,9,1,2,1,9,68.0,162.0,920,1786,608.0,...,0,0,0,1,0,0,0,1,0,0
4,95,1,1,1,12,60.0,0.0,961,1717,642.0,...,1,0,0,1,0,0,0,0,0,0
5,10,1,2,1,10,84.0,350.0,1145,2198,836.0,...,0,0,0,1,0,1,0,1,0,0


Unnamed: 0_level_0,SalePrice
Id,Unnamed: 1_level_1
1,12.247699
2,12.109016
3,12.317171
4,11.849405
5,12.42922


In [5]:
# we are training on a response which is the log of 1 + the sale price
# transform prediction back to original basis with expm1 and evaluate vs. original

def evaluate(y_train, y_pred_train, y_test, y_pred_test):
    """evaluate in train_test split"""
    print('Train RMSE', np.sqrt(mean_squared_error(np.expm1(y_train), np.expm1(y_pred_train))))
    print('Train R-squared', r2_score(np.expm1(y_train), np.expm1(y_pred_train)))
    print('Train MAE', mean_absolute_error(np.expm1(y_train), np.expm1(y_pred_train)))
    print()
    print('Test RMSE', np.sqrt(mean_squared_error(np.expm1(y_test), np.expm1(y_pred_test))))
    print('Test R-squared', r2_score(np.expm1(y_test), np.expm1(y_pred_test)))
    print('Test MAE', mean_absolute_error(np.expm1(y_test), np.expm1(y_pred_test)))

MEAN_RESPONSE=df[response].mean()
def cv_to_raw(cv_val):
    """convert log1p rmse to underlying SalePrice error"""
    return np.expm1(MEAN_RESPONSE+cv_val) - np.expm1(MEAN_RESPONSE)

In [6]:
# always use same k-folds for reproducibility
kfolds = KFold(n_splits=10, shuffle=True, random_state=RANDOMSTATE)


## Baseline linear regression
- Raw CV RMSE 18191.9791
- Wall time 2.81 s

In [7]:
%%time
# Tune lr search space for alphas and l1_ratio
print("LinearRegression")

print(len(predictors), "predictors")

lr = LinearRegression()

#train and evaluate in train/test split
lr.fit(X_train[predictors], y_train)

y_pred_train = lr.predict(X_train[predictors])
y_pred_test = lr.predict(X_test[predictors])
evaluate(y_train, y_pred_train, y_test, y_pred_test)

# evaluate using kfolds, same process as train/test split but average results over 10 folds
# more sample-efficient, less CPU-efficient

scores = -cross_val_score(lr, df[predictors], df[response],
                          scoring="neg_root_mean_squared_error",
                          cv=kfolds,
                          n_jobs=-1)
raw_scores = [cv_to_raw(x) for x in scores]
print()
print("Log1p CV RMSE %.04f (STD %.04f)" % (np.mean(scores), np.std(scores)))
print("Raw CV RMSE %.04f (STD %.04f)" % (np.mean(raw_scores), np.std(raw_scores)))


LinearRegression
100 predictors
Train RMSE 16551.16362069584
Train R-squared 0.955449013087463
Train MAE 10998.860387471454

Test RMSE 17846.171920180186
Test R-squared 0.936430176939307
Test MAE 12755.395673705598

Log1p CV RMSE 0.1037 (STD 0.0099)
Raw CV RMSE 18191.9791 (STD 1838.6678)
CPU times: user 255 ms, sys: 280 ms, total: 535 ms
Wall time: 921 ms


## Native Sklearn xxxCV
- LogisticRegressionCV, LassoCV, RidgeCV, ElasticNetCV, etc.
- Test many hyperparameters in parallel with multithreading
- Note improvement vs. LinearRegression due to controlling overfitting
- RMSE $18103
- Time 5s


In [8]:
%%time
# Tune elasticnet search space for alphas and L1_ratio
# predictor selection used to create the training set used lasso
# so l1 parameter is close to 0
# could use ridge (eg elasticnet with 0 L1 regularization)
# but then only 1 param, more general and useful to do this with elasticnet
print("ElasticnetCV")

# make pipeline
# with regularization must scale predictors
elasticnetcv = make_pipeline(RobustScaler(),
                             ElasticNetCV(max_iter=100000, 
                                          l1_ratio=[0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99],
                                          alphas=np.logspace(-4, -2, 9),
                                          cv=kfolds,
                                          n_jobs=-1,
                                          verbose=1,
                                         ))

#train and evaluate in train/test split
elasticnetcv.fit(X_train[predictors], y_train)

y_pred_train = elasticnetcv.predict(X_train[predictors])
y_pred_test = elasticnetcv.predict(X_test[predictors])
evaluate(y_train, y_pred_train, y_test, y_pred_test)
l1_ratio = elasticnetcv._final_estimator.l1_ratio_
alpha = elasticnetcv._final_estimator.alpha_
print('l1_ratio', l1_ratio)
print('alpha', alpha)

# evaluate using kfolds on full dataset
# I don't see API to get CV error from elasticnetcv, so we use cross_val_score
elasticnet = ElasticNet(alpha=alpha,
                        l1_ratio=l1_ratio,
                        max_iter=10000)

scores = -cross_val_score(elasticnet, df[predictors], df[response],
                          scoring="neg_root_mean_squared_error",
                          cv=kfolds,
                          n_jobs=-1)
raw_scores = [cv_to_raw(x) for x in scores]
print()
print("Log1p CV RMSE %.04f (STD %.04f)" % (np.mean(scores), np.std(scores)))
print("Raw CV RMSE %.04f (STD %.04f)" % (np.mean(raw_scores), np.std(raw_scores)))


ElasticnetCV


[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 8 concurrent workers.
........................................................................................................................................................................................................................................................................................................................................[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    0.4s
.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Train RMSE 16782.099077546347
Train R-squared 0.9541971157728129
Train MAE 11025.896226946621

Test RMSE 17457.333905669526
Test R-squared 0.9391701570474736
Test MAE 12389.820674779758
l1_ratio 0.01
alpha 0.005623413251903491

Log1p CV RMSE 0.1033 (STD 0.0112)
Raw CV RMSE 18122.0127 (STD 2074.9545)
CPU times: user 6.22 s, sys: 4.75 s, total: 11 s
Wall time: 1.66 s


## GridSearchCV
- Useful for algos with no native multithreaded xxxCV
- Test many hyperparameter combinations in parallel with multithreading
- Similar result vs ElasticNetCV, not exact, need more research as to why


In [9]:
%%time
gs = make_pipeline(RobustScaler(),
                   GridSearchCV(ElasticNet(max_iter=100000),
                                param_grid={'l1_ratio': [0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99],
                                            'alpha': np.logspace(-4, -2, 9),
                                           },
                                scoring='neg_mean_squared_error',
                                refit=True,
                                cv=kfolds,
                                n_jobs=-1,
                                verbose=1
                               ))

# do cv using kfolds on full dataset
print("\nCV on full dataset")
gs.fit(df[predictors], df[response])
print('best params', gs._final_estimator.best_params_)
print('best score', -gs._final_estimator.best_score_)
l1_ratio = gs._final_estimator.best_params_['l1_ratio']
alpha = gs._final_estimator.best_params_['alpha']

elasticnet = ElasticNet(alpha=alpha,
                        l1_ratio=l1_ratio,
                        max_iter=100000)
print(elasticnet)

scores = -cross_val_score(elasticnet, df[predictors], df[response],
                          scoring="neg_root_mean_squared_error",
                          cv=kfolds,
                          n_jobs=-1)
raw_scores = [cv_to_raw(x) for x in scores]
print()
print("Log1p CV RMSE %.06f (STD %.04f)" % (np.mean(scores), np.std(scores)))
print("Raw CV RMSE %.06f (STD %.04f)" % (np.mean(raw_scores), np.std(raw_scores)))

# difference in average CV scores reported by GridSearchCV and cross_val_score
# with same alpha, l1_ratio, kfolds
# one reason could be that we used simple average, GridSearchCV is weighted by # of samples per fold?
nsamples = [len(z[1]) for z in kfolds.split(df)]
print("weighted average %.06f" % np.average(scores, weights=nsamples))
# not sure why, also ElasticSearchCV shows fewer fits, takes less time



CV on full dataset
Fitting 10 folds for each of 117 candidates, totalling 1170 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  56 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 1123 tasks      | elapsed:    3.8s
[Parallel(n_jobs=-1)]: Done 1170 out of 1170 | elapsed:    3.9s finished


best params {'alpha': 0.0031622776601683794, 'l1_ratio': 0.01}
best score 0.010637685614240399
ElasticNet(alpha=0.0031622776601683794, l1_ratio=0.01, max_iter=100000)

Log1p CV RMSE 0.103003 (STD 0.0109)
Raw CV RMSE 18060.902698 (STD 2008.2407)
weighted average 0.103023
CPU times: user 963 ms, sys: 424 ms, total: 1.39 s
Wall time: 4.42 s


In [10]:
# roll-our-own CV 
# matches cross_val_score
alpha = 0.0031622776601683794
l1_ratio = 0.01
regressor = ElasticNet(alpha=alpha,
                       l1_ratio=l1_ratio,
                       max_iter=10000)
print(regressor)
cverrors = []
for train_fold, cv_fold in kfolds.split(df): 
    fold_X_train=df[predictors].values[train_fold]
    fold_y_train=df[response].values[train_fold]
    fold_X_test=df[predictors].values[cv_fold]
    fold_y_test=df[response].values[cv_fold]
    regressor.fit(fold_X_train, fold_y_train)
    y_pred_test=regressor.predict(fold_X_test)
    cverrors.append(np.sqrt(mean_squared_error(fold_y_test, y_pred_test)))
    
print("%.06f" % np.average(cverrors))
    

ElasticNet(alpha=0.0031622776601683794, l1_ratio=0.01, max_iter=10000)
0.103003


## XGBoost CV 
- XGBoost has native multithreading, CV
- XGBoost has many tuning parameters so a complete grid search has an unreasonable number of combinations
- We tune reduced sets sequentially and use early stopping. 

### Tuning methodology
- Set an initial set of starting parameters
- Do 10-fold CV
- Use early stopping to halt training in each fold if no improvement after eg 100 rounds, pick hyperparameters to minimize average error over kfolds
- Tune sequentially on groups of hyperparameters that don't interact too much between groups to reduce combinations
- Tune max_depth and min_child_weight 
- Tune subsample and colsample_bytree
- Tune alpha, lambda and gamma (regularization)
- Tune learning rate: lower learning rate will need more rounds/n_estimators
- Retrain on full dataset with best learning rate and best n_estimators (average stopping point over kfolds)

### Notes
- It doesn't seem possible to get XGBoost early stopping and also use GridSearchCV. GridSearchCV doesn't pass the kfolds in a way that XGboost understands for early stopping
- 2 alternative approaches 
    - use native xgboost .cv which understands early stopping but doesn't use sklearn API (uses DMatrix, not np array or dataframe)
    - use sklearn API and roll our own grid search instead of GridSearchCV (used below)
- XGboost terminology differs from sklearn
    - boost_rounds = n_estimators
    - eta = learning_rate
- parameter reference: https://xgboost.readthedocs.io/en/latest/parameter.html
- training reference: https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.training
- times are wall times on an amazon t2.2xlarge instance with 
- to set up environment:
    - `conda create --name hyperparam python=3.8`
    - `conda activate hyperparam`
    - `conda install jupyter`
    - `pip install -r requirements.txt`
- round 1 Wall time: 6min 23s
- round 2 Wall time: 19min 22s
- round 3 Wall time: 5min 30s
- round 4 Wall time: 4min 54s
- total time 36:09
- RMSE 18783.117031

In [11]:
%%time
# this cell runs a single round
# the full process is 
# set initial XGboost parameters
# remove overrides (search for TODO: in this cell)
# run round 1 and override initial max_depth, min_child_weight based on best values (search for TODO:)
# run round 2 and override subsample and colsample_bytree based on best values
# run round 3 and override reg_alpha, reg_lambda, reg_gamma based on best values
# run round 4 and obtain learning_rate and best n_iterations
# this is not an exhaustive list but a representative list of most important parameters to tune
# see https://xgboost.readthedocs.io/en/latest/parameter.html for all parameters

# in XGBoost > 1.0.2 this seems to give a warning
# Parameters: { early_stopping_rounds } might not be used.
# but early stopping seems to be used correctly

max_depth = 5
min_child_weight=5
colsample_bytree = 0.5
subsample = 0.5
reg_alpha = 1e-05
reg_lambda = 1
reg_gamma = 0
learning_rate = 0.01

BOOST_ROUNDS=50000   # we use early stopping so make this arbitrarily high
EARLY_STOPPING_ROUNDS=100 # stop if no improvement after 100 rounds

# round 1: tune depth and min_child_weight
max_depths = list(range(1,5))
min_child_weights = list(range(1,5))
gridsearch_params_1 = product(max_depths, min_child_weights)

# round 2: tune subsample and colsample_bytree
subsamples = np.linspace(0.1, 1.0, 10)
colsample_bytrees = np.linspace(0.1, 1.0, 10)
gridsearch_params_2 = product(subsamples, colsample_bytrees)

# round 2 (refined): tune subsample and colsample_bytree
subsamples = np.linspace(0.4, 0.8, 9)
colsample_bytrees = np.linspace(0.05, 0.25, 5)
gridsearch_params_2 = product(subsamples, colsample_bytrees)

# round 3: tune alpha, lambda, gamma
reg_alphas = np.logspace(-3, -2, 3)
reg_lambdas = np.logspace(-2, 1, 4)
reg_gammas = [0]
#reg_gammas = np.linspace(0, 5, 6)
gridsearch_params_3 = product(reg_alphas, reg_lambdas, reg_gammas)

# round 4: learning rate
learning_rates = reversed(np.logspace(-3, -1, 5).tolist())
gridsearch_params_4 = learning_rates

# TODO: remove these overrides to reset the search
# override initial parameters after search
# round 1:
max_depth=2
min_child_weight=2
# # round 2:
subsample=0.60
colsample_bytree=0.05
# # round 3:  
reg_alpha = 0.003162
reg_lambda = 0.1
reg_gamma = 0

def my_cv(df, predictors, response, kfolds, regressor, verbose=False):
    """Roll our own CV over kfolds with early stopping"""
    metrics = []
    best_iterations = []

    for train_fold, cv_fold in kfolds.split(df): 
        fold_X_train=df[predictors].values[train_fold]
        fold_y_train=df[response].values[train_fold]
        fold_X_test=df[predictors].values[cv_fold]
        fold_y_test=df[response].values[cv_fold]
        regressor.fit(fold_X_train, fold_y_train,
                      early_stopping_rounds=EARLY_STOPPING_ROUNDS,
                      eval_set=[(fold_X_test, fold_y_test)],
                      eval_metric='rmse',
                      verbose=verbose
                     )
        y_pred_test=regressor.predict(fold_X_test)
        metrics.append(np.sqrt(mean_squared_error(fold_y_test, y_pred_test)))
        best_iterations.append(xgb.best_iteration)
    return np.average(metrics), np.std(metrics), np.average(best_iterations)

results = []
best_iterations = []

# TODO: iteratively uncomment 1 of the following 4 lines
# for i, (max_depth, min_child_weight) in enumerate(gridsearch_params_1): # round 1
# for i, (subsample, colsample_bytree) in enumerate(gridsearch_params_2): # round 2
# for i, (reg_alpha, reg_lambda, reg_gamma) in enumerate(gridsearch_params_3): # round 3
for i, learning_rate in enumerate(gridsearch_params_4): # round 4

    params = {
        'max_depth': max_depth,
        'min_child_weight': min_child_weight,
        'subsample': subsample,
        'colsample_bytree': colsample_bytree,
        'reg_alpha': reg_alpha,
        'reg_lambda': reg_lambda,
        'gamma': reg_gamma,
        'learning_rate': learning_rate,
    }
    print("%s params  %3d: %s" % (datetime.strftime(datetime.now(), "%T"), i, params))
    xgb = XGBRegressor(
        objective='reg:squarederror',
        n_estimators=BOOST_ROUNDS,
        early_stopping_rounds=EARLY_STOPPING_ROUNDS,
        random_state=RANDOMSTATE,    
        verbosity=1,
        n_jobs=-1,
        booster='gbtree',   
        base_score=0.5, 
        scale_pos_weight=1        
        **params
    )
    
    metric_rmse, metric_std, best_iteration = my_cv(df, predictors, response, kfolds, xgb, verbose=False)    
    results.append([max_depth, min_child_weight, subsample, colsample_bytree, reg_alpha, reg_lambda, reg_gamma, 
                   learning_rate, metric_rmse, metric_std, best_iteration])
    
    print("%s %3d result mean: %.6f std: %.6f, iter: %.2f" % (datetime.strftime(datetime.now(), "%T"), i, metric_rmse, metric_std, best_iteration))


results_df = pd.DataFrame(results, columns=['max_depth', 'min_child_weight', 'subsample', 'colsample_bytree', 
                               'reg_alpha', 'reg_lambda', 'reg_gamma', 'learning_rate', 'rmse', 'std', 'best_iter']).sort_values('rmse')
results_df


04:04:57 params    0: {'max_depth': 2, 'min_child_weight': 2, 'subsample': 0.6, 'colsample_bytree': 0.05, 'reg_alpha': 0.003162, 'reg_lambda': 0.1, 'gamma': 0, 'learning_rate': 0.1}
Parameters: { early_stopping_rounds } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { early_stopping_rounds } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { early_stopping_rounds } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip thro

Parameters: { early_stopping_rounds } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { early_stopping_rounds } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { early_stopping_rounds } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { early_stopping_rounds } might not be used.

  This may not be accurate due to some parameters a

Parameters: { early_stopping_rounds } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { early_stopping_rounds } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { early_stopping_rounds } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { early_stopping_rounds } might not be used.

  This may not be accurate due to some parameters a

Unnamed: 0,max_depth,min_child_weight,subsample,colsample_bytree,reg_alpha,reg_lambda,reg_gamma,learning_rate,rmse,std,best_iter
1,2,2,0.6,0.05,0.003162,0.1,0,0.031623,0.105996,0.012395,1134.8
2,2,2,0.6,0.05,0.003162,0.1,0,0.01,0.106046,0.013608,2888.1
3,2,2,0.6,0.05,0.003162,0.1,0,0.003162,0.106838,0.013188,7783.7
4,2,2,0.6,0.05,0.003162,0.1,0,0.001,0.107344,0.013132,22413.3
0,2,2,0.6,0.05,0.003162,0.1,0,0.1,0.109786,0.010449,476.0


In [12]:
max_depth = int(results_df.iloc[0]['max_depth'])
min_child_weight = results_df.iloc[0]['min_child_weight']
subsample = results_df.iloc[0]['subsample']
colsample_bytree = results_df.iloc[0]['colsample_bytree']
reg_alpha = results_df.iloc[0]['reg_alpha']
reg_lambda = results_df.iloc[0]['reg_lambda']
reg_gamma = results_df.iloc[0]['reg_gamma']
learning_rate = results_df.iloc[0]['learning_rate']
N_ESTIMATORS = int(results_df.iloc[0]['best_iter'])

params = {
    'max_depth': int(max_depth),
    'min_child_weight': min_child_weight,
    'subsample': subsample,
    'colsample_bytree': colsample_bytree,
    'reg_alpha': reg_alpha,
    'reg_lambda': reg_lambda,
    'gamma': reg_gamma,
    'learning_rate': learning_rate,
    'n_estimators': N_ESTIMATORS,    
}

print(params)

{'max_depth': 2, 'min_child_weight': 2.0, 'subsample': 0.6, 'colsample_bytree': 0.05, 'reg_alpha': 0.003162, 'reg_lambda': 0.1, 'gamma': 0.0, 'learning_rate': 0.03162277660168379, 'n_estimators': 1134}


In [13]:
%%time
# evaluate without early stopping

xgb = XGBRegressor(
    objective='reg:squarederror',
    random_state=RANDOMSTATE,    
    verbosity=1,
    n_jobs=-1,
    booster='gbtree',   
    base_score=0.5, 
    scale_pos_weight=1        
    **params
)
print(xgb)

scores = -cross_val_score(xgb, df[predictors], df[response],
                          scoring="neg_root_mean_squared_error",
                          cv=kfolds,
                          n_jobs=-1)
raw_scores = [cv_to_raw(x) for x in scores]
print()
print("Log1p CV RMSE %.06f (STD %.04f)" % (np.mean(scores), np.std(scores)))
print("Raw CV RMSE %.06f (STD %.04f)" % (np.mean(raw_scores), np.std(raw_scores)))


XGBRegressor(base_score=None, booster=None, colsample_bylevel=None,
             colsample_bynode=None, colsample_bytree=0.05, gamma=0.0,
             gpu_id=None, importance_type='gain', interaction_constraints=None,
             learning_rate=0.03162277660168379, max_delta_step=None,
             max_depth=2, min_child_weight=2.0, missing=nan,
             monotone_constraints=None, n_estimators=1134, n_jobs=-1,
             num_parallel_tree=None, random_state=42, reg_alpha=0.003162,
             reg_lambda=0.1, scale_pos_weight=None, subsample=0.6,
             tree_method=None, validate_parameters=None, verbosity=1)

Log1p CV RMSE 0.106893 (STD 0.0125)
Raw CV RMSE 18783.117031 (STD 2307.2858)
CPU times: user 26.7 ms, sys: 7.75 ms, total: 34.4 ms
Wall time: 1.63 s


In [26]:
# refactor for ray.tune
def my_xgb(config):
    
    # fix these configs 
    config['max_depth'] += 2   # hyperopt needs left to start at 0 but we want to start at 2
    config['max_depth'] = int(config['max_depth'])
    config['n_estimators'] = int(config['n_estimators'])   # pass float eg loguniform distribution, use int
    
    xgb = XGBRegressor(
        objective='reg:squarederror',
        n_jobs=1,
        random_state=RANDOMSTATE,
        booster='gbtree',   
        base_score=0.5, 
        scale_pos_weight=1, 
        **config,
    )
    scores = np.sqrt(-cross_val_score(xgb, df[predictors], df[response],
                                      scoring="neg_mean_squared_error",
                                      cv=kfolds))
    tune.report(mse=np.mean(scores))
    return {'mse': np.mean(scores)}


In [28]:
z = XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bytree=0.15, gamma=0, importance_type='gain',
             learning_rate=0.01, max_delta_step=0, max_depth=3,
             min_child_weight=0, missing=None, n_estimators=5623, n_jobs=-1,
             nthread=None, objective='reg:linear', random_state=0,
             reg_alpha=1e-05, reg_lambda=1, scale_pos_weight=1, seed=None,
             silent=True, subsample=0.55)
scores = np.sqrt(-cross_val_score(z, df[predictors], df[response],
                                  scoring="neg_mean_squared_error",
                                  cv=kfolds))
print( {'mse': np.mean(scores)})


Parameters: { silent } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { silent } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { silent } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { silent } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoo

In [27]:
config = {
    'max_depth': max_depth-2,
    'min_child_weight': min_child_weight,
    'subsample': subsample,
    'colsample_bytree': colsample_bytree,
    'reg_alpha': reg_alpha,
    'reg_lambda': reg_lambda,
    'gamma': reg_gamma,
    'learning_rate': learning_rate,
    'n_estimators': N_ESTIMATORS,
}

xgb = my_xgb(config)

print(xgb)




{'mse': 0.10584381162278081}


## HyperOpt
https://conference.scipy.org/proceedings/scipy2013/pdfs/bergstra_hyperopt.pdf
https://github.com/hyperopt/hyperopt
http://hyperopt.github.io/hyperopt/
https://blog.dominodatalab.com/hyperopt-bayesian-hyperparameter-optimization/
    

In [25]:
NUM_SAMPLES=256

start_time = datetime.now()
print("%-20s %s" % ("Start Time", start_time))

algo = HyperOptSearch(random_state_seed=RANDOMSTATE)
# uncomment and set max_concurrent to limit number of cores
# algo = ConcurrencyLimiter(algo, max_concurrent=10)
scheduler = AsyncHyperBandScheduler()

tune_kwargs = {
    "num_samples": NUM_SAMPLES,
    "config": {
        "n_estimators": tune.loguniform(100, 10000),
        "max_depth": tune.randint(0, 6),
        'min_child_weight': tune.randint(0, 6),
        "subsample": tune.quniform(0.4, 0.9, 0.05),
        "colsample_bytree": tune.quniform(0.05, 0.8, 0.05),
        "reg_alpha": tune.loguniform(1e-04, 1),
        "reg_lambda": tune.loguniform(1e-04, 100),
        "gamma": 0,
        "learning_rate": tune.loguniform(0.001, 0.1),
        "wandb": {
            "project": "iowa",
            "api_key_file": "secrets/wandb.txt",
            "log_config": True
        }    
    }
}

analysis = tune.run(my_xgb,
                    name="xgb_hyperopt",
                    metric="mse",
                    mode="min",
                    search_alg=algo,
                    scheduler=scheduler,
                    verbose=1,
                    loggers=DEFAULT_LOGGERS + (WandbLogger, ),
                    **tune_kwargs)

end_time = datetime.now()
print("%-20s %s" % ("Start Time", start_time))
print("%-20s %s" % ("End Time", end_time))
print(str(timedelta(seconds=(end_time-start_time).seconds)))

Trial name,status,loc,colsample_bytree,gamma,learning_rate,max_depth,min_child_weight,n_estimators,reg_alpha,reg_lambda,subsample,wandb/api_key_file,wandb/log_config,wandb/project
my_xgb_3877f187,RUNNING,,0.3,0,0.00673344,3,2,428.153,0.538855,2.32463,0.6,secrets/wandb.txt,True,iowa
my_xgb_3877f18b,RUNNING,,0.15,0,0.00106609,1,5,2700.39,0.0127785,0.0424791,0.8,secrets/wandb.txt,True,iowa
my_xgb_3877f18c,RUNNING,,0.4,0,0.00556109,3,2,238.15,0.000451262,43.2731,0.75,secrets/wandb.txt,True,iowa
my_xgb_3877f18e,RUNNING,,0.5,0,0.00126552,5,3,4734.92,0.00967981,33.1173,0.65,secrets/wandb.txt,True,iowa
my_xgb_3dac578c,RUNNING,,0.3,0,0.00354348,4,4,322.93,0.092285,8.07198,0.4,secrets/wandb.txt,True,iowa
my_xgb_4171b236,RUNNING,,0.35,0,0.00591754,2,5,413.973,0.693163,0.00697508,0.85,secrets/wandb.txt,True,iowa
my_xgb_4354848e,RUNNING,,0.1,0,0.00222835,5,0,562.024,0.251502,0.000119977,0.65,secrets/wandb.txt,True,iowa
my_xgb_4535a1ac,PENDING,,0.75,0,0.00156672,5,4,968.572,0.000895231,33.5653,0.7,secrets/wandb.txt,True,iowa
my_xgb_3877f186,ERROR,,0.7,0,0.0105391,0,4,435.406,0.000188888,9.90358,0.65,secrets/wandb.txt,True,iowa
my_xgb_3877f188,ERROR,,0.15,0,0.00133237,5,4,7988.18,0.00163546,19.9476,0.8,secrets/wandb.txt,True,iowa

Trial name,# failures,error file
my_xgb_3877f186,1,"/home/ubuntu/ray_results/xgb_hyperopt/my_xgb_3877f186_1_colsample_bytree=0.7,gamma=0,learning_rate=0.010539,max_depth=0,min_child_weight=4,n_estimators=435.41,reg_alpha_2020-10-11_14-44-09/error.txt"
my_xgb_3877f188,1,"/home/ubuntu/ray_results/xgb_hyperopt/my_xgb_3877f188_3_colsample_bytree=0.15,gamma=0,learning_rate=0.0013324,max_depth=5,min_child_weight=4,n_estimators=7988.2,reg_alp_2020-10-11_14-44-09/error.txt"
my_xgb_3877f189,1,"/home/ubuntu/ray_results/xgb_hyperopt/my_xgb_3877f189_4_colsample_bytree=0.2,gamma=0,learning_rate=0.018766,max_depth=2,min_child_weight=3,n_estimators=735.64,reg_alpha_2020-10-11_14-44-09/error.txt"
my_xgb_3877f18a,1,"/home/ubuntu/ray_results/xgb_hyperopt/my_xgb_3877f18a_5_colsample_bytree=0.6,gamma=0,learning_rate=0.0015521,max_depth=0,min_child_weight=3,n_estimators=4032.1,reg_alph_2020-10-11_14-44-09/error.txt"
my_xgb_3877f18d,1,"/home/ubuntu/ray_results/xgb_hyperopt/my_xgb_3877f18d_8_colsample_bytree=0.55,gamma=0,learning_rate=0.015461,max_depth=0,min_child_weight=1,n_estimators=5606.0,reg_alph_2020-10-11_14-44-09/error.txt"
my_xgb_3bca9960,1,"/home/ubuntu/ray_results/xgb_hyperopt/my_xgb_3bca9960_10_colsample_bytree=0.75,gamma=0,learning_rate=0.0015369,max_depth=2,min_child_weight=2,n_estimators=678.64,reg_al_2020-10-11_14-44-18/error.txt"
my_xgb_3f90568e,1,"/home/ubuntu/ray_results/xgb_hyperopt/my_xgb_3f90568e_12_colsample_bytree=0.55,gamma=0,learning_rate=0.027463,max_depth=1,min_child_weight=4,n_estimators=1059.5,reg_alp_2020-10-11_14-44-24/error.txt"


2020-10-11 14:44:33,905	ERROR trial_runner.py:793 -- Trial my_xgb_3877f18c: Error processing event.
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/hyperparam2/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 726, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/ubuntu/anaconda3/envs/hyperparam2/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 489, in fetch_result
    result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
  File "/home/ubuntu/anaconda3/envs/hyperparam2/lib/python3.8/site-packages/ray/worker.py", line 1450, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TuneError): [36mray::ImplicitFunc.train()[39m (pid=801, ip=172.30.1.217)
  File "python/ray/_raylet.pyx", line 482, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 436, in ray._raylet.execute_task.function_executor
  File "/home/ubuntu/anaconda3/envs/hyperparam2/lib/python3.8/site-p

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

Failed to query for notebook name, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable
[34m[1mwandb[0m: Currently logged in as: [33mdruce[0m (use `wandb login --relogin` to force relogin)


Problem at:

[34m[1mwandb[0m: [32m[41mERROR[0m Control-C detected -- Run data was not synced


 /home/ubuntu/anaconda3/envs/hyperparam2/lib/python3.8/site-packages/ray/tune/integration/wandb.py 189 run


Process _WandbLoggingProcess-791:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/hyperparam2/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda3/envs/hyperparam2/lib/python3.8/site-packages/ray/tune/integration/wandb.py", line 189, in run
    wandb.init(*self.args, **self.kwargs)
  File "/home/ubuntu/anaconda3/envs/hyperparam2/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 484, in init
    raise e
  File "/home/ubuntu/anaconda3/envs/hyperparam2/lib/python3.8/site-packages/wandb/interface/interface.py", line 437, in _communicate
    return future.get(timeout)
  File "/home/ubuntu/anaconda3/envs/hyperparam2/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 463, in init
    run = wi.init()
  File "/home/ubuntu/anaconda3/envs/hyperparam2/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 366, in init
    ret = backend.interface.communicate_check_version()
  File "/home/ubuntu/a

KeyboardInterrupt: 

In [13]:
analysis.results_df.columns

Index(['mse', 'time_this_iter_s', 'done', 'timesteps_total', 'episodes_total',
       'training_iteration', 'experiment_id', 'date', 'timestamp',
       'time_total_s', 'pid', 'hostname', 'node_ip', 'time_since_restore',
       'timesteps_since_restore', 'iterations_since_restore', 'experiment_tag',
       'config.n_estimators', 'config.max_depth', 'config.min_child_weight',
       'config.subsample', 'config.colsample_bytree', 'config.reg_alpha',
       'config.reg_lambda', 'config.gamma', 'config.learning_rate',
       'config.wandb.project', 'config.wandb.api_key_file',
       'config.wandb.log_config'],
      dtype='object')

In [14]:
analysis_results_df = analysis.results_df[['mse', 'date', 'time_this_iter_s',
       'config.n_estimators', 'config.max_depth', 'config.min_child_weight', 'config.subsample',
       'config.colsample_bytree', 'config.reg_alpha', 'config.reg_lambda', 'config.gamma',
       'config.learning_rate']].sort_values('mse')
analysis_results_df


Unnamed: 0_level_0,mse,date,time_this_iter_s,config.n_estimators,config.max_depth,config.min_child_weight,config.subsample,config.colsample_bytree,config.reg_alpha,config.reg_lambda,config.gamma,config.learning_rate
trial_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
d85a282a,0.104774,2020-10-11_07-24-38,0.092840,8322,3,1,0.60,0.10,0.003999,0.005157,0,0.004503
0aa53b4a,0.105058,2020-10-11_07-16-41,1.241848,8095,3,0,0.40,0.05,0.011135,0.006914,0,0.005647
e44d4e46,0.105163,2020-10-11_06-51-25,4.145429,9915,3,2,0.40,0.10,0.045455,0.000328,0,0.003634
2e496312,0.105265,2020-10-11_06-58-25,0.134120,8370,3,2,0.55,0.10,0.025667,0.000280,0,0.003237
508153ea,0.105430,2020-10-11_06-57-33,0.115717,6686,3,2,0.40,0.05,0.044706,0.001424,0,0.005173
...,...,...,...,...,...,...,...,...,...,...,...,...
1f075434,7.077912,2020-10-11_07-44-32,18.572371,185,6,5,0.55,0.10,0.012127,0.002499,0,0.002635
cc546550,7.146072,2020-10-11_07-10-16,2.588564,100,6,1,0.45,0.60,0.479140,0.001024,0,0.004776
5677ca94,7.627630,2020-10-11_07-36-50,30.107122,141,6,0,0.60,0.25,0.003002,0.000327,0,0.002926
4256e3e2,7.904806,2020-10-11_08-11-41,14.530406,258,3,1,0.45,0.05,0.041836,0.000839,0,0.001463


In [15]:
max_depth = analysis_results_df.iloc[0]['config.max_depth']
min_child_weight = analysis_results_df.iloc[0]['config.min_child_weight']
subsample = analysis_results_df.iloc[0]['config.subsample']
colsample_bytree = analysis_results_df.iloc[0]['config.colsample_bytree']
reg_alpha = analysis_results_df.iloc[0]['config.reg_alpha']
reg_lambda = analysis_results_df.iloc[0]['config.reg_lambda']
reg_gamma = analysis_results_df.iloc[0]['config.gamma']
learning_rate = analysis_results_df.iloc[0]['config.learning_rate']
N_ESTIMATORS = analysis_results_df.iloc[0]['config.n_estimators']    


In [16]:
best_config = {
    'max_depth': max_depth,
    'min_child_weight': min_child_weight,
    'subsample': subsample,
    'colsample_bytree': colsample_bytree,
    'reg_alpha': reg_alpha,
    'reg_lambda': reg_lambda,
    'gamma': reg_gamma,
    'learning_rate': learning_rate,
    'n_estimators':  N_ESTIMATORS
}

xgb = XGBRegressor(
    objective='reg:squarederror',
    random_state=RANDOMSTATE,    
    verbosity=1,
    n_jobs=-1,
    **best_config
)
print(xgb)

scores = -cross_val_score(xgb, df[predictors], df[response],
                          scoring="neg_root_mean_squared_error",
                          cv=kfolds)

raw_scores = [cv_to_raw(x) for x in scores]
print()
print("Log1p CV RMSE %.06f (STD %.04f)" % (np.mean(scores), np.std(scores)))
print("Raw CV RMSE %.06f (STD %.04f)" % (np.mean(raw_scores), np.std(raw_scores)))
raw_scores = [cv_to_raw(x) for x in scores]


XGBRegressor(base_score=None, booster=None, colsample_bylevel=None,
             colsample_bynode=None, colsample_bytree=0.1, gamma=0, gpu_id=None,
             importance_type='gain', interaction_constraints=None,
             learning_rate=0.0045032167062562, max_delta_step=None, max_depth=3,
             min_child_weight=1, missing=nan, monotone_constraints=None,
             n_estimators=8322, n_jobs=-1, num_parallel_tree=None,
             random_state=42, reg_alpha=0.003998544135195575,
             reg_lambda=0.005157242700140977, scale_pos_weight=None,
             subsample=0.6000000000000001, tree_method=None,
             validate_parameters=None, verbosity=1)

Log1p CV RMSE 0.104774 (STD 0.0130)
Raw CV RMSE 18392.440858 (STD 2395.4341)


In [17]:
# bayesopt
NUM_SAMPLES=256

start_time = datetime.now()
print("%-20s %s" % ("Start Time", start_time))

algo = BayesOptSearch(utility_kwargs={
    "kind": "ucb",
    "kappa": 2.5,
    "xi": 0.0
})

# uncomment and set max_concurrent to limit number of cores
# algo = ConcurrencyLimiter(algo, max_concurrent=10)
scheduler = AsyncHyperBandScheduler()

tune_kwargs = {
    "num_samples": NUM_SAMPLES,
    "config": {
        "n_estimators": tune.loguniform(100, 10000),
        "max_depth": tune.quniform(0, 6, 1),
        'min_child_weight': tune.quniform(0, 6, 1),
        "subsample": tune.quniform(0.4, 0.9, 0.05),
        "colsample_bytree": tune.quniform(0.05, 0.8, 0.05),
        "reg_alpha": tune.loguniform(1e-04, 1),
        "reg_lambda": tune.loguniform(1e-04, 100),
        "gamma": 0,
        "learning_rate": tune.loguniform(0.001, 0.1),
        "wandb": {
            "project": "iowa",
            "api_key_file": "secrets/wandb.txt",
            "log_config": True
        }    
    }
}

analysis = tune.run(my_xgb,
                    name="xgb_bayesopt",
                    metric="mse",
                    mode="min",
                    search_alg=algo,
                    scheduler=scheduler,
                    verbose=1,
                    loggers=DEFAULT_LOGGERS + (WandbLogger, ),
                    **tune_kwargs)

end_time = datetime.now()
print("%-20s %s" % ("Start Time", start_time))
print("%-20s %s" % ("End Time", end_time))
print(str(timedelta(seconds=(end_time-start_time).seconds)))

Trial name,status,loc,colsample_bytree,learning_rate,max_depth,min_child_weight,n_estimators,reg_alpha,reg_lambda,subsample,iter,total time (s),mse
my_xgb_24bea0ca,TERMINATED,,0.330905,0.0951207,4.39196,3.59195,1644.58,0.156079,5.80846,0.833088,1,125.185,0.116099
my_xgb_24bea0cb,TERMINATED,,0.500836,0.0710992,0.123507,5.81946,8341.18,0.212418,18.1826,0.491702,2,408.05,0.114678
my_xgb_256ba608,TERMINATED,,0.278182,0.0529509,2.59167,1.74737,6157.34,0.13958,29.2145,0.583181,1,376.674,0.116462
my_xgb_256ba609,TERMINATED,,0.392052,0.0787324,1.19804,3.08541,5964.9,0.0465458,60.7545,0.485262,1,367.405,0.117542
my_xgb_256ba60a,TERMINATED,,0.0987887,0.0949397,5.79379,4.85038,3115.68,0.0977623,68.4233,0.620076,1,135.749,0.116519
my_xgb_26d8037e,TERMINATED,,0.141529,0.0500225,0.206331,5.45592,2661.92,0.662556,31.1712,0.660034,2,54.5333,0.110889
my_xgb_26d8037f,TERMINATED,,0.460033,0.0193006,5.81751,4.6508,9401.04,0.894838,59.79,0.860937,1,607.753,0.114999
my_xgb_27e222ae,TERMINATED,,0.116369,0.0204023,0.271364,1.95198,3947.91,0.271422,82.8738,0.578377,1,62.1068,0.110967
my_xgb_27e222af,TERMINATED,,0.260701,0.0547269,0.845545,4.81318,838.051,0.986888,77.2245,0.499358,1,12.322,0.120372
my_xgb_51bbd052,TERMINATED,,0.0541416,0.0817307,4.24114,4.37404,7735.58,0.0741372,35.8466,0.457935,1,291.707,0.117329


2020-10-11 11:59:15,407	INFO tune.py:439 -- Total run time: 12902.55 seconds (12901.50 seconds for the tuning loop).


Start Time           2020-10-11 08:24:12.854637
End Time             2020-10-11 11:59:16.328385
3:35:03


In [18]:
analysis_results_df = analysis.results_df[['mse', 'date', 'time_this_iter_s',
       'config.n_estimators', 'config.max_depth', 'config.min_child_weight', 'config.subsample',
       'config.colsample_bytree', 'config.reg_alpha', 'config.reg_lambda', 'config.gamma',
       'config.learning_rate']].sort_values('mse')
analysis_results_df

Unnamed: 0_level_0,mse,date,time_this_iter_s,config.n_estimators,config.max_depth,config.min_child_weight,config.subsample,config.colsample_bytree,config.reg_alpha,config.reg_lambda,config.gamma,config.learning_rate
trial_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
8701b50a,0.108979,2020-10-11_09-50-21,0.006845,2458,2,4.637463,0.597422,0.111506,0.444802,5.230673,0,0.032992
c1eb07d0,0.109085,2020-10-11_10-20-22,0.014628,1641,2,2.434736,0.450785,0.679544,0.236696,2.023811,0,0.039621
fc105f54,0.109088,2020-10-11_11-03-20,9.627358,1642,2,0.109626,0.764238,0.095365,0.214865,3.890424,0,0.067549
2c989b0e,0.109253,2020-10-11_10-05-19,0.011696,2463,2,1.877177,0.473323,0.756797,0.281675,5.422852,0,0.033162
923173b2,0.109482,2020-10-11_10-55-15,0.005312,1647,5,5.188313,0.630089,0.102769,0.150823,12.151102,0,0.036363
...,...,...,...,...,...,...,...,...,...,...,...,...
4f7cd028,0.135519,2020-10-11_08-45-14,279.897717,2464,6,2.365190,0.821552,0.281026,0.123086,52.885852,0,0.002970
0c37725a,0.152903,2020-10-11_09-19-01,294.071997,2460,4,5.259133,0.883229,0.670976,0.486580,41.671012,0,0.002409
5e1d9d54,0.360064,2020-10-11_10-10-20,130.627082,2462,2,4.404157,0.867470,0.384085,0.608909,25.681836,0,0.001547
ab674c38,0.485701,2020-10-11_08-45-31,142.897333,2451,6,2.435895,0.443114,0.573659,0.975435,34.549821,0,0.001472


In [19]:
max_depth = analysis_results_df.iloc[0]['config.max_depth']
min_child_weight = analysis_results_df.iloc[0]['config.min_child_weight']
subsample = analysis_results_df.iloc[0]['config.subsample']
colsample_bytree = analysis_results_df.iloc[0]['config.colsample_bytree']
reg_alpha = analysis_results_df.iloc[0]['config.reg_alpha']
reg_lambda = analysis_results_df.iloc[0]['config.reg_lambda']
reg_gamma = analysis_results_df.iloc[0]['config.gamma']
learning_rate = analysis_results_df.iloc[0]['config.learning_rate']
N_ESTIMATORS = analysis_results_df.iloc[0]['config.n_estimators']    

best_config = {
    'max_depth': max_depth,
    'min_child_weight': min_child_weight,
    'subsample': subsample,
    'colsample_bytree': colsample_bytree,
    'reg_alpha': reg_alpha,
    'reg_lambda': reg_lambda,
    'gamma': reg_gamma,
    'learning_rate': learning_rate,
    'n_estimators':  N_ESTIMATORS
}

xgb = XGBRegressor(
    objective='reg:squarederror',
    random_state=RANDOMSTATE,    
    verbosity=1,
    n_jobs=-1,
    **best_config
)
print(xgb)

scores = -cross_val_score(xgb, df[predictors], df[response],
                          scoring="neg_root_mean_squared_error",
                          cv=kfolds)

raw_scores = [cv_to_raw(x) for x in scores]
print()
print("Log1p CV RMSE %.06f (STD %.04f)" % (np.mean(scores), np.std(scores)))
print("Raw CV RMSE %.06f (STD %.04f)" % (np.mean(raw_scores), np.std(raw_scores)))


XGBRegressor(base_score=None, booster=None, colsample_bylevel=None,
             colsample_bynode=None, colsample_bytree=0.11150621606546184,
             gamma=0, gpu_id=None, importance_type='gain',
             interaction_constraints=None, learning_rate=0.03299212530435644,
             max_delta_step=None, max_depth=2,
             min_child_weight=4.6374634743962755, missing=nan,
             monotone_constraints=None, n_estimators=2458, n_jobs=-1,
             num_parallel_tree=None, random_state=42,
             reg_alpha=0.44480160318532497, reg_lambda=5.230672607404104,
             scale_pos_weight=None, subsample=0.5974219363869536,
             tree_method=None, validate_parameters=None, verbosity=1)

Log1p CV RMSE 0.108979 (STD 0.0136)
Raw CV RMSE 19172.349919 (STD 2516.3095)


In [20]:
# optuna
NUM_SAMPLES=256
optuna_xgb = my_xgb

start_time = datetime.now()
print("%-20s %s" % ("Start Time", start_time))

algo = OptunaSearch()
# uncomment and set max_concurrent to limit number of cores
# algo = ConcurrencyLimiter(algo, max_concurrent=10)
scheduler = AsyncHyperBandScheduler()

tune_kwargs = {
    "num_samples": NUM_SAMPLES,
    "config": {
        "n_estimators": tune.loguniform(100, 10000),
        "max_depth": tune.quniform(0, 6, 1),
        'min_child_weight': tune.quniform(0, 6, 1),
        "subsample": tune.quniform(0.4, 0.9, 0.05),
        "colsample_bytree": tune.quniform(0.05, 0.8, 0.05),
        "reg_alpha": tune.loguniform(1e-04, 1),
        "reg_lambda": tune.loguniform(1e-04, 100),
        "gamma": 0,
        "learning_rate": tune.loguniform(0.001, 0.1),
        "wandb": {
            "project": "iowa",
            "api_key_file": "secrets/wandb.txt",
            "log_config": True,
            "name": get_random_tag(6)
        }           
    }
}

analysis = tune.run(optuna_xgb,
                    name="xgb_optuna",
                    metric="mse",
                    mode="min",
                    search_alg=algo,
                    scheduler=scheduler,
                    verbose=1,
                    loggers=DEFAULT_LOGGERS + (WandbLogger, ),                    
                    **tune_kwargs)

end_time = datetime.now()
print("%-20s %s" % ("Start Time", start_time))
print("%-20s %s" % ("End Time", end_time))
print(str(timedelta(seconds=(end_time-start_time).seconds)))

Trial name,status,loc,colsample_bytree,learning_rate,max_depth,min_child_weight,n_estimators,reg_alpha,reg_lambda,subsample,iter,total time (s),mse
my_xgb_3ee52942,TERMINATED,,0.15,0.00252703,5,4,1149.23,0.000596532,0.00646105,0.75,1,43.1326,0.641638
my_xgb_3ee52943,TERMINATED,,0.1,0.00167144,0,0,434.525,0.0376275,0.00279115,0.85,1,13.1252,5.58187
my_xgb_3ee52944,TERMINATED,,0.5,0.00512339,0,4,1018.22,0.0110368,0.000797972,0.7,2,27.4788,0.139832
my_xgb_3fbbd064,TERMINATED,,0.1,0.00598366,5,4,228.89,0.288919,0.000187992,0.85,2,13.9509,2.94112
my_xgb_3fbbd065,TERMINATED,,0.45,0.0120626,3,2,1457.3,0.00156884,0.154117,0.45,2,163.621,0.107663
my_xgb_4113ddbc,TERMINATED,,0.25,0.00246191,1,6,596.733,0.0616538,0.000963104,0.7,1,12.4006,2.65782
my_xgb_4113ddbd,TERMINATED,,0.6,0.0245369,3,1,7052.47,0.0791086,0.0665227,0.9,1,997.406,0.11035
my_xgb_428dc144,TERMINATED,,0.55,0.0125369,3,0,341.743,0.041832,84.8668,0.4,2,14.3489,0.419528
my_xgb_428dc145,TERMINATED,,0.4,0.0384812,0,1,2024.31,0.0338043,0.000261005,0.7,2,72.0168,0.108844
my_xgb_502999e0,TERMINATED,,0.5,0.00868591,4,4,1424.19,0.200658,0.00991074,0.45,2,257.259,0.110028


2020-10-11 14:34:57,723	INFO tune.py:439 -- Total run time: 9316.06 seconds (9315.20 seconds for the tuning loop).


Start Time           2020-10-11 11:59:41.664330
End Time             2020-10-11 14:34:58.664379
2:35:17


In [21]:
analysis_results_df = analysis.results_df[['mse', 'date', 'time_this_iter_s',
       'config.n_estimators', 'config.max_depth', 'config.min_child_weight', 'config.subsample',
       'config.colsample_bytree', 'config.reg_alpha', 'config.reg_lambda', 'config.gamma',
       'config.learning_rate']].sort_values('mse')
analysis_results_df

Unnamed: 0_level_0,mse,date,time_this_iter_s,config.n_estimators,config.max_depth,config.min_child_weight,config.subsample,config.colsample_bytree,config.reg_alpha,config.reg_lambda,config.gamma,config.learning_rate
trial_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
d66f79bc,0.105844,2020-10-11_13-12-28,0.110363,7208,2,0.0,0.85,0.10,0.000262,0.010011,0,0.008432
730e28f0,0.105979,2020-10-11_13-10-28,0.012445,7735,2,0.0,0.85,0.10,0.000290,0.007960,0,0.009047
1bd44472,0.106095,2020-10-11_14-05-07,0.006889,7689,2,0.0,0.80,0.05,0.000278,0.000143,0,0.006150
3fd206b6,0.106108,2020-10-11_13-30-35,2.913600,9965,2,0.0,0.85,0.05,0.000474,0.001280,0,0.007076
f43f8790,0.106109,2020-10-11_13-35-33,1.576206,9608,2,0.0,0.85,0.05,0.000476,0.001163,0,0.007165
...,...,...,...,...,...,...,...,...,...,...,...,...
3fbbd064,2.941120,2020-10-11_12-00-07,0.011216,228,7,4.0,0.85,0.10,0.288919,0.000188,0,0.005984
3ee52943,5.581871,2020-10-11_12-00-08,13.125238,434,2,0.0,0.85,0.10,0.037628,0.002791,0,0.001671
06915e40,6.380450,2020-10-11_12-27-29,26.584936,478,2,0.0,0.50,0.05,0.000395,0.001734,0,0.001238
c71843d8,6.618268,2020-10-11_14-05-18,1.001238,114,2,0.0,0.80,0.05,0.000854,0.000143,0,0.004861


In [22]:
max_depth = analysis_results_df.iloc[0]['config.max_depth']
min_child_weight = analysis_results_df.iloc[0]['config.min_child_weight']
subsample = analysis_results_df.iloc[0]['config.subsample']
colsample_bytree = analysis_results_df.iloc[0]['config.colsample_bytree']
reg_alpha = analysis_results_df.iloc[0]['config.reg_alpha']
reg_lambda = analysis_results_df.iloc[0]['config.reg_lambda']
reg_gamma = analysis_results_df.iloc[0]['config.gamma']
learning_rate = analysis_results_df.iloc[0]['config.learning_rate']
N_ESTIMATORS = analysis_results_df.iloc[0]['config.n_estimators']    

best_config = {
    'max_depth': max_depth,
    'min_child_weight': min_child_weight,
    'subsample': subsample,
    'colsample_bytree': colsample_bytree,
    'reg_alpha': reg_alpha,
    'reg_lambda': reg_lambda,
    'gamma': reg_gamma,
    'learning_rate': learning_rate,
    'n_estimators':  N_ESTIMATORS
}

xgb = XGBRegressor(
    objective='reg:squarederror',
    random_state=RANDOMSTATE,    
    verbosity=1,
    n_jobs=-1,
    **best_config
)
print(xgb)

scores = -cross_val_score(xgb, df[predictors], df[response],
                          scoring="neg_root_mean_squared_error",
                          cv=kfolds)

raw_scores = [cv_to_raw(x) for x in scores]
print()
print("Log1p CV RMSE %.06f (STD %.04f)" % (np.mean(scores), np.std(scores)))
print("Raw CV RMSE %.06f (STD %.04f)" % (np.mean(raw_scores), np.std(raw_scores)))
raw_scores = [cv_to_raw(x) for x in scores]

XGBRegressor(base_score=None, booster=None, colsample_bylevel=None,
             colsample_bynode=None, colsample_bytree=0.1, gamma=0, gpu_id=None,
             importance_type='gain', interaction_constraints=None,
             learning_rate=0.00843249834406594, max_delta_step=None,
             max_depth=2, min_child_weight=0.0, missing=nan,
             monotone_constraints=None, n_estimators=7208, n_jobs=-1,
             num_parallel_tree=None, random_state=42,
             reg_alpha=0.00026158145985933696, reg_lambda=0.01001083397641177,
             scale_pos_weight=None, subsample=0.8500000000000001,
             tree_method=None, validate_parameters=None, verbosity=1)

Log1p CV RMSE 0.105844 (STD 0.0130)
Raw CV RMSE 18590.228049 (STD 2406.9578)


In [None]:
# tune LightGBM
print("LightGBM")
#!conda install -y -c conda-forge lightgbm

import lightgbm
from lightgbm import LGBMRegressor

NUM_SAMPLES=256

start_time = datetime.now()
print("%-20s %s" % ("Start Time", start_time))

def my_lgbm(config):
    
    # fix these configs 
    config['n_estimators'] = int(config['n_estimators'])   # pass float eg loguniform distribution, use int
    config['num_leaves'] = 2 + int(config['num_leaves'])
    
    lgbm = LGBMRegressor(objective='regression',
                         num_leaves=config['num_leaves'],
                         learning_rate=config['learning_rate'],
                         n_estimators=config['n_estimators'],
                         max_bin=200,
                         bagging_fraction=config['bagging_fraction'],
                         feature_fraction=config['feature_fraction'],
                         feature_fraction_seed=7,
                         min_data_in_leaf=2,
                         verbose=-1,
                         # early stopping params, maybe in fit
                         #early_stopping_rounds=early_stopping_rounds,
                         #valid_sets=[xgtrain, xgvalid], valid_names=['train','valid'], evals_result=evals_results
                         #num_boost_round=num_boost_round,
                         )
    
    scores = np.sqrt(-cross_val_score(lgbm, df[predictors], df[response],
                                      scoring="neg_mean_squared_error",
                                      cv=kfolds))
    tune.report(mse=np.mean(scores))
    return {'mse': np.mean(scores)}

algo = HyperOptSearch(random_state_seed=RANDOMSTATE)
# uncomment and set max_concurrent to limit number of cores
# algo = ConcurrencyLimiter(algo, max_concurrent=10)
scheduler = AsyncHyperBandScheduler()

tune_kwargs = {
    "num_samples": NUM_SAMPLES,
    "config": {
        "n_estimators": tune.loguniform(100, 10000),
        'num_leaves': tune.randint(0, 10),
        "bagging_fraction": tune.uniform(0.5, 0.8),
        "feature_fraction": tune.uniform(0.01, 0.8),
        "learning_rate": tune.loguniform(0.001, 0.1),
        "wandb": {
            "project": "iowa",
            "api_key_file": "secrets/wandb.txt",
            "log_config": True
        }    
    }
}

analysis = tune.run(my_lgbm,
                    name="lgbm_hyperopt",
                    metric="mse",
                    mode="min",
                    search_alg=algo,
                    scheduler=scheduler,
                    verbose=1,
                    loggers=DEFAULT_LOGGERS + (WandbLogger, ),
                    **tune_kwargs)

end_time = datetime.now()
print("%-20s %s" % ("Start Time", start_time))
print("%-20s %s" % ("End Time", end_time))
print(str(timedelta(seconds=(end_time-start_time).seconds)))


Trial name,status,loc,bagging_fraction,feature_fraction,learning_rate,n_estimators,num_leaves,wandb/api_key_file,wandb/log_config,wandb/project,iter,total time (s),mse
my_lgbm_c919702e,RUNNING,,0.604441,0.297149,0.012299,5555.75,7,secrets/wandb.txt,True,iowa,,,
my_lgbm_cf0ad770,RUNNING,172.30.1.217:20697,0.623412,0.20919,0.0404391,3977.01,3,secrets/wandb.txt,True,iowa,1.0,13.9989,0.10958
my_lgbm_dc48e49a,RUNNING,,0.500905,0.0470378,0.00401401,8553.65,4,secrets/wandb.txt,True,iowa,,,
my_lgbm_de3b6750,RUNNING,,0.514938,0.733492,0.00125931,7953.31,4,secrets/wandb.txt,True,iowa,,,
my_lgbm_e22244f6,RUNNING,,0.553996,0.197099,0.00626516,4857.42,4,secrets/wandb.txt,True,iowa,,,
my_lgbm_e4238e7c,RUNNING,,0.627606,0.0108571,0.00493818,3191.1,3,secrets/wandb.txt,True,iowa,,,
my_lgbm_e60ad16e,RUNNING,,0.581444,0.578532,0.00777553,6838.91,7,secrets/wandb.txt,True,iowa,,,
my_lgbm_e814debe,PENDING,,0.559657,0.237643,0.0023008,9082.54,5,secrets/wandb.txt,True,iowa,,,
my_lgbm_87f7d680,TERMINATED,,0.645556,0.262363,0.00137437,4626.62,9,secrets/wandb.txt,True,iowa,1.0,36.3423,0.113285
my_lgbm_87f7d681,TERMINATED,,0.701044,0.259482,0.0734068,2853.98,7,secrets/wandb.txt,True,iowa,2.0,17.4983,0.109697


Failed to query for notebook name, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

[34m[1mwandb[0m: Currently logged in as: [33mdruce[0m (use `wandb login --relogin` to force relogin)




