# 5 - Hyperparameter Tuning

Going to attempt tuning on the entire feature set first.

In [154]:
import pandas as pd
X_train_dc = pd.read_pickle('../pickles/split/X_train_dc.pkl')
y_train_dc = pd.read_pickle('../pickles/split/y_train_dc.pkl')

X_train_l = pd.read_pickle('../pickles/split/X_train_l.pkl')
y_train_l = pd.read_pickle('../pickles/split/y_train_l.pkl')

Based on initial performance, I'm going to be doing tuning for `RandomForestRegressor` and `XGBRegressor`, as they were the most effective, and I'm going to start with all of the features to see if one of the two models does significantly better than the other. If one is out performing the other, that will be the only one I tune on the sets with restricted features.

In [155]:
from sklearn.metrics import root_mean_squared_error, r2_score, make_scorer
from sklearn.ensemble import RandomForestRegressor
from xgboost import XGBRegressor
from sklearn.model_selection import GridSearchCV
import warnings
warnings.filterwarnings('ignore')

In [156]:
forest = RandomForestRegressor(random_state=42)
boost = XGBRegressor(objective='reg:squarederror',random_state=42)

### Random Forest Regressor

**DC set**, all features

In [157]:
space_forest_dc = {
    'n_estimators': [125,130,135,140,145,150,155],
    'max_depth': [19,20,21]}

In [158]:
search_forest_dc1 = GridSearchCV(forest, space_forest_dc, scoring='r2',n_jobs=-1)
result_forest_dc1 = search_forest_dc1.fit(X_train_dc,y_train_dc)
print('Best Score: %s' % result_forest_dc1.best_score_)
print('Best Hyperparameters: %s' % result_forest_dc1.best_params_)

Best Score: 0.6895321510957303
Best Hyperparameters: {'max_depth': 20, 'n_estimators': 150}


In [159]:
search_forest_dc2 = GridSearchCV(forest, space_forest_dc, scoring='neg_root_mean_squared_error',n_jobs=-1)
result_forest_dc2 = search_forest_dc2.fit(X_train_dc,y_train_dc)
print('Best Score: %s' % (result_forest_dc2.best_score_*-1))
print('Best Hyperparameters: %s' % result_forest_dc2.best_params_)

Best Score: 71.84172119174976
Best Hyperparameters: {'max_depth': 20, 'n_estimators': 130}


**London set**, all features

In [160]:
space_forest_l = {
    'n_estimators': [180,185,190],
    'max_depth': [19,20,21]}

In [161]:
search_forest_l1 = GridSearchCV(forest, space_forest_l, scoring='r2',n_jobs=-1)
result_forest_l1 = search_forest_l1.fit(X_train_l,y_train_l)
print('Best Score: %s' % result_forest_l1.best_score_)
print('Best Hyperparameters: %s' % result_forest_l1.best_params_)

Best Score: 0.9028084805222928
Best Hyperparameters: {'max_depth': 20, 'n_estimators': 185}


In [162]:
search_forest_l2 = GridSearchCV(forest, space_forest_l, scoring='neg_root_mean_squared_error',n_jobs=-1)
result_forest_l2 = search_forest_l2.fit(X_train_l,y_train_l)
print('Best Score: %s' % (result_forest_l2.best_score_*-1))
print('Best Hyperparameters: %s' % result_forest_l2.best_params_)

Best Score: 312.8454875514148
Best Hyperparameters: {'max_depth': 20, 'n_estimators': 185}


### XGBoost Regressor

**DC set**, all features

In [163]:
space_boost_dc1 = {
    'alpha': [3,3.5,4],
    'lambda': [90,100,110],
    'max_depth': [7,8,9],
    'learning_rate': [0.5,0.4,0.3],
    'n_estimators': [65,70,75]}

In [164]:
search_boost_dc1 = GridSearchCV(boost, space_boost_dc1, scoring='r2',n_jobs=-1)
result_boost_dc1 = search_boost_dc1.fit(X_train_dc,y_train_dc)
print('Best Score: %s' % result_boost_dc1.best_score_)
print('Best Hyperparameters: %s' % result_boost_dc1.best_params_)

Best Score: 0.7963137030601501
Best Hyperparameters: {'alpha': 3.5, 'lambda': 100, 'learning_rate': 0.4, 'max_depth': 8, 'n_estimators': 70}


In [165]:
space_boost_dc2 = {
    'alpha': [0,0.1,0.01],
    'lambda': [90,100,110],
    'max_depth': [5,6,7],
    'learning_rate': [0.4,0.3,0.2],
    'n_estimators': [175,180,185]}

In [166]:
search_boost_dc2 = GridSearchCV(boost, space_boost_dc2, scoring='neg_root_mean_squared_error',n_jobs=-1)
result_boost_dc2 = search_boost_dc2.fit(X_train_dc,y_train_dc)
print('Best Score: %s' % (result_boost_dc2.best_score_*-1))
print('Best Hyperparameters: %s' % result_boost_dc2.best_params_)

Best Score: 56.490411500995606
Best Hyperparameters: {'alpha': 0.1, 'lambda': 100, 'learning_rate': 0.3, 'max_depth': 6, 'n_estimators': 180}


**London set**, all features

In [167]:
space_boost_l1 = {
    'alpha': [0.3,0.2,0.1],
    'lambda': [100,110,120],
    'max_depth': [5,6,7],
    'learning_rate': [0.3,0.2,0.1],
    'n_estimators': [120,130,140]}

In [168]:
search_boost_l1 = GridSearchCV(boost, space_boost_l1, scoring='r2',n_jobs=-1)
result_boost_l1 = search_boost_l1.fit(X_train_l,y_train_l)
print('Best Score: %s' % result_boost_l1.best_score_)
print('Best Hyperparameters: %s' % result_boost_l1.best_params_)

Best Score: 0.919543719291687
Best Hyperparameters: {'alpha': 0.2, 'lambda': 110, 'learning_rate': 0.2, 'max_depth': 6, 'n_estimators': 130}


In [169]:
space_boost_l2 = {
    'alpha': [1,0.75,0.5],
    'lambda': [100,125,150],
    'max_depth': [6,7,8],
    'learning_rate': [0.4,0.3,0.2],
    'n_estimators': [60,70,80]}

In [170]:
search_boost_l2 = GridSearchCV(boost, space_boost_l2, scoring='neg_root_mean_squared_error',n_jobs=-1)
result_boost_l2 = search_boost_l2.fit(X_train_l,y_train_l)
print('Best Score: %s' % (result_boost_l2.best_score_*-1))
print('Best Hyperparameters: %s' % result_boost_l2.best_params_)

Best Score: 293.8955040541115
Best Hyperparameters: {'alpha': 0.75, 'lambda': 125, 'learning_rate': 0.3, 'max_depth': 7, 'n_estimators': 70}


Now to compare the models, both to each other and to the untuned model. (I am copying in the results from the prior notebook rather than running them again.)

In [171]:
print('Random Forest: \nDC set: \nBase model r2: 0.8908068409097215 \nTuned model r2:',result_forest_dc1.best_score_)
print('\nBase model rmse: 72.87654665983804 \nTuned model rmse:',(result_forest_dc2.best_score_*-1))
print('\nLondon set: \nBase model r2: 0.9343686316506019 \nTuned model r2:',result_forest_l1.best_score_)
print('\nBase model rmse: 289.163573168845 \nTuned model rmse:',(result_forest_l2.best_score_*-1))

Random Forest: 
DC set: 
Base model r2: 0.8908068409097215 
Tuned model r2: 0.6895321510957303

Base model rmse: 72.87654665983804 
Tuned model rmse: 71.84172119174976

London set: 
Base model r2: 0.9343686316506019 
Tuned model r2: 0.9028084805222928

Base model rmse: 289.163573168845 
Tuned model rmse: 312.8454875514148


RandomForest seems to perform worse after hyperparameter tuning, aside from a tiny improvement in the error score for the DC data. 

In [172]:
print('XGBoost: \nDC set: \nBase model r2: 0.8994784355163574 \nTuned model r2:',result_boost_dc1.best_score_)
print('\nBase model rmse: 69.92294989890084 \nTuned model rmse:',(result_boost_dc2.best_score_*-1))
print('\nLondon set: \nBase model r2: 0.9133748412132263 \nTuned model r2:',result_boost_l1.best_score_)
print('\nBase model rmse: 332.20786503370334 \nTuned model rmse:',(result_boost_l2.best_score_*-1))

XGBoost: 
DC set: 
Base model r2: 0.8994784355163574 
Tuned model r2: 0.7963137030601501

Base model rmse: 69.92294989890084 
Tuned model rmse: 56.490411500995606

London set: 
Base model r2: 0.9133748412132263 
Tuned model r2: 0.919543719291687

Base model rmse: 332.20786503370334 
Tuned model rmse: 293.8955040541115


Here the R2 scores still do poorly, either getting worse or improving a very small amount, but there is a more meaningful improvement in the error score, which noticeably degreases. As a result I am going to use `neg_root_mean_squared_error` as my scoring method during grid search. <p>
Because `XGBRegressor` performs much better than `RandomForestRegressor`, I am going to only tune models for the feature selection sets on it. 

In [173]:
print('XGBoost best parameters for all features: \nFor DC set:\n',result_boost_dc2.best_params_)
print('\nFor London set:\n',result_boost_l2.best_params_)

XGBoost best parameters for all features: 
For DC set:
 {'alpha': 0.1, 'lambda': 100, 'learning_rate': 0.3, 'max_depth': 6, 'n_estimators': 180}

For London set:
 {'alpha': 0.75, 'lambda': 125, 'learning_rate': 0.3, 'max_depth': 7, 'n_estimators': 70}


### Tuning on selected features

#### Forward/Backward Selection

In [174]:
X_train_dc_fw = pd.read_pickle('../pickles/split/feat-select/forward/X_train_dc_fw.pkl')
X_train_dc_bw = pd.read_pickle('../pickles/split/feat-select/backward/X_train_dc_bw.pkl')

X_train_l_fwbw = pd.read_pickle('../pickles/split/feat-select/fwbw/X_train_l_fwbw.pkl')

In [175]:
boost = XGBRegressor(objective='reg:squarederror',random_state=42)

**DC Set**<p>
Forward Select

In [176]:
space_dc_fw = {
    'alpha': [1,2,3],
    'lambda': [40,50,60],
    'max_depth': [5,6,7],
    'learning_rate': [0.3,0.2,0.1],
    'n_estimators': [360,370,380]}

In [177]:
search_dc_fw = GridSearchCV(boost, space_dc_fw, scoring='neg_root_mean_squared_error',n_jobs=-1)
result_dc_fw = search_dc_fw.fit(X_train_dc_fw,y_train_dc)
print('Best Score: %s' % (result_dc_fw.best_score_*-1))
print('Best Hyperparameters: %s' % result_dc_fw.best_params_)

Best Score: 56.837789313067205
Best Hyperparameters: {'alpha': 2, 'lambda': 50, 'learning_rate': 0.2, 'max_depth': 6, 'n_estimators': 380}


Backwards Select

In [178]:
space_dc_bw = {
    'alpha': [0.6,0.7,0.8],
    'lambda': [70,80,90],
    'max_depth': [5,6,7],
    'learning_rate': [0.3,0.2,0.1],
    'n_estimators': [270,280,290]}

In [179]:
search_dc_bw = GridSearchCV(boost, space_dc_bw, scoring='neg_root_mean_squared_error',n_jobs=-1)
result_dc_bw = search_dc_bw.fit(X_train_dc_bw,y_train_dc)
print('Best Score: %s' % (result_dc_bw.best_score_*-1))
print('Best Hyperparameters: %s' % result_dc_bw.best_params_)

Best Score: 55.76142506506576
Best Hyperparameters: {'alpha': 0.7, 'lambda': 80, 'learning_rate': 0.2, 'max_depth': 6, 'n_estimators': 280}


**London Set** <br>
(This is forward *and* backward select.)

In [180]:
space_l_fwbw = {
    'alpha': [0.001,0.01,0.1],
    'lambda': [5,10,15],
    'max_depth': [5,6,7],
    'learning_rate': [0.2,0.1,0.01],
    'n_estimators': [140,150,160]}

In [181]:
search_l_fwbw = GridSearchCV(boost, space_l_fwbw, scoring='neg_root_mean_squared_error',n_jobs=-1)
result_l_fwbw = search_l_fwbw.fit(X_train_l_fwbw,y_train_l)
print('Best Score: %s' % (result_l_fwbw.best_score_*-1))
print('Best Hyperparameters: %s' % result_l_fwbw.best_params_)

Best Score: 287.379267057117
Best Hyperparameters: {'alpha': 0.01, 'lambda': 10, 'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 140}


#### Lasso

In [182]:
X_train_dc_lasso = pd.read_pickle('../pickles/split/feat-select/lasso/X_train_dc_lasso.pkl')
X_train_l_lasso = pd.read_pickle('../pickles/split/feat-select/lasso/X_train_l_lasso.pkl')

**DC set**

In [183]:
space_dc_lasso = {
    'alpha': [0.1,1,2],
    'lambda': [120,130,140],
    'max_depth': [5,6,7],
    'learning_rate': [0.3,0.2,0.1],
    'n_estimators': [180,190,200]}

In [184]:
search_dc_lasso = GridSearchCV(boost, space_dc_lasso, scoring='neg_root_mean_squared_error',n_jobs=-1)
result_dc_lasso = search_dc_lasso.fit(X_train_dc_lasso,y_train_dc)
print('Best Score: %s' % (result_dc_lasso.best_score_*-1))
print('Best Hyperparameters: %s' % result_dc_lasso.best_params_)

Best Score: 60.833681953480436
Best Hyperparameters: {'alpha': 1, 'lambda': 130, 'learning_rate': 0.2, 'max_depth': 6, 'n_estimators': 190}


**London Set**

In [185]:
space_l_lasso = {
    'alpha': [0.1,0.01,0.001],
    'lambda': [5,10,15],
    'max_depth': [5,6,7],
    'learning_rate': [0.2,0.1,0.01],
    'n_estimators': [150,160,170]}

In [186]:
search_l_lasso = GridSearchCV(boost, space_l_lasso, scoring='neg_root_mean_squared_error',n_jobs=-1)
result_l_lasso = search_l_lasso.fit(X_train_l_lasso,y_train_l)
print('Best Score: %s' % (result_l_lasso.best_score_*-1))
print('Best Hyperparameters: %s' % result_l_lasso.best_params_)

Best Score: 287.6278104393208
Best Hyperparameters: {'alpha': 0.01, 'lambda': 10, 'learning_rate': 0.1, 'max_depth': 6, 'n_estimators': 160}


Let's put all of the results next to each other for easy comparisson. 

In [187]:
print(f'XGBoost, rmse: \nDC set: \n  Base model: 69.92294989890084 \n  All features: {(result_boost_dc2.best_score_*-1)} \n  Fw features: {(result_dc_fw.best_score_*-1)}')
print(f'  Bw features: {(result_dc_bw.best_score_*-1)} \n  Lasso features: {(result_dc_lasso.best_score_*-1)} \n\nLondon set: \n  Base model: 332.20786503370334')
print(f'  All features: {(result_boost_l2.best_score_*-1)} \n  FwBw features: {(result_l_fwbw.best_score_*-1)} \n  Lasso features: {(result_l_lasso.best_score_*-1)}')

XGBoost, rmse: 
DC set: 
  Base model: 69.92294989890084 
  All features: 56.490411500995606 
  Fw features: 56.837789313067205
  Bw features: 55.76142506506576 
  Lasso features: 60.833681953480436 

London set: 
  Base model: 332.20786503370334
  All features: 293.8955040541115 
  FwBw features: 287.379267057117 
  Lasso features: 287.6278104393208


Both sets see a significant reduction in their error values; DC drops by about 20% and London by 13%. This is only on the training set, however, so I still need to run the entire model and see how it does, and see what variation of the features does the best, if there is even a significant difference.

I'm going to actaully run and do final evaluation of the models in another notebook for the sake of usability, so I need to print out the best parameters, to use in initiating models. 

In [188]:
print('dc allfeat params:',result_boost_dc2.best_params_)
print('lond allfeat params',result_boost_l2.best_params_)
print('\ndc fw params:', result_dc_fw.best_params_)
print('dc bw params:', result_dc_bw.best_params_)
print('lond fwbw params:', result_l_fwbw.best_params_)
print('\ndc lasso params:', result_dc_lasso.best_params_)
print('lond lasso params:', result_l_lasso.best_params_)

dc allfeat params: {'alpha': 0.1, 'lambda': 100, 'learning_rate': 0.3, 'max_depth': 6, 'n_estimators': 180}
lond allfeat params {'alpha': 0.75, 'lambda': 125, 'learning_rate': 0.3, 'max_depth': 7, 'n_estimators': 70}

dc fw params: {'alpha': 2, 'lambda': 50, 'learning_rate': 0.2, 'max_depth': 6, 'n_estimators': 380}
dc bw params: {'alpha': 0.7, 'lambda': 80, 'learning_rate': 0.2, 'max_depth': 6, 'n_estimators': 280}
lond fwbw params: {'alpha': 0.01, 'lambda': 10, 'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 140}

dc lasso params: {'alpha': 1, 'lambda': 130, 'learning_rate': 0.2, 'max_depth': 6, 'n_estimators': 190}
lond lasso params: {'alpha': 0.01, 'lambda': 10, 'learning_rate': 0.1, 'max_depth': 6, 'n_estimators': 160}
