<h1> Fine Tuning </h1>

***Please switch a kernel to possum_regression.***

In [485]:
%run ml_functions.ipynb

In [44]:
X, y = train_set_new_ready, train_set_labels

<h1> First Approach </h1>

<h2> XGBRegressor (GradientBoostingRegressor) </h2>

On a side note, XGBoost does the feature selection for us.

<h4> Fix learning rate and number of estimators </h4>

We will start tuning with a set of parameters with common values. Additionaly let's set learning_rate to 0.1 and try to establish n_estimators value.

In [134]:
distributions = { 'learning_rate':[0.1], # fixed, 0.1 is a good value to start with, generally use sth in 0.05-0.3
                  'n_estimators': randint(20,80), # 40-70
                  'max_depth': [5], # 5-8
                  'min_child_weight': [1], # dataset is very small, 1 should be at the beginning
                  'gamma': [0], # good value to start is 0, also in 0.1 to 0.2
                  'subsample': [0.8], # value to start should be in 0.5-0.9
                  'colsample_bytree': [0.8], # value to start should be in 0.5-0.9
                }

print(regression_randomized_tuning(X, y, distributions, XGBRegressor(), n_iter=61, r_state=42))

(0.2304519789548993, 1.725707102139057, {'colsample_bytree': 0.8, 'gamma': 0, 'learning_rate': 0.1, 'max_depth': 5, 'min_child_weight': 1, 'n_estimators': 43, 'subsample': 0.8})



Let's set n_estimators to 43.

<h4> Tune max_depth and min_child_weight </h4>

In [135]:
distributions = { 'learning_rate':[0.1], # fixed
                  'n_estimators': [43], # tuned
                  'max_depth': [None, 1, 2, 3, 4, 5], # tuning
                  'min_child_weight': [None, 1, 2, 3 , 4, 5, 6], # tuning
                  'gamma': [0], # good value to start is 0, also in 0.1 to 0.2
                  'subsample': [0.8], # value to start should be in 0.5-0.9
                  'colsample_bytree': [0.8], # value to start should be in 0.5-0.9
                }

print(regression_randomized_tuning(X, y, distributions, XGBRegressor(), n_iter=42, r_state=42))

(1.2485443889419527, 1.668903323744559, {'subsample': 0.8, 'n_estimators': 43, 'min_child_weight': 4, 'max_depth': 1, 'learning_rate': 0.1, 'gamma': 0, 'colsample_bytree': 0.8})



As we can see, an optimal pair for these parameters is (1, 4)

<h4> Tune gamma </h4>

In [136]:
distributions = { 'learning_rate':[0.1], # fixed
                  'n_estimators': [43], # tuned
                  'max_depth': [1], # tuned
                  'min_child_weight': [4], # tuned
                  'gamma': uniform(loc=0.4, scale=0.3), # tuning
                  'subsample': [0.8], # value to start should be in 0.5-0.9
                  'colsample_bytree': [0.8], # value to start should be in 0.5-0.9
                }

print(regression_randomized_tuning(X, y, distributions, XGBRegressor(), n_iter=48, r_state=42))

(1.2485443889419527, 1.7056237368481366, {'colsample_bytree': 0.8, 'gamma': 0.5123620356542088, 'learning_rate': 0.1, 'max_depth': 1, 'min_child_weight': 4, 'n_estimators': 43, 'subsample': 0.8})



Let's update gamma and tune n_estimators again.

In [137]:
distributions = { 'learning_rate':[0.1], # fixed
                  'n_estimators': randint(20,80), # tuning
                  'max_depth': [1], # tuned
                  'min_child_weight': [4], # tuned
                  'gamma': [0.51], # tuned
                  'subsample': [0.8], # value to start should be in 0.5-0.9
                  'colsample_bytree': [0.8], # value to start should be in 0.5-0.9
                }

print(regression_randomized_tuning(X, y, distributions, XGBRegressor(), n_iter=48, r_state=42))

(1.2099438214624203, 1.7044300475072558, {'colsample_bytree': 0.8, 'gamma': 0.51, 'learning_rate': 0.1, 'max_depth': 1, 'min_child_weight': 4, 'n_estimators': 49, 'subsample': 0.8})



Now, we set n_estimators to 49.

<h4> Tune subsample and colsample_bytree </h4>

In [139]:
distributions = { 'learning_rate':[0.1], # fixed
                  'n_estimators': [49], # tuned x2
                  'max_depth': [1], # tuned
                  'min_child_weight': [4], # tuned
                  'gamma': [0.51], # tuned
                  'subsample': uniform(loc=0.6, scale=0.3), # tuning
                  'colsample_bytree': uniform(loc=0.6, scale=0.3), # tuning
                }

print(regression_randomized_tuning(X, y, distributions, XGBRegressor(), n_iter=80, r_state=42))

(1.2189847522223576, 1.7152590035573587, {'colsample_bytree': 0.677633994480005, 'gamma': 0.51, 'learning_rate': 0.1, 'max_depth': 1, 'min_child_weight': 4, 'n_estimators': 49, 'subsample': 0.7987566853061946})



A good pair is (0.8, 0.68)

<h4> Tuning Regularization Parameters </h4>

In [140]:
distributions = { 'learning_rate':[0.1], # fixed
                  'n_estimators': [49], # tuned x2
                  'max_depth': [1], # tuned
                  'min_child_weight': [4], # tuned
                  'gamma': [0.51], # tuned
                  'subsample': [0.8], # tuned
                  'colsample_bytree': [0.68], # tuned
                  'reg_alpha': [0.00001, 0.0001, 0.001, 0.01, 0.1, 0, 1, 10 ,100, 1000], # tuning
                  'reg_lambda': [0.00001, 0.0001, 0.001, 0.01, 0.1, 0, 1, 10 ,100, 1000], # tuning
                }

print(regression_randomized_tuning(X, y, distributions, XGBRegressor(), n_iter=81, r_state=42))

(1.3074638431001107, 1.7265690179540432, {'subsample': 0.8, 'reg_lambda': 10, 'reg_alpha': 0.001, 'n_estimators': 49, 'min_child_weight': 4, 'max_depth': 1, 'learning_rate': 0.1, 'gamma': 0.51, 'colsample_bytree': 0.68})



Let's narrow down the search

In [141]:
distributions = { 'learning_rate':[0.1], # fixed
                  'n_estimators': [49], # tuned x2
                  'max_depth': [1], # tuned
                  'min_child_weight': [4], # tuned
                  'gamma': [0.51], # tuned
                  'subsample': [0.8], # tuned
                  'colsample_bytree': [0.68], # tuned
                  'reg_alpha': uniform(loc=0.0001, scale=0.01), # tuning x2
                  'reg_lambda': uniform(loc=1, scale=99), # tuning x2
                }

print(regression_randomized_tuning(X, y, distributions, XGBRegressor(), n_iter=160, r_state=42))

(1.2639711696868277, 1.7119251415345833, {'colsample_bytree': 0.68, 'gamma': 0.51, 'learning_rate': 0.1, 'max_depth': 1, 'min_child_weight': 4, 'n_estimators': 49, 'reg_alpha': 0.00588280140996174, 'reg_lambda': 4.5582851058774665, 'subsample': 0.8})


In [142]:
distributions = { 'learning_rate':[0.1], # fixed
                  'n_estimators': [49], # tuned x2
                  'max_depth': [1], # tuned
                  'min_child_weight': [4], # tuned
                  'gamma': [0.51], # tuned
                  'subsample': [0.8], # tuned
                  'colsample_bytree': [0.68], # tuned
                  'reg_alpha': uniform(loc=0.0001, scale=0.0009), # tuning x3
                  'reg_lambda': uniform(loc=1, scale=9), # tuning x3
                }

print(regression_randomized_tuning(X, y, distributions, XGBRegressor(), n_iter=160, r_state=42))

(1.2511079932398912, 1.703689847623607, {'colsample_bytree': 0.68, 'gamma': 0.51, 'learning_rate': 0.1, 'max_depth': 1, 'min_child_weight': 4, 'n_estimators': 49, 'reg_alpha': 0.0004887505167779041, 'reg_lambda': 3.6210622617823773, 'subsample': 0.8})


Good values are for reg_alpha 0.0005, for reg_lambda 3.62

<h4> Reducing Learning Rate </h4>

I did this stage, but it clearly don't help this time.

<h4> Final model </h4>

In [144]:
distributions = { 'learning_rate':[0.1], # fixed
                  'n_estimators': [49], # tuned x2
                  'max_depth': [1], # tuned
                  'min_child_weight': [4], # tuned
                  'gamma': [0.51], # tuned
                  'subsample': [0.8], # tuned
                  'colsample_bytree': [0.68], # tuned
                  'reg_alpha': [0.0005], # tuned
                  'reg_lambda': [3.62], # tuned
                }

print(regression_randomized_tuning(X, y, distributions, XGBRegressor(), n_iter=1, r_state=42))

(1.2510986572004386, 1.7172635259575793, {'subsample': 0.8, 'reg_lambda': 3.62, 'reg_alpha': 0.0005, 'n_estimators': 49, 'min_child_weight': 4, 'max_depth': 1, 'learning_rate': 0.1, 'gamma': 0.51, 'colsample_bytree': 0.68})


<h4> Summary </h4>

The best model obtained, don't overfit as much as the one before tuning. Also, it's cv_rmse is better, which is more important than rmse for a train set.

<h2> ExtraTreesRegressor </h2>

<h4> Explore Number of Trees </h4>

https://machinelearningmastery.com/extra-trees-ensemble-with-python/

In [None]:
distributions = { 'n_estimators': randint(20,180) # tuning
                }

print(regression_randomized_tuning(X, y, distributions, ensemble.ExtraTreesRegressor(), n_iter=161, r_state=42))

<h4> Explore Number of Features </h4>

In [None]:
distributions = { 'n_estimators': [32], # tuned
                  'max_features': [None, 1, 2, 3, 4, 5, 6, 7, 8, 9] # tuning
                }

print(regression_randomized_tuning(X, y, distributions, ensemble.ExtraTreesRegressor(), n_iter=10, r_state=42))

<h4> Explore Minimum Samples per Split </h4>

In [None]:
distributions = { 'n_estimators': [32], # tuned
                  'max_features': [6], # tuned
                  'min_samples_split': randint(2,14)
                }

print(regression_randomized_tuning(X, y, distributions, ensemble.ExtraTreesRegressor(), n_iter=14, r_state=42))

<h4> Max depth tuning </h4>

In [None]:
distributions = { 'n_estimators': [32], # tuned
                  'max_features': [6], # tuned
                  'min_samples_split': [8], # tuned
                  'max_depth': [None, 9, 10, 11, 12, 13, 14, 15], # tuning 
                }

print(regression_randomized_tuning(X, y, distributions, ensemble.ExtraTreesRegressor(), n_iter=8, r_state=42))

<h4> Tuning other crucial hyperparameters </h4>

In [None]:
X, y = train_set_new_ready, train_set_labels

distributions = { 'n_estimators': [32], # tuned
                  'max_features': [6], # tuned
                  'min_samples_split': [8], # tuned
                  'max_depth': [9], # tuned 
                  'max_leaf_nodes': [None, 2, 3, 4, 5, 6, 7, 8], # tuning 
                  'max_samples': [None, 0.7, 0.8, 0.9, 1], # tuning 
                }

print(regression_randomized_tuning(X, y, distributions, ensemble.ExtraTreesRegressor(), n_iter=40, r_state=42))

At the end, let's re-tune n_estimators (mainly to prevent overfitting).

In [None]:
distributions = { 'n_estimators': randint(20, 180), # tuning x2
                  'max_features': [6], # tuned
                  'min_samples_split': [8], # tuned
                  'max_depth': [9], # tuned 
                  'max_leaf_nodes': [None], # tuned 
                  'max_samples': [None], # tuned 
                }

print(regression_randomized_tuning(X, y, distributions, ensemble.ExtraTreesRegressor(), n_iter=161, r_state=42))

<h4> Final Model </h4>

In [157]:
X, y = train_set_new_ready, train_set_labels

distributions = { 'n_estimators': [21], # tuning x2
                  'max_features': [6], # tuned
                  'min_samples_split': [8], # tuned
                  'max_depth': [9], # tuned 
                  'max_leaf_nodes': [None], # tuned 
                  'max_samples': [None], # tuned 
                }

print(regression_randomized_tuning(X, y, distributions, ensemble.ExtraTreesRegressor(), n_iter=1, r_state=42))

(0.9446131396652445, 1.6977702805890822, {'n_estimators': 21, 'min_samples_split': 8, 'max_samples': None, 'max_leaf_nodes': None, 'max_features': 6, 'max_depth': 9})


<h4> Summary </h4>

Unfortunately, rmse for train set is still much better than for cv.

<h2> RandomForestRegressor </h2>

<h4> Tuning number of trees </h4>

In [None]:
distributions = { 'n_estimators': randint(20, 80), # tuning
                }

print(regression_randomized_tuning(X, y, distributions, ensemble.RandomForestRegressor(), n_iter=61, r_state=42))

<h4> Tuning number of features </h4>

In [None]:
distributions = { 'n_estimators': [47], # tuned
                  'max_features': randint(1,9), # tuning
                }

print(regression_randomized_tuning(X, y, distributions, ensemble.RandomForestRegressor(), n_iter=9, r_state=42))

<h4> Tuning max_depth </h4>

In [None]:
distributions = { 'n_estimators': [47], # tuned
                  'max_features': [4], # tuned
                  'max_depth': [None, 1, 2, 3, 4, 5] # tuning
                }

print(regression_randomized_tuning(X, y, distributions, ensemble.RandomForestRegressor(), n_iter=6, r_state=42))

<h4> Tuning other crucial hyperparameters </h4>

In [None]:
distributions = { 'n_estimators': [47], # tuned
                  'max_features': [4], # tuned
                  'max_depth': [5], # tuned
                  'max_leaf_nodes': [None, 2, 3, 4, 5, 6], # tuning 
                  'max_samples': [None, 0.7, 0.8, 0.9, 1], # tuning 
                }

print(regression_randomized_tuning(X, y, distributions, ensemble.RandomForestRegressor(), n_iter=30, r_state=42))

At the end, let's re-tune n_estimators (mainly to prevent overfitting).

In [None]:
distributions = { 'n_estimators': randint(20,80), # tuning x2
                  'max_features': [4], # tuned
                  'max_depth': [5], # tuned
                  'max_leaf_nodes': [4], # tuning 
                  'max_samples': [0.9], # tuning 
                }

print(regression_randomized_tuning(X, y, distributions, ensemble.RandomForestRegressor(), n_iter=30, r_state=42))

<h4> Final Model <h4>

In [153]:
distributions = { 'n_estimators': [43], # tuned
                  'max_features': [4], # tuned
                  'max_depth': [5], # tuned
                  'max_leaf_nodes': [4], # tuning 
                  'max_samples': [0.9], # tuning 
                }

print(regression_randomized_tuning(X, y, distributions, ensemble.RandomForestRegressor(), n_iter=1, r_state=42))

(1.3651434704757237, 1.7171097459889522, {'n_estimators': 43, 'max_samples': 0.9, 'max_leaf_nodes': 4, 'max_features': 4, 'max_depth': 5})


<h4> Summary </h4>

Again, we have largely prevented an overfitting.

<h1> Second Approach </h1>

<h2> RandomForestRegressor </h2>

<h4> Tuning number of trees </h4>

In [393]:
distributions = { 'n_estimators': randint(20, 80), # tuning
                }

print(regression_randomized_tuning(X, y, distributions, ensemble.RandomForestRegressor(), n_iter=61, r_state=42))

(0.6568380504278021, 1.7412595581861383, {'n_estimators': 70})


<h4> Tuning number of features </h4>

In [394]:
distributions = { 'n_estimators': [70], # tuned
                  'max_features': randint(1,9), # tuning
                }

print(regression_randomized_tuning(X, y, distributions, ensemble.RandomForestRegressor(), n_iter=9, r_state=42))

(0.5974051030438702, 1.6053268279045483, {'max_features': 4, 'n_estimators': 70})


<h4> Tuning max_depth </h4>

In [395]:
distributions = { 'n_estimators': [70], # tuned
                  'max_features': [4], # tuned
                  'max_depth': [None, 1, 2, 3, 4, 5] # tuning
                }

print(regression_randomized_tuning(X, y, distributions, ensemble.RandomForestRegressor(), n_iter=6, r_state=42))

(1.3109711674817301, 1.6765264728004179, {'n_estimators': 70, 'max_features': 4, 'max_depth': 3})


<h4> Tuning other crucial hyperparameters </h4>

In [396]:
distributions = { 'n_estimators': [70], # tuned
                  'max_features': [4], # tuned
                  'max_depth': [3], # tuned
                  'max_leaf_nodes': [None, 2, 3, 4, 5, 6, 7, 8], # tuning 
                  'max_samples': [None, 0.7, 0.8, 0.9, 1], # tuning 
                }

print(regression_randomized_tuning(X, y, distributions, ensemble.RandomForestRegressor(), n_iter=40, r_state=42))

(1.3395238077937672, 1.645106034018814, {'n_estimators': 70, 'max_samples': 0.9, 'max_leaf_nodes': 7, 'max_features': 4, 'max_depth': 3})


At the end, let's re-tune n_estimators (mainly to prevent overfitting).

In [397]:
distributions = { 'n_estimators': randint(20,80), # tuning x2
                  'max_features': [4], # tuned
                  'max_depth': [3], # tuned
                  'max_leaf_nodes': [7], # tuning 
                  'max_samples': [0.9], # tuning 
                }

print(regression_randomized_tuning(X, y, distributions, ensemble.RandomForestRegressor(), n_iter=30, r_state=42))

(1.3434233885668212, 1.692232358850735, {'max_depth': 3, 'max_features': 4, 'max_leaf_nodes': 7, 'max_samples': 0.9, 'n_estimators': 30})


<h4> Final Model <h4>

In [398]:
distributions = { 'n_estimators': [30], # tuned x2
                  'max_features': [4], # tuned
                  'max_depth': [3], # tuned
                  'max_leaf_nodes': [7], # tuning 
                  'max_samples': [0.9], # tuning 
                }

print(regression_randomized_tuning(X, y, distributions, ensemble.RandomForestRegressor(), n_iter=1, r_state=42))

(1.3279880144665084, 1.6452845979119477, {'n_estimators': 30, 'max_samples': 0.9, 'max_leaf_nodes': 7, 'max_features': 4, 'max_depth': 3})


<h4> Summary </h4>

We have largely prevented an overfitting, with a decrease of rmse for cv.

<h2> GaussianProcessRegressor </h2>

In [416]:
%%capture --no-display

distributions = { 'kernel': [None, ConstantKernel(), DotProduct(), ExpSineSquared(), Matern(),
              PairwiseKernel(), RationalQuadratic(), RBF(), WhiteKernel()],
                 'alpha': [None, 1e-10, 1e-9, 1e-8, 1e-7, 1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1], # tuning
                 'n_restarts_optimizer': [0, 1, 2, 3, 4]
                }

results = regression_randomized_tuning(X, y, distributions, GaussianProcessRegressor(), n_iter=540, r_state=42)

In [417]:
results

(0.20133381531671257,
 1.6309649415131617,
 {'n_restarts_optimizer': 3,
  'kernel': RationalQuadratic(alpha=1, length_scale=1),
  'alpha': 0.1})

This model clearly overfits. Additionaly, it does not offer a method to evaluate feature importances, so mainly idea this time possibly is to understand kernels better to customize them. It might be time consuming and differs from assumptions of this project, so I will tune another model - NuSVR.

<h2> SVR </h2>

https://12ft.io/proxy?q=https%3A%2F%2Ftowardsdatascience.com%2Fhyperparameter-tuning-for-support-vector-machines-c-and-gamma-parameters-6a5097416167

<h3> Let's find a kernel </h3>

In [418]:
distributions = { 'kernel': ['rbf', 'linear', 'poly', 'sigmoid'], # tuning
                }

print(regression_randomized_tuning(X, y, distributions, SVR(), n_iter=4, r_state=42))

(1.3387143200154634, 1.6438048009634274, {'kernel': 'rbf'})


<h3> Tune C and gamma </h3>

In [419]:
distributions = { 'kernel': ['rbf'], # tuned
                  'gamma': ['scale', 'auto', 0.0001, 0.001, 0.01, 0.1, 1, 10], # tuning
                  'C': [0.1, 1, 10, 100], #tuning
                }

print(regression_randomized_tuning(X, y, distributions, SVR(), n_iter=32, r_state=42))

(1.3163027240685563, 1.7353874408950853, {'kernel': 'rbf', 'gamma': 0.1, 'C': 1})


<h4> Final Model <h4>

In [421]:
distributions = { 'kernel': ['rbf'], # tuned
                  'gamma': [0.1], # tuned
                  'C': [1], #tuned
                }

print(regression_randomized_tuning(X, y, distributions, SVR(), n_iter=1, r_state=42))

(1.3163027240685563, 1.6415807654522472, {'kernel': 'rbf', 'gamma': 0.1, 'C': 1})


<h4> Summary </h4>

SVR looks promising.

<h2> NuSVR </h2>

<h3> Find a kernel and nu value. </h3>

In [422]:
distributions = { 'kernel': ['rbf', 'linear', 'poly', 'sigmoid'], # tuning
                  'nu': uniform(loc=0, scale=1), # tuning
                }

print(regression_randomized_tuning(X, y, distributions, NuSVR(), n_iter=200, r_state=42))

(1.3361812965298494, 1.7315117588221176, {'kernel': 'rbf', 'nu': 0.8389335020693633})


In [423]:
distributions = { 'kernel': ['rbf'], # tuned
                  'nu': uniform(loc=0.74, scale=0.1), # tuning x2
                }

print(regression_randomized_tuning(X, y, distributions, NuSVR(), n_iter=200, r_state=42))

(1.3353284651103843, 1.6386226718272303, {'kernel': 'rbf', 'nu': 0.7992414568862043})


<h3> Tune C and gamma </h3>

In [424]:
distributions = { 'kernel': ['rbf'], # tuned
                  'nu': [0.8], # tuned
                  'gamma': ['scale', 'auto', 0.0001, 0.001, 0.01, 0.1, 1, 10], # tuning
                  'C': [0.1, 1, 10, 100], #tuning
    
                }

print(regression_randomized_tuning(X, y, distributions, NuSVR(), n_iter=32, r_state=42))

(1.3140729740215606, 1.7362166917804525, {'nu': 0.8, 'kernel': 'rbf', 'gamma': 0.1, 'C': 1})


<h3> Final Model </h3>

In [428]:
distributions = { 'kernel': ['rbf'], # tuned
                  'nu': [0.8], # tuned
                  'gamma': [0.1], # tuned
                  'C': [1], #tuned
                }

print(regression_randomized_tuning(X, y, distributions, NuSVR(), n_iter=1, r_state=42))

(1.3140729740215606, 1.6362666242496062, {'nu': 0.8, 'kernel': 'rbf', 'gamma': 0.1, 'C': 1})


<h3> Summary </h3>

NuSVR also looks promising.