In [1]:
data_file = '../combined_data.csv'  # Load variable so data_loader can locale the csv filesystem path accordingly.
%run ../data_loader.ipynb 

Data loaded:
- x_train_sc: Scaled training features.
- x_test_sc: Scaled testing features.
- x_train - Training features.
- x_test - Testing features.
- y_train - Training labels.
- y_test - Testing labels.


In [8]:
%run ../FeatureEngineering/PCA.ipynb

PCA applied to the training and testing features:
- x_train_pca_Trans_sc: Scaled training features.
- x_test_pca_Trans_sc: Scaled testing features.
- x_train_pca_Trans - Training features.
- x_test_pca_Trans - Testing features.


Run BayesSearchCV for a list of models
---

### Expects models = list()


In [35]:
rd_model = RandomForestRegressor(random_state=42)
rd_search_space = {
    'n_estimators': Integer(10, 40),
    'max_depth': Integer(3, 10),
    'min_samples_split': Real(0.01, 0.1)
}
models = [[rd_model, rd_search_space]]

In [29]:
import time

import numpy as np
from skopt import BayesSearchCV

np.int = int  # to fix attribute error: AttributeError: module 'numpy' has no attribute 'int'.

In [32]:
duration = time.time()

for model in models:
    start_time = time.time()

    # Initialize Bayesian optimization
    opt = BayesSearchCV(model[0], model[1], n_iter=8, random_state=42, cv=3, scoring='neg_mean_squared_error')
    # Perform the optimization
    opt.fit(x_train, y_train)

    end_time = time.time()
    # Calculate the duration
    duration = end_time - start_time

In [33]:
print(f"Bayesian Optimization took {duration:.2f} sec.")
print("Best parameters found: ", opt.best_params_)
print("Best score found: ", -opt.best_score_)
print("Best estimator found: ", opt.best_estimator_)

Bayesian Optimization took 69.04
Best parameters found:  OrderedDict([('max_depth', 9), ('min_samples_split', 0.025468440525690465), ('n_estimators', 21)])
Best score found:  0.4779925255478792
Best estimator found:  RandomForestRegressor(max_depth=9, min_samples_split=0.025468440525690465,
                      n_estimators=21, random_state=42)


In [34]:
"""Future work:
   - Load train set and models from variables and make notebook callable from different notebooks.

* How to fix the 'numpy.int' attribute error when using skopt.BayesSearchCV in scikit-learn?

https://stackoverflow.com/questions/76321820/how-to-fix-the-numpy-int-attribute-error-when-using-skopt-bayessearchcv-in-sci

"""

"Future work:\n   - Load train set and models from variables and make notebook callable from different notebooks.\n\n* How to fix the 'numpy.int' attribute error when using skopt.BayesSearchCV in scikit-learn?\n\nhttps://stackoverflow.com/questions/76321820/how-to-fix-the-numpy-int-attribute-error-when-using-skopt-bayessearchcv-in-sci\n\n"