<a href="https://colab.research.google.com/github/Sumitjh26997/CSS581-ML/blob/main/PCA_GradiantBoost.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Chapter 7 – Boosting**

_This notebook covers information from chapter 7 of the textbook on Gradiant Boosting. For help on sample code see page 205._

Train a Gradiant Boosting regressor on the California housing dataset._

Let's load the dataset using Scikit-Learn's `fetch_california_housing()` function:

In [1]:
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
X = housing["data"]
y = housing["target"]

In [2]:
y

array([4.526, 3.585, 3.521, ..., 0.923, 0.847, 0.894])

Split it into a training set and a test set:

In [3]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [4]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Q1. Use GradientBoostingRegressor model in two settings 1) learning rate 1, number of estimators =3, and 2) learning rate 0.1 number of estimator =200 . What do you observe. Describe your observatioon in text format in canvas PCA.

In [19]:
from sklearn.ensemble import GradientBoostingRegressor
gbrt1 = GradientBoostingRegressor(n_estimators=3, learning_rate=1.0)
gbrt1.fit(X_train_scaled, y_train)
y_pred1 = gbrt1.predict(X_test_scaled)

gbrt2 = GradientBoostingRegressor(n_estimators=200, learning_rate=0.1)
gbrt2.fit(X_train_scaled, y_train)
y_pred2 = gbrt2.predict(X_test_scaled)

In [20]:
from sklearn.metrics import mean_squared_error

print(mean_squared_error(y_test, y_pred1))
print(mean_squared_error(y_test, y_pred2))

0.4473509371594874
0.2614592448892024


Q2. What is the good choice of values for these two hyper-parameters.
Hint:

In order to find the optimal number of trees, you can use early stopping. A simple way to implement this is to use the staged_predict() method: it returns an iterator over the predictions made by the ensemble at each stage of train‐ ing (with one tree, two trees, etc.).  See example code in book page 206.

In [35]:
import numpy as np

gbrt = GradientBoostingRegressor(n_estimators=1000, learning_rate=0.1)
gbrt.fit(X_train_scaled, y_train)
errors = [mean_squared_error(y_test, y_pred)
for y_pred in gbrt.staged_predict(X_test_scaled)]
bst_n_estimators = np.argmin(errors) + 1

bst_n_estimators

991

In [36]:
gbrt_best = GradientBoostingRegressor(n_estimators=bst_n_estimators, learning_rate=0.1)
gbrt_best.fit(X_train_scaled, y_train)
y_pred = gbrt_best.predict(X_test_scaled)
print(mean_squared_error(y_test, y_pred))

0.22319500733262296


In [30]:
gbrtw = GradientBoostingRegressor(warm_start=True)

min_val_error = float("inf")
error_going_up = 0
for n_estimators in range(1, 1000):
  gbrtw.n_estimators = n_estimators
  gbrtw.fit(X_train, y_train)
  y_predw = gbrtw.predict(X_test)
  val_error = mean_squared_error(y_test, y_predw)
  if val_error < min_val_error:
    min_val_error = val_error
    error_going_up = 0
  else:
    error_going_up += 1
    if error_going_up == 5:
      break # early stopping

In [31]:
n_estimators

317

In [None]:

gbrtw2 = GradientBoostingRegressor(warm_start=True)

min_val_error = float("inf")
error_going_up = 0
for n_estimators in range(1, 120):
  gbrtw2.n_estimators = n_estimators
  gbrtw2.fit(X_train_scaled, y_train)
  y_predw = gbrtw2.predict(X_test)
  val_error = mean_squared_error(y_test, y_predw)
  if val_error < min_val_error:
    min_val_error = val_error
    error_going_up = 0
  else:
    error_going_up += 1
    if error_going_up == 5:
      break # early stopping

In [32]:
mean_squared_error(y_test, y_predw)

0.24746468738434815

In [39]:
from sklearn.model_selection import RandomizedSearchCV

param_grid = {
    'learning_rate': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
    'n_estimators': range(1, 200)
}


# Initialize the RandomizedSearchCV
random_search = RandomizedSearchCV(GradientBoostingRegressor(), param_distributions=param_grid, n_iter=10, verbose=2, cv=3, random_state=42)

# Fit the RandomizedSearchCV to the data
random_search.fit(X_train, y_train)

# Get the best parameters
best_params = random_search.best_params_

print("Best parameters: ", best_params)



Fitting 3 folds for each of 10 candidates, totalling 30 fits
[CV] END ................learning_rate=0.6, n_estimators=132; total time=   5.7s
[CV] END ................learning_rate=0.6, n_estimators=132; total time=   5.6s
[CV] END ................learning_rate=0.6, n_estimators=132; total time=   3.9s
[CV] END .................learning_rate=0.8, n_estimators=67; total time=   2.7s
[CV] END .................learning_rate=0.8, n_estimators=67; total time=   1.9s
[CV] END .................learning_rate=0.8, n_estimators=67; total time=   1.9s
[CV] END .................learning_rate=0.5, n_estimators=65; total time=   1.8s
[CV] END .................learning_rate=0.5, n_estimators=65; total time=   1.8s
[CV] END .................learning_rate=0.5, n_estimators=65; total time=   1.8s
[CV] END ................learning_rate=0.7, n_estimators=101; total time=   3.7s
[CV] END ................learning_rate=0.7, n_estimators=101; total time=   2.9s
[CV] END ................learning_rate=0.7, n_es

In [41]:
y_pred_best = GradientBoostingRegressor(n_estimators= 136, learning_rate=0.6).fit(X_train_scaled, y_train).predict(X_test_scaled)
mean_squared_error(y_test, y_pred_best)

0.2496484627810789