In [None]:
# === Environment Setup ===
import os
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display, Markdown, Image
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# --- Configuration ---
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams.update({'font.size': 14, 'figure.figsize': (10, 6), 'figure.dpi': 150})
np.set_printoptions(suppress=True, linewidth=120, precision=4)

# --- Utility Functions ---
def note(msg): display(Markdown(f"<div class='alert alert-info'>📝 {msg}</div>"))
def sec(title): print(f'\n{80*"="}\n| {title.upper()} |\n{80*"="}')

note("Environment initialized for Gradient Boosting Machines.")

# Chapter 7.2: Gradient Boosting Machines

---

### Table of Contents

1.  [**Boosting Intuition: Learning from Errors**](#intro)
2.  [**The Gradient Boosting Algorithm**](#algorithm)
3.  [**XGBoost: The Workhorse of Tabular Data**](#xgboost)
4.  [**Code Lab: Predicting House Prices with XGBoost**](#code-lab)
5.  [**Summary**](#summary)

<a id='intro'></a>
## 1. Boosting Intuition: Learning from Errors

**Boosting** is an ensemble technique that builds models in a sequential fashion. Each new model is trained to correct the errors made by its predecessors. Unlike bagging, which focuses on reducing variance, boosting aims to reduce bias.

The core idea is to fit a sequence of weak learners (e.g., shallow decision trees) to weighted versions of the data, where more weight is given to the observations that were misclassified by earlier models.

![The Boosting Process](images/07-Machine-Learning/boosting_process.png)

<a id='algorithm'></a>
## 2. The Gradient Boosting Algorithm

**Gradient Boosting** frames the boosting problem as a gradient descent optimization in function space. Each new weak learner is trained to fit the negative gradient of the loss function with respect to the predictions of the current ensemble. For squared error loss, this simplifies to fitting each new tree to the *residuals* (the errors) of the previous model.
> **Historical Context: Gradient Boosting**
> The Gradient Boosting algorithm was developed by Jerome Friedman in 1999. It is a generalization of the AdaBoost algorithm, and it allows for the use of arbitrary differentiable loss functions. This makes it a very flexible and powerful tool, and it is one of the most widely used machine learning algorithms today.

<a id='xgboost'></a>
## 3. XGBoost: The Workhorse of Tabular Data

**XGBoost (eXtreme Gradient Boosting)** is a highly efficient and effective implementation of the gradient boosting algorithm. It includes several key innovations:
- **Regularization:** It adds L1 and L2 regularization terms to the objective function to prevent overfitting.
- **Sparsity Awareness:** It can handle missing values efficiently.
- **Parallelization:** It can parallelize the construction of trees.

XGBoost is often the go-to algorithm for competitions and real-world applications involving tabular data.

<a id='code-lab'></a>
## 4. Code Lab: Predicting House Prices with XGBoost

Let's use XGBoost to predict house prices from a set of features.

In [None]:
sec("XGBoost for House Price Prediction")

# Generate synthetic data
np.random.seed(42)
X = np.random.rand(100, 5) * 10
y = 50 + (X[:, 0] * 1.5) + (X[:, 1] * 0.8) + (X[:, 2] * 2.1) + np.random.randn(100) * 5

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# We instantiate the XGBoost regressor, specifying the squared error loss function and the number of trees to build.
xg_reg = xgb.XGBRegressor(objective='reg:squarederror', n_estimators=100, seed=42)
# We then fit the model to the training data.
xg_reg.fit(X_train, y_train)

# We can then use the trained model to make predictions on the test set.
y_pred = xg_reg.predict(X_test)
# We evaluate the model's performance using the root mean squared error.
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
note(f"RMSE: {rmse:.4f}")

# XGBoost provides a built-in function to visualize the importance of each feature in the model.
xgb.plot_importance(xg_reg)
plt.title('Feature Importance')
if not os.path.exists('images/07-Machine-Learning'):
    os.makedirs('images/07-Machine-Learning')
plt.savefig('images/07-Machine-Learning/feature_importance.png')
plt.close()
display(Image(filename='images/07-Machine-Learning/feature_importance.png'))

<a id='summary'></a>
## 5. Summary

Gradient Boosting Machines, and XGBoost in particular, are powerful and widely used models for tabular data. Their sequential, error-correcting nature makes them highly accurate, and implementations like XGBoost provide the efficiency and regularization needed for real-world applications.