# Gradient Boosting

Gradient Boosting Regressor is an ensemble method that builds many weak decision tree models **sequentially**, where each new tree tries to fix the errors made by the previous trees.



### How it Works
1.  **Sequential Learning:** Instead of training all trees independently (like Random Forest), Gradient Boosting trains them one after another.
2.  **Error Correction:** Each new tree learns from the **"residuals"** (errors) of the previous tree.

### Key Libraries (The Big Three)
* **XGBoost:** Famous for performance and speed; the industry standard for competitions.
* **LightGBM:** Developed by Microsoft; faster and uses less memory, great for huge datasets.
* **CatBoost:** Developed by Yandex; handles categorical data (text labels) exceptionally well without extensive preprocessing.

---

### Common Pitfalls and Solutions

#### 1. Overfitting
**Symptoms:**
* Training error keeps decreasing while validation error increases.
* Model becomes too complex.

**Solutions:**
* Use smaller `learning_rate` with more `n_estimators`.
* Apply stronger regularization (`max_depth`, `min_samples_split`).
* Use `subsample < 1.0` (Stochastic Gradient Boosting).
* Implement **Early Stopping**.

#### 2. Underfitting
**Symptoms:**
* Both training and validation errors are high.
* Model is too simple.

**Solutions:**
* Increase `max_depth`.
* Increase `n_estimators`.
* Increase `learning_rate` (with caution).
* Reduce regularization parameters.

#### 3. Computational Efficiency
**Optimizations:**
* Use **histogram-based boosting** (like LightGBM) for large datasets.
* Reduce `n_estimators` while increasing `learning_rate`.
* Use `max_features` to limit feature consideration at each split.

---

### Best Practices
* **Start Simple:** Begin with default parameters and iterate.
* **Feature Scaling:** Gradient Boosting is generally robust to feature scales, but normalization can help convergence.
* **Handle Missing Values:** Many implementations (like XGBoost) can handle `NaN`s naturally.
* **Monitor Learning:** Use validation curves to detect overfitting early.
* **Ensemble Diversity:** Consider blending with other models for maximum performance.

# Gradient Boosting: Classifier vs. Regressor

### 1. Gradient Boosting Classifier
**Use this when:** Your target variable is **Categorical** (classes/labels).

* **Goal:** Predict the probability of a sample belonging to a class.
* **Loss Function:** Log Loss (Deviance) or Exponential Loss.
* **Output:** Probability scores (which are then thresholded to get a class label).
* **Python Class:** `sklearn.ensemble.GradientBoostingClassifier` or `XGBClassifier`.

**Example:**
* Will a user click on this ad? (Yes/No)
* Is this transaction fraudulent? (Fraud/Not Fraud)

### 2. Gradient Boosting Regressor
**Use this when:** Your target variable is **Continuous** (numerical values).

* **Goal:** Predict a specific quantity or amount.
* **Loss Function:** Squared Error (MSE), Absolute Error (MAE), or Huber Loss (for robustness to outliers).
* **Output:** A numerical value.
* **Python Class:** `sklearn.ensemble.GradientBoostingRegressor` or `XGBRegressor`.

**Example:**
* Predicting the price of a used car.
* Predicting the duration of a taxi ride.

---

### Comparison Table

| Feature | Gradient Boosting **Classifier** | Gradient Boosting **Regressor** |
| :--- | :--- | :--- |
| **Target** | Categories (0 or 1, Red or Blue) | Numbers (10.5, 5000, -2.1) |
| **Objective** | Minimize classification error (Log Loss) | Minimize prediction error (MSE/RMSE) |
| **Prediction** | Class Label or Probability | Continuous Value |
| **Key Hyperparameter** | `loss='log_loss'` (default) | `loss='squared_error'` (default) |

### Code Example

```python
from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor

# 1. Classification (Target: Yes/No)
# Useful for: Fraud detection, Churn prediction
clf = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1)
# clf.fit(X_train, y_train_classes)

# 2. Regression (Target: Numerical Price)
# Useful for: House pricing, Stock trends
reg = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1)
# reg.fit(X_train, y_train_numbers)

In [None]:
from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Load dataset
data = fetch_california_housing()
X = data.data
y = data.target

# Split
X_train, X_test, Y_train, Y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Model
model = GradientBoostingRegressor(
    n_estimators=300,
    learning_rate=0.05,
    max_depth=3,
    random_state=42
)

# Train
model.fit(X_train, Y_train)

# Predict
y_pred = model.predict(X_test)

# Evaluation
print("MSE:", mean_squared_error(Y_test, y_pred))
print("R2 Score:", r2_score(Y_test, y_pred))
print("Feature Importances:", model.feature_importances_)
