# Gradient Boosting Regression: Advanced Theory & Interview Q&A

## Theory
Gradient Boosting Regression is an ensemble technique that builds models sequentially, where each new model attempts to correct the errors of the previous ones. It uses decision trees as weak learners and optimizes a loss function using gradient descent. Key concepts include learning rate, number of estimators, subsampling, and regularization.

| Aspect                | Details                                                                 |
|----------------------|-------------------------------------------------------------------------|
| Algorithm            | Sequential ensemble of weak learners (trees)                             |
| Loss Function        | Customizable (MSE, MAE, Huber, etc.)                                     |
| Optimization         | Gradient descent                                                         |
| Regularization       | Shrinkage (learning rate), subsampling, tree depth, min samples split    |
| Strengths            | Handles complex data, robust to outliers, flexible loss functions        |
| Weaknesses           | Prone to overfitting, computationally intensive, sensitive to parameters |

## Advanced Interview Q&A
**Q1: How does Gradient Boosting differ from AdaBoost?**
A1: AdaBoost adjusts sample weights to focus on misclassified points, while Gradient Boosting fits new models to the residuals of previous models using gradient descent on a loss function.

**Q2: What is the role of the learning rate in Gradient Boosting?**
A2: The learning rate controls the contribution of each tree to the final model. Lower values improve generalization but require more trees.

**Q3: How can you prevent overfitting in Gradient Boosting?**
A3: Use early stopping, reduce tree depth, increase min samples split, apply subsampling, and tune the learning rate.

**Q4: Explain the concept of stochastic gradient boosting.**
A4: Stochastic gradient boosting introduces randomness by subsampling the data for each tree, improving generalization and reducing overfitting.

**Q5: How do you select the optimal number of estimators?**
A5: Use cross-validation and early stopping to find the point where additional trees no longer improve validation performance.


# Gradient Boosting Regression — Theory & Interview Q&A

Gradient Boosting Regression is an ensemble learning method that builds models sequentially, each correcting the errors of the previous, using decision trees as weak learners to predict continuous outcomes.

| Aspect                | Details                                                                 |
|-----------------------|------------------------------------------------------------------------|
| **Definition**        | Sequentially builds models to minimize errors, using decision trees.     |
| **Equation**          | Combines weak learners by minimizing loss function                      |
| **Use Cases**         | Price prediction, time series, environmental modeling                   |
| **Assumptions**       | Weak learners perform slightly better than random guessing              |
| **Pros**              | High accuracy, handles mixed data, flexible loss functions              |
| **Cons**              | Prone to overfitting, slower to train                                   |
| **Key Parameters**    | n_estimators, learning_rate, max_depth, subsample                      |
| **Evaluation Metrics**| MSE, RMSE, R² Score                                                     |

## Interview Q&A

**Q1: What is Gradient Boosting Regression?**  
A: An ensemble method that builds models sequentially, each correcting previous errors for regression tasks.

**Q2: How does Gradient Boosting Regression work?**  
A: It fits new models to the residuals of previous models.

**Q3: What are the advantages of Gradient Boosting Regression?**  
A: High accuracy, flexible, handles mixed data types.

**Q4: What are the limitations?**  
A: Prone to overfitting, slower to train.

**Q5: How do you prevent overfitting in Gradient Boosting Regression?**  
A: Use early stopping, limit tree depth, use subsampling.

**Q6: How do you evaluate Gradient Boosting Regression?**  
A: Using MSE, RMSE, and R² score.

In [None]:
# 1️⃣ Import Libraries
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
import seaborn as sns

# 2️⃣ Load Dataset
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

# 3️⃣ Split Dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# 4️⃣ Create Pipeline
pipeline = Pipeline([
    ('scaler', StandardScaler()),  # Optional for tree-based models
    ('gb', GradientBoostingRegressor(random_state=42))
])

# 5️⃣ Hyperparameter Tuning
param_grid = {
    'gb__n_estimators': [100, 200],
    'gb__learning_rate': [0.01, 0.1, 0.2],
    'gb__max_depth': [3, 4, 5],
    'gb__min_samples_split': [2, 5, 10],
    'gb__min_samples_leaf': [1, 2, 4],
    'gb__max_features': ['sqrt', 'log2', None]
}

grid_search = GridSearchCV(pipeline, param_grid, cv=5, scoring='r2')
grid_search.fit(X_train, y_train)

# 6️⃣ Evaluate Best Model
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)

print("Best Parameters:", grid_search.best_params_)
print("R2 Score:", r2_score(y_test, y_pred))
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))

# 7️⃣ Feature Importance Visualization
importances = best_model.named_steps['gb'].feature_importances_
feat_imp_df = pd.DataFrame({'Feature': X.columns, 'Importance': importances})
feat_imp_df = feat_imp_df.sort_values(by='Importance', ascending=False)

plt.figure(figsize=(10,6))
sns.barplot(x='Importance', y='Feature', data=feat_imp_df, palette='coolwarm')
plt.title("Feature Importance in Gradient Boosting Regressor")
plt.show()

# 8️⃣ Predicted vs Actual Visualization
plt.figure(figsize=(8,6))
plt.scatter(y_test, y_pred, alpha=0.6, color='teal')
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--')
plt.xlabel("Actual Values")
plt.ylabel("Predicted Values")
plt.title("Gradient Boosting Regressor: Predicted vs Actual")
plt.show()
