# C6: BOOSTING

## Boosting

- **Definition:** Boosting is an ensemble learning technique that combines multiple weak learners to form a strong learner.  
- **Goal:** Improve accuracy by focusing more on the errors made by previous models.  
- **Key idea:** Each new model is trained to correct the mistakes of the prior ones.  
- **Applications:** Classification, regression, ranking, anomaly detection.  

### Common Algorithms

- AdaBoost (Adaptive Boosting)  
- Gradient Boosting  
- XGBoost  
- LightGBM  
- CatBoost  

### Working Steps

1. Train a weak learner.  
2. Evaluate its errors.  
3. Increase the weight/importance of misclassified points.  
4. Train the next learner focusing on those hard examples.  
5. Repeat for multiple learners.  
6. Combine all learners into a weighted majority vote or weighted sum.  

### Advantages

- Converts weak models into strong ones.  
- Often achieves state-of-the-art accuracy.  
- Handles both linear and complex non-linear data.  

### Disadvantages

- Prone to overfitting if not tuned well.  
- Computationally more expensive compared to bagging methods like Random Forest.


In [None]:
# AdaBoost example
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Create synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
                           n_redundant=5, random_state=42)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define base learner (weak learner)
base_learner = DecisionTreeClassifier(max_depth=1)  # Decision stump

# Define AdaBoost model
ada_model = AdaBoostClassifier(
    estimator=base_learner,
    n_estimators=50,       # number of weak learners
    learning_rate=1.0,     # step size
    random_state=42
)

# Train model
ada_model.fit(X_train, y_train)

# Predictions
y_pred = ada_model.predict(X_test)

# Accuracy
print("AdaBoost Accuracy:", accuracy_score(y_test, y_pred))


AdaBoost Accuracy: 0.8366666666666667


In [None]:
# Gradient boost example
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Create synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
                           n_redundant=5, random_state=42)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define Gradient Boosting model
gb_model = GradientBoostingClassifier(
    n_estimators=100,     # number of trees
    learning_rate=0.1,    # shrinkage / step size
    max_depth=3,          # depth of each tree
    random_state=42
)

# Train model
gb_model.fit(X_train, y_train)

# Predictions
y_pred = gb_model.predict(X_test)

# Accuracy
print("Gradient Boosting Accuracy:", accuracy_score(y_test, y_pred))

Gradient Boosting Accuracy: 0.8866666666666667


|                    | AdaBoost                                                                                                 | Gradient Boost                                                                                                                          |
| ------------------ | -------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
| **Idea**           | Focuses on misclassified samples                                                                         | Focuses on residual errors                                                                                                              |
| **How**            | Assigns higher weights to misclassified data points so the next weak learner pays more attention to them | Each new learner is trained to predict the residuals                                                                                    |
| **Weak learners**  | Often uses decision stumps                                                                               | Typically deeper decision trees                                                                                                         |
| **Loss functions** | Exponential loss                                                                                         | Very flexible, supports many differentiable loss functions                                                                              |
| **Strengths**      | - Very simple and easy to implement <br> - Works well for binary classification                          | - More flexible and powerful <br> - Works for classification, regression, ranking <br> - Usually achieves higher accuracy than AdaBoost |
| **Weaknesses**     | - Sensitive to outliers and noisy data <br> - Less flexible than GBM                                     | - More computationally expensive <br> - Needs careful tuning                                                                            |


In [None]:
# XGBoost: Xtreme Gradient Boosting
import xgboost as xgb
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load dataset
X, y = load_diabetes(return_X_y=True)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create XGBoost model
model = xgb.XGBRegressor(objective="reg:squarederror", n_estimators=100)

# Train the model
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Evaluate
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

Mean Squared Error: 3351.001637862091


## Stacking

- **Definition:** Combining multiple different models and then using a meta-model to make the final prediction.  

### Working

- **Base models:** Each predicts on the dataset.  
- **Meta model:**  
    - Learns how to best combine the outputs of the base models.  
    - Usually a simple model.  
- **Process:**  
    1. Train base models on the training data.  
    2. Collect their predictions.  
    3. Train the meta model on those predictions.  
    4. For new data, base models make predictions, and the meta model provides the final prediction.  

### Usage

- Captures different strengths of models.  
- Often performs better than individual models.  
- Reduces the risk of relying on a single model's biases.  

### Disadvantages

- Can overfit if not done carefully.  
- Requires out-of-fold predictions for training the meta model.  
- More computationally expensive.  


In [13]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# 1. Load dataset
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 2. Define base learners
base_learners = [
    ('decision_tree', DecisionTreeClassifier(max_depth=3, random_state=42)),
    ('svm', SVC(probability=True, kernel='linear', random_state=42))
]

# 3. Define meta-learner (final estimator)
meta_learner = LogisticRegression()

# 4. Build Stacking Classifier
stacking_model = StackingClassifier(
    estimators=base_learners,
    final_estimator=meta_learner
)

# 5. Train
stacking_model.fit(X_train, y_train)

# 6. Predict & Evaluate
y_pred = stacking_model.predict(X_test)
print("Stacking Model Accuracy:", accuracy_score(y_test, y_pred))

Stacking Model Accuracy: 1.0


## Voting

- **Definition:** In this technique, multiple models are combined, and their predictions are aggregated by voting.  

### Types of Voting

1. **Hard Voting**  
    - Each model votes for a class.  
    - The final prediction is the class with the majority of votes.  

2. **Soft Voting**  
    - Models provide class probabilities instead of just votes.  
    - The probabilities are averaged, and the class with the highest average probability is chosen.  
    - Usually performs better if the models can output probabilities.  

3. **Voting Regressor**  
    - Instead of predicting a class, predictions are averaged across models.  


In [14]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# 1. Load dataset
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 2. Define base learners
log_clf = LogisticRegression(max_iter=1000)
knn_clf = KNeighborsClassifier()
dt_clf = DecisionTreeClassifier(random_state=42)

# 3. Voting Classifier
voting_clf = VotingClassifier(
    estimators=[('lr', log_clf), ('knn', knn_clf), ('dt', dt_clf)],
    voting='hard'   # change to 'soft' for soft voting
)

# 4. Train
voting_clf.fit(X_train, y_train)

# 5. Predict & Evaluate
y_pred = voting_clf.predict(X_test)
print("Voting Classifier Accuracy:", accuracy_score(y_test, y_pred))


Voting Classifier Accuracy: 1.0
