## Bagging (Bootstrap Aggregating)

**Literature:**

* Bagging, introduced by Leo Breiman in 1996, aims to reduce variance in models, particularly unstable ones like decision trees.
* It leverages the idea that averaging multiple noisy but unbiased models reduces the overall variance.
* "Bootstrap" refers to the sampling with replacement from the training data. "Aggregating" refers to combining the predictions of the individual models.
* Random Forest is a popular application of bagging, where decision trees are used as base estimators.

**Mathematical Formulas:**

* Let $D = \{(x_1, y_1), (x_2, y_2), ..., (x_N, y_N)\}$ be the training dataset.
* Generate $B$ bootstrap samples $D_1, D_2, ..., D_B$ by sampling with replacement from $D$.
* Train a base learner $h_b(x)$ on each bootstrap sample $D_b$.
* For classification, the final prediction $H(x)$ is obtained by majority voting:
    * $H(x) = \text{mode}\{h_1(x), h_2(x), ..., h_B(x)\}$
* For regression, the final prediction $H(x)$ is obtained by averaging:
    * $H(x) = \frac{1}{B} \sum_{b=1}^{B} h_b(x)$

## Boosting (AdaBoost - Adaptive Boosting)

**Literature:**

* AdaBoost (Adaptive Boosting), proposed by Freund and Schapire in 1996, focuses on reducing bias by sequentially training models that learn from the errors of previous models.
* It assigns weights to training samples, giving more weight to misclassified samples, forcing subsequent models to focus on them.
* It also assigns weights to the models themselves, giving more weight to models with lower error rates.

**Mathematical Formulas:**

* Let $D = \{(x_1, y_1), (x_2, y_2), ..., (x_N, y_N)\}$ be the training dataset.
* Initialize sample weights $w_i = \frac{1}{N}$ for $i = 1, 2, ..., N$.
* For $t = 1, 2, ..., T$ (number of boosting iterations):
    * Train a weak learner $h_t(x)$ on the weighted training data.
    * Calculate the weighted error: $\epsilon_t = \sum_{i=1}^{N} w_i \cdot I(h_t(x_i) \neq y_i)$, where $I$ is the indicator function.
    * Calculate the learner's weight: $\alpha_t = \frac{1}{2} \ln(\frac{1 - \epsilon_t}{\epsilon_t})$.
    * Update sample weights: $w_i = \frac{w_i \cdot \exp(-\alpha_t \cdot y_i \cdot h_t(x_i))}{Z_t}$, where $Z_t$ is a normalization factor.
* Final prediction: $H(x) = \text{sign}(\sum_{t=1}^{T} \alpha_t \cdot h_t(x))$.

In [None]:
import numpy as np
from sklearn.ensemble import BaggingClassifier, AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [None]:
# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

In [None]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Bagging
base_estimator = DecisionTreeClassifier(max_depth=3)
bagging_model = BaggingClassifier(base_estimator=base_estimator, n_estimators=100, random_state=42)
bagging_model.fit(X_train, y_train)
bagging_predictions = bagging_model.predict(X_test)
bagging_accuracy = accuracy_score(y_test, bagging_predictions)
print(f"Bagging Accuracy: {bagging_accuracy}")

# AdaBoost
boosting_model = AdaBoostClassifier(base_estimator=base_estimator, n_estimators=100, random_state=42)
boosting_model.fit(X_train, y_train)
boosting_predictions = boosting_model.predict(X_test)
boosting_accuracy = accuracy_score(y_test, boosting_predictions)
print(f"AdaBoost Accuracy: {boosting_accuracy}")