# Bagging Ensemble (Bootstrap Aggregating)

## ðŸ§  What is Bagging?

**Bagging (Bootstrap Aggregating)** is an ensemble method that:

1. Trains multiple models (usually the same algorithm, e.g. Decision Trees)
2. On different random subsets of the training data (sampled with replacement)
3. Combines their predictions (by averaging for regression or voting for classification)

### ðŸ’¬ In short:

> "Train many weak models on slightly different data â†’ average or vote â†’ stronger, more stable model."

---

## ðŸ“˜ How It Improves Accuracy

- **Reduces variance** (overfitting)
- **Doesn't change bias** (average prediction) much
- **Works well with unstable models** like decision trees
- Each tree sees slightly different data â†’ different errors â†’ averaging cancels them out

### Mathematically:

$$\text{Var}(\bar{f}) = \frac{1}{M^2} \sum_{i=1}^{M} \text{Var}(f_i) \approx \frac{1}{M} \text{Var}(f)$$

â†’ The more estimators ($M$), the lower the variance.

---

## ðŸ”§ Key Parameters

| Parameter            | Description                                |
| -------------------- | ------------------------------------------ |
| `estimator`          | Base model (e.g. DecisionTreeClassifier)   |
| `n_estimators`       | Number of models in the ensemble           |
| `max_samples`        | Fraction or number of samples per model    |
| `max_features`       | Fraction or number of features per model   |
| `bootstrap`          | Sampling with replacement (True = Bagging) |
| `bootstrap_features` | If True, random feature subsets too        |
| `n_jobs`             | Use multiple CPU cores                     |
| `oob_score`          | "Out-of-bag" score (built-in validation)   |

In [2]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score



## Step 1: Load Dataset and Split

We'll use the Iris dataset to demonstrate bagging classification.

In [None]:
data = load_iris()
x, y = data.data, data.target

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)
print(x)
print(y)

## Step 2: Create Bagging Ensemble

We create a `BaggingClassifier` with:
- **Base estimator**: Decision Tree (weak learner)
- **50 estimators**: 50 different trees trained on different bootstrap samples
- **max_samples=0.8**: Each tree sees 80% of the training data
- **bootstrap=True**: Samples are drawn with replacement (this is what makes it "bagging")

In [3]:

# Base model (weak learner)
base_tree = DecisionTreeClassifier(random_state=42)


# Bagging ensemble
bag_model = BaggingClassifier(
    estimator=base_tree,
    n_estimators=50,       # number of trees
    max_samples=0.8,       # each tree sees 80% of data
    bootstrap=True,        # sample with replacement
    random_state=42
)

## Step 3: Train and Evaluate

Train the bagging ensemble and evaluate its accuracy on the test set.

In [4]:
bag_model.fit(x_train, y_train)
y_pred = bag_model.predict(x_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Bagging Classifier Accuracy: {accuracy:.2f}") 

Bagging Classifier Accuracy: 1.00
