# Bagging Ensemble Learning with Visualization

**Bagging** (Bootstrap Aggregating) is an ensemble learning technique designed to improve the stability and accuracy of machine learning algorithms by combining predictions from multiple models trained on different subsets of the training data.

## How Bagging Works
1. Draw multiple bootstrap samples from the original dataset.
2. Train a base model on each sample independently.
3. Aggregate the predictions (by majority vote for classification or average for regression).

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, ConfusionMatrixDisplay
sns.set(style='whitegrid')

In [None]:
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
dt = DecisionTreeClassifier(random_state=42)
bagging_clf = BaggingClassifier(
    base_estimator=dt,
    n_estimators=50,
    bootstrap=True,
    oob_score=True,
    random_state=42
)
rf_clf = RandomForestClassifier(n_estimators=50, random_state=42)

In [None]:
models = {
    'Decision Tree': dt,
    'Bagging': bagging_clf,
    'Random Forest': rf_clf
}

cv_results = {}
for name, model in models.items():
    scores = cross_val_score(model, X_train, y_train, cv=5)
    cv_results[name] = scores

# Plotting
plt.figure(figsize=(10, 6))
sns.boxplot(data=list(cv_results.values()), orient='v')
plt.xticks(ticks=range(len(cv_results)), labels=cv_results.keys())
plt.ylabel('Accuracy')
plt.title('Cross-Validation Accuracy Comparison')
plt.show()

In [None]:
bagging_clf.fit(X_train, y_train)
y_pred = bagging_clf.predict(X_test)
print(f"Test Accuracy (Bagging): {accuracy_score(y_test, y_pred):.4f}")
print(f"OOB Score (Bagging): {bagging_clf.oob_score_:.4f}")

# Confusion Matrix
ConfusionMatrixDisplay.from_estimator(bagging_clf, X_test, y_test)
plt.title('Confusion Matrix - Bagging Classifier')
plt.show()

## Conclusion

Bagging improves model performance by reducing variance, making models like decision trees more robust. It is particularly effective when using high-variance, low-bias base learners.

**Random Forest** is a widely-used extension of bagging that adds randomness in feature selection to further decorrelate individual trees.