1. Use any classification dataset.
2. Implement a Bagging classifier with Decision Trees.
3. Compare its performance with Random Forest and AdaBoost.
4. Report accuracy, precision, and recall for each method, and briefly explain which ensemble worked best and why.

In [1]:
# Step 1: Import libraries
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier, AdaBoostClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score

In [2]:
# Step 2: Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

In [3]:
# Step 3: Define classifiers
bagging = BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=50, random_state=42)
rf = RandomForestClassifier(n_estimators=50, random_state=42)
ada = AdaBoostClassifier(estimator=DecisionTreeClassifier(max_depth=1), n_estimators=50, random_state=42)

In [4]:
# Step 4: Train models
bagging.fit(X_train, y_train)
rf.fit(X_train, y_train)
ada.fit(X_train, y_train)

In [5]:
# Step 5: Predictions
y_pred_bagging = bagging.predict(X_test)
y_pred_rf = rf.predict(X_test)
y_pred_ada = ada.predict(X_test)

In [6]:
# Step 6: Evaluation function
def evaluate(y_true, y_pred):
    return {
        "Accuracy": accuracy_score(y_true, y_pred),
        "Precision": precision_score(y_true, y_pred),
        "Recall": recall_score(y_true, y_pred)
    }

In [7]:
# Step 7: Compare models
results = pd.DataFrame({
    "Bagging": evaluate(y_test, y_pred_bagging),
    "Random Forest": evaluate(y_test, y_pred_rf),
    "AdaBoost": evaluate(y_test, y_pred_ada)
})

print("Model:\n")
print(results)

Model:

            Bagging  Random Forest  AdaBoost
Accuracy   0.941520       0.923977  0.959064
Precision  0.944954       0.943396  0.946429
Recall     0.962617       0.934579  0.990654


In [8]:
# Step 1: Import libraries
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier, AdaBoostClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score

# Step 2: Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

# Step 3: Define classifiers
bagging = BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=50, random_state=42)
rf = RandomForestClassifier(n_estimators=50, random_state=42)
ada = AdaBoostClassifier(estimator=DecisionTreeClassifier(max_depth=1), n_estimators=50, random_state=42)


# Step 4: Train models
bagging.fit(X_train, y_train)
rf.fit(X_train, y_train)
ada.fit(X_train, y_train)

# Step 5: Predictions
y_pred_bagging = bagging.predict(X_test)
y_pred_rf = rf.predict(X_test)
y_pred_ada = ada.predict(X_test)

# Step 6: Evaluation function
def evaluate(y_true, y_pred):
    return {
        "Accuracy": accuracy_score(y_true, y_pred),
        "Precision": precision_score(y_true, y_pred),
        "Recall": recall_score(y_true, y_pred)
    }

# Step 7: Compare models
results = pd.DataFrame({
    "Bagging": evaluate(y_test, y_pred_bagging),
    "Random Forest": evaluate(y_test, y_pred_rf),
    "AdaBoost": evaluate(y_test, y_pred_ada)
})

print("Model:\n")
print(results)


Model:

            Bagging  Random Forest  AdaBoost
Accuracy   0.941520       0.923977  0.959064
Precision  0.944954       0.943396  0.946429
Recall     0.962617       0.934579  0.990654


Explanation:-
## Bagging:
Reduces variance by averaging multiple Decision Trees trained on bootstrapped samples. Works well but doesn’t use feature randomness.

## Random Forest:
Adds feature randomness on top of bagging, making trees more diverse → usually best performance.

## AdaBoost: 
Focuses on misclassified samples iteratively, improving weak learners. Works well but can be sensitive to noisy data.

In practice, Random Forest often performs best because it balances bias and variance reduction, while AdaBoost is better for datasets with weaker base learners.