Bagging Meta-Estimator
Bagging builds multiple independent models on different random subsets of the training data and aggregates their predictions. Purpose: reduce variance of a high-variance base learner (e.g., fully-grown decision trees) without modifying the learner itself. Works best for complex, unstable models (trees) rather than weak learners (boosting domain).

In scikit-learn, bagging methods are a unified BaggingClassifier and BaggingRegressor meta-estimator, taking as input a user-specified estimator along with parameters specifying the strategy to draw random subsets.

In [13]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [14]:
# 1. Create a synthetic dataset
X, y = make_classification(
    n_samples=1000,
    n_features=20,
    n_informative=15,
    n_redundant=5,
    random_state=42
)

# 2. Create our training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

print(f"Data shape: X_train={X_train.shape}, y_train={y_train.shape}")

Data shape: X_train=(700, 20), y_train=(700,)


### 1. K-Neighbor Classifier

In [None]:
# --- Base Models ---
from sklearn.neighbors import KNeighborsClassifier
# --- Ensemble Models ---
from sklearn.ensemble import BaggingClassifier


In [5]:
# 3. Base Model - K-neighbors classifier
base_model = KNeighborsClassifier()

In [9]:
base_model1 = base_model.fit(X_train, y_train)

In [10]:
y_pred = base_model1.predict(X_test)
print(f"Base Model Test Accuracy: {accuracy_score(y_test, y_pred):.4f}")

Base Model Test Accuracy: 0.9033


In [6]:
# 4. Bagging Classifier

bagging_model = BaggingClassifier(
    estimator=base_model,
    n_estimators=25, # How many copies of the base model to create.

    max_samples=0.5, # count of samples per base model
    max_features=0.5, # count of features per base model
    bootstrap=True, # bagging - Sampling samples with replacement
    bootstrap_features=True, # Sampling features with replacement
    oob_score=True # Use out-of-bag samples for validation
)


In [7]:
# 5. Fit the model
print("Training Bagging model...")
bagging_model.fit(X, y)

Training Bagging model...


In [8]:
# 6. Evaluate
y_pred = bagging_model.predict(X_test)
print(f"Bagging Out-of-Bag (OOB) score: {bagging_model.oob_score_:.4f}")
print(f"Bagging Test Accuracy: {accuracy_score(y_test, y_pred):.4f}")

Bagging Out-of-Bag (OOB) score: 0.9220
Bagging Test Accuracy: 0.9667


### 2. Decision Tree Classifier

In [15]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier

In [16]:
base_model = DecisionTreeClassifier()
bagging_model = BaggingClassifier(
    estimator=base_model,
    n_estimators=50,
    max_samples=0.8,
    oob_score=True,
    random_state=42
)

In [17]:
# base model
base_model1 = base_model.fit(X_train, y_train)
y_pred = base_model1.predict(X_test)
print(f"Base Model Accuracy : {accuracy_score(y_test, y_pred):.4f}")

Base Model Accuracy : 0.7900


In [18]:
# 3. Fit the model
print("Training Bagging model...")
bagging_model.fit(X_train, y_train)

Training Bagging model...


In [19]:
# 4. Evaluate
y_pred = bagging_model.predict(X_test)
print(f"Bagging Out-of-Bag (OOB) score: {bagging_model.oob_score_:.4f}")
print(f"Bagging Test Accuracy: {accuracy_score(y_test, y_pred):.4f}")

Bagging Out-of-Bag (OOB) score: 0.8629
Bagging Test Accuracy: 0.8767


### 3. Random Forests - Classifier
It is an specialized and optimized version of Bagging for Decision Tree. It adds another layer of randomness such that each tree only sees a random subset of features at each split. It makes the trees more diverse and the model more robust.

In [20]:
from sklearn.ensemble import RandomForestClassifier

In [21]:
rf_model = RandomForestClassifier(
    n_estimators=100,
    max_features="sqrt",
    max_depth=10,
    oob_score=True,
    random_state=42
)

In [22]:
# 2. Fit the model
print("\nTraining Random Forest model...")
rf_model.fit(X_train, y_train)


Training Random Forest model...


In [23]:
# 3. Evaluate
y_pred = rf_model.predict(X_test)
print(f"Random Forest Out-of-Bag (OOB) score: {rf_model.oob_score_:.4f}")
print(f"Random Forest Test Accuracy: {accuracy_score(y_test, y_pred):.4f}")

Random Forest Out-of-Bag (OOB) score: 0.8943
Random Forest Test Accuracy: 0.8767
