Bagging

Implementation Steps of Bagging

Step 1: Multiple subsets are created from the original data set with equal tuples, selecting observations with replacement.

Step 2: A base model is created on each of these subsets.

Step 3: Each model is learned in parallel with each training set and independent of each other.

Step 4: The final predictions are determined by combining the predictions from all the models.

In [11]:
# Importing necessary libraries
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Wine dataset
wine = load_wine()
X = wine.data
y = wine.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, 
random_state=42)

# Initialize the base classifier (in this case, a decision tree)
base_classifier = DecisionTreeClassifier()

# Initialize the BaggingClassifier
bagging_classifier = BaggingClassifier(estimator=base_classifier, n_estimators=10, random_state=42)

# Train the BaggingClassifier
bagging_classifier.fit(X_train, y_train)

# Make predictions on the test set
y_pred = bagging_classifier.predict(X_test)

# Calculate accuracy
train_accuracy = accuracy_score(y_train, bagging_classifier.predict(X_train))
print("Training Accuracy:", train_accuracy)

test_accuracy = accuracy_score(y_test, y_pred)
print("Testing Accuracy:", test_accuracy)

Training Accuracy: 0.9919354838709677
Testing Accuracy: 0.9259259259259259


Boosting

Initialise the dataset and assign equal weight to each of the data point.

Provide this as input to the model and identify the wrongly classified data points.

Increase the weight of the wrongly classified data points.

if (got required results) 

  Goto step 5 

else 

  Goto step 2 

End

In [12]:
import pandas as pd
from sklearn.ensemble import AdaBoostClassifier  # For Classification
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score


# Now we will use decision tree as a base estimator, you can use any ML learner as base estimator if it accepts sample weight 
dt = DecisionTreeClassifier() 
clf = AdaBoostClassifier(n_estimators=100, estimator=dt, learning_rate=1)

# training the model on the training set
clf.fit(X_train, y_train)

# calculate and print training and testing accuracy
train_accuracy = accuracy_score(y_train, clf.predict(X_train))
test_accuracy = accuracy_score(y_test, clf.predict(X_test))

print("Adaboost - Training Accuracy:", train_accuracy)
print("Adaboost - Testing Accuracy:", test_accuracy)

Adaboost - Training Accuracy: 1.0
Adaboost - Testing Accuracy: 0.9629629629629629




In [14]:
from sklearn.ensemble import GradientBoostingClassifier

gb_clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1, random_state=42)
gb_clf.fit(X_train, y_train)

# Calculate accuracy
gb_train_accuracy = accuracy_score(y_train, gb_clf.predict(X_train))
gb_test_accuracy = accuracy_score(y_test, gb_clf.predict(X_test))

print("Gradient Boosting - Training Accuracy:", gb_train_accuracy)
print("Gradient Boosting - Test Accuracy:", gb_test_accuracy)

#-------------------------#
# XGBoost Classifier
# Make sure xgboost is installed: pip install xgboost
from xgboost import XGBClassifier

xgb_clf = XGBClassifier(n_estimators=100, learning_rate=0.1, use_label_encoder=False, eval_metric='logloss', random_state=42)
xgb_clf.fit(X_train, y_train)

# Calculate accuracy
xgb_train_accuracy = accuracy_score(y_train, xgb_clf.predict(X_train))
xgb_test_accuracy = accuracy_score(y_test, xgb_clf.predict(X_test))

print("XGBoost - Training Accuracy:", xgb_train_accuracy)
print("XGBoost - Test Accuracy:", xgb_test_accuracy)

Gradient Boosting - Training Accuracy: 1.0
Gradient Boosting - Test Accuracy: 1.0
XGBoost - Training Accuracy: 1.0
XGBoost - Test Accuracy: 0.9814814814814815


Parameters: { "use_label_encoder" } are not used.



Stacking

Base Models: Training multiple models (level-0 models) on the same dataset.

Meta-Model: Training a new model (level-1 or meta-model) to combine the predictions of the base models. Using the predictions of the base models as input features for the meta-model.

In [15]:
# Import necessary libraries
from sklearn.ensemble import StackingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define base models
base_models = [
('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
('svm', SVC(probability=True, random_state=42))
]

# Define meta-model
meta_model = LogisticRegression()

# Initialize and train the StackingClassifier
stacking_clf = StackingClassifier(estimators=base_models, final_estimator=meta_model)
stacking_clf.fit(X_train, y_train)

# Make predictions
y_pred = stacking_clf.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_train, stacking_clf.predict(X_train))
print(f"Stacking Classifier Training Accuracy: {accuracy:.2f}")

accuracy = accuracy_score(y_test, y_pred)
print(f"Stacking Classifier Testing Accuracy: {accuracy:.2f}")

Stacking Classifier Training Accuracy: 0.97
Stacking Classifier Testing Accuracy: 1.00
