Ensemble learning is a method where we use many small models instead of just one. Each of these models may not be very strong on its own, but when we put their results together, we get a better and more accurate answer. It's like asking a group of people for advice instead of just one person—each one might be a little wrong, but together, they usually give a better answer.

There are three main types of ensemble methods:

Bagging (Bootstrap Aggregating):

Models are trained independently on different random subsets of the training data. Their results are then combined—usually by averaging (for regression) or voting (for classification). This helps reduce variance and prevents overfitting.

Boosting:

Models are trained one after another. Each new model focuses on fixing the errors made by the previous ones. The final prediction is a weighted combination of all models, which helps reduce bias and improve accuracy.

Stacking (Stacked Generalization):   

Multiple different models (often of different types) are trained, and their predictions are used as inputs to a final model, called a meta-model. The meta-model learns how to best combine the predictions of the base models, aiming for better performance than any individual model.



In [3]:
# Importing Libraries
from sklearn.ensemble import BaggingClassifier, AdaBoostClassifier, RandomForestClassifier, GradientBoostingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from xgboost import XGBClassifier
from catboost import CatBoostClassifier
import pandas as pd

In [4]:
# Loading Dataset

df = pd.read_csv(r"C:\Users\KIIT\Downloads\cleaned_titanic_data.csv")
X = df.drop(columns = 'Pclass', axis = 1)
y = df['Survived']

In [6]:
# Splitting test and train data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [10]:
# Base Classifier

base_classifier = DecisionTreeClassifier()


In [11]:
# Bagging 

bagging_classifier = BaggingClassifier(base_classifier, n_estimators=10, random_state=42)
bagging_classifier.fit(X_train, y_train)

y_pred = bagging_classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 1.0


In [13]:
# Random Forest - Bagging

rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train, y_train)
rf_pred = rf_classifier.predict(X_test)
rf_accuracy = accuracy_score(y_test, rf_pred)
print("Random Forest Accuracy:", rf_accuracy)

# Random Subspace Method - Bagging
# This is similar to RandomForest but with a single tree and max_features < total features
subspace_classifier = BaggingClassifier(
    estimator=DecisionTreeClassifier(),
    n_estimators=10,
    max_features=2,  
    random_state=42
)
subspace_classifier.fit(X_train, y_train)
subspace_pred = subspace_classifier.predict(X_test)
subspace_accuracy = accuracy_score(y_test, subspace_pred)
print("Random Subspace Method Accuracy:", subspace_accuracy)

Random Forest Accuracy: 1.0
Random Subspace Method Accuracy: 0.9832402234636871


In [12]:
# Boosting

adaboost_classifier = AdaBoostClassifier(base_classifier, n_estimators=50, learning_rate=1.0, random_state=42)
adaboost_classifier.fit(X_train, y_train)

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 1.0




In [14]:
# Gradient Boosting Machines

gbm_classifier = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, random_state=42)
gbm_classifier.fit(X_train, y_train)
gbm_pred = gbm_classifier.predict(X_test)
gbm_accuracy = accuracy_score(y_test, gbm_pred)
print("Gradient Boosting Machines Accuracy:", gbm_accuracy)

# Extreme Gradient Boosting Machines (XGBoost)

xgb_classifier = XGBClassifier(n_estimators=100, learning_rate=0.1, use_label_encoder=False, eval_metric='mlogloss', random_state=42)
xgb_classifier.fit(X_train, y_train)
xgb_pred = xgb_classifier.predict(X_test)
xgb_accuracy = accuracy_score(y_test, xgb_pred)
print("XGBoost Accuracy:", xgb_accuracy)

# AdaBoost (Adaptive Boosting)
# Already done in previous cell, but for completeness:
adaboost_pred = adaboost_classifier.predict(X_test)
adaboost_accuracy = accuracy_score(y_test, adaboost_pred)
print("AdaBoost Accuracy:", adaboost_accuracy)

# CatBoost

catboost_classifier = CatBoostClassifier(iterations=100, learning_rate=0.1, verbose=0, random_seed=42)
catboost_classifier.fit(X_train, y_train)
catboost_pred = catboost_classifier.predict(X_test)
catboost_accuracy = accuracy_score(y_test, catboost_pred)
print("CatBoost Accuracy:", catboost_accuracy)

Gradient Boosting Machines Accuracy: 1.0
XGBoost Accuracy: 1.0
AdaBoost Accuracy: 1.0


Parameters: { "use_label_encoder" } are not used.



CatBoost Accuracy: 1.0
