# Task 9: AdaBoost or Gradient Boosting

● Train an AdaBoostClassifier or GradientBoostingClassifier.

● Use a suitable dataset.

● Compare it with Random Forest and Decision Tree in terms of:

○ Accuracy

○ F1-score

○ Training time (optional)

In [1]:
url = "https://raw.githubusercontent.com/ShubhamSinghal12/GLA_pythonML2025/refs/heads/main/LogisticRegression/Titanic.csv"

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.metrics import accuracy_score, f1_score
import time

# Load Titanic dataset

url = "https://raw.githubusercontent.com/ShubhamSinghal12/GLA_pythonML2025/refs/heads/main/LogisticRegression/Titanic.csv"
df = pd.read_csv(url)

# Select relevant features

features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']
df = df[features + ['Survived']]

# Handle missing values

df['Age'] = df['Age'].fillna(df['Age'].median())
df['Embarked'] = df['Embarked'].fillna(df['Embarked'].mode()[0])

# Encode categorical variables

df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})
df = pd.get_dummies(df, columns=['Embarked'], drop_first=True)

# Split dataset

X = df.drop('Survived', axis=1)
y = df['Survived']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# -------------------------
# 1. Decision Tree
# -------------------------

start_dt = time.time()
dt = DecisionTreeClassifier(random_state=42, max_depth=3)
dt.fit(X_train, y_train)
dt_pred = dt.predict(X_test)
end_dt = time.time()

# -------------------------
# 2. Random Forest
# -------------------------

start_rf = time.time()
rf = RandomForestClassifier(random_state=42, n_estimators=100)
rf.fit(X_train, y_train)
rf_pred = rf.predict(X_test)
end_rf = time.time()

# ------------------------------
# 3. GradientBoostingClassifier
# ------------------------------

from sklearn.ensemble import GradientBoostingClassifier

model = GradientBoostingClassifier(random_state=42)

start_gb = time.time()
model.fit(X_train, y_train)
gb_pred = model.predict(X_test)
end_gb = time.time()

# -------------------------
# Evaluation
# -------------------------

print("\nDecision Tree\n")

print("Accuracy:", accuracy_score(y_test, dt_pred))
print("F1 Score:", f1_score(y_test, dt_pred))
print("Training Time:", round(end_dt - start_dt, 4), "seconds")

print("\nRandom Forest\n")
print("Accuracy:", accuracy_score(y_test, rf_pred))
print("F1 Score:", f1_score(y_test, rf_pred))
print("Training Time:", round(end_rf - start_rf, 4), "seconds")

print("\nGradient Boosting\n")
print("Accuracy:", accuracy_score(y_test, gb_pred))
print("F1 Score:", f1_score(y_test, gb_pred))
print("Training Time:", round(end_gb - start_gb, 4), "seconds")



Decision Tree

Accuracy: 0.7988826815642458
F1 Score: 0.7391304347826086
Training Time: 0.0048 seconds

Random Forest

Accuracy: 0.7988826815642458
F1 Score: 0.7534246575342466
Training Time: 0.1343 seconds

Gradient Boosting

Accuracy: 0.8044692737430168
F1 Score: 0.7407407407407407
Training Time: 0.1153 seconds
