You should compare AdaBoost to at least one of the following: a bagging model, a stacking model. Based on the visualizations seen at the links above you're probably also thinking that this classification task should not be that difficult. So, a secondary goal of this assignment is to test the effects of the AdaBoost function arguments on the algorithm's performance. You should explore at least 3 different sets of settings for the function inputs, and you should do your best to find values for these inputs that actually change the results of your modelling. That is, try not to run three different sets of inputs that result in the same performance. The goal here is for you to better understand how to set these input values yourself in the future. Comment on what you discover about these inputs and how the behave. Your submission should be built and written with non-experts as the target audience. All of your code should still be included, but do your best to narrate your work in accessible ways. Again, submit an HTML, ipynb, or Colab link. Be sure to rerun your entire notebook fresh before submitting!

In [None]:
import pandas as pd
from sklearn.ensemble import AdaBoostClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

In [5]:
penguins = pd.read_csv("penguins_size.csv")
penguins.dropna(inplace=True)

AttributeError: module 'pandas' has no attribute 'read_csv'

In [18]:

# Encode all categorical variables
for col in penguins.columns:
    penguins[col] = LabelEncoder().fit_transform(penguins[col])

# Separate features and target
X = penguins.drop('species', axis=1)
Y = penguins['species']

# Split into train/test sets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42, stratify=Y)


# Bagging

In [19]:
bagging_model = BaggingClassifier(
    estimator=DecisionTreeClassifier(max_depth=5),  # Use a deeper tree than AdaBoost
    n_estimators=100,
    random_state=42
)

bagging_model.fit(X_train, Y_train)
bagging_pred = bagging_model.predict(X_test)
bagging_acc = accuracy_score(Y_test, bagging_pred)

print(f"Bagging Model Accuracy (100 estimators, DT max_depth=5): {bagging_acc * 100:.2f}%")


Bagging Model Accuracy (100 estimators, DT max_depth=5): 98.51%


# Adaboost

In [20]:
model1 = AdaBoostClassifier(
    estimator=DecisionTreeClassifier(max_depth=1),
    n_estimators=50,
    learning_rate=1.0,
    algorithm='SAMME'
)

model1.fit(X_train, Y_train)
pred1 = model1.predict(X_test)
acc1 = accuracy_score(Y_test, pred1)

print(f"Model 1 Accuracy (50 estimators, LR=1.0): {acc1 * 100:.2f}%")


Model 1 Accuracy (50 estimators, LR=1.0): 97.01%




In [21]:
model2 = AdaBoostClassifier(
    estimator=DecisionTreeClassifier(max_depth=1),
    n_estimators=200,
    learning_rate=0.5,
    algorithm='SAMME'
)

model2.fit(X_train, Y_train)
pred2 = model2.predict(X_test)
acc2 = accuracy_score(Y_test, pred2)

print(f"Model 2 Accuracy (200 estimators, LR=0.5): {acc2 * 100:.2f}%")


Model 2 Accuracy (200 estimators, LR=0.5): 98.51%




In [22]:
model3 = AdaBoostClassifier(
    estimator=DecisionTreeClassifier(max_depth=1),
    n_estimators=400,
    learning_rate=0.1,
    algorithm='SAMME'
)

model3.fit(X_train, Y_train)
pred3 = model3.predict(X_test)
acc3 = accuracy_score(Y_test, pred3)

print(f"Model 3 Accuracy (400 estimators, LR=0.1): {acc3 * 100:.2f}%")




Model 3 Accuracy (400 estimators, LR=0.1): 97.01%


The three AdaBoost models produced varying levels of accuracy on the test set, highlighting how different parameter choices can impact performance. Model 1, with 50 estimators and a learning rate of 1.0, achieved an accuracy of 97.01%. Model 2, which increased the number of estimators to 200 and lowered the learning rate to 0.5, performed best with an accuracy of 98.51%. Model 3 used even more estimators (400) but a much smaller learning rate of 0.1, resulting in the same accuracy as Model 1 (97.01%). These results suggest that simply increasing the number of estimators does not guarantee better performance.

The Bagging model achieved 98.51% accuracy, matching AdaBoost's best result. While AdaBoost was sensitive to parameter changes, Bagging maintained strong performance with simpler tuning, demonstrating its effectiveness and stability across different settings.