<a href="https://colab.research.google.com/github/wingated/cs473/blob/main/labs/cs473_lab_week_11.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a><p><b>After clicking the "Open in Colab" link, copy the notebook to your own Google Drive before getting started, or it will not save your work</b></p>

# BYU CS 473 Lab Week 11

## Introduction:

In this lab you will be comparing different boosting techniques, specifically the AdaBoost and LogitBoost algorithms.

AdaBoost is a powerful ensemble learning algorithm designed to improve the performance of weak classifiers by combining them into a strong classifier. In AdaBoost, a sequence of simple models is trained iteratively, with each model focusing more on the instances that previous models misclassified. The algorithm works by assigning equal weights to all training samples at the beginning. After each iteration, the weights of misclassified samples are increased, causing subsequent base classifiers to focus on the harder cases. This adaptation helps the ensemble address both bias and variance, often resulting in a more accurate and robust model than any single weak classifier. AdaBoost aggregates the predictions of all weak learners using a weighted voting scheme, where each learner's contribution depends on its accuracy.

LogitBoost is another boosting algorithm for classification that builds an ensemble of weak learners to predict class probabilities accurately. Unlike AdaBoost, which focuses on minimizing exponential loss, LogitBoost is designed to minimize logistic loss, making it especially suitable for tasks where well-calibrated probability estimates are important. LogitBoost works by fitting each weak learner in a stage-wise fashion. At each boosting round, the algorithm calculates pseudo-residuals called "working responses" and corresponding weights based on the current model's probability estimates. The base learner then fits these responses using weighted least squares. Iteratively, this process refines the ensemble so it improves the accuracy of predicted probabilities. LogitBoost is helpful because it tends to be more robust to noisy data and outliers compared to AdaBoost, and it provides reliable probability outputs that can be used in decision-making or as input for other models. This makes LogitBoost a strong choice when you want both high accuracy and meaningful probability estimates in your classification models.

See Sections 18.5.3 and 18.5.4 for more details on these algorithms.

---
## Grading standards   

Your notebook will be graded on the following:

* 40% Correct implementation of the AdaBoost algorithm
* 40% Correct implementation of the LogitBoost algorithm
* 20% Analysis of the different methods
---

#### Imports and Data Preprocessing

In [1]:
from datasets import load_dataset
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.metrics import accuracy_score, classification_report

data = load_dataset("AiresPucrs/adult-census-income", split="train").to_pandas()

data.replace('?', pd.NA, inplace=True)
data = data.dropna()

# Data cleaning
categorials = data.columns[data.dtypes == object].values
encoders = {col:LabelEncoder() for col in categorials}
for col in categorials:
    data[col] = encoders[col].fit_transform(data[col])

target_col = data.columns[-1]

# Use this data for the single classifier and LogitBoost
X, y = data.iloc[:, :-1], data.iloc[:, -1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Change target labels to -1 and 1 instead of 0 and 1 to work with AdaBoost algorithm
# Use this data for AdaBoost
data[target_col] = data[target_col].replace({0: -1, 1: 1})
X, y = data.iloc[:, :-1], data.iloc[:, -1]
X_train_adaboost, X_test_adaboost, y_train_adaboost, y_test_adaboost = train_test_split(X, y, test_size=0.2, random_state=42)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


### Single Classifier

Below is a single decision tree classifier that has been fit to the data. Run the cell to see how it does on this dataset.

In [2]:
clf = DecisionTreeClassifier(max_depth=1, random_state=42)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Single Decision Tree Classifier Accuracy: {accuracy:.4f}")

# Detailed classification report with zero_division parameter
print(classification_report(y_test, y_pred, zero_division=0))

Single Decision Tree Classifier Accuracy: 0.7514
              precision    recall  f1-score   support

           0       0.75      1.00      0.86      4533
           1       0.00      0.00      0.00      1500

    accuracy                           0.75      6033
   macro avg       0.38      0.50      0.43      6033
weighted avg       0.56      0.75      0.64      6033



### Implement AdaBoost

Following Algorithm 18.1 on page 616 of the textbook, you will implement the AdaBoost algorithm. For the base classifier, use sklearn.DecisionTreeClassifier with max_depth=1 as used above for the single classifier example. After you finish your AdaBoost implementation, get the predictions of your model on X_test_adaboost and print the accuracy and classification report as done above for the single classifier.

![Adaboost Algorithm](https://raw.githubusercontent.com/wingated/cs473/main/labs/images/lab_week_11_adaboost.png)

Hints:

* AdaBoost relies on class labels being {+1, -1}. To handle this, use X_train_adaboost/y_train_adaboost which have been correctly labeled.

* w should be shape (N,) with one value per datapoint.

* When computing alpha, add a small epsilon term (1e-10) to the denominator for numerical stability.

* For each iteration, you should save your classifier and your alpha to use for the final prediction (last line of the pseudocode).

* The sgn[] in the last line means you should take the sign of the weighted sum inside the brackets.

In [8]:
# Parameters
M = 50  # number of boosting rounds
epsilon = 1e-10  # small term for numerical stability
N = X_train_adaboost.shape[0]

# Initialize weights
w = np.ones(N) / N

# Lists to store classifiers and their weights
classifiers = []
alphas = []

# AdaBoost algorithm
for m in range(M):
    # Fit a weak learner with current weights
    clf = DecisionTreeClassifier(max_depth=1, random_state=42)
    clf.fit(X_train_adaboost, y_train_adaboost, sample_weight=w)
    y_pred_train = clf.predict(X_train_adaboost)

    # Compute weighted error
    incorrect = (y_pred_train != y_train_adaboost)
    err_m = np.sum(w * incorrect) / np.sum(w)

    # Compute alpha
    alpha_m = 0.5 * np.log((1 - err_m + epsilon) / (err_m + epsilon))

    # Update weights
    w = w * np.exp(alpha_m * incorrect * 2)  # incorrect=1 if misclassified, 0 otherwise
    w /= np.sum(w)  # normalize weights

    # Save classifier and alpha
    classifiers.append(clf)
    alphas.append(alpha_m)

# Final prediction function
def adaboost_predict(X):
    final_pred = np.zeros(X.shape[0])
    for clf, alpha in zip(classifiers, alphas):
        final_pred += alpha * clf.predict(X)
    return np.sign(final_pred)

# Predictions on test set
y_pred_adaboost_manual = adaboost_predict(X_test_adaboost)

# Evaluate
accuracy = accuracy_score(y_test_adaboost, y_pred_adaboost_manual)
print(f"Manual AdaBoost Accuracy: {accuracy:.4f}")

print(classification_report(y_test_adaboost, y_pred_adaboost_manual, zero_division=0))

Manual AdaBoost Accuracy: 0.8419
              precision    recall  f1-score   support

          -1       0.87      0.93      0.90      4533
           1       0.73      0.57      0.64      1500

    accuracy                           0.84      6033
   macro avg       0.80      0.75      0.77      6033
weighted avg       0.83      0.84      0.84      6033



### Implement LogitBoost

Following Algorithm 18.2 on page 617 of the textbook, implement the LogitBoost algorithm. As your base learner, use sklearn.DecisionTreeRegressor with max_depth=1.

![LogitBoost Algorithm](https://raw.githubusercontent.com/wingated/cs473/main/labs/images/lab_week_11_logitboost.png)

Hints:

* LogitBoost uses class labels of {1, 0}. X_train/y_train should be formatted for this.

* w will again be shape (N,), as will pi.

* Similar to alpha in AdaBoost, when computing z, add a small epsilon term (1e-10) to the denominator for numerical stability.

* Calculating $F_{m}$ will be as simple as using the DecisionTreeRegressor's fit function (using X and z for your data, and w for the sample_weight).

* $f(x)$ will initially be all zeros of shape (N,).

* Again, save each of your regressors for the final prediction step.

In [9]:
# Parameters
M = 50  # number of boosting rounds
epsilon = 1e-10
N = X_train.shape[0]  # for {0,1} labels

# Initialize F(x) and pi
f = np.zeros(N)
pi = np.full(N, 0.5)

# Lists to store regressors
regressors = []

# LogitBoost algorithm
for m in range(M):
    # Compute working response z
    z = (y_train - pi) / (pi * (1 - pi) + epsilon)

    # Compute weights w
    w = pi * (1 - pi)

    # Fit base learner to weighted least squares
    reg = DecisionTreeRegressor(max_depth=1)
    reg.fit(X_train, z, sample_weight=w)

    # Update F(x)
    f += 0.5 * reg.predict(X_train)

    # Update pi
    pi = 1 / (1 + np.exp(-2 * f))

    # Save the regressor
    regressors.append(reg)

# Final prediction function
def logitboost_predict(X):
    f_pred = np.zeros(X.shape[0])
    for reg in regressors:
        f_pred += 0.5 * reg.predict(X)
    # Convert to binary predictions {0,1}
    return (f_pred > 0).astype(int)

# Predictions on test set
y_pred_logitboost = logitboost_predict(X_test)

# Evaluate
accuracy = accuracy_score(y_test, y_pred_logitboost)
print(f"Manual LogitBoost Accuracy: {accuracy:.4f}")

print(classification_report(y_test, y_pred_logitboost, zero_division=0))


Manual LogitBoost Accuracy: 0.8546
              precision    recall  f1-score   support

           0       0.88      0.94      0.91      4533
           1       0.76      0.61      0.68      1500

    accuracy                           0.85      6033
   macro avg       0.82      0.77      0.79      6033
weighted avg       0.85      0.85      0.85      6033



### Analysis

Compare the accuracy of each model (single decision tree classifier, AdaBoost, and LogitBoost) and describe what you learned about the effectiveness of boosting methods (2-3 sentences).

The Single Decision Tree's accuracy was reported as the lowest at 75.14%. The AdaBoost's accuracy was reported as a close second at 84.19%. The LogitBoost's accuracy was reported as the best at 85.46%.



| Algorithm     | Strengths                            | Weaknesses                                                     | Best Use                                                     |
| ------------- | ------------------------------------ | -------------------------------------------------------------- | ------------------------------------------------------------ |
| Decision Tree | Simple, interpretable, fast          | Overfits, weak accuracy, poor with imbalanced classes          | Baseline, feature understanding                              |
| AdaBoost      | High accuracy, reduces bias          | Sensitive to outliers, may overfit, not probability-calibrated | Standard classification, clean datasets                      |
| LogitBoost    | Probability outputs, robust to noise | Slower, more complex                                           | Classification with probability needs, imbalanced/noisy data |
