# Exercise 2: Boosting

## Do not start the exercise until you fully understand the submission guidelines.


* The homework assignments are executed automatically. 
* Failure to comply with the following instructions will result in a significant penalty. 
* Appeals regarding your failure to read these instructions will be denied. 
* Kind reminder: the homework assignments contribute 60% of the final grade.


## Read the following instructions carefully:

1. This Jupyter notebook contains all the step-by-step instructions needed for this exercise.
1. Write **efficient**, **vectorized** code whenever possible. Some calculations in this exercise may take several minutes when implemented efficiently, and might take much longer otherwise. Unnecessary loops will result in point deductions.
1. You are responsible for the correctness of your code and should add as many tests as you see fit to this jupyter notebook. Tests will not be graded nor checked.
1. You are allowed to use functions and methods from the [Python Standard Library](https://docs.python.org/3/library/).
1. Your code must run without errors. Use at least `numpy` 1.15.4. Any code that cannot run will not be graded.
1. Write your own code. Cheating will not be tolerated.
1. Submission includes a zip file that contains this notebook, with your ID as the file name. For example, `hw1_123456789_987654321.zip` if you submitted in pairs and `hw1_123456789.zip` if you submitted the exercise alone. The name of the notebook should follow the same structure.
   
Please use only a **zip** file in your submission.

---
---

## Please sign that you have read and understood the instructions: 

### *** YOUR RUNI EMAILS HERE ***
---
---


In [None]:
# Import necessary libraries
import numpy as np
from sklearn.datasets import make_classification, make_gaussian_quantiles, make_moons
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier, RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, roc_curve, auc
import matplotlib.pyplot as plt

from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.decomposition import PCA
from itertools import product
import pandas as pd

np.random.seed(42)

# Design your algorithm
Make sure to describe the algorithm, its limitations, and describe use-cases.

Our design begins by initializing every sample with equal weight. In each of the T rounds, we train a simple decision stump on the weighted data, compute its weighted error $\epsilon$, and assign it a weight:
$w_i = \frac{w_i \cdot e^{-\alpha y_i h(x_i)}}{2\sqrt{\epsilon\cdot\left(1-\epsilon\right)}}$, where $\alpha \;=\;\tfrac{1}{2}\ln\!\bigl(\tfrac{1-\epsilon}{\epsilon}\bigr)$.
We boost the weights of misclassified points and lower those of correctly classified ones. After T iterations, the final classifier is
$H(x)\;=\;\mathrm{sign}\Bigl(\sum_{t=1}^T\alpha_t\,h_t(x)\Bigr)$,
a weighted majority vote over all stumps.

Key points of the design include the number of boosting rounds T, the learning rate (which scales each $\alpha$), and a stability mechanism (such as adding a minimum $\epsilon$ to avoid $\ln(0)$ and early stopping when $\epsilon \ge 0.5$). A smaller learning rate with more rounds yields smoother convergence, while a larger rate learns faster but can overfit noisy data. Balancing these settings controls accuracy, robustness to noise, and computational efficiency.

Examples for limitations of the algorithm are overfitting noisy or mislabeled data and slow running time when too many rounds (T) are used. The algorithm will work well on clean, structured data where weak learners can slowly improve the decision line. Also, it will work well for classification problems with a small number of features.

# Your implementations
You may add new cells, write helper functions or test code as you see fit.
Please use the cell below and include a description of your implementation.
Explain code design consideration, algorithmic choices and any other details you think is relevant to understanding your implementation.
Failing to explain your code will lead to point deductions.

 In this implementation of t-SNE, we have followed the core principles of the algorithm as thought in class.
 
 As described in the design we start by giving equal weight to all n samples ($w_i = \tfrac{1}{n}$), then for each of up to T rounds we train a decision stump (DecisionTreeClassifier(max_depth=1)) using those weights. After predicting on the training set, we compute the weighted error:
$\epsilon = \sum_{h_t(x_i) \neq y_i} w_i$.

As explained in the design, we update each sample’s weight by multiplying it with $\exp(-\alpha\,y_i\,h(x_i))$ which increases weights for misclassified points and decreases weights for correctly classified points. Then, we normalize with $2\sqrt{\epsilon(1-\epsilon)}$ to keep the weight vector summing to one. Early stopping when $\epsilon \ge 0.5$ prevents adding a stump that’s no better than random.

Prediction simply aggregates each stump’s vote weighted by its $\alpha$:
$H(x)=\mathrm{sign}\Bigl(\sum_t \alpha_t\,h_t(x)\Bigr)$.

Our implementation includes a predict_proba method that turns the weighted sum of stumps into probabilities using a simple logistic sigmoid, so we can later plot ROC curves and compute AUC. The main hyperparameters we have are the number of rounds (T) and learning rate.

In [None]:
# Your code here
# Part 1: Implementing AdaBoost
class AdaBoostCustom:
    def __init__(self, T=10, epsilon=1e-10, learning_rate=1.0):
        self.T = T
        self.alphas = []
        self.models = []
        # Note: You may add more attributes
        self.epsilon = epsilon    
        self.learning_rate = learning_rate  

    def fit(self, X, y):
        n_samples, n_features = X.shape

        # init weights
        w = np.ones(n_samples) / n_samples

        for t in range(self.T):
            # using a weak classifier (decision stump)
            model = DecisionTreeClassifier(max_depth=1)
            model.fit(X, y, sample_weight=w)
            y_pred = model.predict(X)

            # compute the error
            err = np.sum(w * (y_pred != y)) / np.sum(w)

            if err >= 0.5:
                break

            # compute alpha
            alpha = self.learning_rate * 0.5 * np.log((1 - err) / (err + self.epsilon))

            self.alphas.append(alpha)
            self.models.append(model)

            # update weights
            # (increase weights for misclassified samples
            # decrease weights for correctly classified samples)
            w *= np.exp(-alpha * y * y_pred)
            w /= 2 * np.sqrt(err * (1 - err) + self.epsilon)
    
    def predict(self, X):  
        y_pred = np.zeros(X.shape[0])

        # get predictions from all models
        for alpha, model in zip(self.alphas, self.models):
            y_pred += alpha * model.predict(X)

        # return the sign of the accumulated predictions
        return np.sign(y_pred)
    
    def predict_proba(self, X):
        # approximate probabilities via sigmoid of margin
        y_pred = np.zeros(X.shape[0])

        for alpha, model in zip(self.alphas, self.models):
            y_pred += alpha * model.predict(X)

        # convert margin to probabilities using sigmoid function
        probs = 1 / (1 + np.exp(-2 * y_pred))

        return np.vstack([1 - probs, probs]).T

# Generate data
Please use the cell below to discuss your dataset choice and why it is appropriate (or not) for this algorithm.

We use make_gaussian_quantiles to generate a 2D dataset with two classes. This dataset is appropriate for AdaBoost because it’s low-dimensional and has some class overlap, which tests the algorithm’s ability to improve decision boundaries through boosting. The 2D format also makes the results easy to visualize. Labels are converted to +- 1 to match AdaBoost's expected input format.

In [None]:
# create gaussian quantiles dataset
# gaussian one
X1, Y1 = make_gaussian_quantiles(
    n_samples=2000,
    n_features=2,
    n_classes=1,
    mean=[-0.6, -0.6],
    cov=0.4,
    random_state=42
)
Y1 = np.zeros(len(Y1))  # All class 0

# gaussian two 
X2, Y2 = make_gaussian_quantiles(
    n_samples=2000,
    n_features=2,
    n_classes=1,
    mean=[0.6, 0.6],
    cov=0.4,
    random_state=123
)
Y2 = np.ones(len(Y2))   # All class 1

# Combine the two gaussians
X = np.vstack([X1, X2])
Y = np.hstack([Y1, Y2])

# convert labels to -1 and 1
Y = np.where(Y == 1, 1, -1)

# split the dataset into training and testing sets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

# AdaBoost demonstration 
Demonstrate your AdaBoost implementation.

Add plots and figures. 

Please use the cell below to describe your results and tests.

Describe the difference between your implementation and the sklearn implementation. Hint: you can look at the documentation.

We train and evaluate our custom AdaBoost implementation and compare it to sklearn’s AdaBoostClassifier. We use the same dataset and compare their performance using accuracy, ROC curves, and decision boundary plots. (Accuracy can be found in top left of the plots)

Our custom AdaBoost uses decision stumps and updates weights like sklearn’s AdaBoostClassifier. Both use a learning rate to control each weak learner’s impact. Our custom model stops early if error is above 0.5, while sklearn stops if error is zero or after max rounds. Sklearn has extra optimizations and handles sample weights more flexibly. Our custom version is simpler, while sklearn is made for more general use. (info from sklearn's docs)

We also perform a grid search over different numbers of estimators and learning rates for the custom model to find the best configuration, showing how hyperparameter tuning affects accuracy.

Then we plotted ROC curves and decision boundaries to illustrate the differences in classifier confidence and decision surfaces between the two implementations on both training and test data.

As seen in the decision boundary plots both implemntations manage to classify and seprate the classes well with accuracy above 90% which indicate that custom algorithm was well implemented.
In the ROC plots, we observe scores close to 1, suggesting excellent performance in distinguishing between the two classes. This further confirms that the custom AdaBoost behaves similarly to the Scikit-learn implementation in both accuracy and confidence of predictions.

In [None]:
# helper function to plot decision boundary
def plot_decision_boundary(model, X, Y, acc=None, title="", pca=None, ax=None):
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 300),
                         np.linspace(y_min, y_max, 300))
    grid = np.c_[xx.ravel(), yy.ravel()]
    grid_input = pca.inverse_transform(grid) if pca else grid
    Z = model.predict(grid_input).reshape(xx.shape)

    if ax:
        ax.contourf(xx, yy, Z, alpha=0.2, cmap='bwr')
        ax.scatter(X[:, 0], X[:, 1], c=Y, cmap='bwr', edgecolor='k', s=20)
        if acc is not None:
            ax.text(x_min, y_max, f"Accuracy: {acc:.2f}", fontsize=12,
            verticalalignment='top', bbox=dict(boxstyle="round", facecolor='white', alpha=0.7))
        ax.set_title(title)
        ax.set_xticks([])
        ax.set_yticks([])
    else:
        plt.figure(figsize=(10, 6))
        plt.contourf(xx, yy, Z, alpha=0.2, cmap='bwr')
        plt.scatter(X[:, 0], X[:, 1], c=Y, cmap='bwr', edgecolor='k', s=20)
        if acc is not None:
            plt.text(x_min, y_max, f"Accuracy: {acc:.2f}", fontsize=12,
                     verticalalignment='top', bbox=dict(boxstyle="round", facecolor='white', alpha=0.7))
        plt.title(title)
        plt.xlabel("Feature 0")
        plt.ylabel("Feature 1")
        plt.grid(True)
        plt.tight_layout()
        plt.show()

In [None]:
# HYPERPARAMETER TUNING

estimator_range = [1, 5, 10, 25, 50, 75, 100]
learning_rates = [0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
results = []

# grid search over T and learning_rate
for T, lr in product(estimator_range, learning_rates):
    model = AdaBoostCustom(T=T) #, learning_rate=lr)
    model.fit(X_train, Y_train)
    acc = accuracy_score(Y_test, model.predict(X_test))
    results.append((T, lr, acc))

df = pd.DataFrame(results, columns=["T", "lr", "accuracy"])
pivot = df.pivot(index="T", columns="lr", values="accuracy")

best_idx = np.unravel_index(np.argmax(pivot.values), pivot.shape)
best_T = pivot.index[best_idx[0]]
best_lr = pivot.columns[best_idx[1]]
best_acc = pivot.values[best_idx]

# print the best configuration
print(f"\nBest Custom AdaBoost Configuration:")
print(f"  - Number of Estimators (T): {best_T}")
print(f"  - Learning Rate: {best_lr}")
print(f"  - Test Accuracy: {best_acc:.4f}")


# VISUALIZATION OF DECISION BOUNDARIES

# using the best parameters from grid search
custom_model = AdaBoostCustom(T=75, learning_rate=0.5)
custom_model.fit(X_train, Y_train)

sk_model = AdaBoostClassifier(n_estimators=75, learning_rate=0.5, random_state=42)
sk_model.fit(X_train, Y_train)

# compute accuracies
custom_train_acc = accuracy_score(Y_train, custom_model.predict(X_train))
custom_test_acc = accuracy_score(Y_test, custom_model.predict(X_test))
sk_train_acc = accuracy_score(Y_train, sk_model.predict(X_train))
sk_test_acc = accuracy_score(Y_test, sk_model.predict(X_test))

# plot decision boundaries
plot_decision_boundary(custom_model, X_train, Y_train, custom_train_acc, "Custom AdaBoost Decision Boundary (Train Data)")
plot_decision_boundary(sk_model, X_train, Y_train, sk_train_acc, "Sklearn AdaBoost Decision Boundary (Train Data)")
plot_decision_boundary(custom_model, X_test, Y_test, custom_test_acc, "Custom AdaBoost Decision Boundary (Test Data)")
plot_decision_boundary(sk_model, X_test, Y_test, sk_test_acc, "Sklearn AdaBoost Decision Boundary (Test Data)")


# PLOT ROC CURVES

# plot ROC curves for both custom and sklearn AdaBoost models
plt.figure(figsize=(10, 6))

# ROC for custom model
custom_proba = custom_model.predict_proba(X_test)[:, 1]
custom_fpr, custom_tpr, _ = roc_curve(Y_test == 1, custom_proba)
custom_roc_auc = auc(custom_fpr, custom_tpr)
plt.plot(custom_fpr, custom_tpr, color='blue', lw=2, 
         label=f'Custom AdaBoost ROC (area = {custom_roc_auc:.2f})')

# ROC for sklearn model
sklearn_proba = sk_model.predict_proba(X_test)[:, 1]
sklearn_fpr, sklearn_tpr, _ = roc_curve(Y_test == 1, sklearn_proba)
sklearn_roc_auc = auc(sklearn_fpr, sklearn_tpr)
plt.plot(sklearn_fpr, sklearn_tpr, color='red', lw=2, 
         label=f'Sklearn AdaBoost ROC (area = {sklearn_roc_auc:.2f})')

# plot results
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve Comparison: Custom vs Sklearn AdaBoost')
plt.legend(loc="lower right")
plt.grid(True, alpha=0.3)
plt.show()

# Generate additional data sets
Generate at least two experimental datasets with binary labels, designed to demonstrate specific properties of AdaBoost (e.g., handling noise or overfitting).

Add plots and figures.

Please use the cell below to describe your suggested approach in detail. Use formal notations where appropriate.

Describe and discuss your results.

We generated three datasets to highlight distinct behaviors of AdaBoost: dealing with noise, risk of overfitting, and capturing complex patterns.

1. High noise dataset:
Created by adding significant label noise and feature overlap, this set challenges AdaBoost’s robustness. The similar performance on training and test sets indicates the model’s ability to focus on hard-to-classify points without overfitting noisy labels.

2. Overfitting-prone dataset:
With clean data but very high boosting rounds, this dataset exposes AdaBoost’s tendency to overfit by perfectly fitting the training data while losing accuracy on unseen samples, showing some gap between train and test results.

3. Non-linear dataset:
Based on a noisy two-moons pattern, this dataset tests AdaBoost’s flexibility to learn non-linear boundaries. The balanced accuracy across train and test sets demonstrates effective adaptation to complex class shapes without overfitting.

In [None]:
# # Generate additional data sets
# and
# # Split data sets

# dataset 1: noisy labels
# gaussian one
X1, Y1 = make_gaussian_quantiles(
    n_samples=1500,
    n_features=2,
    n_classes=1,
    mean=[-0.6, -0.6],
    cov=0.4,
    random_state=42
)
Y1 = np.zeros(len(Y1))  # All class 0

# gaussian two 
X2, Y2 = make_gaussian_quantiles(
    n_samples=1500,
    n_features=2,
    n_classes=1,
    mean=[0.6, 0.6],
    cov=0.4,
    random_state=123
)
Y2 = np.ones(len(Y2))   # All class 1

# Combine the two gaussians
X1 = np.vstack([X1, X2])
Y = np.hstack([Y1, Y2])

# convert labels to -1 and 1
Y1 = np.where(Y == 1, 1, -1)

# flip labels for noise
flip_ratio = 0.2
n_flip = int(flip_ratio * len(Y))
flip_indices = np.random.choice(len(Y), size=n_flip, replace=False)
Y[flip_indices] *= -1

X1_train, X1_test, y1_train, y1_test = train_test_split(X1, Y1, test_size=0.2, random_state=42)

# dataset 2: high-dimensional and prone to overfitting
X2, y2 = make_classification(
    n_samples=2000,
    n_features=10,
    n_informative=2,
    n_redundant=3,
    n_clusters_per_class=1,
    n_classes=2,
    flip_y=0.0,
    class_sep=1.5,
    random_state=42
)

y2 = 2 * y2 - 1
X2_train, X2_test, y2_train, y2_test = train_test_split(X2, y2, test_size=0.2, random_state=42)

# dataset 3: non-linear structure
X3, y3 = make_moons(n_samples=3000, noise=0.2, random_state=42)
y3 = 2 * y3 - 1
X3_train, X3_test, y3_train, y3_test = train_test_split(X3, y3, test_size=0.2, random_state=42)

# pca for visualization only
pca1 = PCA(n_components=2).fit(X1_test)
X1_test_pca = pca1.transform(X1_test)

pca2 = PCA(n_components=2).fit(X2_test)
X2_test_pca = pca2.transform(X2_test)

pca3 = PCA(n_components=2).fit(X3_test)
X3_test_pca = pca3.transform(X3_test)

In [None]:
datasets = [
    ("Dataset 1", X1_train, y1_train, X1_test, y1_test, pca1, 50),
    ("Dataset 2", X2_train, y2_train, X2_test, y2_test, pca2, 5000),
    ("Dataset 3", X3_train, y3_train, X3_test, y3_test, pca3, 50)
]

fig, axes = plt.subplots(3, 2, figsize=(12, 10))

for i, (label, X_train, y_train, X_test, y_test, pca, T) in enumerate(datasets):
    custom = AdaBoostCustom(T=T)
    custom.fit(X_train, y_train)
    sk_model = AdaBoostClassifier(n_estimators=T, random_state=42)
    sk_model.fit(X_train, y_train)

    acc_custom = accuracy_score(y_test, custom.predict(X_test))
    acc_sk = accuracy_score(y_test, sk_model.predict(X_test))

    plot_decision_boundary(
        custom, X_test, y_test,
        acc=acc_custom,
        title=f"Custom AdaBoost - {label}",
        pca=pca,
        ax=axes[i, 0]
    )
    plot_decision_boundary(
        sk_model, X_test, y_test,
        acc=acc_sk,
        title=f"Sklearn AdaBoost - {label}",
        pca=pca,
        ax=axes[i, 1]
    )

plt.tight_layout()
plt.show()

# todo: check all explanations and comments + finish test section + additional datasets

# Test algorithms
Test your AdaBoost, a library implementation of AdaBoost and at least two additional models, one of which must be another boosting algorithm on your two datasets.

Add plots and figures.

Please use the cell below to describe your suggested approach in detail. Use formal notations where appropriate.

Describe and discuss your results.

We compared four models on a binary classification task: 
1. custom AdaBoost
2. Scikit-learn’s AdaBoost
3. Gradient Boosting
4. Random Forest.

The custom and Scikit-learn AdaBoost models performed similarly, both producing clear boundaries and balanced train/test accuracy near 90%. This suggests the our custom implementation works well and describes AdaBoost properly.

Gradient Boosting showed a smoother decision boundary and achieved slightly lower accuracy compared to AdaBoost. Its focus on minimizing loss directly, rather than reweighting samples like AdaBoost, makes it more stable in the presence of noise and overlapping data.

Random Forest achieved perfect accuracy but with overly complex boundaries, pointing to overfitting. Its reliance on deep, unpruned trees and bootstrapping leads to high variance if not regularized.

The ROC and accuracy plots support these findings—highlighting Gradient Boosting’s strong generalization and AdaBoost’s close alignment between theory and practice.

In [None]:
# Set up models 

# create gaussian quantiles dataset
# gaussian one
X1, Y1 = make_gaussian_quantiles(
    n_samples=1500,
    n_features=2,
    n_classes=1,
    mean=[-0.5, -0.5],
    cov=0.8,
    random_state=42
)
Y1 = np.zeros(len(Y1))  # All class 0

# gaussian two 
X2, Y2 = make_gaussian_quantiles(
    n_samples=1500,
    n_features=2,
    n_classes=1,
    mean=[0.5, 0.5],
    cov=0.5,
    random_state=123
)
Y2 = np.ones(len(Y2))   # All class 1

# Combine the two gaussians
X = np.vstack([X1, X2])
Y = np.hstack([Y1, Y2])

# convert labels to -1 and 1
Y = np.where(Y == 1, 1, -1)

# split the dataset into training and testing sets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

# models configuration
models = {
    'Custom AdaBoost': lambda: AdaBoostCustom(T=75, learning_rate=0.5),
    'Sklearn AdaBoost': lambda: AdaBoostClassifier(n_estimators=75, learning_rate=0.5, random_state=42),
    'Random Forest': lambda: RandomForestClassifier(n_estimators=75, random_state=42),
    'Gradient Boosting': lambda: GradientBoostingClassifier(n_estimators=75, learning_rate=0.5, random_state=42)
}

In [None]:
# Test and visualize
results = {}

for name, ctor in models.items():
    model = ctor()
    model.fit(X_train, Y_train)
    train_acc = accuracy_score(Y_train, model.predict(X_train))
    test_acc = accuracy_score(Y_test, model.predict(X_test))
    proba = model.predict_proba(X_test)[:, 1]
    fpr, tpr, _ = roc_curve(Y_test == 1, proba)
    auc_score = auc(fpr, tpr)
    results[name] = {
        'train_acc': train_acc,
        'test_acc': test_acc,
        'fpr': fpr,
        'tpr': tpr,
        'auc': auc_score
    }

# plot decision boundaries for each model
trained = {name: ctor() for name, ctor in models.items()}
for name, model in trained.items():
    model.fit(X_train, Y_train)

fig, axes = plt.subplots(2, 2, figsize=(12, 10))
for ax, (name, model) in zip(axes.ravel(), trained.items()):
    acc = results[name]['test_acc']
    auc_score = results[name]['auc']
    plot_decision_boundary(
        model, X_train, Y_train, acc=acc,
        title=f"{name}\nTest Acc={acc:.2f}, AUC={auc_score:.2f}",
        ax=ax
    )
plt.tight_layout()
plt.show()

# plot ROC curve for each model
plt.figure(figsize=(8,6))
for name, res in results.items():
    plt.plot(res['fpr'], res['tpr'], label=f"{name} (AUC={res['auc']:.2f})")
plt.plot([0,1],[0,1],'k--')
plt.xlabel("False Positive Rate"); plt.ylabel("True Positive Rate")
plt.title("ROC Curves")
plt.legend(loc="lower right"); plt.grid(True); plt.tight_layout(); plt.show()

# plot comparison of train and test accuracies and compare models
labels = list(models.keys())
train_accs = [results[n]['train_acc'] for n in labels]
test_accs  = [results[n]['test_acc']  for n in labels]
x = np.arange(len(labels))
width = 0.35

plt.figure(figsize=(10,6))
plt.bar(x - width/2, train_accs, width, label='Train Acc')
plt.bar(x + width/2, test_accs,  width, label='Test Acc')
plt.xticks(x, labels, rotation=45)
plt.ylabel("Accuracy"); plt.title("Train vs Test Accuracy for Each Model")
plt.legend(); plt.grid(axis='y'); plt.tight_layout(); plt.show()

# Use of generative AI
Please use the cell below to describe your use of generative AI in this assignment. 

We used generative AI tools such as ChatGPT and Claude to assist with technical aspects like using Python packages, improving code readability, and debugging issues. We also used them for creating visualizations and making the code more efficient. For example, helping us to find the right NumPy operations to optimize performance. The core algorithm design, theoretical understanding, data processing, and full implementation were based on material taught in class and additional study from online sources.