# Abstract

# Loading The Model and Packages

Source Code: 

# Part 0: Considering p > n

When the number of features \( p \) exceeds the number of data observations \( n \), the standard closed-form solution for linear regression becomes invalid due to a key mathematical issue. The formula \( \hat{\mathbf{w}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y} \) relies on the inversion of the matrix \( \mathbf{X}^T\mathbf{X} \). However, when \( p > n \), the matrix \( \mathbf{X} \) has more columns than rows, making \( \mathbf{X}^T\mathbf{X} \) a \( p \times p \) matrix that is not full rank. This means it is singular, or rank-deficient, and therefore not invertible. As a result, the operation \( (\mathbf{X}^T\mathbf{X})^{-1} \) is undefined, and the entire expression for \( \hat{\mathbf{w}} \) in Equation 1 breaks down. This is why the closed-form solution is only valid when \( n > p \), ensuring that \( \mathbf{X}^T\mathbf{X} \) is invertible.


In [2]:
%load_ext autoreload
%autoreload 2
import torch
import matplotlib.pyplot as plt
from KernelLogistic import KernelLogisticRegression
plt.style.use('seaborn-v0_8-whitegrid')

# Discussion

In this project, we implemented Sparse Kernel Logistic Regression to classify nonlinear data, focusing on how different kernel parameters affect model performance. Changing lambda also resulted in more of the weight vectors erring from zero, which we had hoped to keep close to zero. Larger gamma tended to cause the decision bounds to fit the training data more accurately, often leading to overfitting while smaller lambda's tended to cause the model to lose the patterns in the data. Through visualization and ROC curve analysis, we observed that high gamma values, such as 1000, led the model to overfit, fitting the noise in the training data while performing poorly on new data. This process highlighted the importance of carefully tuning hyperparameters and evaluating models with metrics like AUC that capture performance across all decision thresholds. Overall, this project helped us deepen our understanding of kernel methods, regularization, and the balance between model complexity and generalization.