# Q1 Analysis

### Introduction
This experiment implements logistic regression using gradient descent to minimize the negative log-likelihood function, optimizing the parameter vector w. We analyze the behavior of convergence and the effect of learning rate.


# Before Fix 

<div style="display: flex; gap: 10px;">
  <img src="attachment:1f76d13e-63ae-4fe2-8f48-f56593f00448.png" alt="Before Fix 1" width="300"/>
  <img src="attachment:b4055b20-4695-403c-8aac-5f4d7687f180.png" alt="Before Fix 2" width="300"/>
</div>


Figure 1 (Separator Path in Slope-Intercept Space):
The separator path oscillated significantly, indicating unstable updates to the weight vector w.


Figure 2 (Negative Log-Likelihood Over Iterations):
The negative log-likelihood exhibited oscillations, showing the model was not converging steadily.

Reason for Oscillation:
The learning rate (eta = 0.003) was too high, causing overshooting of the minimum in the error function. This led to unstable weight updates, producing oscillating behavior in both the separator and error plots.


# To Fix 
Fix Implemented
Change:
Reduced learning rate to eta = 0.001.

Result:
The model converged smoothly, with the separator path showing a steady trajectory and the negative log-likelihood curve decreasing monotonically.

# After Fix

<div style="display: flex; gap: 10px;">
  <img src="attachment:8bf521b4-c658-4ac6-9fb6-77cfd4433f43.png" alt="Image 1" width="300"/>
  <img src="attachment:74fdb9a4-c0e4-481b-89e6-85d45c0b0338.png" alt="Image 2" width="300"/>
</div>



Figure 1 Separator in Data Space (After Fix)
Figure 2 Negative Log-Likelihood Plot (After Fix — Smooth Decrease)
These plots confirm that gradient descent stabilized and negative log-likelihood decreased smoothly after reducing eta.
A high learning rate caused oscillations due to unstable weight updates. Reducing the learning rate allowed gradient descent to converge smoothly, effectively minimizing negative log-likelihood.


# Summary

The oscillations observed in the separator path (slope-intercept space) and negative log-likelihood plot were due to a high learning rate (eta = 0.003). This caused the weight updates to overshoot the minimum, leading to instability and oscillatory convergence behavior. 

To address this, I reduced the learning rate to eta = 0.001, which allowed smaller, more stable updates to the weights. This resulted in smooth convergence of the weights and a monotonic decrease in the negative log-likelihood.

Results After Fix
After fixing, the updated plots showed:

A smooth separator path in slope-intercept space.
Steady, monotonic decrease in negative log-likelihood.
Better classifier stability as seen in the separator line in data space.

# Q2 Analysis

Stochastic Gradient Descent (SGD) Implementation
In this task, I modified the logistic regression implementation to use Stochastic Gradient Descent (SGD) instead of Batch Gradient Descent. In SGD, weights w are updated after each data point, which results in more frequent, but noisier updates. One iteration is defined as a full pass through all training points.

Learning rate used: η = 0.003
Maximum iterations: 50


## Results Using SGD (η = 0.003)

<img src="attachment:997fca2b-c54e-425a-bc72-0ba551b3cffb.png" alt="Resized Image" width="400"/>


### Separator path in slope-intercept space using SGD


### 📊 Visualizations

### 📌 Figures 1–3: Separator in Data Space (Iterations 47, 48, 49)

<div style="display: flex; gap: 10px;">
  <div style="text-align: center;">
    <img src="attachment:34ded560-a3d8-4900-8817-9719404b878e.png" alt="Iter 47" width="250"/>
    <p style="margin-top: 5px;">Iter 47</p>
  </div>
  <div style="text-align: center;">
    <img src="attachment:89252daa-c2fb-4301-a7c4-b12b62f41612.png" alt="Iter 48" width="250"/>
    <p style="margin-top: 5px;">Iter 48</p>
  </div>
  <div style="text-align: center;">
    <img src="attachment:e215b562-1d5d-42e6-8bea-bd6cca903c7d.png" alt="Iter 49" width="250"/>
    <p style="margin-top: 5px;">Iter 49</p>
  </div>
</div>

### 📉 Figure 4: Negative Log-Likelihood Over Iterations (SGD)

<img src="attachment:33861dac-2e43-4777-b361-f1ddcf9bcba8.png" alt="Loss Curve" width="300"/>


# Observations
The separator line stabilizes over iterations, progressively improving classification between the two classes.
The negative log-likelihood decreased steadily, with no significant oscillations, indicating that SGD was effective with η = 0.003. Compared to Batch Gradient Descent, SGD converged faster in early iterations but with smaller fluctuations in later stages.

# Conclusion
Stochastic Gradient Descent allowed for faster early convergence due to frequent updates, and with an appropriate learning rate, it converged smoothly and effectively minimized the error. These results validate SGD’s practical efficiency for logistic regression optimization.

# Q3 Analysis

# Objective
This experiment explores the effect of L2 regularization in logistic regression by minimizing:

$$
E_{\text{total}}(w) = \text{Negative Log-Likelihood} + \frac{1}{2} \lambda \|w\|^2
$$

I evaluated the impact of λ = 0.1, 1, 10, 100 on model performance, specifically analyzing:

Final negative log-likelihood
Final ||w|| (norm of weight vector)

# Graphs Generated from Code

## For λ = 0.1
<div style="display: flex; gap: 10px;">
  <img src="attachment:87e69b9e-d3af-4809-bf3d-50d9c4f99416.png" alt="Lambda 0.1 - Graph 1" width="400"/>
  <img src="attachment:87632625-c5e4-472d-b926-b3a0b6060062.png" alt="Lambda 0.1 - Graph 2" width="400"/>
</div>

## For λ = 1
<div style="display: flex; gap: 10px;">
  <img src="attachment:dcee5201-ca9e-4012-acbf-6ac9f0cd2006.png" alt="Lambda 1 - Graph 1" width="400"/>
  <img src="attachment:de3fa656-9f09-41d0-8d6b-13d0b039d77d.png" alt="Lambda 1 - Graph 2" width="400"/>
</div>

## For λ = 10
<div style="display: flex; gap: 10px;">
  <img src="attachment:5815e75c-bcc5-48ca-a02e-abe28d163398.png" alt="Lambda 10 - Graph 1" width="400"/>
  <img src="attachment:4c6e08b6-738b-4ab3-851d-d3cab8b8e497.png" alt="Lambda 10 - Graph 2" width="400"/>
</div>

## For λ = 100
<div style="display: flex; gap: 10px;">
  <img src="attachment:99307b69-7998-44d5-bac5-f55fc5ba8936.png" alt="Lambda 100 - Graph 1" width="400"/>
  <img src="attachment:d98c9b20-16e0-45a0-a76a-bca1c4ee0c87.png" alt="Lambda 100 - Graph 2" width="400"/>
</div>

## Summary of Run 
Running with λ = 0.1 
Final Negative Log-Likelihood: 60.9930
||w||: 4.7769

Running with λ = 1
Final Negative Log-Likelihood: 66.4473
||w||: 3.9003

Running with λ = 10
Final Negative Log-Likelihood: 94.8889
||w||: 1.2250

Running with λ = 100
Final Negative Log-Likelihood: 113.1325
||w||: 0.3741

# Observations and Explanation
As λ increased, ||w|| consistently decreased due to the regularization term penalizing large weights more heavily.
Simultaneously, negative log-likelihood increased with higher λ values because the model became less flexible and underfit the data due to strong regularization.
For λ = 0.1, the model achieved low error and high weight norm, fitting data well with mild regularization.
For λ = 100, the model over-penalized weights, resulting in minimal weight magnitude but high error, indicating underfitting.

# Conclusion
L2 regularization effectively controls model complexity by shrinking weight magnitudes, reducing overfitting risk. However, excessive regularization (high λ) can severely underfit data, harming model accuracy. An optimal λ balances error minimization and generalization.