# Vanilla Gradient Descent (From Scratch)

In this notebook, we implement **logistic regression with a single feature** using **vanilla gradient descent**.  
We will not use PyTorch’s automatic differentiation (`requires_grad=True`); instead, we will:

- Define a tiny dataset with one input feature and binary labels.
- Initialize weights (`w`) and bias (`b`) manually.
- Perform the **forward pass** (linear function + sigmoid).
- Compute the **binary cross-entropy loss** for monitoring.
- Derive and apply **manual gradients** for both weight and bias.
- Update parameters with **gradient descent**.

---

## Learning Goals
- Understand the mechanics of gradient descent updates without relying on autograd.
- Practice deriving gradients for logistic regression by hand.
- Build intuition for how `w` and `b` evolve step by step to reduce loss.

---

## Key Formulas
- **Prediction (logits → sigmoid):**

$$
z = xw + b, \quad \hat{y} = \sigma(z) = \frac{1}{1 + e^{-z}}
$$

- **Binary Cross-Entropy Loss:**

$$
L = - \frac{1}{N} \sum \Big( y \log(\hat{y}) + (1-y)\log(1-\hat{y}) \Big)
$$

- **Gradients:**

$$
\frac{\partial L}{\partial w} = \frac{1}{N} \sum ( \hat{y} - y )x, 
\quad 
\frac{\partial L}{\partial b} = \frac{1}{N} \sum ( \hat{y} - y )
$$

---

By the end of this notebook, you will see **vanilla gradient descent in action** and understand how parameter updates reduce the loss step by step.


In [9]:
import torch 
torch.manual_seed(42)

#Dataset
x = torch.tensor([0, 1, 2, 3])
y = torch.tensor([0, 0, 1, 1])

#Parameters
w = torch.randn(1)
b = torch.randn(1)

#Hyperparameters
epochs = 50
lr = 0.1

#training loop
for epoch in range(epochs+1):
    z = x * w + b
    y_hat = torch.sigmoid(z)

    loss = -(y*torch.log(y_hat + 1e-8) + (1-y) * torch.log(1-y_hat + 1e-8)).mean()

    dz = y_hat - y
    dw = (x * dz).mean()
    db = dz.mean()

    #gradient Update 
    w -= lr * dw
    b -= lr * db

    if epoch % 10 == 0:
        print(f'Loss in epoch {epoch} is : {loss.item():.4f}')

Loss in epoch 0 is : 0.5901
Loss in epoch 10 is : 0.5422
Loss in epoch 20 is : 0.5035
Loss in epoch 30 is : 0.4703
Loss in epoch 40 is : 0.4414
Loss in epoch 50 is : 0.4159


In [12]:
# Parameters
w = torch.randn(1, requires_grad=True)
b = torch.randn(1, requires_grad=True)

learning_rate = 0.1

# Training loop
for step in range(50):
    # Forward pass
    z = x * w + b
    y_hat = torch.sigmoid(z)
    
    # Binary cross-entropy loss
    loss = -(y * torch.log(y_hat + 1e-8) + (1 - y) * torch.log(1 - y_hat + 1e-8)).mean()
    
    # Backward pass: compute gradients
    loss.backward()
    
    # Update parameters (vanilla GD)
    with torch.no_grad():
        w -= learning_rate * w.grad
        b -= learning_rate * b.grad
    
    # Zero gradients for next step
    w.grad.zero_()
    b.grad.zero_()

    if step % 10 == 0:
        print(f"Step {step+1}: w = {w.item():.4f}, b = {b.item():.4f}, loss = {loss.item():.4f}")


Step 1: w = 2.1888, b = -0.6667, loss = 0.5521
Step 11: w = 2.0109, b = -0.9328, loss = 0.4471
Step 21: w = 1.8670, b = -1.1601, loss = 0.3724
Step 31: w = 1.7611, b = -1.3504, loss = 0.3231
Step 41: w = 1.6927, b = -1.5080, loss = 0.2923


# Summary

- Implemented logistic regression with **vanilla gradient descent**.
- Derived and applied **manual gradients** for weight and bias.
- Verified results using **PyTorch autograd**.
- Observed how parameters `w` and `b` evolve while the loss decreases.

📌 This notebook builds intuition for how gradient descent works under the hood before moving to larger models.
