# üéØ Gradient Descent from Scratch in PyTorch

This notebook implements gradient descent to find optimal weights for predicting employee bonuses. We'll see how the algorithm iteratively adjusts weights to minimize prediction error.

**Problem:** Find weights w‚ÇÅ, w‚ÇÇ, w‚ÇÉ and bias b such that:
```
bonus = w‚ÇÅ√óperformance + w‚ÇÇ√óexperience + w‚ÇÉ√óprojects + b
```

**Ground Truth:** w‚ÇÅ=12, w‚ÇÇ=6, w‚ÇÉ=2, bias=20

---
## Setup & Data Loading

In [None]:
import pandas as pd
import torch

### Load the Employee Dataset

The dataset was generated with known weights, so we can verify if gradient descent recovers them.

In [None]:
df = pd.read_csv('emp_bonus.csv')
print(f"Dataset shape: {df.shape}")
df.head()

---
## Convert Data to PyTorch Tensors

PyTorch requires tensors for gradient computation. We use `float32` for numerical stability.

In [None]:
# Convert features and target to tensors
performance = torch.tensor(df['performance'].values, dtype=torch.float32)
years_of_experience = torch.tensor(df['years_of_experience'].values, dtype=torch.float32)
projects_completed = torch.tensor(df['projects_completed'].values, dtype=torch.float32)
bonus = torch.tensor(df['bonus'].values, dtype=torch.float32)

print(f"Number of samples: {len(bonus)}")

---
## Initialize Weights Randomly

We start with random weights and let gradient descent find the optimal values.

**Key:** `requires_grad=True` tells PyTorch to track operations for automatic differentiation.

In [None]:
# Initialize weights randomly (gradient descent will optimize these)
w1 = torch.rand(1, requires_grad=True)   # Weight for performance
w2 = torch.rand(1, requires_grad=True)   # Weight for experience
w3 = torch.rand(1, requires_grad=True)   # Weight for projects
bias = torch.rand(1, requires_grad=True) # Bias term

print(f"Initial weights: w1={w1.item():.4f}, w2={w2.item():.4f}, w3={w3.item():.4f}, bias={bias.item():.4f}")

---
## Set Hyperparameters

| Hyperparameter | Value | Purpose |
|----------------|-------|--------|
| Learning Rate | 0.006 | Step size for weight updates |
| Epochs | 5000 | Number of complete passes through data |

In [None]:
epochs = 5000
learning_rate = 0.006

---
## Training Loop: Gradient Descent

Each iteration:
1. **Forward pass:** Compute predictions using current weights
2. **Compute loss:** Mean Squared Error between predictions and actual values
3. **Backward pass:** Compute gradients via backpropagation
4. **Update weights:** Adjust weights in the direction that reduces loss
5. **Zero gradients:** Clear gradients for next iteration

In [None]:
for epoch in range(epochs):
    # Forward pass: compute predicted bonus
    predicted_bonus = w1 * performance + w2 * years_of_experience + w3 * projects_completed + bias
    
    # Compute Mean Squared Error loss
    loss = ((predicted_bonus - bonus) ** 2).mean()
    
    # Backward pass: compute gradients
    loss.backward()
    
    # Update weights using gradient descent
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
        w3 -= learning_rate * w3.grad
        bias -= learning_rate * bias.grad
    
    # Zero gradients for next iteration (IMPORTANT!)
    w1.grad.zero_()
    w2.grad.zero_()
    w3.grad.zero_()
    bias.grad.zero_()
    
    # Print progress every 100 epochs
    if epoch % 100 == 0:
        print(f"Epoch {epoch}: Loss = {loss.item():.2f}")

---
## Results: Learned Weights

Compare learned weights to ground truth:

| Parameter | True Value | Learned Value |
|-----------|------------|---------------|
| w‚ÇÅ (performance) | 12 | ? |
| w‚ÇÇ (experience) | 6 | ? |
| w‚ÇÉ (projects) | 2 | ? |
| bias | 20 | ? |

In [None]:
print("Learned weights:")
print(f"  w1 (performance): {w1.item():.2f} (true: 12)")
print(f"  w2 (experience):  {w2.item():.2f} (true: 6)")
print(f"  w3 (projects):    {w3.item():.2f} (true: 2)")
print(f"  bias:             {bias.item():.2f} (true: 20)")

---
## Make a Prediction

Use the learned weights to predict bonus for a new employee:
- Performance: 7
- Years of experience: 4
- Projects completed: 7

**Expected:** 12√ó7 + 6√ó4 + 2√ó7 + 20 = 84 + 24 + 14 + 20 = **142**

In [None]:
# Predict bonus for new employee
new_performance = 7
new_experience = 4
new_projects = 7

predicted = w1 * new_performance + w2 * new_experience + w3 * new_projects + bias
print(f"Predicted bonus: ${predicted.item():.2f}")
print(f"Expected bonus:  $142.00")

---
## Summary

**What we learned:**
1. Gradient descent iteratively adjusts weights to minimize loss
2. `requires_grad=True` enables automatic gradient computation
3. Always zero gradients before each backward pass
4. Learning rate controls convergence speed vs stability

**Next:** See `gd_vs_mini_gd_vs_sgd.ipynb` to compare Batch GD, Mini-Batch GD, and SGD.