# PyTorch Deep Dive: Training Your First Model

We have the Data (Tensor). We have the Machine (Model). We have the Math (Autograd).

Now we need to teach the machine. This is **Training**.

## Learning Objectives
- **The Vocabulary**: What is an "Epoch", "Batch", "Loss", and "Optimizer"?
- **The Intuition**: Training as "Learning to Ride a Bike".
- **The Loop**: The 5-step process that repeats millions of times.
- **The Visual**: Watching the loss go down.


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt

torch.manual_seed(42)

## Part 1: The Vocabulary (Definitions First)

Training a model is like training an athlete. Here are the terms:

### 1. Epoch
- One full pass through the entire dataset.
- Example: If you have 1000 images and you look at all 1000, that's 1 Epoch.
- Analogy: Reading the textbook cover-to-cover once.

### 2. Batch
- A small chunk of data processed at once.
- We don't learn from 1 example at a time (too slow/noisy), nor all at once (too big for RAM).
- Analogy: Studying one chapter at a time.

### 3. Loss Function (The Scorecard)
- Measures how bad the model's prediction is.
- Example: MSE (Mean Squared Error) for numbers, CrossEntropy for categories.
- Analogy: The grade on a practice test.

### 4. Optimizer (The Coach)
- The algorithm that updates the weights to reduce the loss.
- Example: SGD (Stochastic Gradient Descent), Adam.
- Analogy: The coach telling you "Lean left!" or "Pedal harder!".

## Part 2: The Intuition (Learning to Ride a Bike)

How do you learn to ride a bike?

1. **Try**: You get on and pedal. (Forward Pass).
2. **Fail**: You fall over. (Compute Loss).
3. **Blame**: You realize you leaned too far left. (Compute Gradients).
4. **Adjust**: You lean a bit to the right next time. (Update Parameters).
5. **Repeat**: You do it again.

This is exactly how Neural Networks learn.

## Part 3: The Setup (Data, Model, Loss, Optimizer)

Before the loop, we need 4 things:

1. **Data**: $X$ (Inputs) and $y$ (Targets).
2. **Model**: The network.
3. **Loss Function**: The Scorecard.
4. **Optimizer**: The Coach.

In [None]:
# 1. Data (Linear Regression: y = 2x + 1)
X = torch.linspace(0, 10, 100).view(-1, 1) # 100 inputs
y = 2 * X + 1 + torch.randn(X.shape) * 0.5 # 100 targets (with noise)

# 2. Model (Linear Layer)
model = nn.Linear(1, 1)

# 3. Loss Function (MSE: Mean Squared Error)
criterion = nn.MSELoss()

# 4. Optimizer (SGD: Stochastic Gradient Descent)
optimizer = optim.SGD(model.parameters(), lr=0.01)

## Part 4: The Training Loop (The 5 Steps)

This loop is the heartbeat of Deep Learning. Memorize these 5 steps.

1. **Forward Pass**: `pred = model(X)`
2. **Calculate Loss**: `loss = criterion(pred, y)`
3. **Zero Gradients**: `optimizer.zero_grad()` (Don't forget!)
4. **Backpropagation**: `loss.backward()` (Compute gradients)
5. **Step**: `optimizer.step()` (Update weights)

In [None]:
epochs = 100
losses = []

for epoch in range(epochs):
    # 1. Forward Pass
    predictions = model(X)
    
    # 2. Calculate Loss
    loss = criterion(predictions, y)
    losses.append(loss.item())
    
    # 3. Zero Gradients
    optimizer.zero_grad()
    
    # 4. Backpropagation
    loss.backward()
    
    # 5. Step
    optimizer.step()
    
    if epoch % 10 == 0:
        print(f"Epoch {epoch}: Loss = {loss.item():.4f}")

print("Training Complete!")

## Part 5: Visualization (Did it learn?)

Let's see if the model learned the line $y = 2x + 1$.

In [None]:
# Plot Loss Curve
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.plot(losses)
plt.title("Loss Curve (Lower is Better)")
plt.xlabel("Epoch")
plt.ylabel("MSE Loss")

# Plot Predictions
plt.subplot(1, 2, 2)
plt.scatter(X, y, label="Data")
with torch.no_grad(): # Don't track gradients for plotting
    plt.plot(X, model(X), color='red', label="Prediction")
plt.title("Model Fit")
plt.legend()
plt.show()

# Check learned parameters
print(f"Learned Weight: {model.weight.item():.2f} (True: 2.0)")
print(f"Learned Bias: {model.bias.item():.2f} (True: 1.0)")

## Summary Checklist

1. **Epoch** = One full pass through the dataset.
2. **Loss** = The error metric we want to minimize.
3. **Optimizer** = The algorithm (SGD, Adam) that updates weights.
4. **The 5 Steps**: Forward -> Loss -> Zero -> Backward -> Step.

You have now trained your first AI model from scratch. Congratulations!