# Lesson 15: Your First PyTorch Model - Linear Regression

Welcome! In the last lesson, we learned about Tensors (the data) and Autograd (the gradient calculator). Today, we will combine these concepts to build our very first functional model that can *learn* from data.

We will build a **Linear Regression** model. This is the "Hello, World!" of machine learning. 

**Our Goal:** Find the line of best fit for a set of data points. We are trying to teach the computer to find the values for `w` (weight) and `b` (bias) in the famous equation: `y = w * x + b`.

## The 5 Steps of Deep Learning Training

To train *any* model in PyTorch, we will always follow these steps. We will repeat this pattern in every lesson from now on.

**The Training Loop:**
1.  **Forward Pass:** Our model makes a prediction.
2.  **Calculate Loss:** We measure *how wrong* the prediction is.
3.  **Zero Gradients:** We reset the gradient so they don't add up from previous steps.
4.  **Backward Pass:** PyTorch calculates the gradients (the "slope" of our error) for every parameter.
5.  **Update Parameters:** We use an optimizer to take a small step in the correct direction to reduce the error.

Let's learn this by doing it.

In [None]:
import torch
import torch.nn as nn # nn is PyTorch's module for building neural networks
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

## Step 1: Create the Data

In a real project, we would load data. For this first example, let's *create* some data. We'll start with a known line, `y = 2x + 1`, and add some random "noise" to it. 

Our model's job will be to discover that `w` is close to `2` and `b` is close to `1`.

In [None]:
# Create 100 data points
X = torch.randn(100, 1) * 10 # 100 rows, 1 column. Multiplied by 10 to spread the data out

# Our "true" line is y = 2x + 1
# We add random noise to make it a realistic problem
y = 2 * X + 1 + torch.randn(100, 1) * 2 # Add noise with a standard deviation of 2

# Let's visualize our data
plt.figure(figsize=(10, 6))
plt.scatter(X.numpy(), y.numpy())
plt.title('Our Synthetic Data (What the model sees)')
plt.xlabel('X (Input Feature)')
plt.ylabel('y (Target Value)')
plt.show()

## Part 1: The "Manual" Way (Building from scratch)

Let's build the model by manually defining our parameters `w` and `b`. This is the best way to understand what's happening under the hood.

### Step 2: Initialize Parameters

We need to start with a guess for `w` and `b`. Let's guess `w = 0.0` and `b = 0.0`. 

Crucially, we must set `requires_grad=True`. This tells PyTorch: "Track every calculation that uses these tensors, because we will need to calculate gradients for them later."

In [None]:
w = torch.tensor(0.0, requires_grad=True)
b = torch.tensor(0.0, requires_grad=True)

### Step 3: Define the Model, Loss, and Optimizer

**1. The Model (Forward Pass):** This is just a function that implements our equation.

In [None]:
def forward(x):
    return w * x + b

**2. The Loss Function (Criterion):**

How do we measure "how wrong" our predictions are? We use a **Loss Function**.

**Analogy:** Imagine you're playing a guessing game. You guess a number, and your friend says "you're off by 10" or "you're off by 2". That "you're off by" number is the **loss**. A high number means you're very wrong. A low number (like 0) means you're right.

For regression, the most common loss function is **Mean Squared Error (MSE)**.

**MSE = (1/N) * sum( (prediction - actual)Â² )**

In simple terms: *"For every data point, find the difference between our guess and the real answer. Square that difference (to make it positive and to punish big mistakes more). Then, take the average of all those squared differences."*

Luckily, PyTorch has this built-in: `nn.MSELoss()`.

In [None]:
loss_fn = nn.MSELoss()

**3. The Optimizer:**

If the Loss Function is the *measurement* of how wrong we are, and the Gradient (from Lesson 14) is the *direction* of the mistake, the **Optimizer** is the tool that *takes the step* to fix the mistake.

We will use the simplest and most famous optimizer: **Stochastic Gradient Descent (SGD)**.

We need to tell the optimizer two things:
1.  **What parameters to update?** In our case, `w` and `b`.
2.  **How big of a step to take?** This is the **Learning Rate (`lr`)**. It's the most important knob to tune. 
    * A **high `lr`** is like taking giant leaps down the hill. You might be fast, but you might leap right over the valley.
    * A **low `lr`** is like taking tiny baby steps. You'll be very precise, but it might take forever to get to the bottom.

In [None]:
learning_rate = 0.001 
optimizer = torch.optim.SGD([w, b], lr=learning_rate)

### Step 4: The Training Loop

Now we'll put all 5 steps together and repeat them many times. One full pass through our data is called one **epoch**.

In [None]:
epochs = 200

for epoch in range(epochs):
    # 1. Forward Pass: Make a prediction
    y_pred = forward(X)
    
    # 2. Calculate Loss: How wrong are we?
    loss = loss_fn(y_pred, y)
    
    # 3. Zero Gradients: Reset gradients from previous loop
    optimizer.zero_grad()
    
    # 4. Backward Pass: Calculate gradients (the magic!)
    loss.backward()
    
    # 5. Update Parameters: Take a step
    optimizer.step()
    
    # Print progress
    if (epoch + 1) % 20 == 0:
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')

print(f"\nTraining finished!\nLearned w: {w.item():.4f}, Learned b: {b.item():.4f}")

### Step 5: Visualize the Result

Our model learned that `w` is around `2.0` and `b` is around `1.0`. Let's plot the line our model learned against the original data!

In [None]:
# Get the model's final predictions
predicted = forward(X).detach() # .detach() removes it from the autograd graph

plt.figure(figsize=(10, 6))
plt.scatter(X.numpy(), y.numpy(), label='Original Data')
plt.plot(X.numpy(), predicted.numpy(), 'r-', label='Fitted Line')
plt.title('Linear Regression - Manual Model')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()

---

## Part 2: The "PyTorch Way" (Using `nn.Module`)

Manually tracking `w` and `b` is great for learning, but it's not practical for large models with millions of parameters. 

The "proper" way to build models in PyTorch is to create a class that inherits from `nn.Module`. This class will hold all our layers and parameters.

### Step 2 (Revisited): Define the Model as a Class

Instead of `w` and `b`, we will use a built-in **Linear Layer**: `nn.Linear(in_features, out_features)`.

* `in_features=1`: Our `X` data has 1 feature.
* `out_features=1`: We want 1 output value (`y_pred`).

This `nn.Linear` layer automatically creates and manages the `w` and `b` parameters for us.

In [None]:
class LinearRegressionModel(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(LinearRegressionModel, self).__init__()
        # Define the layers
        self.linear = nn.Linear(input_dim, output_dim)
    
    def forward(self, x):
        # Define the flow of data
        return self.linear(x)

model = LinearRegressionModel(input_dim=1, output_dim=1)

### Step 3 (Revisited): Define Loss and Optimizer

The Loss Function is the same. The Optimizer is *almost* the same, but instead of passing `[w, b]`, we can just ask the model for all its parameters automatically using `model.parameters()`.

In [None]:
learning_rate = 0.001
loss_fn = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

### Step 4 (Revisited): The Training Loop

The loop is almost identical. The only change is that we call `model(X)` instead of our old `forward(X)` function.

In [None]:
epochs = 200

for epoch in range(epochs):
    # 1. Forward Pass
    y_pred = model(X)
    
    # 2. Calculate Loss
    loss = loss_fn(y_pred, y)
    
    # 3. Zero Gradients
    optimizer.zero_grad()
    
    # 4. Backward Pass
    loss.backward()
    
    # 5. Update Parameters
    optimizer.step()
    
    if (epoch + 1) % 20 == 0:
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')

print("\nTraining finished!\n")

# You can inspect the learned parameters inside the model
list(model.parameters())

### Step 5 (Revisited): Visualize the Result

The result should be identical to our manual model, proving that `nn.Linear` was just doing the same `w*x + b` operation all along.

In [None]:
# Get the model's final predictions
predicted = model(X).detach()

plt.figure(figsize=(10, 6))
plt.scatter(X.numpy(), y.numpy(), label='Original Data')
plt.plot(X.numpy(), predicted.numpy(), 'r-', label='Fitted Line (nn.Module)')
plt.title('Linear Regression - PyTorch nn.Module')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()

## Congratulations!

You have just built and trained your first machine learning model from scratch. You learned the 5-step training loop, which is the foundation for all deep learning, from simple linear regression to giant models like ChatGPT.

In the next lesson, we will use these exact same steps to build a true **neural network** to solve a more exciting problem: classifying images of handwritten digits.