# PyTorch Workflow: Linear Regression End-to-End

This notebook covers the full linear regression workflow in PyTorch with practical intuition and coding exercises.


## Topics Covered

1. Creating a dataset with linear regression
2. Creating training and test sets
3. Creating our first PyTorch model
4. Important model-building classes
5. Checking model internals
6. Making predictions with our model
7. Training a model with PyTorch (intuition)
8. Setting up a loss function and optimizer
9. Training loop intuition
10. Running training loop epoch by epoch
11. Writing testing loop code
12. Saving and loading a model
13. Putting everything together


## Learning Goals

By the end, you should be able to:
- Build synthetic regression data with known true parameters.
- Split data into train/test sets correctly.
- Define an `nn.Module` model for regression.
- Train and evaluate with proper `train()` / `eval()` behavior.
- Save and reload model weights safely.
- Package the whole process into reusable functions.


In [None]:
import torch
from torch import nn
import matplotlib.pyplot as plt

print('Torch version:', torch.__version__)


In [None]:
# Reproducibility
RANDOM_SEED = 42
torch.manual_seed(RANDOM_SEED)


## Creating a Dataset with Linear Regression

We build a synthetic dataset from the formula:

\[
y = w x + b
\]

Since we choose `w` and `b`, we can verify whether training recovers those values.


In [None]:
# True parameters
weight_true = 0.7
bias_true = 0.3

# Create input values
X = torch.arange(0, 1, 0.02).unsqueeze(dim=1)

# Create labels using the true linear relationship
y = weight_true * X + bias_true

print('X shape:', X.shape)
print('y shape:', y.shape)


In [None]:
# Visualize full dataset
plt.figure(figsize=(7, 4))
plt.scatter(X.numpy(), y.numpy(), s=18, c='royalblue')
plt.title('Synthetic Linear Regression Data')
plt.xlabel('x')
plt.ylabel('y')
plt.grid(alpha=0.2)
plt.show()


### Exercise 1

- Change `weight_true` and `bias_true`.
- Regenerate `y`.
- Re-plot the data and describe how slope/intercept changed.


## Creating Training and Test Sets

A model should be trained on one subset and evaluated on unseen examples.


In [None]:
train_split = int(0.8 * len(X))
X_train, y_train = X[:train_split], y[:train_split]
X_test, y_test = X[train_split:], y[train_split:]

print('Train samples:', len(X_train))
print('Test samples :', len(X_test))


In [None]:
# Plot train vs test
plt.figure(figsize=(7, 4))
plt.scatter(X_train.numpy(), y_train.numpy(), s=18, c='royalblue', label='Train')
plt.scatter(X_test.numpy(), y_test.numpy(), s=18, c='tomato', label='Test')
plt.legend()
plt.title('Train/Test Split')
plt.xlabel('x')
plt.ylabel('y')
plt.grid(alpha=0.2)
plt.show()


### Exercise 2

- Try split ratios `70/30` and `90/10`.
- Compare how many test points each gives.
- Which split gives a stronger estimate of generalization for this small dataset?


## Creating Our First PyTorch Model

We start with the most explicit version: two learnable parameters (`weight`, `bias`) as `nn.Parameter` objects.


In [None]:
class LinearRegressionModelV1(nn.Module):
    def __init__(self):
        super().__init__()
        self.weight = nn.Parameter(torch.randn(1, dtype=torch.float), requires_grad=True)
        self.bias = nn.Parameter(torch.randn(1, dtype=torch.float), requires_grad=True)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.weight * x + self.bias


model_0 = LinearRegressionModelV1()
print(model_0)


## Important Model-Building Classes

- `nn.Module`: Base class for all neural network modules.
- `nn.Parameter`: Tensor that should be optimized.
- `nn.Linear`: Built-in affine layer (`y = xA^T + b`).
- `torch.optim`: Optimizers like SGD/Adam.
- Loss functions: quantify prediction error (`nn.L1Loss`, `nn.MSELoss`, etc.).


In [None]:
# Equivalent model using nn.Linear (more common)
class LinearRegressionModelV2(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear_layer = nn.Linear(in_features=1, out_features=1)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.linear_layer(x)


model_1 = LinearRegressionModelV2()
print(model_1)


### Exercise 3

- Print both `model_0` and `model_1` parameters.
- Explain how `model_1` hides `weight`/`bias` inside `nn.Linear`.


## Checking Out the Internals of Our Model


In [None]:
print('model_0 parameters:')
for name, param in model_0.named_parameters():
    print(f'{name:10s} | shape={tuple(param.shape)} | value={param.data}')

print()
print('model_0 state_dict:')
print(model_0.state_dict())


In [None]:
print('model_1 parameters:')
for name, param in model_1.named_parameters():
    print(f'{name:20s} | shape={tuple(param.shape)} | value={param.data}')

print()
print('model_1 state_dict:')
print(model_1.state_dict())


## Making Predictions with Our Model

Before training, predictions are usually poor because parameters are random.


In [None]:
with torch.inference_mode():
    y_preds_before = model_0(X_test)

plt.figure(figsize=(7, 4))
plt.scatter(X_train.numpy(), y_train.numpy(), s=18, c='royalblue', label='Train')
plt.scatter(X_test.numpy(), y_test.numpy(), s=18, c='tomato', label='Test')
plt.scatter(X_test.numpy(), y_preds_before.numpy(), s=18, c='seagreen', label='Predictions (before training)')
plt.legend()
plt.title('Predictions Before Training')
plt.xlabel('x')
plt.ylabel('y')
plt.grid(alpha=0.2)
plt.show()


### Exercise 4

- Compare `y_preds_before[:5]` with `y_test[:5]`.
- Estimate whether the model is over- or under-predicting on average.


## Training a Model with PyTorch (Intuition)

Core cycle:
1. Forward pass (predictions)
2. Compute loss
3. Zero gradients
4. Backward pass (`loss.backward()`)
5. Optimizer step (`optimizer.step()`)


## Setting Up a Loss Function and Optimizer

For linear regression, `L1Loss` (MAE) is common in intro examples.


In [None]:
loss_fn = nn.L1Loss()
optimizer = torch.optim.SGD(params=model_0.parameters(), lr=0.01)


## Training Loop Intuition

This short loop prints gradients to show how parameters get updated.


In [None]:
torch.manual_seed(RANDOM_SEED)
model_0.train()

for step in range(3):
    y_pred = model_0(X_train)
    loss = loss_fn(y_pred, y_train)

    optimizer.zero_grad()
    loss.backward()

    print(f'Step {step} | loss={loss.item():.5f} | grad_weight={model_0.weight.grad.item():.5f} | grad_bias={model_0.bias.grad.item():.5f}')

    optimizer.step()


### Exercise 5

- Increase `lr` to `0.1` and run the 3-step loop again from a fresh model.
- What changes in gradient step behavior do you observe?


## Running Our Training Loop Epoch by Epoch


In [None]:
# Reinitialize model for clean training
torch.manual_seed(RANDOM_SEED)
model_0 = LinearRegressionModelV1()

loss_fn = nn.L1Loss()
optimizer = torch.optim.SGD(params=model_0.parameters(), lr=0.01)

epochs = 300
train_loss_values = []
test_loss_values = []
epoch_count = []

for epoch in range(epochs):
    model_0.train()

    # Training step
    y_pred = model_0(X_train)
    train_loss = loss_fn(y_pred, y_train)

    optimizer.zero_grad()
    train_loss.backward()
    optimizer.step()

    # Testing step
    model_0.eval()
    with torch.inference_mode():
        test_pred = model_0(X_test)
        test_loss = loss_fn(test_pred, y_test)

    if epoch % 20 == 0:
        epoch_count.append(epoch)
        train_loss_values.append(train_loss.item())
        test_loss_values.append(test_loss.item())
        print(f'Epoch: {epoch:3d} | Train loss: {train_loss.item():.5f} | Test loss: {test_loss.item():.5f}')


In [None]:
plt.figure(figsize=(7, 4))
plt.plot(epoch_count, train_loss_values, label='Train loss')
plt.plot(epoch_count, test_loss_values, label='Test loss')
plt.title('Loss Curves')
plt.xlabel('Epoch')
plt.ylabel('L1 Loss')
plt.legend()
plt.grid(alpha=0.2)
plt.show()


## Writing Testing Loop Code

A correct testing loop should:
- switch to `model.eval()`
- disable gradient tracking (`torch.inference_mode()`)
- compute metrics on unseen test data


In [None]:
def evaluate_model(model: nn.Module, X_data: torch.Tensor, y_data: torch.Tensor, loss_fn):
    model.eval()
    with torch.inference_mode():
        pred = model(X_data)
        loss = loss_fn(pred, y_data)
    return {'loss': loss.item(), 'predictions': pred}


test_results = evaluate_model(model_0, X_test, y_test, loss_fn)
print('Test loss:', round(test_results['loss'], 6))


In [None]:
with torch.inference_mode():
    y_preds_after = model_0(X_test)

plt.figure(figsize=(7, 4))
plt.scatter(X_train.numpy(), y_train.numpy(), s=18, c='royalblue', label='Train')
plt.scatter(X_test.numpy(), y_test.numpy(), s=18, c='tomato', label='Test')
plt.scatter(X_test.numpy(), y_preds_after.numpy(), s=18, c='seagreen', label='Predictions (after training)')
plt.legend()
plt.title('Predictions After Training')
plt.xlabel('x')
plt.ylabel('y')
plt.grid(alpha=0.2)
plt.show()


### Exercise 6

- Add `MSELoss` to the evaluation function.
- Report both MAE and MSE for the same model.
- Which metric is larger numerically and why?


## Saving and Loading a Model

Standard practice is saving the `state_dict()`.


In [None]:
from pathlib import Path

MODEL_PATH = Path('PyTorch Course/models')
MODEL_PATH.mkdir(parents=True, exist_ok=True)

MODEL_NAME = 'linear_regression_model_v1.pth'
MODEL_SAVE_PATH = MODEL_PATH / MODEL_NAME

torch.save(model_0.state_dict(), MODEL_SAVE_PATH)
print('Saved model to:', MODEL_SAVE_PATH)


In [None]:
# Load weights into a new model instance
loaded_model = LinearRegressionModelV1()
loaded_model.load_state_dict(torch.load(MODEL_SAVE_PATH))
loaded_model.eval()

# Verify predictions match
with torch.inference_mode():
    loaded_preds = loaded_model(X_test)

same = torch.allclose(y_preds_after, loaded_preds)
print('Loaded model predictions identical to trained model:', same)


### Exercise 7

- Save `model_1` as a separate file.
- Load it into a fresh `LinearRegressionModelV2` instance.
- Verify test predictions are identical pre/post load.


## Putting Everything Together

This section wraps the workflow into reusable functions for repeatable experiments.


In [None]:
def create_linear_data(weight=0.7, bias=0.3, start=0, end=1, step=0.02):
    X = torch.arange(start, end, step).unsqueeze(1)
    y = weight * X + bias
    return X, y


def train_test_split_tensors(X, y, split_ratio=0.8):
    split = int(split_ratio * len(X))
    return X[:split], y[:split], X[split:], y[split:]


def train_linear_model(model, X_train, y_train, X_test, y_test, epochs=300, lr=0.01):
    loss_fn = nn.L1Loss()
    optimizer = torch.optim.SGD(model.parameters(), lr=lr)

    history = {'epoch': [], 'train_loss': [], 'test_loss': []}

    for epoch in range(epochs):
        model.train()
        y_pred = model(X_train)
        train_loss = loss_fn(y_pred, y_train)

        optimizer.zero_grad()
        train_loss.backward()
        optimizer.step()

        model.eval()
        with torch.inference_mode():
            test_pred = model(X_test)
            test_loss = loss_fn(test_pred, y_test)

        if epoch % 20 == 0:
            history['epoch'].append(epoch)
            history['train_loss'].append(train_loss.item())
            history['test_loss'].append(test_loss.item())

    return history


In [None]:
# End-to-end run
X_all, y_all = create_linear_data(weight=1.2, bias=-0.1)
X_train2, y_train2, X_test2, y_test2 = train_test_split_tensors(X_all, y_all, split_ratio=0.8)

torch.manual_seed(RANDOM_SEED)
final_model = LinearRegressionModelV1()

history = train_linear_model(final_model, X_train2, y_train2, X_test2, y_test2, epochs=400, lr=0.01)

print('Learned weight:', final_model.weight.item())
print('Learned bias  :', final_model.bias.item())


In [None]:
plt.figure(figsize=(7, 4))
plt.plot(history['epoch'], history['train_loss'], label='Train loss')
plt.plot(history['epoch'], history['test_loss'], label='Test loss')
plt.title('Final Workflow Loss Curves')
plt.xlabel('Epoch')
plt.ylabel('L1 Loss')
plt.legend()
plt.grid(alpha=0.2)
plt.show()


## Capstone Exercises

1. Add Gaussian noise to `y` when creating data and compare final losses.
2. Replace SGD with Adam and compare convergence speed.
3. Train with very small and very large learning rates and summarize failure modes.
4. Add early stopping logic when test loss does not improve for `N` checks.
5. Re-run experiments with 3 different random seeds and compare learned parameters.


## Quick Knowledge Check

- Why should test data never influence optimizer updates?
- What is the difference between `model.train()` and `model.eval()`?
- Why is `optimizer.zero_grad()` required every training step?
- What does `state_dict()` contain?
- When would you prefer MAE over MSE?
