# A Typical Machine Learning Workflow

Machine learning typically occurs in a **loop** where you adjust the model‚Äôs weights and bias, measure performance, and iteratively improve the model until it is finalized.

The training loop generally consists of **four main steps**:

1. **Prediction (Forward Pass)**  
   Compute the model‚Äôs output based on the current weights and input data.

2. **Loss Computation**  
   Measure how far the predictions are from the true values using a **loss function**.

3. **Gradient Calculation (Backward Pass)**  
   Compute gradients of the loss with respect to model parameters using **automatic differentiation**.

4. **Parameter Updates**  
   Adjust the weights and bias in the direction that minimizes the loss (e.g., using **gradient descent**).


With Examples, We will learn how to move from manually performing all these steps to automated way using **pytorch**.

# Example1 (performing all steps manually)
This is to undestand how all the machine learning steps execute. We are using a simple linear regression f = w * x for simplicity.

In [2]:
"""
üìò Simple Linear Regression from Scratch using NumPy
---------------------------------------------------
This script demonstrates how machine learning training works internally
‚Äî without using frameworks like PyTorch or TensorFlow.

We model a simple linear relationship:
    y = w * x

Goal:
    Learn the optimal value of weight `w` that minimizes Mean Squared Error (MSE)
    between predicted and true values.

Steps:
    1. Initialize parameters
    2. Forward pass (prediction)
    3. Compute loss (MSE)
    4. Compute gradient (slope of loss wrt w)
    5. Update weights (gradient descent)
"""

import numpy as np

# ---------------------------------------------------
# 1Ô∏è‚É£ Training Data
# ---------------------------------------------------
# True relationship: y = 2x
X = np.array([1, 2, 3, 4], dtype=np.float32)
Y = np.array([2, 4, 6, 8], dtype=np.float32) # Ytrue

# ---------------------------------------------------
# 2Ô∏è‚É£ Initialize Model Parameter
# ---------------------------------------------------
# Start with an arbitrary weight
w = 0.0  # initial guess for slope

# ---------------------------------------------------
# 3Ô∏è‚É£ Define Core Functions
# ---------------------------------------------------

def forward(x: np.ndarray) -> np.ndarray:
    """
    Forward pass: Predicts the output for given input `x`
    using the current weight `w`.
    Equation: y_pred = w * x
    """
    return w * x


def loss(y: np.ndarray, y_predicted: np.ndarray) -> float:
    """
    Computes Mean Squared Error (MSE)
    Equation: MSE = (1/N) * Œ£(y_pred - y)^2
    """
    return ((y_predicted - y) ** 2).mean()


def gradient(x: np.ndarray, y: np.ndarray, y_predicted: np.ndarray) -> float:
    """
    Computes the gradient of the loss function
    with respect to weight `w`.

    Derivation:
        MSE = (1/N) * Œ£(w*x - y)^2
        dMSE/dw = (2/N) * Œ£(x * (w*x - y))
    """
    return np.dot(2 * x, y_predicted - y).mean()


# ---------------------------------------------------
# 4Ô∏è‚É£ Before Training ‚Äî Initial Prediction
# ---------------------------------------------------
print(f"Prediction before training: f(5) = {forward(5):.3f}")

# ---------------------------------------------------
# 5Ô∏è‚É£ Training Loop (Gradient Descent)
# ---------------------------------------------------
learning_rate = 0.01
epochs = 20

for epoch in range(epochs):

    # üß≠ Forward Pass ‚Äî Compute Predictions
    y_pred = forward(X)

    # üìâ Compute Loss
    l = loss(Y, y_pred)

    # üßÆ Compute Gradient (how loss changes with w)
    dw = gradient(X, Y, y_pred)

    # ‚öôÔ∏è Update Weight ‚Äî Gradient Descent Step
    # new_w = old_w - learning_rate * gradient
    w -= learning_rate * dw

    # ü™Ñ Log Progress Every 10 Epochs
    if epoch % 5 == 0:
        print(f"Epoch {epoch+1:02d}: w = {w:.3f}, loss = {l:.8f}")

# ---------------------------------------------------
# 6Ô∏è‚É£ After Training ‚Äî Evaluate Model
# ---------------------------------------------------
print(f"Prediction after training: f(5) = {forward(5):.3f}")


Prediction before training: f(5) = 0.000
Epoch 01: w = 1.200, loss = 30.00000000
Epoch 06: w = 1.992, loss = 0.00314574
Epoch 11: w = 2.000, loss = 0.00000033
Epoch 16: w = 2.000, loss = 0.00000000
Prediction after training: f(5) = 10.000


# Example2 (partially automated)

In stead of using numpy, we will use **tensor** from pytorch.

**Why?**

By defining weight(w) as a tensor with "requires_grad=True", we are asking the pytorch **Autograd** to build the computation graph, do the gradient calculation.

.backward() calculates the gradient, and updates the parameter .grad property with the value. We do not need the the custom gradient() method.


In [None]:
"""
üìò Simple Linear Regression using PyTorch (Manual Training Loop)
---------------------------------------------------------------
This script demonstrates how PyTorch automates gradient computation
using its **autograd** engine.

We‚Äôre modeling the relationship:
    y = 2 * x

Goal:
    Learn the optimal weight `w` such that predictions match the true values y.

üöÄ Key Improvement Over Manual NumPy Version:
---------------------------------------------
‚úÖ In the NumPy version, we manually computed the gradient (dL/dw).
‚úÖ In this PyTorch version, **autograd automatically calculates gradients**
   during the backward pass using `loss.backward()` ‚Äî eliminating the need
   for manual differentiation.
"""

import torch

# ---------------------------------------------------
# 1Ô∏è‚É£ Define Training Data
# ---------------------------------------------------
# True relationship: y = 2x
X = torch.tensor([1, 2, 3, 4], dtype=torch.float32)
Y = torch.tensor([2, 4, 6, 8], dtype=torch.float32)

# ---------------------------------------------------
# 2Ô∏è‚É£ Initialize Model Parameter
# ---------------------------------------------------
# requires_grad=True lets PyTorch track operations on w for autograd
w = torch.tensor(0.0, dtype=torch.float32, requires_grad=True)

# ---------------------------------------------------
# 3Ô∏è‚É£ Define Core Functions
# ---------------------------------------------------
def forward(x: torch.Tensor) -> torch.Tensor:
    """
    Forward pass: Predict output using the current weight.
    Equation: y_pred = w * x
    """
    return w * x


def loss(y: torch.Tensor, y_predicted: torch.Tensor) -> torch.Tensor:
    """
    Compute Mean Squared Error (MSE) loss.
    MSE = (1/N) * Œ£(y_pred - y)^2
    """
    return ((y_predicted - y) ** 2).mean()


# ---------------------------------------------------
# 4Ô∏è‚É£ Before Training ‚Äî Check Initial Prediction
# ---------------------------------------------------
print(f"Prediction before training: f(5) = {forward(5):.3f}")

# ---------------------------------------------------
# 5Ô∏è‚É£ Training Configuration
# ---------------------------------------------------
learning_rate = 0.01
epochs = 100

# ---------------------------------------------------
# 6Ô∏è‚É£ Training Loop
# ---------------------------------------------------
for epoch in range(epochs):

    # üß≠ Forward Pass ‚Äî Compute Predictions
    y_pred = forward(X)

    # üìâ Compute Loss
    l = loss(Y, y_pred)

    # üßÆ Backward Pass ‚Äî Autograd computes dL/dw automatically and stores in w.grad
    l.backward()

    # ‚öôÔ∏è Update Weight ‚Äî Gradient Descent Step. no_grad() ensure, the weight updated is not added to the w tensor computational graph.
    with torch.no_grad():
        w -= learning_rate * w.grad

    # üßπ Reset gradients to zero for the next iteration.
    w.grad.zero_()

    # ü™Ñ Log progress every 10 epochs
    if epoch % 10 == 0:
        print(f"Epoch {epoch+1:03d}: w = {w:.3f}, loss = {l:.8f}")

# ---------------------------------------------------
# 7Ô∏è‚É£ After Training ‚Äî Evaluate Model
# ---------------------------------------------------
print(f"Prediction after training: f(5) = {forward(5):.3f}")


Prediction before training: f(5) = 0.000
Epoch 001: w = 0.300, loss = 30.00000000
Epoch 011: w = 1.665, loss = 1.16278565
Epoch 021: w = 1.934, loss = 0.04506890
Epoch 031: w = 1.987, loss = 0.00174685
Epoch 041: w = 1.997, loss = 0.00006770
Epoch 051: w = 1.999, loss = 0.00000262
Epoch 061: w = 2.000, loss = 0.00000010
Epoch 071: w = 2.000, loss = 0.00000000
Epoch 081: w = 2.000, loss = 0.00000000
Epoch 091: w = 2.000, loss = 0.00000000
Prediction after training: f(5) = 10.000


# Example3 (fully automated)

Here we will automate the previous example further:
1. Will use the pytorch linear model that would replace the forward() method.
2. Replace the our loss & optimizer calculation with pytorch loss and optimizer methods.

**Note:**
The X & Y are reshaped to 4 x 1 matrix. That's how the pytorch model expects input. Every input shall be treated as a row in the matrix.

In [39]:
"""
üìò Linear Regression using PyTorch (High-Level API)
--------------------------------------------------
This script demonstrates how PyTorch‚Äôs built-in modules
automate most of the training process for a linear regression model.

We‚Äôre modeling the relationship:
    y = 2 * x

Goal:
    Learn the optimal weight (w) and bias (b) that best fit the data.

üöÄ Key Improvement Over Previous Version:
-----------------------------------------
‚úÖ In the earlier version, we manually defined the forward pass,
   computed the loss, and updated the parameter `w` using autograd.
‚úÖ In this version, **PyTorch automates everything**:
   - `nn.Linear` handles model definition (w and b)
   - `nn.MSELoss` handles loss computation
   - `torch.optim.SGD` handles weight updates

This represents a fully modular, high-level approach compared to
manual gradient descent.
"""

import torch
import torch.nn as nn

# ---------------------------------------------------
# 1Ô∏è‚É£ Define Training and Test Data
# ---------------------------------------------------
X = torch.tensor([[1], [2], [3], [4]], dtype=torch.float32)
Y = torch.tensor([[2], [4], [6], [8]], dtype=torch.float32)
X_test = torch.tensor([5], dtype=torch.float32)

# Extract input/output sizes
n_samples, n_features = X.shape
print(f"Samples: {n_samples}, Features: {n_features}")

input_size = n_features
output_size = n_features

# ---------------------------------------------------
# 2Ô∏è‚É£ Define Model
# ---------------------------------------------------
# nn.Linear automatically creates weight and bias parameters
model = nn.Linear(in_features=input_size,
                  out_features=output_size,
                  bias=True)

print(f"Prediction before training: f(5) = {model(X_test).item():.3f}")

# ---------------------------------------------------
# 3Ô∏è‚É£ Define Loss and Optimizer
# ---------------------------------------------------
# Mean Squared Error (MSE) loss function
criterion = nn.MSELoss()

# Stochastic Gradient Descent optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# ---------------------------------------------------
# 4Ô∏è‚É£ Training Loop
# ---------------------------------------------------
epochs = 400
for epoch in range(epochs):

    # üß≠ Forward Pass ‚Äî Generate Predictions
    y_pred = model(X)

    # üìâ Compute Loss
    l = criterion(Y, y_pred)

    # üßÆ Backward Pass ‚Äî Compute Gradients Automatically
    l.backward()

    # ‚öôÔ∏è Update Weights and Bias (handled by optimizer)
    optimizer.step()

    # üßπ Reset gradients before next iteration
    optimizer.zero_grad()

    # ü™Ñ Log progress every 20 epochs
    if epoch % 20 == 0:
        [w, b] = model.parameters()
        print(f"Epoch {epoch+1:03d}: w = {w[0][0].item():.3f}, b = {b.item():.3f}, loss = {l:.8f}")

# ---------------------------------------------------
# 5Ô∏è‚É£ After Training ‚Äî Evaluate Model
# ---------------------------------------------------
print(f"Prediction after training: f(5) = {model(X_test).item():.3f}")


Samples: 4, Features: 1
Prediction before training: f(5) = -3.002
Epoch 001: w = -0.400, b = 0.984, loss = 46.45131302
Epoch 021: w = 1.435, b = 1.503, loss = 0.41872454
Epoch 041: w = 1.512, b = 1.431, loss = 0.34410033
Epoch 061: w = 1.542, b = 1.348, loss = 0.30519095
Epoch 081: w = 1.568, b = 1.269, loss = 0.27069762
Epoch 101: w = 1.593, b = 1.195, loss = 0.24010263
Epoch 121: w = 1.617, b = 1.126, loss = 0.21296556
Epoch 141: w = 1.639, b = 1.060, loss = 0.18889576
Epoch 161: w = 1.660, b = 0.999, loss = 0.16754624
Epoch 181: w = 1.680, b = 0.940, loss = 0.14860973
Epoch 201: w = 1.699, b = 0.886, loss = 0.13181348
Epoch 221: w = 1.716, b = 0.834, loss = 0.11691565
Epoch 241: w = 1.733, b = 0.786, loss = 0.10370149
Epoch 261: w = 1.748, b = 0.740, loss = 0.09198096
Epoch 281: w = 1.763, b = 0.697, loss = 0.08158500
Epoch 301: w = 1.777, b = 0.656, loss = 0.07236403
Epoch 321: w = 1.790, b = 0.618, loss = 0.06418534
Epoch 341: w = 1.802, b = 0.582, loss = 0.05693090
Epoch 361: w =

# Example4 (defining custom model)

Here we will define a custom linear model and run the same demo.

In [40]:
"""
üìò Linear Regression using PyTorch (High-Level API)
--------------------------------------------------
This script demonstrates how PyTorch‚Äôs built-in modules
automate most of the training process for a linear regression model.

We‚Äôre modeling the relationship:
    y = 2 * x

Goal:
    Learn the optimal weight (w) and bias (b) that best fit the data.

üöÄ Key Improvement Over Previous Version:
-----------------------------------------
‚úÖ In the earlier version, we manually defined the forward pass,
   computed the loss, and updated the parameter `w` using autograd.
‚úÖ In this version, **PyTorch automates everything**:
   - `nn.Linear` handles model definition (w and b)
   - `nn.MSELoss` handles loss computation
   - `torch.optim.SGD` handles weight updates

This represents a fully modular, high-level approach compared to
manual gradient descent.
"""

import torch
import torch.nn as nn

# ---------------------------------------------------
# 1Ô∏è‚É£ Define Training and Test Data
# ---------------------------------------------------
X = torch.tensor([[1], [2], [3], [4]], dtype=torch.float32)
Y = torch.tensor([[2], [4], [6], [8]], dtype=torch.float32)
X_test = torch.tensor([5], dtype=torch.float32)

# Extract input/output sizes
n_samples, n_features = X.shape
print(f"Samples: {n_samples}, Features: {n_features}")

input_size = n_features
output_size = n_features

# ---------------------------------------------------
# 2Ô∏è‚É£ Define Model
# ---------------------------------------------------
# nn.Linear automatically creates weight and bias parameters
# model = nn.Linear(in_features=input_size,
#                   out_features=output_size,
#                   bias=True)
class LinearRegression(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(LinearRegression, self).__init__()
        # define our layers
        self.lin = nn.Linear(input_dim, output_dim)

    def forward(self, x):
        return self.lin(x)

model = LinearRegression(input_size, output_size)

print(f"Prediction before training: f(5) = {model(X_test).item():.3f}")

# ---------------------------------------------------
# 3Ô∏è‚É£ Define Loss and Optimizer
# ---------------------------------------------------
# Mean Squared Error (MSE) loss function
criterion = nn.MSELoss()

# Stochastic Gradient Descent optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# ---------------------------------------------------
# 4Ô∏è‚É£ Training Loop
# ---------------------------------------------------
epochs = 400
for epoch in range(epochs):

    # üß≠ Forward Pass ‚Äî Generate Predictions
    y_pred = model(X)

    # üìâ Compute Loss
    l = criterion(Y, y_pred)

    # üßÆ Backward Pass ‚Äî Compute Gradients Automatically
    l.backward()

    # ‚öôÔ∏è Update Weights and Bias (handled by optimizer)
    optimizer.step()

    # üßπ Reset gradients before next iteration
    optimizer.zero_grad()

    # ü™Ñ Log progress every 20 epochs
    if epoch % 20 == 0:
        [w, b] = model.parameters()
        print(f"Epoch {epoch+1:03d}: w = {w[0][0].item():.3f}, b = {b.item():.3f}, loss = {l:.8f}")

# ---------------------------------------------------
# 5Ô∏è‚É£ After Training ‚Äî Evaluate Model
# ---------------------------------------------------
print(f"Prediction after training: f(5) = {model(X_test).item():.3f}")


Samples: 4, Features: 1
Prediction before training: f(5) = 0.942
Epoch 001: w = 0.472, b = 0.038, loss = 24.80678368
Epoch 021: w = 1.805, b = 0.458, loss = 0.05376185
Epoch 041: w = 1.848, b = 0.443, loss = 0.03299318
Epoch 061: w = 1.858, b = 0.417, loss = 0.02925432
Epoch 081: w = 1.866, b = 0.393, loss = 0.02594792
Epoch 101: w = 1.874, b = 0.370, loss = 0.02301523
Epoch 121: w = 1.881, b = 0.349, loss = 0.02041401
Epoch 141: w = 1.888, b = 0.328, loss = 0.01810675
Epoch 161: w = 1.895, b = 0.309, loss = 0.01606024
Epoch 181: w = 1.901, b = 0.291, loss = 0.01424512
Epoch 201: w = 1.907, b = 0.274, loss = 0.01263509
Epoch 221: w = 1.912, b = 0.258, loss = 0.01120705
Epoch 241: w = 1.917, b = 0.243, loss = 0.00994039
Epoch 261: w = 1.922, b = 0.229, loss = 0.00881692
Epoch 281: w = 1.927, b = 0.216, loss = 0.00782038
Epoch 301: w = 1.931, b = 0.203, loss = 0.00693652
Epoch 321: w = 1.935, b = 0.191, loss = 0.00615253
Epoch 341: w = 1.939, b = 0.180, loss = 0.00545717
Epoch 361: w = 1