### Welcome to my tutorial of training a torch model from scratch

In [1]:
import torch

In [2]:
# We'll create a "batch" of 10 data points
N = 10
# Each data point has 1 feature
D_in = 1
# The output for each data point is a single value
D_out = 1

# Create our input data X
# Shape: (10 rows, 1 column)
X = torch.randn(N, D_in)


In [5]:
# Create our true target labels y by applying the "true" function
# and adding some noise for realism
true_W = torch.tensor([[2.0]])
true_b = torch.tensor(1.0)
y_true = X @ true_W + true_b + torch.randn(N, D_out) * 0.1 # Add a little noise

print(f"Input Data X (first 3 rows):\n {X[:3]}\n")
print(f"True Labels y_true (first 3 rows):\n {y_true[:3]}")

Input Data X (first 3 rows):
 tensor([[ 2.0452],
        [-0.5060],
        [-1.0434]])

True Labels y_true (first 3 rows):
 tensor([[ 5.0715],
        [ 0.1660],
        [-1.0289]])


## 5.3. The Parameters: The Model's "Brain"
Now, we create the parameters W and b that our model will learn. 
We initialize them with random values. Most importantly, we set requires_grad=True to tell PyTorch's Autograd engine to start tracking them.

In [7]:
# Initialize our parameters with random values
# Shapes must be correct for matrix multiplication: X(10,1) @ W(1,1) -> (10,1)
W = torch.randn(D_in, D_out, requires_grad=True)
b = torch.randn(1, requires_grad=True)

print(f"Initial Weight W:\n {W}\n")
print(f"Initial Bias b:\n {b}")

Initial Weight W:
 tensor([[1.1082]], requires_grad=True)

Initial Bias b:
 tensor([0.9393], requires_grad=True)


## 5.4. The Implementation: From Math to Code
Now for the main event. We translate our mathematical formula ŷ = XW + b directly into a single line of PyTorch code.

In [9]:
 # Perform the forward pass to get our first prediction
y_hat = X @ W + b

print(f"Shape of our prediction y_hat: {y_hat.shape}\n")
print(f"Prediction y_hat (first 3 rows):\n {y_hat[:3]}\n")
print(f"True Labels y_true (first 3 rows):\n {y_true[:3]}")

Shape of our prediction y_hat: torch.Size([10, 1])

Prediction y_hat (first 3 rows):
 tensor([[ 3.2057],
        [ 0.3786],
        [-0.2169]], grad_fn=<SliceBackward0>)

True Labels y_true (first 3 rows):
 tensor([[ 5.0715],
        [ 0.1660],
        [-1.0289]])


### 6.1. Defining Error: The Loss Function

We need a single number that tells us how "wrong" our predictions are. This is called the **Loss**. For regression, the most common loss function is the **Mean Squared Error (MSE)**.

The formula is simple:
`L = (1/N) * Σ(ŷ_i - y_i)²`

In plain English: "For every data point, find the difference between the prediction and the truth, square it, and then take the average of all these squared differences."

Let's translate this directly into PyTorch code, using the `y_hat` from Part 5.

In [10]:
# y_hat is our prediction from the forward pass
# y_true is the ground truth
# Let's calculate the loss manually
error = y_hat - y_true
squared_error = error ** 2
loss = squared_error.mean()

print(f"Prediction (first 3):\n {y_hat[:3]}\n")
print(f"Truth (first 3):\n {y_true[:3]}\n")
print(f"Loss (a single number): {loss}")

Prediction (first 3):
 tensor([[ 3.2057],
        [ 0.3786],
        [-0.2169]], grad_fn=<SliceBackward0>)

Truth (first 3):
 tensor([[ 5.0715],
        [ 0.1660],
        [-1.0289]])

Loss (a single number): 1.1553044319152832


### 6.2. The Magic Command: `loss.backward()`

This is where the magic of Autograd happens. With a single command, we tell PyTorch to send a signal backward from the `loss` through the entire computation graph it built during the forward pass.

This command calculates the gradient of the `loss` with respect to every single parameter that has `requires_grad=True`. In our case, it will compute:
*   `∂L/∂W` (the gradient of the Loss with respect to our Weight `W`)
*   `∂L/∂b` (the gradient of the Loss with respect to our Bias `b`)


In [11]:
loss.backward()  

PyTorch has now populated the `.grad` attribute for our `W` and `b` tensors.


### 6.3. Inspecting the Result: The `.grad` Attribute

The `.grad` attribute now holds the gradient for each parameter. This is the "signal" that tells us how to adjust our knobs.


In [12]:
# The gradients are now stored in the .grad attribute of our parameters
print(f"Gradient for W (∂L/∂W):\n {W.grad}\n")
print(f"Gradient for b (∂L/∂b):\n {b.grad}")


Gradient for W (∂L/∂W):
 tensor([[-2.6305]])

Gradient for b (∂L/∂b):
 tensor([0.0684])


#### **How to Interpret These Gradients:**

*   **`W.grad` is -1.0185:** The negative sign is key. It means that if we were to *increase* `W`, the loss would *decrease*. The gradient points in the direction of the steepest *increase* in loss, so we'll want to move in the opposite direction.
*   **`b.grad` is -2.0673:** Similarly, this tells us that increasing `b` will also decrease the loss.

We now have everything we need to improve our model:
1.  A way to measure error (the loss).
2.  The exact direction to turn our parameter "knobs" to reduce that error (the gradients).

We have completed the analysis. The final step is to actually *act* on this information—to update our weights and biases.

This leads us to the heart of the training process. Let's move on to **Part 7: The Training Loop - Gradient Descent in Action**.
