# Compute every step manually

![image.png](attachment:image.png)

In [1]:
import numpy as np

In [2]:
# Linear regression
# f = w * x 
# here : f = 2 * x
X = np.array([1, 2, 3, 4], dtype=np.float32)
Y = np.array([2, 4, 6, 8], dtype=np.float32)

In [3]:
w = 0.0

In [4]:
# model output
def forward(x):
    return w * x

In [5]:
# loss = MSE
def loss(y, y_pred):
    return ((y_pred - y)**2).mean()

To calculate $\frac{dJ}{dw}$, we'll differentiate the given function $ J $ with respect to $ w $. The function $ J $ is:

$$ J = \frac{1}{N} (wx - y)^2 $$

Here, $ J $ is a function of $ w $, $ x $, and $ y $, but we consider $ x $ and $ y $ to be constants when differentiating with respect to $ w $. Let's go through the steps:

1. **Rewrite the function for clarity:**

$$ J = \frac{1}{N} (wx - y)^2 $$

2. **Apply the chain rule:**

To differentiate $ J $ with respect to $ w $, we'll use the chain rule. Let $ u = wx - y $. Then:

$$ J = \frac{1}{N} u^2 $$

$$ \frac{dJ}{dw} = \frac{dJ}{du} \cdot \frac{du}{dw} $$

3. **Differentiate $ J $ with respect to $ u $:**

$$ \frac{dJ}{du} = \frac{1}{N} \cdot 2u = \frac{2}{N} (wx - y) $$

4. **Differentiate $ u $ with respect to $ w $:**

$$ \frac{du}{dw} = x $$

5. **Combine the results using the chain rule:**

$$ \frac{dJ}{dw} = \frac{2}{N} (wx - y) \cdot x $$

So, the derivative of $ J $ with respect to $ w $ is:

$$ \frac{dJ}{dw} = \frac{2}{N} x (wx - y) $$

This is the gradient of the loss function $ J $ with respect to the weight $ w $.

In [6]:
# J = MSE = 1/N * (w*x - y)**2
# dJ/dw = 1/N * 2x(w*x - y)
def gradient(x, y, y_pred):
    return np.mean(2*x*(y_pred - y))

In [7]:
print(f'Prediction before training: f(5) = {forward(5):.3f}')

Prediction before training: f(5) = 0.000


In [10]:
# Training
learning_rate = 0.01
n_iters = 30

for epoch in range(n_iters):
    # predict = forward pass
    y_pred = forward(X)

    # loss
    l = loss(Y, y_pred)
    
    # calculate gradients
    dw = gradient(X, Y, y_pred)

    # update weights
    w -= learning_rate * dw

    if epoch % 1 == 0:
        print(f'epoch {epoch+1}: w = {w:.3f}, loss = {l:.8f}')

epoch 1: w = 1.934, loss = 0.04506905
epoch 2: w = 1.944, loss = 0.03256244
epoch 3: w = 1.952, loss = 0.02352631
epoch 4: w = 1.960, loss = 0.01699772
epoch 5: w = 1.966, loss = 0.01228092
epoch 6: w = 1.971, loss = 0.00887291
epoch 7: w = 1.975, loss = 0.00641072
epoch 8: w = 1.979, loss = 0.00463173
epoch 9: w = 1.982, loss = 0.00334642
epoch 10: w = 1.985, loss = 0.00241778
epoch 11: w = 1.987, loss = 0.00174685
epoch 12: w = 1.989, loss = 0.00126211
epoch 13: w = 1.991, loss = 0.00091188
epoch 14: w = 1.992, loss = 0.00065882
epoch 15: w = 1.993, loss = 0.00047601
epoch 16: w = 1.994, loss = 0.00034391
epoch 17: w = 1.995, loss = 0.00024848
epoch 18: w = 1.996, loss = 0.00017952
epoch 19: w = 1.996, loss = 0.00012971
epoch 20: w = 1.997, loss = 0.00009371
epoch 21: w = 1.997, loss = 0.00006770
epoch 22: w = 1.998, loss = 0.00004892
epoch 23: w = 1.998, loss = 0.00003534
epoch 24: w = 1.998, loss = 0.00002554
epoch 25: w = 1.999, loss = 0.00001845
epoch 26: w = 1.999, loss = 0.0000

In [9]:
print(f'Prediction after training: f(5) = {forward(5):.3f}')

Prediction after training: f(5) = 9.612
