## Backprop and gradient descent

<img src="pictures/chain rule.png" width="50%">

#### The steps:

<img src="pictures/steps.png" width="50%">

In [2]:
import torch , torch.nn as nn
import numpy as np

Consider a simple neuron with one input and no bias. Then we apply the least square error to compute loss. 

In [None]:
x = torch.tensor(1.0)
w = torch.tensor(1.0, requires_grad=True) # since we need to update w in backprop
y_true = torch.tensor(2.0)

#linear regression
y_pred = w*x
loss = (y_pred - y_true)**2

In [8]:
loss

tensor(1., grad_fn=<PowBackward0>)

This one step: calling `.backward()` will compute the local gradients and assign derivative to w. (and _not_ to intermediate variables such as y_pred, as seen in earlier notebooks.)

In [None]:
loss.backward()

In [11]:
w.grad

tensor(-2.)

In [13]:
# update weights: 
with torch.no_grad():
    w -= 0.1*w.grad

In [None]:
y_pred2 = w*x
loss2 = (y_pred2 - y_true)**2

loss

tensor(0.6400, grad_fn=<PowBackward0>)

Loss has decreased from 1.0 to 0.64 after 1 iteration of backprop. 

In [15]:
loss2.backward()

In [16]:
w.grad

tensor(-3.6000)

Then you can continue..


#### Manual linear regression 

Pytorch allows for automation of all these steps using the mentioned modules, but we will first do a fully manual run on a simple 1 dimensional problem. 

In [38]:
# Linear regression using mean squared error and gradient descent in single dimension

# define dataset
X = np.array([-1,1,2,3,4], dtype=np.float32)
# simple function: y = 2*x 
y = np.array([-2,2,4,6,8], dtype=np.float32)

# w = np.random.rand(1)
w = np.zeros(1)

def predict(w,x):
    return w*x

def loss(y_pred, y_true):
    return ((y_pred-y_true)**2).mean()

def gradient(x, y_true, y_pred):
    return np.mean(2*x*(y_pred - y_true))

# lets run gradient descent: 

print("Before training f(5) = ", (w*5)[0])

n_iters = 40
learning_rate = 0.01

i = 0
for i in range(n_iters):

    # calculate prediction 
    y_pred = predict(w,X)

    # loss
    mse_loss = loss(y_pred, y)

    # calculate grad
    grad = gradient(X, y, y_pred)

    # update w
    w -= learning_rate*grad

    if (i % 4 ==0) :
        print(f"epoch {i}: w = {w} | loss = {mse_loss}")


print("After training f(5) = ", (w*5)[0])




Before training f(5) =  0.0
epoch 0: w = [0.248] | loss = 24.8
epoch 4: w = [0.9683069] | loss = 8.599724336109533
epoch 8: w = [1.39247109] | loss = 2.982066881333652
epoch 12: w = [1.64224692] | loss = 1.0340706907786803
epoch 16: w = [1.7893314] | loss = 0.35857753567528283
epoch 20: w = [1.87594444] | loss = 0.12434144999713344
epoch 24: w = [1.92694791] | loss = 0.04311702393256027
epoch 28: w = [1.95698211] | loss = 0.014951391936026497
epoch 32: w = [1.97466823] | loss = 0.005184590689151571
epoch 36: w = [1.98508299] | loss = 0.001797824625897734
After training f(5) =  9.949862318549542


- True f(5) = 2*x = 2*5 = 10
- Predicted after 40 iterations = 9.94

We can see the loss decrease. 

Now lets automate parts of the above code in numpy with pytorch. 

<img src = 'pictures/pipeline.png' width = '30%'>


### Training pipeline in pytorch: 

<ol>
    <li> Design model (Input, optput size, forward pass)
    <li> Construct loss (mean square, softmax, log loss) and optimizer (gd, sgd, newton etc)
    <li> Training loop:
    <ul>
        <li> Forward pass: computation of y_pred
        <li> backward pass: assign gradient to leaf nodes 
        <li> update weights, baises 
    </ul>
</ol>

In [3]:
# 0) Training samples, watch the shape!
X = torch.tensor([[1], [2], [3], [4]], dtype=torch.float32)
Y = torch.tensor([[2], [4], [6], [8]], dtype=torch.float32)

n_samples, n_features = X.shape
print(f'#samples: {n_samples}, #features: {n_features}')
# 0) create a test sample
X_test = torch.tensor([5], dtype=torch.float32)

# 1) Design Model, the model has to implement the forward pass!
# Here we can use a built-in model from PyTorch
input_size = n_features
output_size = n_features

# we can call this model with samples X
model = nn.Linear(input_size, output_size)

'''
class LinearRegression(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(LinearRegression, self).__init__()
        # define diferent layers
        self.lin = nn.Linear(input_dim, output_dim)

    def forward(self, x):
        return self.lin(x)

model = LinearRegression(input_size, output_size)
'''

print(f'Prediction before training: f(5) = {model(X_test).item():.3f}')

# 2) Define loss and optimizer
learning_rate = 0.01
n_iters = 100

loss = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

# 3) Training loop
for epoch in range(n_iters):
    # predict = forward pass with our model
    y_predicted = model(X)

    # loss
    l = loss(Y, y_predicted)

    # calculate gradients = backward pass
    l.backward()

    # update weights
    optimizer.step()

    # zero the gradients after updating
    optimizer.zero_grad()

    if epoch % 10 == 0:
        [w, b] = model.parameters() # unpack parameters
        print('epoch ', epoch+1, ': w = ', w[0][0].item(), ' loss = ', l)

print(f'Prediction after training: f(5) = {model(X_test).item():.3f}')

#samples: 4, #features: 1
Prediction before training: f(5) = -3.499
epoch  1 : w =  -0.2859164774417877  loss =  tensor(54.8922, grad_fn=<MseLossBackward0>)
epoch  11 : w =  1.417089819908142  loss =  tensor(1.5142, grad_fn=<MseLossBackward0>)
epoch  21 : w =  1.6973741054534912  loss =  tensor(0.1277, grad_fn=<MseLossBackward0>)
epoch  31 : w =  1.7486273050308228  loss =  tensor(0.0867, grad_fn=<MseLossBackward0>)
epoch  41 : w =  1.7628586292266846  loss =  tensor(0.0808, grad_fn=<MseLossBackward0>)
epoch  51 : w =  1.7709583044052124  loss =  tensor(0.0760, grad_fn=<MseLossBackward0>)
epoch  61 : w =  1.7778998613357544  loss =  tensor(0.0716, grad_fn=<MseLossBackward0>)
epoch  71 : w =  1.7844887971878052  loss =  tensor(0.0675, grad_fn=<MseLossBackward0>)
epoch  81 : w =  1.790859341621399  loss =  tensor(0.0635, grad_fn=<MseLossBackward0>)
epoch  91 : w =  1.7970378398895264  loss =  tensor(0.0598, grad_fn=<MseLossBackward0>)
Prediction after training: f(5) = 9.593


#### Explanation:

so we have used a simple linear regression model existing in the torch.nn module -- here it is done using a neural network with no hidden layer, just a single neuron. 

Loss is `MSELoss()` available in the torch.nn module. 

Finally SGD optimizer is available in the `torch.optim` module. 

But we can see that backprop step using .backward() on loss, and weight updation using .step() is automated to our earlier version in numpy. <br>
Weights and biases are stored as list of lists hence accessed as `w[0][0]`and stored in `model.parameters()`