# Error Backpropagation

In this homework, our goal is to test two approaches to implement backpropagation in neural networks. The neural network we consider is depicted below:

![](files/net.svg.png)

## Exercise 1: Implementing backpropagation (20 P)

The following code loads the data and current parameters, applies the neural network forward pass, and computes the error. Pre-activations at each layer are stored in a list so that they can be reused for the backward pass.

In [10]:
import numpy,utils

# 1. Get the data and parameters

X,T = utils.getdata()
W,B = utils.getparams()

# 2. Run the forward pass
Z1 = X.dot(W[0])+B[0]
A1 = numpy.maximum(0,Z1)
Z2 = A1.dot(W[1])+B[1]
A2 = numpy.maximum(0,Z2)
Z3 = A2.dot(W[2])+B[2]
A3 = numpy.maximum(0,Z3)
Y  = A3.dot(W[3])+B[3];

# 3. Compute the error

err = ((Y-T)**2).mean()

Here, you are asked to implement the backward pass, and obtain the gradient with respect to the weight and bias parameters.

**Task:**

 * **Write code that computes the gradient (and format it in the same way as the parameters themselves, i.e. as lists of arrays).**

In [2]:
#pip install torch

In [27]:
import numpy as np
import torch
import torch.nn

def exercise1(W,B,X,Z1,A1,Z2,A2,Z3,A3,Y,T):
    
    W1,W2,W3,W4 = W
    
    DY = 2*(Y - T)
    
    DB4 = DY.mean(axis = 0)
    DW4 = A3.T.dot(DY) / len(X)
    
    DZ3 = DY.dot(W4.T)*(Z3 > 0)
    
    DB3 = DZ3.mean(axis = 0)
    DW3 = A2.T.dot(DZ3) / len(X)
    
    DZ2 = DZ3.dot(W3.T) * (Z2 > 0)
    
    DB2 = DZ2.mean(axis = 0)
    DW2 = A1.T.dot(DZ2) / len(X)
    
    DZ1 = DZ2.dot(W2.T)*(Z1 > 0)
    
    DB1 = DZ1.mean(axis = 0)
    DW1 = X.T.dot(DZ1) / len(X)
    
    #bias unimportant?
    
    return [DW1, DW2, DW3, DW4], [DB1, DB2, DB3, DB4]
    
    



DW,DB = exercise1(W,B,X,Z1,A1,Z2,A2,Z3,A3,Y,T)


To test the implementation, we print the gradient w.r.t. the first parameter in the first layer.

In [28]:
print(numpy.linalg.norm(DW[0][0,0]))

1.542282152339245


## Exercise 2: Using Automatic Differentiation (10 P)

Because manual computation of gradients can be tedious and error-prone, it is now more common to use libraries that perform automatic differentiation. In this exercise, we make use of the PyTorch library. You are then asked to compute the error of the neural network within that framework, and this error can then be automatically differentiated.

In [53]:
import torch
import torch.nn as nn

# 1. Get the data and parameters

X,T = utils.getdata()
W,B = utils.getparams()

# 2. Convert to PyTorch objects

X = torch.Tensor(X)
T = torch.Tensor(T)
W = [nn.Parameter(torch.Tensor(w)) for w in W]
B = [nn.Parameter(torch.Tensor(b)) for b in B]

**Task:**

 * **Write code that computes the forward pass and the error in a way that can be differentiated automatically by PyTorch.**

In [54]:
def exercise2(W,B,X,T):
    
    W1,W2,W3,W4 = W
    B1,B2,B3,B4 = B
    

    Z1 = torch.mm(X, W1) + B1  # matrix multiplication
    A1 = torch.clamp(Z1, min = 0) # rectified linear unit
    
    Z2 = torch.mm(A1, W2) + B2
    A2 = torch.clamp(Z2, min = 0)
    
    Z3 = torch.mm(A2, W3) + B3
    A3 = torch.clamp(Z3, min =0)
    
    Y = torch.mm(A3, W4) + B4
    
    return ((Y - T) ** 2).mean()
    
    
    
    
err = exercise2(W,B,X,T)


Now that the error has been computed, we can apply automatic differentiation to get the parameters. Like for the first exercise, we print the gradient of the first weight parameter of the first layer.

In [58]:
err.backward(retain_graph=True)


print(numpy.linalg.norm(W[0].grad[0,0]))

W[0].grad = None
W[1].grad = None
W[2].grad = None
W[3].grad = None

#print(W[0].grad.shape)

1.5422822


Here, we can verify that the value of the gradient obtained by manual and automatic differentiation are the same.

In [32]:
import torch
import math

dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0")  # Uncomment this to run on GPU

# Create Tensors to hold input and outputs.
# By default, requires_grad=False, which indicates that we do not need to
# compute gradients with respect to these Tensors during the backward pass.
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# Create random Tensors for weights. For a third order polynomial, we need
# 4 weights: y = a + b x + c x^2 + d x^3
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
a = torch.randn((), device=device, dtype=dtype, requires_grad=True)
b = torch.randn((), device=device, dtype=dtype, requires_grad=True)
c = torch.randn((), device=device, dtype=dtype, requires_grad=True)
d = torch.randn((), device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(2000):
    # Forward pass: compute predicted y using operations on Tensors.
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Compute and print loss using operations on Tensors.
    # Now loss is a Tensor of shape (1,)
    # loss.item() gets the scalar value held in the loss.
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())

    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Tensors with requires_grad=True.
    # After this call a.grad, b.grad. c.grad and d.grad will be Tensors holding
    # the gradient of the loss with respect to a, b, c, d respectively.
    loss.backward()

    # Manually update weights using gradient descent. Wrap in torch.no_grad()
    # because weights have requires_grad=True, but we don't need to track this
    # in autograd.
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad

        # Manually zero the gradients after updating weights
        a.grad = None
        b.grad = None
        c.grad = None
        d.grad = None

print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

99 5970.38037109375
199 3954.881103515625
299 2620.96875
399 1738.095458984375
499 1153.7125244140625
599 766.8773193359375
699 510.79052734375
799 341.24688720703125
899 228.98988342285156
999 154.65667724609375
1099 105.4305419921875
1199 72.8281478881836
1299 51.233089447021484
1399 36.927330017089844
1499 27.449447631835938
1599 21.169198989868164
1699 17.00717544555664
1799 14.248553276062012
1899 12.419769287109375
1999 11.207252502441406
Result: y = 0.011601278558373451 + 0.8104267120361328 x + -0.002001413842663169 x^2 + -0.08674260228872299 x^3
