# LIS 640 Applied Deep Learning : Computation Graphs in Pytorch

# Set up code

In [1]:
import utils
import torch

In this exercise we will implement two-layer network using a more modular approach. For each layer we will implement a `forward` and a `backward` function. The `forward` function will receive inputs, weights, and other parameters and will return both an output and a `cache` object storing data needed for the backward pass, like this:

```python
def forward(x, w):
  """ Receive inputs x and weights w """
  # Do some computations ...
  z = # ... some intermediate value
  # Do some more computations ...
  out = # the output
   
  cache = (x, w, z, out) # Values we need to compute gradients
   
  return out, cache
```

The backward pass will receive upstream derivatives and the `cache` object, and will return gradients with respect to the inputs and weights, like this:

```python
def backward(dout, cache):
  """
  Receive dout (derivative of loss with respect to outputs) and cache,
  and compute derivative with respect to inputs.
  """
  # Unpack cache values
  x, w, z, out = cache
  
  # Use values in cache to compute derivatives
  dx = # Derivative of loss with respect to x
  dw = # Derivative of loss with respect to w
  
  return dx, dw
```

After implementing a bunch of layers this way, we will be able to easily combine them to build classifiers with different architectures. Your task here is to implement `ReLU` activation function with modular approach.


To validate our implementation, we will compare the analytically computed gradients with numerical approximations of the gradient as done in previous assignments. You can inspect the numeric gradient function `utils.compute_numeric_gradient`. Please note that we have updated the function to accept upstream gradients to allow us to debug intermediate layers easily.
  

# ReLU activation function

We will now implement the ReLU activation function. As above, we will define a class with two empty static methods, and implement them in upcoming cells. The class structure can be found in `problem3.py`

## ReLU activation: forward
Implement the forward pass for the ReLU activation function in the `ReLU.forward` function. You **should not** change the input tensor with an in-place operation. 

Run the following to test your implementation of the ReLU forward pass. Your errors should be less than `1e-7`.

In [2]:
from problem3 import ReLU

utils.reset_seed(0)
x = torch.linspace(-0.5, 0.5, steps=12, dtype=torch.float64, device='cuda')
x = x.reshape(3, 4)

out, _ = ReLU.forward(x)
correct_out = torch.tensor([[ 0.,          0.,          0.,          0.,        ],
                            [ 0.,          0.,          0.04545455,  0.13636364,],
                            [ 0.22727273,  0.31818182,  0.40909091,  0.5,       ]],
                            dtype=torch.float64,
                            device='cuda')

# Compare your output with ours. The error should be on the order of e-8
print('Testing ReLU.forward function:')
print('difference: ', utils.rel_error(out, correct_out))

Testing ReLU.forward function:
difference:  4.5454545613554664e-09


## ReLU activation: backward
Now implement the backward pass for the ReLU activation function.

Again, you should not change the input tensor with an in-place operation.

Run the following to test your implementation of `ReLU.backward`. Your errors should be less than `1e-8`.

In [3]:
from problem3 import ReLU

utils.reset_seed(0)
x = torch.randn(10, 10, dtype=torch.float64, device='cuda')
dout = torch.randn(*x.shape, dtype=torch.float64, device='cuda')

dx_num = utils.compute_numeric_gradient(lambda x: ReLU.forward(x)[0], x, dout)

_, cache = ReLU.forward(x)
dx = ReLU.backward(dout, cache)

# The error should be on the order of e-12
print('Testing ReLU.backward function:')
print('dx error: ', utils.rel_error(dx_num, dx))

Testing ReLU.backward function:
dx error:  2.6317796097761553e-10


# Two-layer network
In the previous problem2 you implemented a two-layer neural network in a single monolithic class. Now that you have implemented modular versions of the necessary layers, you will reimplement the two layer network using these modular implementations.

Complete the implementation of the `TwoLayerNet` class. This class will serve as a model for the other networks you will implement in this assignment, so read through it to make sure you understand the API. 

Once you have finished implementing the forward and backward passes of your two-layer net, run the following to test your implementation:

In [4]:
from problem3 import TwoLayerNet
torch.set_printoptions(precision=12, threshold=None, edgeitems=None, linewidth=None, profile=None)

utils.reset_seed(0)
N, D, H, C = 3, 5, 50, 7
X = torch.randn(N, D, dtype=torch.float64, device='cuda')
y = torch.randint(C, size=(N,), dtype=torch.int64, device='cuda')

std = 1e-3
model = TwoLayerNet(
          input_dim=D,
          hidden_dim=H,
          num_classes=C,
          weight_scale=std,
          dtype=torch.float64,
          device='cuda'
        )

print('Testing initialization ... ')
W1_std = torch.abs(model.params['W1'].std() - std)
W2_std = torch.abs(model.params['W2'].std() - std)
assert W1_std < std / 10, 'First layer weights do not seem right'
assert W2_std < std / 10, 'Second layer weights do not seem right'

print('Testing test-time forward pass ... ')
model.params['W1'] = torch.linspace(-0.7, 0.3, steps=D * H, dtype=torch.float64, device='cuda').reshape(D, H)
model.params['W2'] = torch.linspace(-0.3, 0.4, steps=H * C, dtype=torch.float64, device='cuda').reshape(H, C)
X = torch.linspace(-5.5, 4.5, steps=N * D, dtype=torch.float64, device='cuda').reshape(D, N).t()
scores = model.loss(X)
correct_scores = torch.tensor(
        [[ 8.56847057,  9.12177260,  9.67507463, 10.22837667, 10.78167870,
         11.33498073, 11.88828277],
        [ 9.09451046,  9.57617926, 10.05784805, 10.53951685, 11.02118564,
         11.50285444, 11.98452323],
        [ 9.62055036, 10.03058591, 10.44062147, 10.85065703, 11.26069259,
         11.67072814, 12.08076370]],
    dtype=torch.float64, device='cuda')
scores_diff = torch.abs(scores - correct_scores).sum()
assert scores_diff < 1e-6, 'Problem with test-time forward pass'

print('Testing training loss (no regularization)')
y = torch.tensor([0, 5, 1])
loss, grads = model.loss(X, y)
correct_loss = 2.881451052641
assert abs(loss - correct_loss) < 1e-10, 'Problem with training-time loss'

# Errors should be around e-6 or less
print('Running numeric gradient check:')
loss, grads = model.loss(X, y)

for name in sorted(grads):
  f = lambda _: model.loss(X, y)[0]
  grad_num = utils.compute_numeric_gradient(f, model.params[name])
  print('%s relative error: %.2e' % (name, utils.rel_error(grad_num, grads[name])))

Testing initialization ... 
Testing test-time forward pass ... 
Testing training loss (no regularization)
Running numeric gradient check:
W1 relative error: 1.87e-07
W2 relative error: 1.46e-09
