# Neural Architectures

First, we'll discuss Torch Autograd and its significance, along with the implications for how we build neural networks. Then, we'll take a high-level look at the following sample NN architectures: 
+ Dense
+ CNN
+ RNN
+ [VGG](https://arxiv.org/pdf/1409.1556.pdf)
+ [ResNet](https://arxiv.org/pdf/1512.03385.pdf)
+ V/AE

Finally, we'll open up discussion to focus on how to learn more on your own, be it by participating in labs or dissecting research papers.

In [1]:
import torch

## Torch Autograd

PyTorch comes with a package known as `autograd` which serves as the basis for its flexibility and functionality. Autograd overloads many operations on tensors and dynamically builds a computational graph with the initial tensors as leaf nodes all the way until it reaches a terminal value (e.g. loss) which then is back-differentiated through leveraging the built up computational graph (see: Workshop 4).

We begin by creating a torch tensor, and in particular we ensure that the `requires_grad` flag is set to `True`. This prompts autograd to keep track of all the operations which occur on this tensor, and subsequent tensors which rely on it. One point of potential trouble is that none of the operations can be in place. That is to say, you can't write `x = x*3`. Thinking back to the computational graphs we defined before, this would end up representing a self loop, making our graph ill-defined. Instead, we keep track of the operation and add the output as a *new* variable/node.

In [51]:
# Create a differentiable tensor
x = torch.ones(2, 2, requires_grad=True)
print(x)

# Multiply x by 3 and store result as a *new* node y
y = x * 3
print(y)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
tensor([[3., 3.],
        [3., 3.]], grad_fn=<MulBackward0>)


We can chain together as many operations as we want (physical memory permitting) in order to build even the most complex machine learning architectures. For now, we'll stick to a couple simple operations. Remember, we'll be recording these as new variables each time. In particular, we'll culminate our operations in our final variable that we'll call `out`. This is exactly what we'll apply a backward pass through in order to get our partial derivatives.

In [52]:
z = y * y * 3
out = z.mean()

print(z)
print(out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>)
tensor(27., grad_fn=<MeanBackward0>)


It's important to note that in order to actually call `tensor.backward()`, one must ensure that the tensor that is calling the function is a tensor containing only a scalar. That is to say, it shouldn't be a vector, matrix or general k-th order tensor. Here, out is a tensor containing a single scalar, and hence we can call `.backward()` on it!

In [53]:
out.backward()

Great, now the backward pass has completed and the computational graph has its partial derivatives! We've done this for each individual element of x, and so that means we simultaneously computed $$\frac{\partial (out)}{\partial x_{i,j}}\text{ for } 1\leq i,j\leq 2$$. If we had other variables which were used to compute `out` then `out.backward()` will also compute their gradients. In this case, we only used `x`. To access the gradient for `x`, we simply access `x.grad`

In [54]:
print(x.grad)

None


**Aside:** As we mentioned above, in-place operations are not handled by PyTorch, which means that one must instead *clone* the starting tensor into a new one using `.clone()`, then apply in-place operations to the new tensor. An example is shown below.

In [48]:
# Starting tensor
a = torch.ones(2, 2, requires_grad=True)
print(a)

# In-place operation on the starting tensor
a[1,0]=a[1,0]*3
print(a)

# Scalar tensor for gradient calculation
out = a.mean()

# Backward pass
out.backward()

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
tensor([[1., 1.],
        [3., 1.]], grad_fn=<CopySlices>)


RuntimeError: leaf variable has been moved into the graph interior

In [49]:
# Starting tensor
a = torch.ones(2, 2, requires_grad=True)
print(a)

# New tensor using .clone()
A=a.clone()
print(A)

# In-place operation on the cloned tensor
A[1,0]=A[1,0]*3
print(A)

# Scalar tensor for gradient calculation
out = A.mean()

# Backward pass
out.backward()

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
tensor([[1., 1.],
        [1., 1.]], grad_fn=<CloneBackward>)
tensor([[1., 1.],
        [3., 1.]], grad_fn=<CopySlices>)


In [50]:
a.grad

tensor([[0.2500, 0.2500],
        [0.7500, 0.2500]])