- Understanding and working with PyTorch computation graphs;
- Working with PyTorch tensor objects;
- Solving the classic XOR problem and understanding model capacity;
- Building complex NN models using PyTorch `Sequential` class and the `nn.Module` class;
- Computing gradients using automatic differentiation and `torch.autograd`.

# PyTorch's computation graphs

PyTorch performs its computations based on a directed acyclic graph (DAG).

Let's say that we have rank 0 tensors $a$, $b$, and $c$ and we want to evaluate $z = 2\times(a-b)+c$.

In [1]:
import torch
def compute_z(a,b,c):
    r1 = torch.sub(a,b)
    r2 = torch.mul(r1,2)
    z = torch.add(r2,c)
    return z

print('Scalar inputs: ', compute_z(torch.tensor(1),torch.tensor(2),torch.tensor(3)))
print('Rank 1 inputs: ', compute_z(torch.tensor([1]),torch.tensor([2]),torch.tensor([3])))
print('Rank 2 inputs: ', compute_z(torch.tensor([[1]]),torch.tensor([[2]]),torch.tensor([[3]])))

Scalar inputs:  tensor(1)
Rank 1 inputs:  tensor([1])
Rank 2 inputs:  tensor([[1]])


In PyTorch, a special tensor object for which gradients need to be computed allows us to store and update the parameters
of our models during training. Such a tensor can be created by just assigning `requires_grad` to `True` on user
specified initial values.

In [2]:
a = torch.tensor(3.14, requires_grad=True)
print(a)
b = torch.tensor([1.,2.,3.], requires_grad=True)
print(b)

tensor(3.1400, requires_grad=True)
tensor([1., 2., 3.], requires_grad=True)


`requires_grad` is set to `False` by default but can be turned on afterwards using `.requires_grad_()` on the tensor.

You will recall that for NN models, initializing model parameters with random weights is necessary to break the
symmetry during backpropagation (otherwise, a multilayer NN would be no more useful than a single-layer NN like logistic
regression (If all neurons start with the same weight, they will continue to update in the same way, effectively making
them redundant)).

When creating a PyTorch tensor, we can also use a random initialization scheme, based on many different probability
distributions.

Let's look at how we can create a tensor with Glorot initialization, which is a classic random initialization scheme
that was proposed by Xavier Glorot.

In [3]:
import torch.nn as nn
torch.manual_seed(1)
w = torch.empty(2,3)
nn.init.xavier_normal_(w)
print(w)

tensor([[ 0.4183,  0.1688,  0.0390],
        [ 0.3930, -0.2858, -0.1051]])
