In [2]:
# Dependencies:
import torch
import numpy as np

## Tensors:

Tensors are multi-dimensional arrays with support for `autograd` operations like `backward()`. Tensors in PyTorch are similar to tensors in NumPy, except that they can be used on a GPU. 

### Creating Tensors:

- $\texttt{torch.rand(shape)}$ &mdash; returns a tensor with random values drawn from a uniform distribution on interval $[0, 1)$
- $\texttt{torch.zeros(shape)}$ &mdash; returns a tensor where all components are initialised with zeroes
- $\texttt{torch.tensor(data)}$ &mdash; creates a tensor from the supplied data list/array 


### Tensor Properties:

Tensors have the following methods:
- $\texttt{size()}$ &mdash; returns a tuple-like object containing the tensor's shape
- $\texttt{transpose(dim0, dim1)}$ &mdash; returns a transposed tensor
    - Eg. `x.transpose(0, 1)` transposes a 2D tensor
- $\texttt{reshape(shape)}$ &mdash; returns a reshaped tensor, containing the same data


In [64]:

x = torch.rand((2, 3))
print("===== Original tensor =====")
print(x)
print("Size: {}".format(x.size()))

x = x.transpose(0, 1)
print("===== Transpose =====")
print(x)

print("===== Reshape ====")
x = x.reshape((1, 6))
print(x)

===== Original tensor =====
tensor([[0.1869, 0.9153, 0.8237],
        [0.8429, 0.8415, 0.9888]])
Size: torch.Size([2, 3])
===== Transpose =====
tensor([[0.1869, 0.8429],
        [0.9153, 0.8415],
        [0.8237, 0.9888]])
tensor([[0.1869, 0.8429, 0.9153, 0.8415, 0.8237, 0.9888]])


### Tensor operations:

The elementwise math operators `+`, `-`, `*`, `/` can be used on any two size-compatible tensors.


In [66]:
x = torch.tensor([1.0, 2.0, 3.0])
y = torch.tensor([4.0, 5.0, 6.0])
print("x = {}".format(x))
print("y = {}".format(y))

print("===== Operations =====")
print("x + y = {}".format(x + y))
print("x - y = {}".format(x - y))
print("x * y = {}".format(x * y))
print("x / y = {}".format(x / y))

x = tensor([1., 2., 3.])
y = tensor([4., 5., 6.])
===== Operations =====
x + y = tensor([5., 7., 9.])
x - y = tensor([-3., -3., -3.])
x * y = tensor([ 4., 10., 18.])
x / y = tensor([0.2500, 0.4000, 0.5000])


#### Converting between torch tensor and numpy array:

In [78]:
import numpy as np

print("===== torch.Tensor to numpy.ndarray =====")
x = torch.tensor([1, 2, 3])
print("Torch tensor: {}".format(x))
x = x.numpy()
print("Numpy array:  {}".format(x))

print("===== numpy.ndarray to torch.Tensor =====")
x = np.array([1, 2, 3])
print("Numpy array:  {}".format(x))
x = torch.from_numpy(x)
print("Torch tensor: {}".format(x))

===== torch.Tensor to numpy.ndarray =====
Torch tensor: tensor([1, 2, 3])
Numpy array:  [1 2 3]
===== numpy.ndarray to torch.Tensor =====
Numpy array:  [1 2 3]
Torch tensor: tensor([1, 2, 3])


#### CUDA Tensors:

In [98]:
# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    y = torch.rand((1, 3), device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!

tensor([[1.0929, 2.1365, 3.7604]], device='cuda:0')
tensor([[1.0929, 2.1365, 3.7604]], dtype=torch.float64)


## Autograd &mdash; Automatic Differentiation:

The `autograd` package provides automatic differentiation for all operations on Tensors.

- Enabling tracking:
    - Setting `requires_grad=True` on a new tensor tracks all the computation done on it. Once the computations are done, you can call `.backward()` to have all the gradients computed automatically. 
- Disabling tracking:
    - `.detach()` method prevents computation tracking
    - Wrapping the code block in `with torch.no_grad()` blocks tracking for everything within the block. Useful for when we're testing the model rather than training

#### Computation tracking example:

In [12]:
x = torch.ones(2, 2, requires_grad=True)
print("x = {}".format(x))

y = x + 2
print("y = {}".format(y))
print("y.grad_fn = {}".format(y.grad_fn))   # y was created as a result of an operation on x, so it has a grad_fn

z = y * y * 3
print("z = {}".format(z))
print("z.grad_fn = {}".format(z.grad_fn))

theta = z.mean()
print("Theta of z = {}".format(theta))
print("Theta.grad_fn = {}".format(theta.grad_fn))

# Doing backpropagation:  (note that calling backward() is only valid on a scalar)
theta.backward()

print("dθ/dx = {}".format(x.grad))


x = tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
y = tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
y.grad_fn = <AddBackward0 object at 0x7f785d98e1f0>
z = tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>)
z.grad_fn = <MulBackward0 object at 0x7f785d9bc250>
Theta of z = 27.0
Theta.grad_fn = <MeanBackward0 object at 0x7f785d9bc3a0>
dθ/dx = tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


$$
\begin{split}
    \frac{\partial \theta}{\partial x} & = \frac{\partial \theta}{\partial z} \cdot \frac{\partial z}{\partial y} \cdot \frac{\partial y}{\partial x} \\
    & = 
        \begin{pmatrix}
            \frac{1}{4} & \frac{1}{4}\\
            \frac{1}{4} & \frac{1}{4}
        \end{pmatrix}
     \cdot 6\big(
        \begin{pmatrix}
            x_{11} & x_{12} \\
            x_{21} & x_{22}
        \end{pmatrix}
     +
        \begin{pmatrix}
            2 & 2 \\
            2 & 2
        \end{pmatrix}
    \big) \cdot 
        \begin{pmatrix}
            1 & 1 \\
            1 & 1
        \end{pmatrix}
    \\
    & = 
        \begin{pmatrix}
            \frac{1}{4} & \frac{1}{4}\\
            \frac{1}{4} & \frac{1}{4}
        \end{pmatrix}
    \cdot 6\big(
        \begin{pmatrix}
            1 & 1 \\
            1 & 1
        \end{pmatrix}
    +
        \begin{pmatrix}
            2 & 2 \\
            2 & 2
        \end{pmatrix}
    \big) \cdot 
        \begin{pmatrix}
            1 & 1 \\
            1 & 1
        \end{pmatrix} \\
    & = 
        \begin{pmatrix}
            4.5 & 4.5 \\
            4.5 & 4.5 
        \end{pmatrix}
\end{split}
$$

### Network Example: 

Below is a network that classifies handwritten digits.

<img src="images/pytorch-sample-network.png" width="100%">


In [30]:
import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        
        # Convolutional layers
        self.conv1 = nn.Conv2d(1, 6, 3)  # 1 input image channel, 6 output channels, 3x3 convolution kernel
        self.conv2 = nn.Conv2d(6, 16, 3) # 6 input channels, 16 output channels, 3x3 kernels
        
        # Fully connected layers
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    # Note: the backward() function is automatically defined by autograd
    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) # Max pooling over a (2, 2) window on conv1
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)      # Max pooling over a (2, 2) window on conv2
        x = x.view(-1, self.num_flat_features(x))       # Flattening(?)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = Net()
print(net)

# Network parameters are accessible under net.parameters()
print("Network parameters")
params = list(net.parameters())
for each_layer in net.parameters():
    print("    {}".format(each_layer.size()))


Net(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)
Network parameters
    torch.Size([6, 1, 3, 3])
    torch.Size([6])
    torch.Size([16, 6, 3, 3])
    torch.Size([16])
    torch.Size([120, 576])
    torch.Size([120])
    torch.Size([84, 120])
    torch.Size([84])
    torch.Size([10, 84])
    torch.Size([10])


#### Predicting:

In [41]:
# Making a prediction, zeroing the gradients, then backpropagating:

input = torch.randn(1, 1, 32, 32)
out = net(input)
print("Output layer: {}".format(out))

net.zero_grad()   # Need to zero the gradients prior to backpropagation
out.backward(torch.randn(1, 10))

Output layer: tensor([[ 0.0034,  0.0779, -0.0674,  0.0440,  0.0717,  0.0629, -0.1683,  0.0298,
         -0.1209,  0.0345]], grad_fn=<AddmmBackward>)


#### Computing Loss:

In [42]:
output = net(input)
target = torch.randn(10)     # a dummy target, for example
target = target.view(1, -1)  # Flatten it to the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)

print(loss)

tensor(0.7599, grad_fn=<MseLossBackward>)


This is the sequence of computations in a forward pass:

<img src="images/pytorch-sample-feedforward-sequence.png" width="50%">


#### Backpropagation:
Now, calling `loss.backward()`, the whole computational graph is differentiated.

In [43]:
net.zero_grad()     # zeroes the gradient buffers of all parameters

print('=== conv1.bias.grad before backward() ===')
print(net.conv1.bias.grad)

loss.backward()

print('=== conv1.bias.grad after backward() ===')
print(net.conv1.bias.grad)

=== conv1.bias.grad before backward() ===
tensor([0., 0., 0., 0., 0., 0.])
=== conv1.bias.grad after backward() ===
tensor([-0.0092,  0.0256,  0.0169, -0.0281, -0.0075,  0.0155])


#### Using optimisers:
All that's left to do at this point is to update the weights using an optimiser.

In [None]:
import torch.optim as optim

optimiser = optim.SGD(net.parameters(), lr=0.01)

# This goes in the training loop:
optimiser.zero_grad()
output = net(input)
loss = criterion(ouptut, target)
loss.backward()
optimiser.step()      # step() proceeds with the update

<hr style="height: 2px;" />

## <a href="https://pytorch.org/docs/stable/nn.html">`Torch.nn`</a>

`nn.Module` &mdash; base class for all neural network modules. Convenient for encapsulating parameters, keep track of state and has helpers for moving them to the GPU
- `parameters()` &mdash; returns an iterator containing the network's parameters
- `zero_grad()` &mdash; zeroes the gradient buffers of all parameters. It's necessary to zero the gradients prior to backpropagation because PyTorch accumulates gradients on subsequent backward passes by default


#### Loss Functions:
There are several different error functions available in the `nn` package. They all take in an $\texttt{(prediction, target)}$ pair and give back a value that indicates the magnitude of prediction error
- `MSELoss` &mdash; mean squared error
- `CrossEntropyLoss` &mdash; cross entropy error

### Optimisers (from `torch.optim`):
- `SGD(net.parameters(), lr, momentum)`
- `Adam([var, var2], lr)`

#### Optimiser methods:
- `zero_grad()`
- `step()` &mdash; updates the network parameters. Called once the gradients have been computed by `backward()`


### Building Blocks:

- `Linear(in_size, out_size)` &mdash; applies linear transformation: $y=xA^T+b$
- `Conv2d(in_channels, out_channels, kernel_size, stride, padding)` &mdash; a 2D convolutional layer
- `MaxPool2d()`
