# PyTorch Introduction

This notebook will cover:

* Tensors in PyTorch
* Autograd - PyTorch's built in differentiation & gradient descent engine
* CUDA GPU Acceleration

---

## Notebook Setup

In [2]:
import torch
import numpy as np

## Initialising Tensors

In [3]:
x = torch.ones(3, 2)
print(x)
x = torch.zeros(3, 2)
print(x)
x = torch.rand(3, 2)
print(x)
x = torch.randn(3,2)
print(x)

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])
tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])
tensor([[0.4267, 0.9134],
        [0.2652, 0.5657],
        [0.5505, 0.6887]])
tensor([[ 0.6811, -0.8585],
        [-0.9085,  0.2148],
        [-0.9884, -1.3167]])


`ones()` and `zeros()` are both self-explanatory.

`rand()` generates flots in the range (0,1) inclusive 

`randn()` samples the Standard Normal Distribution (with mean 0 and variance 1).

Next up: torch.empty() returns garbage data of the specified data type, torch.zeros_like(A) returns a zeros tensor of shape matching that of A.

In [4]:
x = torch.empty(2,2)
print(x)
y = torch.zeros_like(x)
print(y)

tensor([[1.1210e-44, 0.0000e+00],
        [0.0000e+00, 0.0000e+00]])
tensor([[0., 0.],
        [0., 0.]])



### Tensor Operations

We can trivially use simple matrix additions, subtractions and dot products.

In [5]:
x = torch.ones(2,2)
y = torch.ones(2,2)

In [6]:
z = torch.mm(x,y)
print(z)
z = x + y  # We can use infix operators or functions
print(z)
z = torch.add(x,y)
print(z)

tensor([[2., 2.],
        [2., 2.]])
tensor([[2., 2.],
        [2., 2.]])
tensor([[2., 2.],
        [2., 2.]])


### Inplace Operations

Appending a '`_`' (underscore) to an operation makes it operate in-place rather than returning new.
E.g. `z = torch.add_(x,y)` returns Z but also augments x.

---

# Autograd

Autograd is the workhorse of PyTorch. Autograd allows us to constantly track the gradients of tensors, and automatically calculate them at every step.

PyTorch cleverly contains the gradient information within the Tensor data type themselves, enabled using the parameter `.requires_grad = True`. Note that Autograd only works on tensors of type `float`.



In [7]:
x = torch.randn(2,2,requires_grad=True)

# Alternatively, we can also specify requires_grad after the fact

x = np.array([1.,2.,3.])
x = torch.from_numpy(x)
x.requires_grad_(True) # Notice the inplace operator

tensor([1., 2., 3.], dtype=torch.float64, requires_grad=True)

Autograd tracks a tensor's gradient by recording all the operations performed on the tensor in a graph form. This graph is called a Dynamic Computational Graph (DCG). The graph stores the backwards version of what operations happen to the tensor, because backprop works backwards.

Here is a simple DCG for multiplication of two tensors:


![title](img/dcg_simple.png)


Let's examine the elements of each node:

Data: The data each variable is holding. I.e. the tensor

requires_grad: If true the DCG tracks all operation history

grad: The grad will be None unless the backward() function is called on *some other node* in the graph, and the gradients have to be chain ruled foward. Note the grad here is with respect to the other node. For example. If you call `z.backward()` on a node `z` which depends on `x`, then the gradient stored under node `x` will be filled in as $\frac{\partial z}{\partial x}$.

On turning `requires_grad = True`, PyTorch will track the reverse operations like so:

![title](img/dcg_grad.png)

Let's see this in action. We won't quite build a NN yet, but we'll play with the differentiation engine itself.

We'll initiate variable x, and then we will define equations for y and z with respect to x.


In [8]:
x = torch.ones([3,2],requires_grad=True)

y = x + 5

z = y*y + 1

`backward`, backprop, can only be called on scalar nodes. I.e - think of outputs to the NN that spit out a number. Backward propagates the gradients that each node in the graph with respect to the output node. I.e - Backward finds the strength & direction that each node has on the specified output node.

In [9]:
t = torch.sum(z)
print(t)
t.backward()

tensor(222., grad_fn=<SumBackward0>)


So we have some output '222'. We want to find the direction & magnitude of the effect that `x` has on our output.
PyTorch has already done this for us.
We can simply call `x.grad` to get that strength of effect.

In [10]:
x.grad

tensor([[12., 12.],
        [12., 12.],
        [12., 12.]])

This is actually returning $\frac{\partial t}{\partial x}$ for x = 1 in all dimensions

### Autograd Toy Problem

Let's use Autograd to learn linear regression:

We will have `N` data points, and each data point the x_n maps to the y_n by the function:
$$ y_n = 3*x_n - 2 $$

In [93]:
learning_rate = 0.01
epochs=200
N = 10

w = torch.rand([N], requires_grad=True)
b = torch.ones([1],requires_grad=True)

print(w)

tensor([0.1941, 0.4434, 0.9535, 0.7798, 0.9168, 0.3934, 0.2934, 0.5543, 0.0208,
        0.7791], requires_grad=True)


In [94]:
for i in range(epochs):

    # Each epoch we start again with the data set
    x = torch.randn([N])
    y = torch.dot(3*torch.ones([N]), x) - 2

    y_hat = torch.dot(w, x) + b
    loss = torch.sum((y_hat - y)**2)
  
    loss.backward() # We want to find the impacts of w and b on loss
  
    with torch.no_grad():
        w -= learning_rate * w.grad
        b -= learning_rate * b.grad
    
    w.grad.zero_()
    b.grad.zero_()

In [95]:
print(torch.mean(w).item(),b.item())

2.970419406890869 -1.9342337846755981


---
### References

1. https://towardsdatascience.com/pytorch-autograd-understanding-the-heart-of-pytorchs-magic-2686cd94ec95