<a href="https://colab.research.google.com/github/ckraju/examples/blob/master/FMML_M10_Lab1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to pytorch

In this lab, we go over the basics of [pytorch](https://pytorch.org/), an open source ML framework  

In [None]:
import numpy as np

# the module is named `torch`
import torch

## Tensors

At its core, PyTorch is a library for processing tensors.  
A tensor is a number, vector, matrix, or any n-dimensional array. It is equivalent to an `np.ndarray`  

In [None]:
def show_tensor(t):
    print("dtype:", t.dtype)
    print("size:", t.size())
    print(t)

In [None]:
# Number
t0 = torch.tensor(4)
show_tensor(t0)

In [None]:
# Vector
t1 = torch.tensor([1.0, 2.0, 3.0, 4.0])
show_tensor(t1)

Notice that torch used 64 bits for integer and 32 bits for float by default

In [None]:
# Matrix
t2 = torch.tensor(
    [
        [1.0, 2.0, 3.0],
        [4.0, 5.0, 6.0],
        [7.0, 8.0, 9.0],
    ]
)
show_tensor(t2)

In [None]:
# 3d array
t3 = torch.tensor(
    [
        [[1, 2], [3, 4]],
        [[5, 6], [7, 8]],
    ]
)
show_tensor(t3)

Like numpy, torch expects it's tensors to be of regular shape

In [None]:
t4 = torch.tensor(
    [
        [1, 2],
        [3, 4],
        [5],
    ]
)
show_tensor(t4)

## Interoperability with numpy

You can easily convert between a torch tensor and a numpy nd array

In [None]:
t5 = torch.tensor([[1, 2], [3, 4]])
# call .numpy() to convert a tensor to np array
a5 = t5.numpy()

print("type:", type(a5))
print("dtype:", a5.dtype)
print(a5)

In [None]:
# use torch.from_numpy to go from np array to tensor
t6 = torch.from_numpy(a5)
show_tensor(t6)

Just like numpy, you can index into tensors using the same notation

In [None]:
print(t2)
print()
print(t2[:, 1])

## Torch gradients

Torch tensors are capable of tracking gradients

In [None]:
# create tensors.
x = torch.tensor(3.0)
w = torch.tensor(4.0, requires_grad=True)
b = torch.tensor(5.0, requires_grad=True)
print(x, w, b, sep="\n")

Lets take this simple function:
$y = wx + b$

now,
\begin{align}\frac{\partial y}{\partial w} = x\end{align}
and
\begin{align}\frac{\partial y}{\partial b} = 1\end{align}

In [None]:
y = w * x + b
print(y)

Torch is capable of tracking that we got y after multiplying w, x and then adding b  
It uses this information while calculating the partial differential of y wrt w and b

Note that x wasn't marked with requires_grad and hence it doesn't exist

In [None]:
# compute derivatives
y.backward()

In [None]:
# display gradients
print("dy/dx:", x.grad)  # shouldn't exist since x wasn't marked with requires_grad
print("dy/dw:", w.grad)  # should be equal to x
print("dy/db:", b.grad)  # should be 1

## Differentiation in Autograd
Let's take a look at how `autograd` collects gradients. We create two tensors `a` and `b` with `requires_grad=True`. This signals to `autograd` that every operation on them should be tracked.

In [None]:
a = torch.tensor([2.0, 3.0], requires_grad=True)
b = torch.tensor([6.0, 4.0], requires_grad=True)

Now, let's take the function
$Q=3a^3−b^2$

In [None]:
Q = (3 * a ** 3) - (b ** 2)

Let's assume `a` and `b` to be parameters of an NN, and `Q` to be the error. In NN training, we want gradients of the error w.r.t. parameters, i.e.

\begin{align}\frac{\partial Q}{\partial a} = 9a^2\end{align}
\begin{align}\frac{\partial Q}{\partial b} = -2b\end{align}


When we call `.backward()` on `Q`, autograd calculates these gradients and stores them in the respective tensors' `.grad` attribute.

We need to explicitly pass a `gradient` argument in `Q.backward()` because it is a vector. `gradient` is a tensor of the same shape as `Q`, and it represents the gradient of Q w.r.t. itself, i.e.

\begin{align}\frac{dQ}{dQ} = 1\end{align}

Equivalently, we can also aggregate Q into a scalar and call backward implicitly, like `Q.sum().backward()`.

In [None]:
external_grad = torch.tensor([1.0, 1.0])
Q.backward(gradient=external_grad)

Gradients are now deposited in `a.grad` and `b.grad`

In [None]:
# check if collected gradients are correct
print(9 * a ** 2 == a.grad)
print(-2 * b == b.grad)

In [None]:
print(a.grad, b.grad, sep="\n")

## Visualizing the computation graph

We use the module [pytorchviz](https://github.com/szagoruyko/pytorchviz) for this

In [None]:
!pip install -q torchviz
from torchviz import make_dot

In [None]:
make_dot(Q, params={"Q": Q, "a": a, "b": b})

# Resources
- https://towardsdatascience.com/pytorch-autograd-understanding-the-heart-of-pytorchs-magic-2686cd94ec95
- https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html