<a href="https://colab.research.google.com/github/dkaratzas/XNAP2021-22/blob/main/Week%201%20-%20Revision/W01_03_Intro_Tensors.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/dkaratzas/XNAP2021-22/blob/main/Week%201%20-%20Revision/W01_03_Intro_Tensors.ipynb)

# What is PyTorch?

<a href="https://pytorch.org/">Pytorch</a> is a Python based scientific computing package targeted at two types of audience:

-  At the low level, it is a tensor library capable to exploit the computational power of GPUs
-  At the high level, it is a deep learning research platform that provides maximum flexibility and speed

## Import the library

In [None]:
import torch

## Getting help in Jupyter

The fastest way to get some quick help on something using Jupyter is to just ask! Type any Python object name you want followed by a question mark `?` and the code documentation will be loaded in your notebook. Try it with `torch`

In [None]:
torch?

The following command will list all objects of torch that with a name that finishes with "Tensor"

In [None]:
# In Colab, you can press <esc> to get out of help
torch.*Tensor?

If you use Colab, you also have a handy autocomplete feature at hand. For example, start writing a function name, like `torch.sqrt` if you pause after the first few characters a context menu with possible options will appear. Select the term you meant and press Tab or Enter to autocomplete. Note, this will not work in Jupyter Lab / Notebook out of the box, you would need to install an extension to enable this functionality.

In [None]:
torch.sq  # complete by typing a little bit more, wait and then use <Tab> or <Enter> to autocomplete

In Jupyter Lab (but not in CoLab) you can access the documentation by clicking on the Python object and pressing `<Shift>` + `<Tab>`. Try it in the line below (if you are using Jupyter Lab)

In [None]:
torch.nn.Module()  # <Shift>+<Tab>

You should see the same result as with the line below

In [None]:
# Annotate your functions / classes!
torch.nn.Module?

Where does this documentation come from? Part of it comes from the code itself, and part of it from the annotations (special comments) that are introduced in the function / class definitions. To have a look at the actual code of a function, just use a double `??`. See for example below, and get used to annotate your functions / classes as well!

In [None]:
torch.nn.Module??

## Torch!

At the core of PyTorch there is the `Tensor` class. It is very much like numpy's arrays, but supports autograd.

In [None]:
# Generate a tensor of size 2x3x4
t = torch.Tensor(2, 3, 4)
type(t)

In [None]:
# Get the size of the tensor
t.size()

In [None]:
# prints dimensional space and sub-dimensions
print(f'point in a {t.numel()} dimensional space')
print(f'organised in {t.dim()} sub-dimensions')

In [None]:
t

In [None]:
# Mind the underscore!
# Any operation that mutates a tensor in-place is post-fixed with an _.
# For example: x.copy_(y), x.t_(), x.random_(n) will change x.
t.random_(10)

In [None]:
r = t.view(3, 8)
r

In [None]:
# As you can see zero_ would replace r with 0's which was originally filled with integers
r.zero_()

In [None]:
t

In [None]:
# What are strides. And how are they related to shapes?
print(t.stride(), r.stride())
print(t.shape, r.shape)

In [None]:
# Let's try that again without doing the operations in place
t.random_(10)

In [None]:
# Not in place
r = t.view(3, 8)
r = torch.zeros_like(r)
r

In [None]:
t

In [None]:
# What are strides?
print(t.stride(), r.stride())

In [None]:
# This *is* important
s = r.clone()

In [None]:
# In-place fill of 1's
s.fill_(1)
s

In [None]:
# Because we cloned r, even though we did an in-place operation, this doesn't affect r
r

## Vectors (1D Tensors)

In [None]:
# Creates a 1D tensor of integers 1 to 4
v = torch.Tensor([1, 2, 3, 4])
v

In [None]:
# Print number of dimensions (1D) and size of tensor
print(f'dim: {v.dim()}, size: {v.size()[0]}')

In [None]:
w = torch.Tensor([1, 0, 2, 0])
w

In [None]:
# Element-wise multiplication
v * w

In [None]:
# Scalar product: 1*1 + 2*0 + 3*2 + 4*0
v @ w

In [None]:
# In-place replacement of random number from 0 to 10
x = torch.Tensor(5).random_(10)
x

In [None]:
print(f'first: {x[0]}, last: {x[-1]}')

In [None]:
# Extract sub-Tensor [from:to)
x[1:2 + 1]

In [None]:
# Create a tensor with integers ranging from 1 to 5, excluding 5
v = torch.arange(1, 4 + 1)
v

In [None]:
# Square all elements in the tensor
print(v.pow(2), v)

## Matrices (2D Tensors)

In [None]:
# Create a 2x4 tensor
m = torch.Tensor([[2, 5, 3, 7],
                  [4, 2, 1, 9]])
m

In [None]:
m.dim()

In [None]:
print(m.size(0), m.size(1), m.size(), sep=' -- ')

In [None]:
# Indexing row 0, column 2 (0-indexed)
m[0][2]

In [None]:
# Indexing row 0, column 2 (0-indexed)
m[0, 2]

In [None]:
# Indexing column 1, all rows (returns size 2)
m[:, 1]

In [None]:
# Indexing column 1, all rows (returns size 2x1)
m[:, [1]]

In [None]:
# Indexes row 0, all columns (returns 1x4)
m[[0], :]

In [None]:
# Create tensor of numbers from 1 to 4 (excluding 5)
v = torch.arange(1., 5)
v

In [None]:
m

In [None]:
# Scalar product
m @ v

In [None]:
# Calculated by 1*2 + 2*5 + 3*3 + 4*7
m[0, :] @ v

In [None]:
# Calculated by 
m[[1], :] @ v

In [None]:
# Add a random tensor of size 2x4 to m
m + torch.rand(2, 4)

In [None]:
# Subtract a random tensor of size 2x4 to m
m - torch.rand(2, 4)

In [None]:
# Multiply a random tensor of size 2x4 to m
m * torch.rand(2, 4)

In [None]:
# Divide m by a random tensor of size 2x4
m / torch.rand(2, 4)

In [None]:
m.size()

In [None]:
# Transpose tensor m, which is essentially 2x4 to 4x2
m.t()

In [None]:
# Same as
m.transpose(0, 1)

## Broadcasting

Two tensors are “broadcastable” if the following rules hold:

*   Each tensor has at least one dimension.
*   When iterating over the dimension sizes, starting at the trailing dimension, the dimension sizes must either be equal, one of them is 1, or one of them does not exist.


In [None]:
x=torch.empty(5,7,3)
y=torch.empty(5,7,3)
# x and y are broadcastable since all dimensions are equal

x=torch.empty((0,))
y=torch.empty(2,2)
# x and y are not broadcastable, because x does not have at least 1 dimension

x=torch.empty(5,3,4,1)
y=torch.empty(  3,1,1)
# x and y are broadcastable.
# 1st trailing dimension: both have size 1
# 2nd trailing dimension: y has size 1
# 3rd trailing dimension: x size == y size
# 4th trailing dimension: y dimension doesn't exist

# but:
x=torch.empty(5,2,4,1)
y=torch.empty(  3,1,1)
# x and y are not broadcastable, because in the 3rd trailing dimension 2 != 3

In [None]:
# How is the output dimension calculated?
x=torch.empty(5,1,4,1)
y=torch.empty(3,1,1)
print((x+y).size())

x=torch.empty(1)
y=torch.empty(3,1,7)
print((x+y).size())

## Constructors

In [None]:
# Create tensor from 3 to 8
torch.arange(3., 8 + 1)

In [None]:
# Create tensor from 5.7 to -2.1 with step -3
torch.arange(5.7, -2.1, -3)

In [None]:
# returns a 1D tensor of equally spaced elements between start=3, end=8 and number of elements=20
torch.linspace(3, 8, 20).view(1, -1)

In [None]:
# Create a tensor filled with 0's
torch.zeros(3, 5)

In [None]:
# Create a tensor filled with 1's
torch.ones(3, 2, 5)

In [None]:
# Create a tensor with the diagonal filled with 1
torch.eye(3)

In [None]:
from matplotlib import pyplot as plt
plt.rcParams["figure.figsize"] = (20,10)

In [None]:
# Numpy bridge!
plt.hist(torch.randn(1000).numpy(), 100);

In [None]:
plt.hist(torch.randn(10**6).numpy(), 100);

## Casting

In [None]:
# Helper to get what kind of tensor types
torch.*Tensor?

In [None]:
m = torch.Tensor([[2, 5, 3, 7],
                  [4, 2, 1, 9]])
m

In [None]:
# This is basically a 64 bit float tensor
m_double = m.double()
m_double

In [None]:
# This creates a tensor of type int8
m_byte = m.byte()
m_byte

In [None]:
# Converts tensor to numpy array
m_np = m.numpy()
m_np

In [None]:
# In-place fill of column 0 and row 0 with value -1
m_np[0, 0] = -1
m_np

In [None]:
m

In [None]:
# Create a tensor of integers ranging from 0 to 4
import numpy as np
n_np = np.arange(5)
n = torch.from_numpy(n_np)
print(n_np, n)

In [None]:
# In-place multiplication of all elements by 2 for tensor n
n.mul_(2)
n_np

## Using the GPU

In [None]:
# If this cell fails you need to change the runtime of your colab notebook to GPU
# Go to Runtime -> Change Runtime Type and select GPU
assert torch.cuda.is_available(), "GPU is not enabled"

# use the first gpu available if possible
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [None]:
# Tensors can be moved between gpu and cpu memory

tensor = torch.randn(5, 5) # create a 5x5 matrix filled with random numbers
print(f"tensor's device: {tensor.device}") # by default tensors are stored in cpu memory (RAM)

# Move your tensor to GPU device 0 if there is one (first GPU in the system)
if torch.cuda.is_available():
    tensor = tensor.to(device) # tensor.cuda() is an alternative although not recommended
print(f"tensor's device: {tensor.device}")

In [None]:
# A common mistake 
a = torch.randn(5, 2, device=device)
b = torch.randn(1, 2)

# This throws an exception, since you can't operate on tensors stored in
# different devices, and the error message is pretty clear about that
c = a * b

# Gradient Computation



In [None]:
# Tensors also track the operations applied on them in order to differentiate them

# setting requires_grad to true tells the autograd engine that we want to compute
# gradients for this tensor
a = torch.tensor([2., 3.], requires_grad=True)

L = 3*a**3
L.sum().backward()
print(f"Gradient of a with respecto to L: {a.grad}")

Lets check if the computed gradients are correct:

$\frac{\partial{L}}{\partial{a}} = [9 * a_1^2, 9 * a_2^2]$

$\frac{\partial{L}}{\partial{a}} = [9 * 2^2, 9 * 3^2]$

$\frac{\partial{L}}{\partial{a}} = [36, 81]$

As we can see the gradient vector matches the one computed by the autograd engine (no surprise there)

In [None]:
# Notice that the output tensor of an operation will require gradients even 
# if only a single input tensor has requires_grad=True.

x = torch.rand(5, 5)
y = torch.rand(5, 5)
z = torch.rand((5, 5), requires_grad=True)

a = x + y
print(f"Does a require gradients? : {a.requires_grad}")
b = x + z
print(f"Does b require gradients?: {b.requires_grad}")

## Much more

There's definitely much more, but this was the basics about `Tensor`s fun.

*Torch* full API can be found [here](https://pytorch.org/docs/stable/index.html).
You'll find 100+ `Tensor` operations, including transposing, indexing, slicing, mathematical operations, linear algebra, random numbers, etc are described.

# Homework

<font color="blue">**Exercise 1:** The code below simulates a tiny neural network, however it throws an exception. As you build neural networks in PyTorch you will see this exception often. Look at the error message, explain whats happening and make the necessary changes to the code to get an output from this tiny network</font>

In [None]:
### Generate some data
torch.manual_seed(7) # Set the random seed so things are predictable

# Features are 5 random normal variables
features = torch.randn((1, 5))
# True weights for our data, random normal variables again
weights = torch.randn_like(features)
# and a true bias term
bias = torch.randn((1, 1))
fts = torch.mm(features, weights)
print(fts + bias)
print(fts.shape, bias.shape)

<font color="blue">**Exercise 2:** Once you manage to sucessfully run the code below notice how the shape of the tensors ```fts``` and ```bias``` are drastically different, yet they can be added together. Which internal PyTorch mechanism makes this addition happen?</font>

# More Homework

<font color="blue">**Exercise 3:** Answer the following questions about the cell below</font>

1. Does the value of ```t``` change? Why?
2. Does the shape of ```t``` change? Why?
3. Explain, in your own words. What is the stride of a tensor, why is it convenient to have them?
4.  Pick a mathematical operation like cosine or square root (not those though 🙂). Can you find the correspoding function in the [torch library](https://https://pytorch.org/docs/stable/torch.html#pointwise-ops). 
5. Apply the function element-wise to ```a```.
6. Is there a version of the function that operates in place? Does it return an error? Why? How can it be fixed?
7. Run the same function on the GPU. Do you notice any difference in runtime? If not, why do you think that is?

In [None]:
t = torch.tensor(list(range(9)))

a = t.view(3, 3)
a.mul_(2)