<a href="https://colab.research.google.com/github/DanieleAngioni97/Introductory-Seminar-PyTorch/blob/main/notebooks/01_tensor_basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tensor Basics

Here we will see what is a tensor object and how we can manipulate this particular data structure.

First, let's import some libraries that we will need later.

In [1]:
import matplotlib.pyplot as plt
import torch
import numpy as np

Do you remember the python list? This is a basic data structure in which we can collect data in an organized manner.
For example, we can create a list of integers and obtain the values by using the correponding indices (remember that we start counting from 0)

In [None]:
l = [1, 0, 3]
print(l[0])

1


We can also create a list of lists, so that we can simulate a matrix-like data structure.
For example, we can create a 2x3 matrix (2 rows and 3 columns) by creating a list of 2 list with 3 elements each.
Here we can obtain each element by first selecting the row and then the column.

In [2]:
nested_list = [[0, 1, 2],
               [1, 2, 3]]
print(nested_list[1][2])

3


With the exact same concept we can create a nested list and pass it as argument to torch.tensor() in order to create a PyTorch tensor data structure.
Now we can perform indexing in a more matrix-like fashion.







In [None]:
import torch
t = torch.tensor([[0, 1, 2],
                  [1, 2, 3]])
print(t[0, 1]) # indexing tensors (we will see more indexing tricks later)

tensor(1)


## Fancy Indexing

With tensors, we can use fancy indexing (like [Numpy indexing](https://numpy.org/doc/stable/user/basics.indexing.html))

In [None]:
x = torch.tensor([0, 1, 2, 3, 4, 5, 6]) # 1-d tensor
element = x[0] # i-th element
first_elements = x[:3] # from start to element 3
last_elements = x[3:] # from element 3 to the end
some_elements = x[3:5] # from element 1 to element 3

When tensors are too large it can be difficult to keep track of the last indices, but there is a trick ;)

In [None]:
x = torch.arange(100) # 1-d tensor

last_element = x[99]

# A clever alternative
last_element = x[-1]

third_to_last_elements = x[-3:] # the last three elements

# Some combinations... what should it print?
some_elements = x[90: -5]
some_elements = x[-10: -5]

Works similarly with 2-d tensors (row and columns):

In [None]:
x = torch.tensor([[0, 1, 2], [1, 2, 3]]) # 2-d tensor
element = x[0, 0]
row = x[0, :] # works also with x[0] in this case
column = x [:, 0]
some_rows = x[1:, :] # from row 1 to the end, all columns
some_elements = x[1:2, :1] # from row 1 to 2, from column 0 to 1

## Tensors element types

In [None]:
double_precision = torch.tensor([0, 1], dtype=torch.double)
print(double_precision.dtype)
short_tensor = double_precision.short()
print(short_tensor.dtype)
bool_tensor = double_precision.bool()
print(bool_tensor.dtype)

torch.float64
torch.int16
torch.bool


In general you can call the `.type` method and specify the torch data type (a complete list in the [documentation](https://pytorch.org/docs/stable/tensor_attributes.html))

In [None]:
x = torch.tensor([0, 1], dtype=torch.double)
x.type(torch.uint8)

tensor([0, 1], dtype=torch.uint8)

## Boolean indexing
Similarly to NumPy, we can use the boolean tensors to indicise certain elements of another tensor.

In [None]:
x = torch.tensor([[-4, -1, 2], [1, -2, 3]]) # 2-d tensor

boolean_mask = (x > 0)  # this could be any boolean expression
print(boolean_mask)
print(x[boolean_mask])

tensor([[False, False,  True],
        [ True, False,  True]])
tensor([2, 1, 3])


## Basic tensor operations

In [None]:
a = torch.ones(3, 2) # 3x2 tensor of only ones
b = torch.zeros(3, 1) # 3x1 tensor of only zeros
c = torch.zeros_like(a) # same shape and type as a
a_t = a.t() # 2x3 tensor (transpose of a)
print(a.shape) # prints the shape (i.e., all the sizes of the dimensions)

torch.Size([3, 2])


In [None]:
absolute_values = torch.abs(a) # pointwise operations
mean_value = torch.mean(a) # reduction operations
s = a + c # element-wise sum
p = a * c # element-wise product
z = torch.mm(a, c.t()) # matrix multiplication (careful with shapes!)
broadcasting = a + torch.tensor([1, 2]) # torch tries to match shapes

## [Tensor storage](https://pytorch.org/docs/stable/generated/torch.Tensor.data_ptr.html#torch-tensor-data-ptr)


In [None]:
a = torch.tensor([1, 2, 3, 4])
b = a[1] # different Tensor, same storage (points to the same location)
c = a.reshape([2, 2]) # same storage, different stride
print(a.storage())
print(c.storage())
print(a.data_ptr() == c.data_ptr()) # same storage
print(c.stride()) # how many storage items to skip for incrementing each dimension

Remember: the underlying memory is allocated only once, which makes the view
operation very lightweigth even for large storages.

## Modifying stored values: in-place operations

In-place operations are used to modify directly stored values. The most used one
is the zero_, that sets to zero all values. They can be recognized by the trailing
underscore _ in their name.

In [None]:
a = torch.ones(3, 2)
a.zero_() # in-place operation, does not create a new tensor

## Moving tensors to the GPU

In [None]:
gpu_tensor = torch.zeros(1, device='cuda') # created on the GPU
cpu_tensor = torch.zeros(1)
to_gpu = cpu_tensor.to(device='cuda') # this creates a copy of the tensor!
to_gpu_another = cpu_tensor.cuda() # shorthand for the previous command
again_to_cpu = to_gpu.cpu() # shorthand for copying the tensor to cpu

## Serializing tensors

In [None]:
torch.save(a, 'tensor.pth') # note that the extension is arbitrary

In [None]:
b = torch.load('tensor.pth')

## Exercise 1.1

Write code that creates a `torch.tensor` with the following contents:
$\begin{bmatrix}
1 & 2.2 & 9.6 \\ 4 & -7.2 & 6.3 \\ 9 & 0 & -1
\end{bmatrix}$

Now with this tensor:
1. Print the second element in the second row
2. Print the last column
3. Print the shape of the tensor
4. Compute the average of each row
5. Print the tensor of all positive values and save it (*hint: use boolean indexing*)
6. Load the latter tensor and compute its average
7. Move the original tensor to the GPU

In [None]:
# TODO 1)

In [None]:
# TODO 2)

In [None]:
# TODO 3)

In [None]:
# TODO 4)

In [None]:
# TODO 5)

In [None]:
# TODO 6)

In [None]:
# TODO 7)

# Autograd
As we will see in the next chapters, in deep learning we need to obtain the **gradients**.
PyTorch computes the gradient of any differentiable function w.r.t. their inputs by using **automatic differentiation** (even for extremely complex functions!)
This PyTorch component is called **autograd**.

When operating with tensors, PyTorch automatically build the corresponding computational graph by keeping track of every interaction between tensors.
In particular each node remembers:
*   the parent tensors that originated it
*   the operation performed on the parent tensors

Given a function $y = f(x)$ (of any complexity),
where $x$ and $y$ can be respectively the input and output tensors and $f$ the series of operations applied to them,
we can compute the derivative $\frac{dy}{dx}$ by:


1. first computing the **forward pass** to have an actual scalar value of $y$,
2. calling the method `y.backward()`,
3. accessing the gradient value in the field `x.grad`.

Remember to set the attribute `requires_grad=True` whenever you need to retreive gradients from a tensor. PyTorch will compute gradients only when specified to keep the code efficient.


Let's see this procedure in practice!


In [None]:
x = torch.tensor(3., requires_grad=True)
print(x.grad)
y = x**2    # forward pass: here we define and use the function
print(y)
y.backward()    # backward pass
print(x.grad)

None
tensor(9., grad_fn=<PowBackward0>)
tensor(6.)


What if we run multiple times the following cell?
Why the gradient is no more equal to the correct one?

In [None]:
y = x**2
y.backward()
print(x.grad)

tensor(6.)


Remember that every time we call forward and backward, the gradients are not overwritten but accumulated.

This is an important property needed for complex operations, but if disregarded can lead to wrong results.

To fix this issue is sufficient to call the `.zero_()` method on the grad attribute


In [None]:
y = x**2
y.backward()
print(x.grad)
x.grad.zero_()

tensor(6.)


tensor(0.)

When the function is more complex, such as a nested one, the code for computing the gradients does not change as PyTorch take care of implementing the chain rule automatically for computing derivatives all around the computational graph!

In [None]:
x = torch.tensor(3., requires_grad=True)
print(x.grad)
y = (x**2).log()    # forward (only part of the code that changed)
print(y)
y.backward()    # backward
print(x.grad)

None
tensor(2.1972, grad_fn=<LogBackward0>)
tensor(0.6667)


## Exercise 1.2
Consider the function $f = log(w \cdot x + b)$.

1. Compute the forward pass setting $x=2$, $w=0.5$ and $b=1.5$ and print the intermediate and output tensors.


2. Compute the backward pass and print the gradients $\frac{\partial f}{\partial x}$, $\frac{\partial f}{\partial w}$ and $\frac{\partial f}{\partial b}$.

In [None]:
###########################################
# 1. Forward pass
###########################################
# TODO

In [None]:
###########################################
# 1. Backward pass
###########################################
# TODO