# Exercise 2
## Authors: E. Vercesi; A. Dei Rossi, T. Marzi, M.Palomba

The goal of this exercise is to learn the basics of Pytorch.

In [None]:
import torch
import numpy as np
torch.manual_seed(42)

<torch._C.Generator at 0x7f22a0038030>

**Question (for fun):** Why the seed is often 42?

## 1 - Creating tensors
Learn how to create a tensor using the following methods:
### 1.1 - From scratch

Create a 2D tensor of floats with `torch.tensor` (don't explecitly specify the float type, just use float numbers)

In [None]:
v_1 =

### 1.2 - From a numpy array (and back)

Create a 2D numpy array and convert it to a tensor, then go back to a numpy array

Check the data type of the numpy tensors (in and out) and the torch tensor defined in section 1.1. **Question:** can you explain this behaviour?

### 1.3 - From a list

Given a list of lists, cast it as a pytorch tensor

In [None]:
list_v = [[1,2,3],[4,5,6]]

### 1.4 - Create an identity matrix

Just like numpy, pytorch makes it easy to create some kind of tensors. Try first with an identity matrix:

**Question:** what happens if you specify two non-matching dimensions?

### 1.5 - Given the shape, create a tensor of zeros/ones

Say we want to create matrices $\in \mathbb{R}^{2x3}$...


### 1.6 - Given a tensor, create another tensor of ones/zeros/random with the same shape of the input tensor


**Question:** Try to use the rand_like method with the following tensor: `torch.tensor([1,2,3])`.

Why does it raise error?

### 1.7 - Create some evenly-spaced tensors
`arange` → creates a tensor from `start` to `end` (excluded) with distance between each point equal to `step`

**Question:** in the arange method, what happens to the dype of the resulting vector? Try different combinations of start, end, and step.


Keep in mind: just like numpy, torch also supports linspace.

### 1.8 - Random tensors in numpy-style

Check the documentation for different distributions (standard normal, normal, bernoulli, poisson, ...). Define a random tensor:

## 2 - Operations with tensors
### 2.1 - Arithmetic operations
Create two tensors with the same shape (e.g. `(2,3)`) and sum/subract/multiply/divide them.

### 2.2 - Different ways to multiply tensors
Learn and explain the differences between `torch.dot`, `@`, `torch.matmul` and `*`. Then, perform these operation matrices of different sizes. If you face some errors, try to explain why.

**Remark:** you can compute the transpose of a matrix `x` using `x.T` or `x.t()`.

### 2.3 - Other operations
Try some operations (e.g. add, sum, mean, max, argmax) on a tensor of shape `(2,3)`, each time using different dimensions.

**Remark:** differently from `numpy`, the argument in torch built-in functions is `dim`, not `axis`. However, apparently pytorch devs kindly introduced the usage of axis as an alias of dim (be aware that we haven't tested it for all the pytorch functions).

**Remark 2:** differently from `numpy`, if you specify the dim in `max` it returns the value as well as the indices (check the result also using argmax).

### 2.4 - Broadcasting on tensors operations
Try to sum tensors of different sizes (e.g. `(3,1)` and `(1,3)`) and explain the behaviour.

### 2.5 - Pytorch functionals
Pytorch functionals (e.g. activation functions) can be imported from the `torch.nn.functional` module.
We now create a random tensor `v` of shape `(3,3)` with values sampled from a uniform distribution in `(-2,2)`:

In [None]:
import torch.nn.functional as F
from torch.distributions.uniform import Uniform # hint: you need to sample from this object

Try out yourself the following activation functions from the functional module:
* relu
* tanh
* sigmoid
* softmax (**remark:** you must normalize with respect to a dimension...)

## 3 - Access the information of a tensor
Similarly to numpy, pytorch tensors contain useful attributes such as type and shape.

Moreover, pytorch tensors can be stored on both CPU and GPU.

We now learn how to access those information.

### 3.1 - Size of a tensor

**Question:** what is the difference between `size()`, `shape` and `ndim`? Why does `size()` have the brackets?

### 3.2 - Type of a tensor

**Question:** what is the difference between `dtype` and `type`? The `d` should be a hint, not an answer

#### Exercise on tensor types
* what is the default type of a tensor?
* learn how to create a tensor with a hard-coded type.
* what happens if you choose int type with float numbers?

### 3.3 - Memory location
Define a tensor and retrieve the device in which the tensor is stored.

Specify to create a tensor on GPU (**remark:** you have to turn on GPU):

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"      # you will see this line plenty of times!

**Remark 1:** the notebook counts the hours of GPU usage, not the actual usage...  So don't forget to turn it off when you don't need it!

**Remark 2:** for apple M1 chips the syntax is:

`device = "mps" if torch.backends.mps.is_available() else "cpu"`

## 4 - Manipulating dimensionality

We now want to learn how how to manipulate tensors dimensions.

### 4.1 - Reshape a tensor

Create a random tensor of shape `(2,3,4)`. Then, use `torch.reshape()` to reshape it into a tensor with the following sizes (you have to guess the correct `x` for each size):
* `(2,x)`
* `(4,x)`
* `(3,x,2)`
* `(1,x,24)`

**Remark:** also `torch.view()` is used for the same purpose, but it will share the underlying data with the original tensor.

### 4.2 - Concatenate tensors
Try to use `torch.cat()` to concatenate two tensors of shape `(2,3,4)` along different dimensions. What will be their output shape?

If you want to concatenate along a new dimension (**don't forget to specify where to insert it!**), you have to use instead the `torch.stack()` method. Try it with different dimensions.

### 4.3 - Add dimensions (and back)
Create a tensor of shape `(2, 3, 4)` and insert a new dimension in the last axis using `torch.unsqueeze()` (check its shape).

Then, go back to the original tensor using `torch.squeeze()`.
**Question:** what happens if I have more than one dimension with size 1? Guess and try it yourself with a tensor of shape `(2, 1, 4, 1)`.

### 4.4 - Permute
Given a tensor of shape `(2,3,4)`, use the command `torch.permute()` to permutes the dimensione of the tensor. To better understand how it works, it takes a tuple (e.g.  `(2, 0, 1)`) that permutes the order of the dimensions:

`dims = (2, 3, 4)`

`new_dims = (dims[2], dims[0], dims[1]) = (4, 2, 3)`

### 4.5 - Slicing, accessing elements
These operations are performed in the same way as `numpy`. Make sure you have understand these operations in that case. To provide you an example:

In [None]:
n = torch.arange(0, 36).reshape((6, 6))
print(n)
print(n[3, 5])  # access a single element

tensor([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11],
        [12, 13, 14, 15, 16, 17],
        [18, 19, 20, 21, 22, 23],
        [24, 25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34, 35]])
tensor(23)


Now try to access the "core" 2x2 matrix of the tensor `n`, i.e. the block:

$\begin{pmatrix}
  14 & 15 \\
  20 & 21
 \end{pmatrix}$

**Question:** Is `n[3,5]` an integer from a "python" point of view? Check out the `item()` method.

## 5 - Gradients
Pytorch allows us to compute the gradient using automatic differentiation.

Every `torch.tensor` has an attribute `requires_grad`, which specifies if the gradient must be computed for that tensor. By default, the attribute is set to false (try it out yourself).

Try to create a tensor with `requires_grad=True` and use `detach()` to create a copy of the tensor with `requires_grad=False`.

We now want to analyze how gradient is propagated. We define two tensors `x` and `y` with `requires_grad=True` and a variable `z` which is a function of `x` and `y`. Then, we compute the gradient using `backward()` and print the gradient associated to each variable.



In [None]:
x = torch.tensor([1.2], requires_grad = True)
y = torch.tensor([2.], requires_grad = True)
z = x*x + 3*y
# TODO

#### Questions
* If instead the variable `z` does not depend on `y`, what happens if I try to print the gradient associated to `y`?
* What happens if I try to define an integer tensor with `requires_grad=True`?
* What happens if I try to compute again the gradient of `z`? (**Hint:** check out `torch.optim.optimizer.zero_grad`)
* What happens if I call `numpy()` on a tensor that has `requires_grad=True`?

Try to answer


Note that the gradient accumulate over multiple backward calls of computations involving that tensor, unless explicitely told otherwise.