# Exercise 2
## Authors: E. Vercesi; A. Dei Rossi, G. Dominici, S. Huber

In this exercise session you are going to learn the basics of PyTorch. 
PyTorch is a Python library for scientific computing (as much as NumPy), but which can additionally run on GPUs. 
Hence, this is the computing library of choice for Deep Learning applications. 
PyTorch is developed by Meta. You might have also heard of its main competitor TensorFlow (Google). Although both have basically the same functionalities, in this course we would like you to stick to Pytorch.
If you haven't done Exercise 1 on NumPy yet, we highly encourage to do it first: NumPy and PyTorch offer a vast overlap of functionalities, so understanding NumPy first is going to boost greatly your understanding of PyTorch.
To begin with, make sure you have installed it. 

In [1]:
import torch  # If you see errors, use conda or pip to install torch in your virtual environment.
import numpy as np

torch.manual_seed(42)  # manual seed is to ensure repeatability of random numbers. 

<torch._C.Generator at 0x14ea05610>

**Question (for fun):** Why the seed is often [42](https://www.youtube.com/watch?v=aboZctrHfK8)?


It's the answer to life, universe and everything according to "A hitchhiker's guide to the galaxy" by Douglas Adams.

## Create tensors

Tensors are like numpy arrays, but they can live in the GPU.

1. Create a tensor out of a Python list [1, 2, 3]
2. Create a tensor out of a NumPy array [[2, 3, 4], [4, 3, 2]] (see method [`.from_numpy()`](https://pytorch.org/docs/stable/generated/torch.from_numpy.html))
3. Convert the tensor of point 2 back to a NumPy array. (see method [`.numpy()`](https://pytorch.org/docs/stable/generated/torch.Tensor.numpy.html))

In [2]:
## 1: create a tensor out of a Python list
v = torch.tensor([1, 2, 3])
print("## 1:", v)

## 2: create a tensor out of a NumPy array
array = np.array([[2, 3, 4], [4, 3, 2]])
v2 = torch.from_numpy(array)
print("## 2: from numpy array:\n", v2)

## 3: Convert the tensor back to NumPy.
a = v2.numpy()
print("## 3: back to numpy array:\n", a)

## 1: tensor([1, 2, 3])
## 2: from numpy array:
 tensor([[2, 3, 4],
        [4, 3, 2]])
## 3: back to numpy array:
 [[2 3 4]
 [4 3 2]]


Check the `.dtype` attribute of the above created tensors. Create a tensor of size 3 with values [1, 2, 3] but forcing the dtype to be float64.

In [4]:
print(v.dtype, v2.dtype)

## 1: create [1, 2, 3] with dtype float64
v = torch.tensor([1, 2, 3], dtype=torch.float64)
print("## 1:", v.dtype)

torch.float64 torch.int64
## 1: torch.float64


PyTorch also offers some more advanced functions that can be used to create well known matrices:

1. Create an identity matrix of size (5, 5) (see [`torch.eye()`](https://pytorch.org/docs/stable/generated/torch.eye.html)).
2. Create a matrix of all zeros of size (3, 4) (see [`torch.zeros()`](https://pytorch.org/docs/stable/generated/torch.zeros.html).
3. Create a matrix of all ones of size (2, 3) (see [`torch.ones()`](https://pytorch.org/docs/stable/generated/torch.ones.html).
4. Given a tensor of size (3, 2) of your choice, create a matrix of the same size (3, 2) filled with ones (equivalently zeros) (see [`torch.zeros_like()`](https://pytorch.org/docs/stable/generated/torch.zeros_like.html)
5. Create a matrix of size (3, 4) filled with numbers from 0 to 11 inclusive (same as in NumPy). Try both [`torch.arange()`](https://pytorch.org/docs/stable/generated/torch.arange.html) and [`torch.linspace()`](https://pytorch.org/docs/stable/generated/torch.linspace.html).

In [11]:
## 1:
eye = torch.eye(5)
print("## 1: eye\n", eye)

## 2:
zeros = torch.zeros(3, 4)
print("## 2: zeros\n", zeros)

## 3:
ones = torch.ones(2, 3)
print("## 3: ones\n", ones)

## 4:
v = torch.tensor([[1, 2], [3, 4], [5, 6]])
vzero = torch.ones_like(v)  # we might also use zeros_like
print("## 4: zeros like\n", vzero)

## 5:
## torch.arange()
# start 0, end 12 (exclusive)
varange = torch.arange(0, 12).reshape(3, 4)
## torch.linspace()
# start 0, end 11 (inclusive), 12 steps
vlinspace = torch.linspace(0, 11, 12).reshape(3, 4)
print("## 5:\narange\n", varange, "\nlinspace\n", vlinspace)

## 1: eye
 tensor([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.]])
## 2: zeros
 tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])
## 3: ones
 tensor([[1., 1., 1.],
        [1., 1., 1.]])
## 4: zeros like
 tensor([[1, 1],
        [1, 1],
        [1, 1]])
## 5:
arange
 tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]]) 
linspace
 tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]])


Notice that `arange` creates integer numbers, `linspace` floating point numbers.

### Random arrays

As in NumPy, you have a big choice of random distributions to sample you arrays from.
Try to do the same random arrays you tried to do in NumPy in Exercise 1:
1) Create a random tensor of size 4 of uniform floating point numbers in the interval [0, 1). (see [`torch.rand`](https://pytorch.org/docs/stable/generated/torch.rand.html))
2) Create a random tensor of size (3, 2) of uniform floating point numbers in the interval [0, 5). (hint: generate numbers in the interval [0, 1) and scale them up by 5).
3) Create a random tensor of size (2, 1, 2) of integers in the interval [10, 20]. (see [`torch.randint`](https://pytorch.org/docs/stable/generated/torch.randint.html), caraful with border conditions!)
4) Create a random tensor of size 10 over the normal distribution, mean 3 and std dev 2. (see [`torch.normal`](https://pytorch.org/docs/stable/generated/torch.normal.html))

In [5]:
## 1:
rand = torch.rand(4)
print("## 1: rand\n", rand)

## 2: 
rand05 = torch.rand(3, 2) * 5
print("## 2: rand * 5\n", rand05)

## 3:
# low inclusive, high exclusive
randint = torch.randint(10, 21, (2, 1, 2))
print("## 3: randint\n", randint)

## 4:
normal = torch.normal(mean=3., std=2., size=(10,))
print("## 4: normal distribution\n", normal)

## 1: rand
 tensor([0.8823, 0.9150, 0.3829, 0.9593])
## 2: rand * 5
 tensor([[1.9522, 3.0045],
        [1.2829, 3.9682],
        [4.7039, 0.6659]])
## 3: randint
 tensor([[[11, 20]],

        [[13, 14]]])
## 4: normal distribution
 tensor([3.7531, 2.6384, 3.7861, 3.8654, 0.2746, 5.7129, 4.3376, 1.5846, 2.3466,
        2.4424])


## Device (GPU vs CPU)

In this section we will learn how do computation using the GPU instead of the CPU: notice that this is the reason why in Deep Learning applications PyTorch is used over NumPy.

By default, tensors are accessed by the CPU. You can check it easily using the [`.device()`](https://pytorch.org/docs/stable/tensor_attributes.html#torch.device) method.
1) Create an identity matrix of size (4, 4) and access its device attribute.

In [6]:
## 1: see .device of a matrix

v = torch.eye(4, 4)  # create a tensor
print("## 1: device =", v.device)

## 1: device = cpu


Hence, every time we want to use the GPU, we need to explicitly move the tensors to the desired device. Careful here: your laptop doesn't necessarily have a dedicated GPU. And, even if it has one, it might not be compatible with CUDA (the NVIDIA interface that allows computations to be performed on the GPU).

You can check if CUDA is available on your machine by simply using [`cuda.is_available()`](https://pytorch.org/docs/stable/generated/torch.cuda.is_available.html).

In [7]:
## The output might be different on your machine. 
torch.cuda.is_available()

False

If the above returns False, it could be either because you didn't install correctly CUDA, or because you laptop doesn't have a GPU compatible with it. 
If you have a macbook with Apple Silicon processors, you can still use the device `mps`:

In [8]:
# For mac M1/2/3 users
torch.backends.mps.is_available()

True

We can set the device either to these three options:
- `cuda` (if you have a NVIDIA graphics card). Might be `cuda:0` etc if you have more than one.
- `mps` (if you have a MacBook with M1/2/3 processor)
- `cpu` otherwise

If your laptop doesn't have any of the above-mentioned devices apart from the CPU, you can use Google Colab's or Kaggle's notebooks: they offer free hours of GPU per week (they count the hours the kernel is running, not if you are actually using the notebook. Remember to shut it down when you don't use it!!!)

In [9]:
device = torch.device('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu')
device

device(type='mps')

Finally, move the tensor `v` you created earlier to the most convenient device. Use function [`.to`](https://pytorch.org/docs/stable/generated/torch.Tensor.to.html). Careful: is it an in-place method? Check that the device is indeed correct.

In [10]:
# Move vector v to the correct device. Check it is indeed on the desired device.
w = v.to(device)
print("w device =", w.device, ", v device =", v.device)

w device = mps:0 , v device = cpu


Notice that `w` is moved to the GPU. `v`, instead, remains on the CPU. `.to` is not an inplace method, it returns a new tensor allocated in the desired device. Notice that changing `w` doesn't affect `v`!

In [11]:
w[0, 0] = 10.
print("w after change:", w[0, 0], ", v after change:", v[0, 0], ": they are different objects!")

w after change: tensor(10., device='mps:0') , v after change: tensor(1.) : they are different objects!


You can also create a tensor and send it directly to the correct device. 
1. Create a tensor of ones of size (3, 3) and specify in its constructor the `device` attribute. Check that, indeed, the tensor has been initialized with the correct device.

In [12]:
## 1: Create a tensor and initialize it to the correct device.
eye = torch.ones(3, device=device)
print("The tensor is allocated in:", eye.device)

The tensor is allocated in: mps:0


Later on, we will compute empirically how much faster are GPUs than CPUs for computing large calculations.

If you are using Kaggle platform for your projects (we recommend you to do that), you have at your disposal 30h/week of free GPUs: in order to activate it, you need to open a notebook, go to settings -> accelerator and you can select a GPU from there. If the GPU options are non-clickable, it is because you have to verify your account using your phone number. Go to home -> your picture (top right border) -> settings -> phone verification. Before the options become actually clickable you will need to wait a few minutes (<5').

If you are using Google Colab (also recommended), you can activate GPUs by opening a notebook -> top-right arrow pointing downward -> change runtime type -> select something which is not `CPU`.

## Working with tensors' dimensions

In this section we will learn how to manipulate tensor's dimensions. Notice that they are extremely similar to NumPy methods: hence, if you have done exercise 1, this section should be quite straightforward.

### Access elements and slicing 

Create an identity matrix of size (4, 4) and access 
1. The element in position [0, 0]
2. The last element
3. Element in position [2, 3]
Check that the returned elements are what you expect.

In [13]:
# Create the identity matrix
eye = torch.eye(4)

## 1: access element in [0, 0]
print("## 1: we expect 1, we get", eye[0, 0])

## 2: access element in [3, 3] , rember -1 is for last position
print("## 2: we expect 1, we get", eye[-1, -1])

## 3: access element in [2, 3]
print("## 3: we expect 0, we get", eye[2, -3])


## 1: we expect 1, we get tensor(1.)
## 2: we expect 1, we get tensor(1.)
## 3: we expect 0, we get tensor(0.)


### Slicing

1. Create a random tensor of size (3, 4) of integers in the interval [5, 10].
2. Print the second row.
3. Print the third column.
4. Print the sub-matrix spanning from the second to the third row, from the second to the third column.

In [14]:
## 1: create a random tensor of size [3, 4].
v = torch.randint(5, 11, size=(3, 4))
print("## 1\n", v)

## 2: print the second row.
print("## 2: second row", v[1, :])  # second row has index 1

## 3: print the third column.
print("## 3: third column", v[:, 2])  # Third column has index 2

## 4: sub-matrix
print("## 4: sub-matrix\n", v[1:3, 1:3])

## 1
 tensor([[10,  9, 10,  7],
        [ 9,  5,  7,  5],
        [ 6,  8,  8, 10]])
## 2: second row tensor([9, 5, 7, 5])
## 3: third column tensor([10,  7,  8])
## 4: sub-matrix
 tensor([[5, 7],
        [8, 8]])


### Access tensors' dimensions

1. Create a tensor $v$ of size (3, 4, 2, 4, 1) of random floats in [0, 1)
2. Print its shape. You can use both `.shape` and `.size()`, try them both.
3. Print its third dimension's size (2 in our example). Check `.size()` function.
4. Print the number of dimensions of our vector (5 in our example). Check `.ndim`.

In [17]:
## 1: create a random tensor v of size (3, 4, 2, 4, 1).
v = torch.rand(3, 4, 2, 4, 1)

## 2: print v's shape using .shape and .size().
print("## 2: v.size() =", v.size(), "; v.shape =", v.shape)
print("Pam from Dunder Mifflin: they are the same picture;)")

## 3: print the size of the third dimension of v.
# Index of the third dimension is 2!!
print("## 3: third dimension, we expect 2, we get:", v.size(2))
print("We could also use .shape:", v.shape[2])

## 4: print the number of dimensions of v.
print("## 4: number of dimensions:", v.ndim)

## 2: v.size() = torch.Size([3, 4, 2, 4, 1]) ; v.shape = torch.Size([3, 4, 2, 4, 1])
Pam from Dunder Mifflin: they are the same picture;)
## 3: third dimension, we expect 2, we get: 2
We could also use .shape: 2
## 4: number of dimensions: 5


## Permute dimensions

You can invert the order of the dimensions of a tensor. Create a random tensor of integers in the interval [0, 10) of size (2, 3, 4) and permute its dimensions so that the final size is (4, 2, 3). See [`torch.permute`](https://pytorch.org/docs/stable/generated/torch.permute.html)

In [20]:
# Create a random tensor. Check its shape (2, 3, 4)
v = torch.randint(0, 10, (2, 3, 4))
print("Original shape:", v.size())

# Permute its dimensions. Check its shape (4, 2, 3)
# first we want the third dimension, second the first dimension, and last the second dimension
# (indices start from 0!)
w = torch.permute(v, (2, 0, 1))
print("Shape of w after having permuted v's dimensions:", w.size())
print("Notice that v has remained unchanged:", v.size())


Original shape: torch.Size([2, 3, 4])
Shape of w after having permuted v's dimensions: torch.Size([4, 2, 3])
Notice that v has remained unchanged: torch.Size([2, 3, 4])


## Squeeze/unsqueeze

If you want increase the number of dimensions of your vector (similar to `np.newaxis`, this might turn useful in the context of broadcasting), you can use [`torch.unsqueeze`](https://pytorch.org/docs/stable/generated/torch.unsqueeze.html). If you want to reduce the number of dimensions of your vector by dropping dimensions of size 1 you can use [`torch.squeeze`](https://pytorch.org/docs/stable/generated/torch.squeeze.html) instead.

1. Create a random tensor uniform in [0, 1) of size (2, 2). Insert a new dimension so that the final shape is (2, 1, 2)
2. Add a dimension to the tensor of point 1, so that the final shape is (2, 1, 2, 1). Try to use negative indices as the argument of `torch.unsqueeze()`.
3. Turn the tensor back to its original shape (2, 2) by using `torch.squeeze()`.

In [21]:
## 1: Create a tensor of size (2, 2). Unsqueeze it so that its final shape is (2, 1, 2)
v = torch.rand(2, 2)
# We insert a new dimension as second dimension (index 1)
v = torch.unsqueeze(v, 1)
print("## 1: v's shape after having added a dimension:", v.shape)

## 2: Add an additional dimension to the tensor so that its shape is (2, 1, 2, 1). Use negative indices
v = torch.unsqueeze(v, -1)  # Add a new dimension as the last dimension.
print("## 2: v's dimension after having added a new dimension as its last:", v.size())

## 3: Turn the tensor back to shape (2, 2)
# Simply use squeeze() without arguments: it will drop all dimensions of size 1.
v = torch.squeeze(v)
print("## 3: v back to normal:", v.shape)

## 1: v's shape after having added a dimension: torch.Size([2, 1, 2])
## 2: v's dimension after having added a new dimension as its last: torch.Size([2, 1, 2, 1])
## 3: v back to normal: torch.Size([2, 2])


## Concatenate and stack

If you have two tensors of compatible sizes, you can merge them into a unique tensor along one of their axes.
In order to get some intuition, think about having 2 2-dimensional tensors of size (3, 4). You can merge them along the first axis and get the final shape be (6, 4), or you can merge them along the second axis and get the final shape to be (3, 8), or you can go in 3D stacking one over the other (along the z-axis) and get a shape of (2, 3, 4). 

This is precisely what [`torch.concat`](https://pytorch.org/docs/stable/generated/torch.cat.html#torch.cat) (also called `.cat`) and [`torch.stack`](https://pytorch.org/docs/stable/generated/torch.stack) do. 
You should already be familiar with NumPy `axis` attribute. In PyTorch it is called `dim`.

1. Concat $v$ and $w$ along the first dimension. Check that the final shape is (6, 4).
2. Concat $v$ and $w$ along the second dimension. Check that the final shape is (3, 8).
3. Concat $v$ and $w$ along a new dimension. Check that the final shape is (2, 3, 4).

In [22]:
v = torch.randint(0, 10, (3,4))
w = torch.randint(0, 10, (3,4))

## 1: concat along first dimension.
# v, w need to be passed as a tuple/list of tensors.
x = torch.concat((v, w), dim=0)
print("## 1: concatenating along first dimension:")
print(x)
print("The shape is:", x.shape)

## 2: concat along second dimension.
y = torch.concat((v, w), dim=1)
print("## 2: concatenating along the second dimension:")
print(y)
print("The shape is:", y.shape)

## 3: concat along new dimension.
# In order to add a new dimension we use stack
z = torch.stack((v, w))
print("## 3: concatenating along a new dimension:")
print(z)
print("The shape is:", z.shape)


## 1: concatenating along first dimension:
tensor([[7, 3, 2, 9],
        [1, 3, 9, 0],
        [9, 5, 8, 2],
        [3, 7, 7, 3],
        [9, 7, 6, 9],
        [7, 2, 0, 0]])
The shape is: torch.Size([6, 4])
## 2: concatenating along the second dimension:
tensor([[7, 3, 2, 9, 3, 7, 7, 3],
        [1, 3, 9, 0, 9, 7, 6, 9],
        [9, 5, 8, 2, 7, 2, 0, 0]])
The shape is: torch.Size([3, 8])
## 3: concatenating along a new dimension:
tensor([[[7, 3, 2, 9],
         [1, 3, 9, 0],
         [9, 5, 8, 2]],

        [[3, 7, 7, 3],
         [9, 7, 6, 9],
         [7, 2, 0, 0]]])
The shape is: torch.Size([2, 3, 4])


## Broadcasting

Same as in NumPy, also PyTorch tensors allow [broadcasting](https://pytorch.org/docs/stable/notes/broadcasting.html).
When performing element-wise operations (like sums) on two tensors of mismatching sizes, the smaller tensor can adapt to the size of the larger tensor in case these simple rules apply:

- Each tensor has at least one dimension.
- When iterating over the dimension sizes, starting at the trailing (right-most) dimension, the dimension sizes must either be equal, one of them is 1, or one of them does not exist.

Let us see an example:

Assume you have $v = [[1, 2, 3], [4, 5, 6]]$ shape (2, 3) and $w=[3, 2, 1]$ shape (3). If we want to perform $v + w$ (element by element sum), it is clear that the dimensions don't match, but with the help of broadcasting we can still do it: $w$ is simply enlarged to reach size (2, 3) by copying itself on the first axis twice. Then, it is possible to perform element by element sum $v+w$.

Let's put broadcasting in practice:

1. Perform the above described example $v+w$ using tensors, check that the result size is (2, 3) and that numbers add up.
2. $r = [[1, 2], [3, 4], [5, 6]]$ and $l=[1, 2, 3]$. Compute $r + l$. It should raise errors. Why?
3. Adjust the size of $l$ in example 2 so that the sum works. What size should $l$ have in order for broadcasting to work on $r + l$?
4. Create random integers tensors $s$ of size (2, 1, 3, 1) and $t$ of size (1, 3, 1, 3). Does broadcasting work here in order to compute $s+t$? In case it does, predict the final shape of the result. 
 
 

In [26]:
## 1: compute v + w
v = torch.tensor([[1, 2, 3], [4, 5, 6]])
w = torch.tensor([3, 2, 1])
print("## 1: v + w\n", v + w)

## 2: compute r + l. It doesn't work, why?
r = torch.tensor([[1, 2], [3, 4], [5, 6]])
l = torch.tensor([1, 2, 3])
try:
    r + l
except RuntimeError:
    print("## 2: broadcasting doesn't work, trailing dimensions don't match:", r.shape[-1], ",", l.shape[-1])

## 3: adjust the size of l, and compute r + l
# Since r has shape (3, 2) and l has shape (3),
# We need to add an additional axis to l so that its shape is (3, 1).
print("## 3: use unsqueeze() to add a dimension to l:\n", r + torch.unsqueeze(l, -1))

## 4: compute s + t
s = torch.rand(2, 1, 3, 1)
t = torch.rand(1, 3, 1, 3)
# yes, it works. proceeding right to left, 
# we fix dimensions with size 1 so that the two tensors' shapes coincide.
print("## 4: the final shape should be (2, 3, 3, 3): and it is", (s + t).shape) 


## 1: v + w
 tensor([[4, 4, 4],
        [7, 7, 7]])
## 2: broadcasting doesn't work, trailing dimensions don't match: 2 , 3
## 3: use unsqueeze() to add a dimension to l:
 tensor([[2, 3],
        [5, 6],
        [8, 9]])
## 4: the final shape should be (2, 3, 3, 3): and it is torch.Size([2, 3, 3, 3])


## PyTorch functions

In this section we are going to learn the basic functions of PyTorch.

### Mean, min, max, sum ...

These functions are quite self-explanatory, and they work the same way as in NumPy. The only detail we ought to pay attention to is the axis along we want to perform the function (in NumPy it was called `axis`, in PyTorch `dim`).

Create a random tensor $v$ of ints of size (3, 2, 4) and print it.

In order to be sure you have understood what is going on, always try to predict the result and then check that your prediction is wrong/correct.

1. Compute the min value in the entire tensor.
2. Compute the max value along axis 0.
3. Compute the min along axis 1.
4. Multi-dimensional axes: take the sum over axes (0, 1). 


In [27]:
# Create v of shape (3, 2, 4)
v = torch.randint(0, 100, (3, 2, 4))
print(v)

tensor([[[16, 99, 60, 99],
         [75, 17, 60,  9]],

        [[28, 25, 90,  7],
         [40, 88, 79, 56]],

        [[12, 37, 36, 38],
         [58, 91, 12, 15]]])


In [29]:
## 1: Compute the min value of v.
print("## 1: min is ", v.min())

## 2: Compute the max along axis 0.
# Notice that max() also tells you the 0-axis indices where the max happens.
print("## 2: max along axis 0 has shape (3, 2):\n", v.max(dim=0))

## 3: Compute the min along axis 1.
print("## 3: min along axis 1 has shape (3, 4):\n", v.min(dim=1))

## 4: Compute the sum over axes (0, 1)
print("## 3: sum over axes (0, 1). " +
      "It means both axes 0 and 1 are squashed, the final size is only 4:", v.sum(dim=(0,1)))

## 1: min is  tensor(7)
## 2: max along axis 0 has shape (3, 2):
 torch.return_types.max(
values=tensor([[28, 99, 90, 99],
        [75, 91, 79, 56]]),
indices=tensor([[1, 0, 1, 0],
        [0, 2, 1, 1]]))
## 3: min along axis 1 has shape (3, 4):
 torch.return_types.min(
values=tensor([[16, 17, 60,  9],
        [28, 25, 79,  7],
        [12, 37, 12, 15]]),
indices=tensor([[0, 1, 0, 1],
        [0, 0, 1, 0],
        [0, 0, 1, 1]]))
## 3: sum over axes (0, 1). It means both axes 0 and 1 are squashed, the final size is only 4: tensor([229, 357, 337, 224])


### dot, matmul, transpose, *

Unlike NumPy, Torch has a stricter policy on these operands:

- `*`: is the Hadamard product, element-wise product.
- `dot`: only used to compute the dot product of two 1-dimensional tensors. Remember how confusing the dot product between multi-dimension NumPy vectors is (see Exercise 1)? PyTorch avoids this issue by simply forbidding the dimension of the input tensors to be greater than 1.
- `matmul`: or its alias `@` computes the matrix product. Can be used for larger than 2-dimensional tensors (it applies broadcasting, as much as in NumPy). Notice that the complexity of multiplying two $n\times n$ matrices is $O(n^3)$. We are taking advantage of its relatively high time-complexity in order to show how much faster are GPUs wrt CPUs.

1. Create two random integer tensors $A$ and $B$ of compatible(?) sizes and compute their Hadamard product (element by element product). Try these sizes (predict whether they work or not):
    - $A$ size (3, 4), $B$ size (3, 4).
    - $A$ size (3, 4), $B$ size (4, 4).
    - $A$ size (3, 4), $B$ size (1, 4).
2. Create two random 1-dimensional tensors $v, w$ and compute their dot product. If you can use multiple ways to compute it, check that indeed they return the same value.
3. Create $C$ of size (3, 4) and $D$ of size (4, 3). Compute the matrix product. Are the sizes compatible?
4. Create $E$ of size (3, 3) and $F$ of size (4, 3). Compute the matrix product. Are the sizes compatible? If not, use the transpose operator to adjust the dimensions of one of the two matrices and compute the matrix product.




In [32]:
## 1: Create A, B and perform hadamard product
# A (3,4), B (3,4)
A = torch.rand(3, 4)
B = torch.rand(3, 4)
print("## 1: when both A and B have the same size then the hadamard product works:\n", (A * B).shape)

# A (3,4), B (4,4)
# They have mismatching sizes!! Broadcasting cannot work either!

# A (3,4), B (1,4)
# Although they don't have the same size, broadcasting enlarges B to get to size (3, 4)
A = torch.rand(3, 4)
B = torch.rand(1, 4)
print("## 1: B gets enlarged by broadcasting have the same size then the hadamard product works:\n", (A * B).shape)

## 2: Create 1-dimensional tensors v, w and compute their dot product.
v = torch.rand(3)
w = torch.rand(3)
print("## 2: dot product", v.dot(w))
print("alternatively with matmul:", v.matmul(w))
print("alternatively with @:", v @ w)

## 3: Compute matrix product of C and D.
C = torch.rand(3, 4)
D = torch.rand(4, 3)
# Sizes are compatible (last of C = first of D)
print("## 3: C @ D:\n", (C @ D).shape)

## 4: adjust dimensions using .T, and compute the matrix product E @ F
E = torch.rand(3, 3)
F = torch.rand(4, 3)
# Sizes are not compatible. We can transpose F so that the size is (3, 4)
print("## 4: E @ F.T:", (E @ F.T).shape)

## 1: when both A and B have the same size then the hadamard product works:
 torch.Size([3, 4])
## 1: B gets enlarged by broadcasting have the same size then the hadamard product works:
 torch.Size([3, 4])
## 2: dot product tensor(1.2744)
alternatively with matmul: tensor(1.2744)
alternatively with @: tensor(1.2744)
## 3: C @ D:
 torch.Size([3, 3])
## 4: E @ F.T: torch.Size([3, 4])


Now, we try to prove empirically that GPUs are actually faster than CPUs at doing large calculations.

Create large tensors $G, H$ both of size (15000, 15000). Take their matrix product and measure how long it takes (use `%%time` cell magic notebook function).

In [33]:
# Create E and F
G = torch.rand(15000, 15000)
H = torch.rand(15000, 15000)

In [34]:
%%time
G @ H

CPU times: user 1min 37s, sys: 2.23 s, total: 1min 39s
Wall time: 9.1 s


tensor([[3747.3000, 3708.0327, 3715.6504,  ..., 3745.9561, 3738.2327,
         3755.8008],
        [3786.8828, 3728.3523, 3704.2632,  ..., 3764.4802, 3742.7209,
         3758.7354],
        [3792.3147, 3735.1011, 3735.4006,  ..., 3760.5247, 3771.2302,
         3780.5728],
        ...,
        [3792.9155, 3742.2104, 3725.9783,  ..., 3759.2773, 3745.9236,
         3777.8081],
        [3777.2478, 3744.6997, 3736.3315,  ..., 3783.3958, 3738.6348,
         3775.3091],
        [3783.7690, 3756.2952, 3728.4785,  ..., 3748.6155, 3739.5891,
         3749.5991]])

Move $E$ and $F$ to the more convenient device at your disposal (different from CPU, if possible), and compute the same matrix product.

In [35]:
# Move the tensors to GPU in another cell, so that the time is not counted.
G = G.to(device)
H = H.to(device)

In [36]:
%%time
G @ H

CPU times: user 7.09 ms, sys: 7.59 ms, total: 14.7 ms
Wall time: 18.4 ms


tensor([[3747.3137, 3708.0364, 3715.6570,  ..., 3745.9404, 3738.2397,
         3755.8044],
        [3786.8901, 3728.3479, 3704.2625,  ..., 3764.4812, 3742.7344,
         3758.7249],
        [3792.3008, 3735.1030, 3735.3962,  ..., 3760.5261, 3771.2334,
         3780.5669],
        ...,
        [3792.9199, 3742.2112, 3725.9788,  ..., 3759.2810, 3745.9275,
         3777.8079],
        [3777.2510, 3744.6982, 3736.3340,  ..., 3783.3857, 3738.6384,
         3775.3181],
        [3783.7734, 3756.2986, 3728.4778,  ..., 3748.6125, 3739.5864,
         3749.5984]], device='mps:0')

Side note: on my laptop (MacBook) I noticed a performance improvement by $\approx\times 10$. On Kaggle the performance improvement is much larger (from >20'' to <<1').
When you have done this task, you might want to shut down your notebook and start from the cells below since resource usage might be quite demanding. Also, if you are using Kaggle, you might consider shutting down the GPU, since the bottom cells can be done with CPU only.

### PyTorch functionals

As you will learn by attending this class, one of the key features of neural networks are their non-linear functions.
PyTorch has already implemented a great amount of them in the package `torch.nn.functionals`.
Create a random tensor $A$ of size (3, 2) and apply to it:

1. [ReLu](https://pytorch.org/docs/stable/generated/torch.nn.functional.relu.html)
2. [Tanh](https://pytorch.org/docs/stable/generated/torch.nn.functional.tanh.html)
3. [Sigmoid](https://pytorch.org/docs/stable/generated/torch.nn.functional.sigmoid.html)
4. [Softmax](https://pytorch.org/docs/stable/generated/torch.nn.functional.softmax.html) (it requires an axis: pick axis 1, predict the output shape).

If you are not familiar with them don't worry, you will learn in the remainder of the course what these functions are used for.

In [37]:
import torch
import torch.nn.functional as F

In [39]:
# Create A in the interval (-5, 5)
A = torch.rand(3, 2) * 10 - 5
print(A)

## 1: apply F.relu
print("## 1: ReLU. Notice that negative numbers are set to 0.\n", F.relu(A))

## 2: apply F.tanh
print("## 2: tanh. Notice that numbers are in the interval (-1, 1)\n", F.tanh(A))

## 3: apply F.sigmoid
print("## 3: sigmoid. Notice that numbers are in the interval (0, 1)\n", F.sigmoid(A))

## 4: apply F.softmax with dim=1
print("## 4: softmax, notice that along axis 1 numbers sum to 1. The shape is still (3, 2).\n", F.softmax(A, dim=1))

tensor([[-4.7276,  1.4128],
        [-0.1018,  3.6263],
        [ 1.6128,  4.2273]])
## 1: ReLU. Notice that negative numbers are set to 0.
 tensor([[0.0000, 1.4128],
        [0.0000, 3.6263],
        [1.6128, 4.2273]])
## 2: tanh. Notice that numbers are in the interval (-1, 1)
 tensor([[-0.9998,  0.8881],
        [-0.1015,  0.9986],
        [ 0.9236,  0.9996]])
## 3: sigmoid. Notice that numbers are in the interval (0, 1)
 tensor([[0.0088, 0.8042],
        [0.4746, 0.9741],
        [0.8338, 0.9856]])
## 4: softmax, notice that along axis 1 numbers sum to 1. The shape is still (3, 2).
 tensor([[0.0021, 0.9979],
        [0.0235, 0.9765],
        [0.0682, 0.9318]])


## Gradients

One of the useful features of PyTorch is that it is possible to compute automatically the gradient of functions. 
As you will see, the gradient of a function is one of the key ingredients of the backpropagation algorithm, used to train neural nets.

Assume we have tensor $x = [2], y = [2]$. We have $z = 2x^2 + 3y = [14]$.

We know that $\frac{\delta z}{\delta x} = 4x$, $\frac{\delta z}{\delta y} = 3$. Since we are evaluating the point $x=2, y=2$, we get that the gradient is (8, 3). The gradients are going to be stored in $x.grad$ and $y.grad$ if we specify the option `requires_grad=True`. We can let PyTorch compute the gradients by invoking `z.backward()`. Check that indeed `x.grad` and `y.grad` hold the desired values.


In [40]:
x = torch.tensor([2], dtype=torch.float64, requires_grad=True)
y = torch.tensor([2], dtype=torch.float64, requires_grad=True)
z = 2 * x*x + 3 * y
z.backward()
print(x.grad)
print(y.grad)

tensor([8.], dtype=torch.float64)
tensor([3.], dtype=torch.float64)


1. Create tensors $s = [1]$ and $t = [1]$, define a new variable $w = 5s + 6$ and compute their gradient. What is the gradient associated to $t$? (Notice that $w$ does not depend on $t$). 
2. What happens if I try to define an integer tensor with `requires_grad=True`?
3. What happens if I call `numpy()` on a tensor that has `requires_grad=True`?

In [43]:
s = torch.tensor([1.], requires_grad=True)
t = torch.tensor([1.], requires_grad=True)
w = 5 * s + 6
w.backward()

## 1: gradient of t for w = 5s + 6.
print("## 1: the gradient wrt t is:", t.grad, ", whereas the gradient for s is:", s.grad)

## 2: integer tensor with requires_grad.
try:
    t = torch.tensor([1], requires_grad=True)
except RuntimeError:
    print("## 2: only floating point tensors can have require_gradients=True!")

## 3: compute .numpy() of a tensor with requires_grad.
try:
    s_numpy = s.numpy()
except RuntimeError:
    print("## 3: only tensors without gradients can be turned to NumPy!")

## 1: the gradient wrt t is: None , whereas the gradient for s is: tensor([5.])
## 2: only floating point tensors can have require_gradients=True!
## 3: only tensors without gradients can be turned to NumPy!


In this last section we point out a very important feature of gradients, namely that they are *cumulative*! In order to see what does that mean, let's see in practice the example that was given in class:

1. Create tensors $x=[2], y=[3]$ (with flag `requires_grad=True`).
2. Compute $z = x * x + y$ and perform the backward pass.
3. Check that the gradients are as expected: $\frac{\delta z}{\delta x}=2x=4$, $\frac{\delta z}{\delta y} = 1$.
4. Compute $g = xy + 3x$ and perform che backward pass.
5. Check out the gradients: $\frac{\delta g}{\delta x}=y + 3=6$, $\frac{\delta g}{\delta y} = x = 2$.
6. Do you see the expected value? Can you explain why? (hint: gradients are *cumulative*).
7. In order to fix this potential issue, use `x/y.grad.zero_()` in between the computation of $z$ and $g$. Do you observe the expected gradient now?

In [45]:
## 1: create x, y.
x = torch.tensor([2.], requires_grad=True)
y = torch.tensor([3.], requires_grad=True)

## 2: compute z.
z = x * x + y
z.backward()

## 3: check out gradients of x, y.
print("## 3: grad of x:", x.grad, ", grad of y:", y.grad)

## 4: compute g.
g = x * y + 3 * x
g.backward()

## 5: check out gradients of x, y
print("## 5: grad of x:", x.grad, ", grad of y:", y.grad)
print("Gradients are cumulative, hence x.grad = 4 + 6 = 10, y.grad = 1 + 2 = 3.")

## 7: Repeat 1-5 using torch.zero_grad()
# We'll repeat everything with zero grad in between z and g:
x = torch.tensor([2.], requires_grad=True)
y = torch.tensor([3.], requires_grad=True)
z = x * x + y
z.backward()
# zero_grad()
x.grad.zero_()
y.grad.zero_()
g = x * y + 3 * x
g.backward()
print("## 7: grad of x:", x.grad, ", grad of y:", y.grad)
print("Now we see the expected gradient!")



## 3: grad of x: tensor([4.]) , grad of y: tensor([1.])
## 5: grad of x: tensor([10.]) , grad of y: tensor([3.])
Gradients are cumulative, hence x.grad = 4 + 6 = 10, y.grad = 1 + 2 = 3.
## 7: grad of x: tensor([6.]) , grad of y: tensor([2.])
Now we see the expected gradient!
