# NOTE: This notebook is not mandatory to pass the exercise (8 Bonus Points)

As explained in the README file, this notebook covers basic and advanced concepts of PyTorch. However, you will find some exdrcises that will grant you bonus points.

It consists of two sections. The first one is on Tensor Manipulation, while the second one is on Tensor Operations and Einsum

Prerequisites:

- Python

Authors:
- Andrea Sanchietti
- Niklas Berndt
- Eyvaz Najafli
- Based on a notebook by Prof. Emanuele Rodolà (rodola@di.uniroma1.it) and Dr. Donato Cristomi (crisostomi@di.uniroma1.it).



# Section 1 - Tensor Manipulation

- This part covers PyTorch Tensors: creation, gpu tensors, shape manipulation, indexing


## 1.1 Introduction

Many Deep Learning frameworks have emerged for python. Arguably the most notable ones in 2024 are **PyTorch**, **TensorFlow** (with keras frontend) and **Jax**.
We will use PyTorch, which is [the leading DL framework](https://thegradient.pub/state-of-ml-frameworks-2019-pytorch-dominates-research-tensorflow-dominates-industry/) for research and [continues to gain popularity](https://openai.com/blog/openai-pytorch/).

The fundamental data structure of these frameworks is the **tensor**, which is more or less the same everywhere. _A solid understanding of how tensors work is required in deep learning_ and will definitely come in handy in other areas.

This part of the exercise concernes basics of tensors and operations between tensors.

## 1.2 Numpy

The adoptive father of Python's deep learning frameworks is Numpy, the historical library which added support for large, multi-dimensional arrays and matrices to Python.

As we will see, modern deep learning frameworks (and especially PyTorch) have drawn largely from Numpy's API, while at the same time overcoming its limitations such as the absence of GPU support or automatic differentiation. The student has become the master.

![img](https://i.imgur.com/KaUdmee.png)

We will mainly use PyTorch tensors for implementing our Deep Learning systems, but knowing how to use Numpy remains very important. Note that:

- **Numpy arrays** and **PyTorch tensors** are very similar, most of the features that we will explain for PyTorch tensors apply also to Numpy arrays.
- In real DL systems you need to constantly switch between PyTorch and Numpy.

In [None]:
import numpy as np

## 1.3 PyTorch

During the exercise we'll use and learn many parts of PyTorch API.
You should also familiarize with the [PyTorch Documentation](https://pytorch.org/docs/stable/) as it will greatly assist you.




In [None]:
import torch
torch.__version__

### 1.3.1 **PyTorch Tensor**

The ``Tensor`` class is very similar to numpy's ``ndarray`` and provides most of its functionality.


However, it also has two important distinctions:

- ``Tensor`` supports GPU computations.
- ``Tensor`` may store extra information needed for back-propagation:
  - The gradient tensor w.r.t. some variable (e.g. the loss)
  - A node representing an operation in the computational graph that produced this tensor.


Keep in mind:
- Usually **tensor operations are not in-place**.

#### **Tensor instantiation**

A tensor represents an n-dimensional grid of values, **all of the same type**.

In [None]:
# Basic tensor creation from python lists
torch.tensor([[1, 2, 3], [4, 5, 6]], dtype=torch.int32)

In [None]:
# Some other tensor construction methods
torch.zeros((3,5))

In [None]:
torch.ones((2,5), dtype=torch.float64)

In [None]:
torch.eye(4)

In [None]:
torch.rand((2,2))  # from which distribution are these random numbers sampled? Check the PyTorch documentation

**Pro tip**: Bookmark the [PyTorch docs](https://pytorch.org/docs/stable/).

In [None]:
torch.randint(0, 100, (3,3))

In [None]:
t = torch.rand((3, 3))
torch.ones_like(t)

One can easily convert to/from Numpy tensors:

In [None]:
t = torch.rand((3, 3), dtype=torch.float32)
t.numpy()

In [None]:
n = np.random.rand(3,3).astype(np.float16)
torch.from_numpy(n)

There are many other functions available to create tensors!

> **EXERCISE**
>
> Create a matrix $M \in \mathbb{R}^{3 \times 3}$ that is filled with 2 along the diagonal and 1 elsewhere, that is:
>
> $$
m_{ij} =
\begin{cases}
2 & \text{if } i = j \\
1 & \text{otherwise}
\end{cases}
$$

In [None]:
# 📝 write your solution in this cell

Expected Output:

```
tensor([[2., 1., 1.],
        [1., 2., 1.],
        [1., 1., 2.]])
```

#### **Tensor properties**

The **type** of a tensor is the type of each element contained in the tensor:

In [None]:
t = torch.rand((3, 3))
t.dtype


The **shape** of a tensor is a tuple of integers giving the size of the tensor along each dimension, e.g. for a matrix $M \in \mathbb{R}^{3 \times 5}$:

In [None]:
t = torch.rand((3,5))
t.shape

The **device** of a tensor indicates the memory in which the tensor is currently stored: RAM (denoted as ``cpu``) or GPU memory (denoted as ``cuda``)

In [None]:
t = torch.rand((3,5))
t.device

> **EXERCISE**
>
> Given a matrix $X \in \mathbb{R}^{m \times n}$, create another matrix $Y \in \mathbb{R}^{m \times 3}$ filled with ones using $X$.

In [None]:
# Exercise variables
X = torch.rand(20,42)

# Your solution:
# Y = ?

Expected Output:

```
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
```

#### **Using the GPU**

> If you don't have a gpu, the code of this section may not work

Thanks to the explosion of the videogame industry in the last 50 years, the performance of the chips specialized in rendering and processing graphics --known as GPUs-- has dramatically improved.

In 2007 NVidia realized the potential of parallel GPU computing outside the videogame world, and released the first version of the CUDA framework, allowing  software developers to use GPUs for general purpose processing.

Graphics operations are mostly linear algebra operations, and accelerating them can turn very useful in many other fields.

In 2012 Hinton et al. [demonstrated](https://en.wikipedia.org/wiki/AlexNet) the huge potential of GPUs in training deep neural networks, starting *de facto* the glorious days of deep learning.

In [None]:
# Check if the GPU is available
torch.cuda.is_available()

In [None]:
# If available use the GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

In [None]:
t = torch.rand((3,3))
t = t.to(device)  # Note that we are assigning back to t, otherwise t won't be updated!
t

In [None]:
# Construct tensors directly on the GPU memory
t = torch.ones((5, 5), device='cuda')
t

In [None]:
t = torch.rand((3,3))

# Other shortcuts to transfer tensors between devices

# Be careful of hardcoded cuda calls: the code will not run if a GPU is not available
t = t.cuda()
t

In [None]:
t = t.cpu()
t

In [None]:
# Utility function to print tensors nicely. We will use this all the time.

from typing import Union, Sequence

def print_arr(
    *arr: Sequence[Union[torch.Tensor, np.ndarray]], prefix: str = ""
) -> None:
    """
    Pretty print tensors, together with their shape and type

    :param arr: one or more tensors
    :param prefix: prefix to use when printing the tensors
    """
    print(
        "\n\n".join(
            f"{prefix}{str(x)} <shape: {x.shape}> <dtype: {x.dtype}>" for x in arr
        )
    )

t = torch.rand((3,3), dtype=torch.float32)
print_arr(t, prefix='My tensor = ')

#### **Tensor rank**

In Numpy and PyTorch, the **rank of a tensor** denotes the number of dimensions. For example, any matrix is a tensor of rank 2.

Don't confuse this with the rank of a matrix, which has a completely different meaning in linear algebra!

- **rank-0** tensors are just scalars

In [None]:
t0 = torch.tensor(3, dtype=torch.double)

print_arr(t0)  # notice torch.Size in the printed output

In [None]:
item = t0.item()  # convert the tensor scalar to a python base type
item, type(item)

In [None]:
# Be careful, a non-scalar tensor cannot be converted with an .item() call
try:
  x = torch.ones(3).item()
except RuntimeError as e:
  print('Error:', e)

- **rank-1** tensors are sequences of numbers. A sequence of length ``n`` has the shape ``(n,)``

In [None]:
# A rank-1 tensor
t1 = torch.tensor([1, 2, 3])

print_arr(t1)

In [None]:
# A rank-1 tensor with a single scalar
print_arr(torch.tensor([42]))

PyTorch and NumPy are smart: if a tensor is not rank-0 but can be converted to a rank-0 tensor, then the .item() will work.

This operation is called **broadcasting**.

In [None]:
# A rank-1 tensor with a single element can be converted to a rank-0 tensor
torch.tensor([42]).item()

- **rank-2** tensors have the shape ``(n, m)``

In [None]:
t2 = torch.tensor([[1, 2, 3], [4, 5, 6]])

print_arr(t2)

In [None]:
# element (i,j) of a rank-2 tensor just means the j-th element of the i-th rank-1 tensor
t2[1, 2].item()

In [None]:
# To mimick the notion of a column vector from linear algebra, we can use a rank-2 tensor
t_col = t1.reshape(-1, 1)

print_arr(t_col)

In [None]:
# ...and similarly for row vectors
t_row = t1.reshape(1, -1)

print_arr(t_row)

- **rank-k** tensors have a shape of $(n_1, \dots, n_k)$

In [None]:
print_arr(torch.zeros((2, 3, 4)))

In [None]:
print_arr(torch.ones((2, 2, 2, 2)))

> **EXERCISE**
>
> Build a tensor $X \in \mathbb{R}^{k \times k}$ filled with zeros and the sequence $[0, ..., k-1]$ along the diagonal

In [None]:
# your solution
k = 12
# ...

expected output for k=12

```
tensor([[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
        [ 0,  1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
        [ 0,  0,  2,  0,  0,  0,  0,  0,  0,  0,  0,  0],
        [ 0,  0,  0,  3,  0,  0,  0,  0,  0,  0,  0,  0],
        [ 0,  0,  0,  0,  4,  0,  0,  0,  0,  0,  0,  0],
        [ 0,  0,  0,  0,  0,  5,  0,  0,  0,  0,  0,  0],
        [ 0,  0,  0,  0,  0,  0,  6,  0,  0,  0,  0,  0],
        [ 0,  0,  0,  0,  0,  0,  0,  7,  0,  0,  0,  0],
        [ 0,  0,  0,  0,  0,  0,  0,  0,  8,  0,  0,  0],
        [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  9,  0,  0],
        [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 10,  0],
        [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 11]]) <shape: torch.Size([12, 12])> <dtype: torch.int64>
```

> **EXERCISE**
>
> What is the shape of the following tensor?
>
> ```python
> torch.tensor(
>     [
>         [[1.0, 1.0, 1.0],
>          [1.0, 1.0, 1.0]],
>
>         [[1.0, 1.0, 1.0],
>          [1.0, 1.0, 1.0]],
>
>         [[1.0, 1.0, 1.0],
>          [1.0, 1.0, 1.0]],
>
>         [[1.0, 1.0, 1.0],
>          [1.0, 1.0, 1.0]],
>     ]
> )
> ```



In [None]:
# Think about it, then confirm your answer by writing code here

### 1.3.2 **Changing and adding dimensions**

PyTorch provides several functions to manipulate tensor shapes


#### **Transpose dimension**

In [None]:
a = torch.ones((3, 5))
a[0, -1] = 0  # index -1 denotes the last element, as in common python indexing
print("a: ")
print_arr(a)

In [None]:
a.T

In [None]:
a.transpose(1, 0)  # Swap dimension 1 and 0

In [None]:
torch.einsum('ij -> ji', a)  # transpose using Einstein notation

# We will explain the Einstein notation later

#### Transpose in k-dimensions and in Numpy


In [None]:
a = torch.ones((2, 3, 6))
a[1, 2, 4] = 42
print_arr(a)

In [None]:
a.transpose(2, 1)

In [None]:
torch.einsum('ijk->ikj', a)

Shortcuts are handy, but your code becomes less readable.
Most of the time readability is the most important goal to aim for!

> **NOTE**
>
> In Numpy the transpose function is different!
>
> PyTorch:
> `torch.transpose(input, dim0, dim1) → Tensor`
>
> NumPy:
> `numpy.transpose(a, axes=None) -> numpy.ndarray`
>
> Compare the docs from [numpy](https://numpy.org/doc/stable/reference/generated/numpy.transpose.html) and [pytorch](https://pytorch.org/docs/stable/generated/torch.transpose.html)
>
> In PyTorch the transpose swaps two dimensions. In NumPy you can specify a complete mapping to change all the dimensions.

In [None]:
a = np.arange(10).reshape(2, 5)
a

In [None]:
a.transpose(1, 0)

In [None]:
a.transpose(0, 1)

In [None]:
torch.from_numpy(a).transpose(0, 1)

In [None]:
# The einsum is cross platform. It works with consistent semantics
# pretty much everywhere: PyTorch, NumPy, TensorFlow, Jax, ...
# We will see the power of einsum in the next lab
np.einsum('ij -> ji', a)

#### **Reshape**

Another important feature is **reshaping** a tensor into different dimensions

- We need to make sure to **preserve the same number of elements**.
- `-1` in one of the dimensions means **"figure it out"**.


❌❌❌ Pay attention that **transposing and reshaping are two fundamentally different operations**:

In [None]:
a = torch.arange(12).reshape(3,4 )
a

In [None]:
# The classical transpose
a.t()

In [None]:
# Reshape into the transpose shape
a.reshape(4, 3)

#### **What is `reshape` really doing?**



Think of the `reshape` operation as unrolling the tensor **row-wise**, to obtain a rank-1 tensor *(matlab users: matlab unrolls **column-wise**, pay attention when converting code!)*. Then it stores the values in this tensor following the specified dimensions.

```python
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
```
$-$ unrolling $ \to $

```python
tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
```

Then, reading the target shape from right to left, organize the values into the dimensions:

- e.g. reshape into `[4, 3]`:

```python
tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
```

$-$ organize in groups of $3$ $ \to $

```python
tensor([[0,  1,  2],  [3,  4,  5],  [6,  7,  8],  [9, 10, 11]])
```

$-$ organize in groups of $4$ $ \to $

```python
tensor([[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8],
        [ 9, 10, 11]])

# same shape of corresponding transpose, but the values are stored differently!
```

- e.g. reshape into `[2, 2, 3]`:

```python
tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
```

$-$ organize in groups of $3$ $ \to $

```python
tensor([[0,  1,  2],  [3,  4,  5],  [6,  7,  8],  [9, 10, 11]])
```

$-$ organize in groups of $2$ $ \to $

```python
tensor([[[0,  1,  2],  [3,  4,  5]],  [[6,  7,  8],  [9, 10, 11]]])
```

$-$ organize in groups of $2$ $ \to $

```python
tensor([[[ 0,  1,  2],
         [ 3,  4,  5]],

        [[ 6,  7,  8],
         [ 9, 10, 11]]])
```

In [None]:
a = torch.arange(12)
print_arr(a)

In [None]:
a.reshape(6, 2)

In [None]:
a.reshape(2, 6)

In [None]:
a.reshape(2, 2, 3)

In [None]:
try:
  a.reshape(5, -1)
except RuntimeError as e:
  print('Error:', e)

In [None]:
a.reshape(1, -1)

In [None]:
a.reshape(-1, 1)

In [None]:
a.reshape(-1)  # we are flattening the rank-k tensor into a rank-1 tensor

> **NOTE**
>
> We can add or remove dimensions of size `1` using `torch.unsqueeze` or `torch.squeeze`

In [None]:
a

In [None]:
a.shape

In [None]:
a.unsqueeze(0).shape  # adds a new dimension at the beginning

In [None]:
a.unsqueeze(-1).shape  # adds a new dimension at the end

> **NOTE**
>
> Often the reshape does not require a physical copy of the data, but just a logical
> reorganization.
>
> If you are curious about the NumPy/PyTorch tensor internals, a good starting point to learn about *strides* is this [SO answer](https://stackoverflow.com/questions/53097952/how-to-understand-numpy-strides-for-layman).
> tldr: often you can reshape tensors by changing only its strides and shape. The strides  are the byte-separation between consecutive items for each dimension.
>
> To be sure to obtain a *view* of the tensor, that shares the same underlying data, you can use the `torch.view` method.
> Its semantics is similar to `reshape`, but it works only on [`contiguous` tensors](https://discuss.pytorch.org/t/contigious-vs-non-contigious-tensor/30107/2) and it guarantees that no copy will be performed.

> **EXERCISE**
>
> Given a sequence of increasing numbers from `0` to `9`, defined as:
>
> ```python
> a = torch.arange(10)
> ```
>
> Use only the `reshape` and `transpose` functions to obtain the following tensor from `a`:
>
> ```python
> tensor([0, 2, 4, 6, 8, 1, 3, 5, 7, 9])
> ```

In [None]:
a = torch.arange(10)

# Write your solution here


print(a)

#### **Concatenation**

PyTorch provides many functions to manipulate tensors.
Two of the most common functions are:

- `torch.stack`: Adds a **new** dimension, and concatenates the given tensors along that dimension.
- `torch.cat`: Concatenates the given tensors along one of the **existing** dimensions.

In [None]:
a = torch.arange(12).reshape(3, 4)
b = torch.arange(12).reshape(3, 4) + 100
print_arr(a, b)

In [None]:
out = torch.stack((a, b), dim=0)
print_arr(out)

In [None]:
out = torch.cat((a, b), dim=0)
print_arr(out)

In [None]:
out = torch.cat((a, b), dim=1)
print_arr(out)

> **EXERCISE**
>
> Given a tensor $X \in \mathbb{R}^{3 \times 1920 \times 5 \times 1080}$ reorganize it in order to obtain a tensor $Y \in \mathbb{R}^{5 \times 1920 \times 1080 \times 3}$
>
> Think of $X$ as a tensor that represents $5$ RGB images of size $1080\times 1920$. Your goal is to reorganize this tensor in a sensible (and usable) way.
>
> *HINT: there are different ways of solving this problem. Look at the documentation for **transpose** or **permute***

In [None]:
a = torch.rand(3, 1920, 5, 1080)
a.shape

In [None]:
# Your solution


### 1.3.3 **Tensor indexing**

PyTorch offers several ways to index tensors


#### **Standard indexing**

As a standard Python list, PyTorch tensors support the python indexing conventions:

In [None]:
a = torch.arange(10)
a

In [None]:
print(a[0])  # first element
print(a[5])  # sixth element

In [None]:
print(a[-1])  # last element
print(a[-2])  # second last element

#### **Multidimensional indexing**

Since tensors may be multidimensional, you can specify **one index for each dimension**:

In [None]:
a = torch.arange(10).reshape(2, 5)
a

In [None]:
a[1, 3]

In [None]:
a[0]

In [None]:
a[1]

In [None]:
a[0, -1]

#### **Slicing**

Similar to Python sequences and Numpy arrays, PyTorch tensors can be easily sliced using the slice notation:

```python
a[start:stop]  # items from start to stop-1 (i.e. the last element is excluded)
a[start:]      # items from start through the rest of the array
a[:stop]       # items from the beginning through stop-1
a[:]           # a shallow copy of the whole array
```

There is also an optional step value, which can be used with any of the above:

```python
a[start:stop:step] # from start to at most stop-1, by step
```

In [None]:
# Sum with scalar acts element-wise
a = torch.arange(10) + 10
a

In [None]:
# Take the elements in positions 5..6
a[5:7]

In [None]:
# Take the last 5 elements
a[-5:]

In [None]:
# Select every element having an even index
a[::2]

With multidimensional tensors we can perform **multidimensional slicing**:

In [None]:
a = torch.arange(10).reshape(2, 5)
a

In [None]:
# Take the second column
a[:, 1]

In [None]:
# Take the last column
a[:, -1]

In [None]:
# Take a slice from the last row
a[-1, -3:]

You can **assign** to sliced tensors, therefore *modifying the original tensor*.

This means that sliced tensors are **shallow copies**: the resulting tensors **share the underlying data** with the original tensor.

In [None]:
a = torch.arange(10).reshape(2, 5)
a

In [None]:
b = a[0:2, 1:3]
b

In [None]:
b[-1, :] = -999
b

In [None]:
# The original tensor has been modified
a

In [None]:
a[-1, -1] = -1
a

> **NOTE**
>
> Indexing with **integers yields lower rank tensors**
>
> Integer indexing simply means we don't use slices (:) or boolean masks for indexing.

In [None]:
a = torch.arange(12).reshape(3, 4)
print_arr(a)

In [None]:
# Rank-1 view of the second row of a
row_r1 = a[1, :]
print_arr(row_r1)  # notice the size of the resulting tensor, which is now lower than the original tensor

In [None]:
# Rank-2 view of the second row of a
row_r2 = a[1:2, :]
print_arr(row_r2)

In [None]:
# Rank-2 view of the second row of a
row_r3 = a[[1], :]
print_arr(row_r3)

In [None]:
# Same with the columns
print_arr(a[:, 1])
print_arr(a[:, [1]])

#### **Integer array indexing**

When we use slices (:), the resulting tensor view will always be a subarray of the original tensor.

In contrast, if we index with integers only, we can construct arbitrary tensors using the data from another tensor.

In [None]:
a = torch.arange(1, 7).reshape(3, 2)
print_arr(a)

In [None]:
# Example of integer array indexing
# The returned array will have shape (3,)
b = a[[0, 1, 2], [0, 1, 0]]
print_arr(b)

In [None]:
# The above is equivalent to:
v1, v2, v3 = a[0, 0], a[1, 1], a[2, 0]
b = torch.tensor([v1, v2, v3])
print_arr(b)

In [None]:
# You can re-use the same element of the source tensor multiple times!
print_arr(a[[0, 0], [1, 1]])
print_arr(torch.tensor([a[0, 1], a[0, 1]]))

In [None]:
# You can use another tensor to perform the indexing,
# as long as they have dtype=torch.int64 (synonym for torch.long)
i = torch.ones(3, dtype=torch.int64)
i

In [None]:
j = torch.tensor([0, 1, 0])
j

In [None]:
out = a[i, j]

print_arr(a, out)

> **EXERCISE**
>
> Using a single assignment, change the elements of a tensor $X \in \mathbb{R}^{4 \times 3}$ as follows:
>
> `X[0,2] = -1`
>
> `X[1,1] = 0`
>
> `X[2,0] = 1`
>
> `X[3,1] = 2`



In [None]:
# Mutate one element from each row of a matrix
a = torch.arange(12).reshape(4, 3)
a

Expected Output:

```
tensor([[ 0,  1, -1],
        [ 3,  0,  5],
        [ 1,  7,  8],
        [ 9,  2, 11]])
```

> ❌❌❌ **NOTE**
>
> **Slice indexing vs Array indexing**
>
> Be careful, since slice indexing and array indexing are different operations!

In [None]:
a = torch.arange(16).reshape(4, 4)
a

In [None]:
a[0:3, 0:3]

In [None]:
a[[0, 1, 2], [0, 1, 2]]

In [None]:
a[torch.arange(0,3), torch.arange(0,3)]

In [None]:
a[0:5:2, 0:5:2]

With *slice indexing* you return a sub-tensor.

#### **Boolean array indexing**

This type of indexing is used to select the elements of a tensor that satisfy some condition (similar to MATLAB's logical indexing):

In [None]:
a = torch.arange(6).reshape(3, 2)
a

In [None]:
bool_idx = (a > 2)
bool_idx

In [None]:
a[bool_idx]  # remember that NumPy and PyTorch unroll row-wise and not column-wise like Matlab

If you want to know more about indexing in PyTorch and Numpy read the [docs](https://numpy.org/doc/stable/user/basics.indexing.html#basics-indexing)

### 1.3.4 Exercises

> **EXERCISE**
>
> Build a 3D tensor in $X \in \mathbb{R}^{3 \times 3 \times 3}$ that has ones along the 3D-diagonal and zeros elsewhere, i.e. a 3D identity.

In [None]:
# Write here your solution
# X = ?

Expected Output:

```
tensor([[[1., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 1., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 1.]]])
```

> **EXERCISE**
>
> Now, given a number N, build a 3D tensor in $X \in \mathbb{R}^{N \times N \times N}$ that has ones along the 3D-diagonal and zeros elsewhere, i.e. a 3D identity.

In [None]:
# Write here your solution
N = torch.randint(0, 10, (1, )).item()
# X = ?

Expected Output for N=2:

```
tensor([[[1., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 1.]]])
```
Expected Output for N=5:

```
tensor([[[1., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 1., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 1., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 1., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 1.]]])
```

> **EXERCISE**
>
> You are given a 3D tensor $X \in \mathbb{R}^{w \times h \times 3}$ representing a $w \times h$ image with `(r, g, b)` color channels. Assume that colors take values in $[0, 1]$.
>
> Color the image $X$ completely by red, i.e. `(1, 0, 0)` in the `(r, g, b)` format.

In [None]:
# Create and visualize a black image
x = torch.zeros(100, 200, 3)

%matplotlib inline
import matplotlib.pyplot as plt
img = plt.imshow(x)

In [None]:
# Write here your solution

Expected Output:

![Red Image](../data/red_image_output.png)

> **EXERCISE**
>
> You are given the GitHub logo $X \in \mathbb{R}^{560 \times 560}$.  Assume the logo is in gray scale, with the color $c \in [0, 1]$ (remember 0 $\to$ black).
>
> 1. Change the black-ish color into light gray: $0.8$.
> 2. Then draw a diagonal and anti-diagonal black line (i.e. an X) on the new image, to mark that the new logo is wrong.

In [None]:
from skimage import io

image = io.imread('https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png', as_gray=True)
_ = plt.imshow(image, cmap='gray', vmin=0, vmax=1)

In [None]:
# Change the black into light-gray
X = torch.from_numpy(image.copy())  # PyTorch CPU and Numpy share the memory!
# Write your code here

_ = plt.imshow(X, cmap='gray', vmin=0, vmax=1)

In [None]:
# Mark the new image as wrong with a big black X
# Write your code here

_ = plt.imshow(X, cmap='gray', vmin=0, vmax=1)

Expected Output:

![Expected Output](../data/gh_expected_result.png)

---

# Section 2 Tensor Operations and 3D Transformations

- This part covers Tensor operations (broadcasting, (not)-elementwise operations, tensor contraction, einsum)

## Introduction

In this part of the exercise we will continue to learn basic tensor usage, we will cover broadcasting, fundamental linear algebra operations, `einsum`!

All these tensor operations will come in handy to build our deep neural networks.
Yet, the high level API offered by PyTorch to perform GPU-accelerated linear algebra operations may turn useful in many other fields, from microbiology to fluid dynamics.

The GPU computing paradigm offers several benefits over single-core machines or traditional supercomputers equipped with many single-core nodes.
Deep learning frameworks such as the one we are studying are a very good compromise between simplicity and expressivenes to unleash the power of GPU-computing.

To get even more control you can tackle directly the CUDA language, but we won't go there with this course!

## PyTorch

**Reminder:** Familiarize with the [PyTorch Documentation](https://pytorch.org/docs/stable/) as it will greatly assist you.






####Set torch and numpy random seeds for reproducibility

As we will see, several operations in deep learning (e.g. training a network) rely on randomness in order to work effectively. This means that we will get different results each time we run a test, which can make design and debugging difficult.

To this end, we usually **set a fixed seed** for the pseudo-random number generator, so that we are sure to always see the "same randomness" that makes our tests reproducible.

> Once your model works, remember to test multiple times _without_ a fixed seed! The results you got at design time may be due to overfitting the seed (e.g. you have chosen hyperparameters that happen to work particularly well with a given seed.), or just out of luck.

If you are going to use a gpu, two further options must be set.

In [None]:
import random

np.random.seed(42)
random.seed(0)

torch.manual_seed(0)
torch.cuda.manual_seed_all(0)
torch.backends.cudnn.deterministic = True  # Note that this Deterministic mode can have a performance impact
torch.backends.cudnn.benchmark = False

# some frameworks aid the reproducibility of your code,
# e.g. PyTorch Lightning exposes a `seed_everything` function by default:
# https://github.com/PyTorchLightning/pytorch-lightning/blob/e1f5eacab98670bc1de72c88657404a15aadd527/pytorch_lightning/utilities/seed.py#L29

### **Tensor operations**



In [None]:
t = torch.rand(3,3)
t

Functions that operate on tensors are often accessible in different ways:

- From the **`torch` module**...:

In [None]:
torch.add(t, t)

- ...or by tensors **methods**:

In [None]:
t.add(t)

- ...or even through **overloaded** operators:




In [None]:
t + t

None of the above operates in-place:

In [None]:
# t is unchanged
t

These functions are all equivalent, they are *aliases* of the same method.
Personal preference, code consistency, and readability should guide your decision of which one to use.

> e.g. `torch.add(...)` may be too verbose, but in some cases it may be preferable since it makes explicit to the code-reader that you are dealing with tensors. Nevertheless, if you are using [types](https://docs.python.org/3/library/typing.html) -- and you should be using types -- it will be rarely necessary.


Most operation in PyTorch are **not in-place**. It means that the resulting tensor is a *new* tensor, and it does not share the underlying data with other tensors. Changes to the new tensor are not reflected to other tensors.


In-place operations are still available in PyTorch, and in some cases (e.g. when you don't need autodiff) they can be useful; they are more efficient, since they never require to perform deep copies of the data.
They are normally recognized by a trailing `_`:

In [None]:
t

In [None]:
t.add_(t)  # notice the trailing _

In [None]:
t  # t itself changed!

Another common in-place operation is the assignment:

In [None]:
t[0] = 42
t

#### **Basic operations and broadcasting**

Basic mathematical operations $(+, -, *, /, **)$ are applied **element-wise**: for example, if `x` and `y` are two tensors, the product `x*y` is a tensor with the same size, and its values are the element-wise products of the two tensors. In mathematics, this is also called a Hadamard product.

**Broadcasting** is another powerful mechanism that allows PyTorch to perform operations on tensors of different shapes. The most basic example is summing a scalar (a rank-0 tensor) to a matrix (a rank-2 tensor).

In [None]:
x = torch.tensor([[1, 2], [3, 4]], dtype=torch.float64)
y = torch.tensor([[5, 6], [7, 8]], dtype=torch.float64)

print(x + y)  # element-wise sum
print(x + 4.2)  # broadcasting

In [None]:
# other examples
print(x * y - 5)
print((x - y) / y)  # element-wise division!

Broadcasting is quite powerful! When you perform an operation between two tensors with different shape, PyTorch automatically "broadcasts" the smaller tensor across the larger tensor so that they have compatible shapes.

In the example below, the sequence `v` is replicated (_without actually copying data!_) along the missing dimension so that it fits the shape of matrix `m`:

In [None]:
m = torch.arange(12).reshape(4, 3)
v = torch.tensor([100, 0, 200])
n = m + v
print_arr(m, v, n)

In this other example `m` and `u` are both rank-2, but the smaller one (`u`) is expanded along the dimension where it has size 1 to fit `m`:

In [None]:
m = torch.arange(12).reshape(4, 3)
u = torch.tensor([0, 10, 0, 20]).reshape(4,1)
n = m + u
print_arr(m, u, n)

In the following example, both tensors are expanded along their size-1 dimensions, so that the sum makes sense:

In [None]:
w = u + v
print_arr(u, v, w)

Mastering broadcasting is hard!

However, it is very convenient as it allows writing **vectorized** code, i.e., code that avoids explicit python loops which can not be efficiently parallelized.

Technically, broadcasting takes advantage of the underlying C implementation of PyTorch and Numpy (on CPU) or CUDA implementation of Pytorch (on GPU). Here's a take-home illustration for your convenience:

![broadcasting](https://jakevdp.github.io/PythonDataScienceHandbook/figures/02.05-broadcasting.png)

##### **EXERCISE**
>
> Given two vectors $x \in \mathbb{R}^n$ and $y \in \mathbb{R}^m$, compute the differences between all possible pairs of their elements, and organize these differences in a matrix $Z \in \mathbb{R}^{n \times m}$:
> $$ z_{ij} = x_i - y_j $$

In [None]:
x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5])

# ✏️ your code here

Expected Output:

```
tensor([[-3, -4],
        [-2, -3],
        [-1, -2]]) <shape: torch.Size([3, 2])> <dtype: torch.int64>
```

#### 📖 **Broadcasting, let's take a peek under the hood**

To recap: if a PyTorch operation supports broadcast, then **its tensor arguments can be implicitly expanded to be of equal sizes** (without making copies of the data).

###### **Broadcastable tensors**

Two tensors are "broadcastable" if:
- Each tensor has at least one dimension
- When iterating over the dimension sizes, starting at the trailing dimension, the dimension **sizes** must either **be equal**, **one of them is 1**, or **one of them does not exist**.


###### **Broadcasting rules**

Broadcasting two tensors together follows these rules:

1. If the input tensors have different ranks, **singleton dimensions are prepended to the shape** of the smaller one until it has the same rank as the other
2. The size in each dimension of the **output shape** is the maximum size in that dimension between the two tensors
3. An input can be used in the computation if its size in a particular **dimension either matches** the output size in that dimension, **or is a singleton dimension**
4. If an input has a dimension size of 1 in its shape, the **first data entry in that dimension will be used for all calculations** along that dimension.

**Example**:

- `m` has shape `[4, 3]`
- `v` has shape `[3,]`.


In [None]:
print_arr(m, v)

In [None]:
n = m + v
print_arr(n)


Following the Broadcasting logic, this is what happened:

- `v` has less dims than `m` so a dimension of `1` is **prepended** $\to$ `v` is now `[1, 3]`.
- Output shape will be `[max(1, 4), max(3, 3)] = [4, 3]`.
- `dim 1` of `v` matches exactly `3`; `dim 0` is `1`, so we can use the first data entry in that dimension (i.e. the whole `row 0` of `v`) each time any row is accessed. This is effectively like converting `v` from `[1, 3]` to `[4, 3]` by stacking the repeated row four times.


For more on broadcasting, see the [documentation](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html).

Functions that support broadcasting are known as universal functions (i.e. ufuncs). For Numpy you can find the list of all universal functions in the [documentation](https://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs).

#### **EXERCISE (2 Point)**
>
> Given a tensor $Y \in \mathbb{R}^{n \times m}$ and an index pair $(a,b)$, for each element of $Y$ compute its $L_p$ distance to $(a,b)$, and store the resulting distance value in the corresponding cell of $Y$.
>
> In brief, compute:
> $$ y_{ij} = d_{L_p}\left( (i,j), (a,b) \right) \text{ for all }  i,j$$
>
> and visualize the resulting $Y$.
>
> Try different values of $p>0$ to see what happens.
>
> ---
>
> The [$L_p$ distance](https://en.wikipedia.org/wiki/Lp_space#The_p-norm_in_finite_dimensions) between two points $x$ and $y$ can be computed as: $d_{L_p}(x, y)=\left( \sum_{i=1}^n|x_i - y_i|^p\right)^{1/p}$
>
> Example: The $L_1$ distance between $(i,j) = (3, 5)$ and $(a,b) = (14, 20)$ is:
> $$ y_{3,5} = d_{L_1}( (3, 5), (14, 20) ) = |3 - 14| + |5 - 20| $$

In [None]:
# @title Utility function, you can execute and safely ignore this cell

import plotly.express as px

def plot_row_images(images: Union[torch.Tensor, np.ndarray]) -> None:
  """ Plots the images in a subplot with multiple rows.

  Handles correctly grayscale images.

  :param images: tensor with shape [number of images, width, height, <colors>]
  """
  from plotly.subplots import make_subplots
  import plotly.graph_objects as go
  fig = make_subplots(rows=1, cols=images.shape[0] ,
                      specs=[[{}] * images.shape[0]])

  # Convert grayscale image to something that go.Image likes
  if images.dim() == 3:
    images = torch.stack((images, images, images), dim= -1)
  elif (images.dim() == 4 and images.shape[-1] == 1):
    images = torch.cat((images, images, images), dim= -1)

  assert images.shape[-1] == 3 or images.shape[-1] == 4

  for i in range(images.shape[0]):
    i_image = np.asarray(images[i, ...])

    fig.add_trace(
        go.Image(z = i_image, zmin=[0, 0, 0, 0], zmax=[1, 1, 1, 1]),
        row=1, col=i + 1
    )

  fig.show()


# When using plotly pay attention that it often does not like PyTorch Tensors
# ...and it does not give any error, just a empty plot.

In [None]:
x = torch.zeros(300, 300)
a = 150
b = 150

x[a, b] = 1  # this will be overwritten by your distance-calculating code
plot_row_images(x[None, :])

In [None]:
# ✏️ your code here

# First write a function to calculate the distance.
# Fill the matrix with the Lp distance of each point to the center.
def distance(input: torch.Tensor, p: int) -> torch.Tensor:
    raise NotImplementedError("You need to implement this function")


P = 1
# Call you function and plot ...
mat = distance(x, P)
px.imshow(mat).show()
P = 8
# Call you function and plot ...
mat = distance(x, P)
px.imshow(mat).show()

Expected Output for P=1:

![L1 expected output](../data/l1_norm_expected_plot.png)

Expected Output for P=8:

![L8 expected output](../data/l8_norm_expected_plot.png)

#### **Non-elementwise operations**


PyTorch and NumPy provide many useful functions to perform computations on tensors:

In [None]:
x = torch.tensor([[1, 2, 3], [3, 4, 5]], dtype=torch.float32)
print_arr(x)

In [None]:
# Sum up all the elements
print_arr(torch.sum(x))

In [None]:
# Compute the mean of each column
print_arr(torch.mean(x, dim=0))

> **REMEMBER!**
>
> In order to avoid confusion with the `dim` parameter, you can think of it as an **index over the list returned by `tensor.shape`**. The operation is performed by iterating over that dimension.
>
> Example above: since our tensor `x` has shape `[2, 3]`, the dimension `dim=0` operates along the `2`.
>
> Visually (here array means _tensor_):
>
><img src="https://qph.fs.quoracdn.net/main-qimg-30be20ab9458b5865b526d287b4fef9a" width="500" >

In [None]:
print_arr(x)

In [None]:
# Compute the product of each row
print_arr(torch.prod(x, dim=1))

In [None]:
# Max along the rows (i.e. max value in each column)
values, indices = torch.max(x, dim=0)
print_arr(values)

In [None]:
# Max along the columns (i.e. max value in each row)
values, indices = torch.max(x, dim=1)
print_arr(values)

##### **Dim parameter, let's take a peek under the hood**


Let's see what the `dim` parameter exactly does:

In [None]:
dim = 2

a = torch.arange(2*3*4).reshape(2, 3, 4)
out = a.sum(dim=dim)
out

In [None]:
# It is summing over the `dim` dimension, i.e.:
a.shape

In [None]:
# The `dim` dimension has 4 elements
a.shape[dim]

In [None]:
# The dimension dim collapses, the output tensor will have shape:
new_shape = a.shape[:dim] + a.shape[dim + 1:]
new_shape

In [None]:
# Explicitly compute the sum over dim
out = torch.zeros(new_shape)

# iterate over all the rows
for r in range(a.shape[0]):
  # iterate over all the columns in the r-th row
  for c in range(a.shape[1]):

    for i in range(a.shape[dim]): # <- sum over 'dim'

      out[r, c] += a[r, c, i]

out

# **DO NOT** use for loops in your code

###### **EXERCISE**
>
> Given a matrix $X \in R^{k \times k}$:
> - Compute the mean of the values along its diagonal.
>
> Perform this computation in at least two different ways, then check that the results are the same.

In [None]:
x = torch.rand(4, 4)
print_arr(x)

In [None]:
# ✏️ your code here

With input x = 

```
[[0.0259, 0.9557, 0.8247, 0.2847],
[0.6301, 0.8209, 0.2687, 0.1967],
[0.1685, 0.0114, 0.9896, 0.5087],
[0.4334, 0.5446, 0.1283, 0.8798]]
```

The expected output is:

```
tensor(0.6791) <shape: torch.Size([])> <dtype: torch.float32>
```

##### **EXERCISE**
>
> Given a binary non-symmetric matrix $X \in \{0, 1\}^{n\times n}$, build the symmetric matrix $Y \in \{0, 1\}^{n \times n}$ defined as:
> $$
y_{ij} =
\begin{cases}
1 & \text{if } x_{ij} = 1 \\
1 & \text{if } x_{ji} = 1 \\
0 & \text{otherwise}
\end{cases}
$$
>
> *Hint*: search for `clamp` in the [docs](https://pytorch.org/docs/stable/index.html)

In [None]:
x = torch.randint(0, 2, (5, 5))  # Non-symmetric matrix
x

In [None]:
# ✏️ your code here

With input x = 

```
tensor([[0, 1, 0, 1, 1],
        [0, 0, 1, 0, 0],
        [1, 0, 1, 1, 0],
        [0, 1, 0, 0, 1],
        [1, 1, 0, 0, 1]])
```

the expected output is:

```
tensor([[0, 1, 1, 1, 1],
        [1, 0, 1, 1, 1],
        [1, 1, 1, 1, 0],
        [1, 1, 1, 0, 1],
        [1, 1, 0, 1, 1]])
```

#### **Tensor contractions**

##### **Matrix multiplication**

Given $X \in R^{n \times d}$ and $Y \in R^{d \times v}$, their matrix multiplication $Z \in R^{n \times v}$ is defined as:

$$ \sum_{k=0}^{d} x_{ik} y_{kj} = z_{ij} $$


In [None]:
x = torch.tensor([[1, 2], [3, 4], [5, 6]])
y = torch.tensor([[1, 2], [2, 1]])
print_arr(x, y)

In [None]:
# as we will see, matmul's functionality is not limited to matrix-matrix multiplication
torch.matmul(x, y)

In [None]:
x @ y  # Operator overload for matmul

In [None]:
torch.mm(x, y)  # PyTorch function, only works for rank-2 tensors (matrices) https://pytorch.org/docs/stable/generated/torch.mm.html

In [None]:
x.mm(y)  # Tensor method

In [None]:
torch.einsum('ik, kj -> ij', (x, y))  # Einsum notation!

# It summed up dimension labeled with the index `k`

##### **Dot product**
Also known as scalar product or inner product.
Given $x \in \mathbb{R}^k$ and $y \in \mathbb{R}^k$, the dot product $z \in \mathbb{R}$ is defined as:

$$ \sum_{i=0}^{k} x_i y_i = z $$

In [None]:
x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
print_arr(x, y)

In [None]:
# We want to perform:
(1 * 4) + (2 * 5) + (3 * 6)

In [None]:
torch.dot(x, y)  # PyTorch function

In [None]:
x.dot(y) # Tensor method

In [None]:
x @ y  # PyTorch operator again overloading matmul

In [None]:
torch.einsum('i, i ->', (x, y))  # Einstein notation!

# Read it as:
# - iterate with i along x
# - iterate with i along y
# - compute the product at each iteration
# - sum the products and return a scalar (-> means return a scalar)

# More in general, Einstein notation:
# Multiply point-wise repeated indices in the input
# Sum up along the indices that `do not` appear in the output

# More on this below!

##### **Batch matrix multiplication**

Often we want to perform more operations together. Why?
- Reduce the **overhead of uploading** each tensor to/from the GPU memory
- **Better parallelization** of the computation

Given two 3D tensors, each one containing ``b`` matrices,
$X \in \mathbb{R}^{b \times n \times m}$
and  
$Y \in \mathbb{R}^{b \times m \times p}$,

We want to multiply together each $i$-th pair of matrices, obtaining a tensor $Z \in \mathbb{R}^{b \times n \times p}$ defined as:

$$ z_{bij} = \sum_{k=0}^m x_{bik} y_{bkj} $$

In [None]:
# here b = 2 matrices
x = torch.tensor([[[1, 2], [3, 4], [5, 6]], [[1, 2], [3, 4], [5, 6]]])  # 3x2 matrices
y = torch.tensor([[[1, 2], [2, 1]], [[1, 2], [2, 1]]])  # 2x2 matrices
print_arr(x, y)

In [None]:
torch.bmm(x, y)  # **not** torch.mm

In [None]:
# Operator overload! again, matmul is actually doing the job
x @ y

In [None]:
torch.einsum('bik, bkj -> bij', (x, y)) # Einstein notation!

##### **Broadcast matrix multiplication**

Given a matrix $Y \in \mathbb{R}^{m \times p}$ and $b$ matrices of size $n \times m$ organized in a 3D tensor $X \in \mathbb{R}^{b \times n \times m}$, we want to multiply together each matrix $X_{i,:,:}$ with $Y$, obtaining a tensor $Z \in R^{b \times n \times p}$ defined as:

$$ z_{bij} = \sum_{k=0}^m x_{bik} y_{kj} $$


In [None]:
x = torch.tensor([[[1, 2], [3, 4], [5, 6]], [[1, 2], [3, 4], [5, 6]]])
y = torch.tensor([[1, 2], [2, 1]])
print_arr(x, y)

In [None]:
torch.matmul(x, y)  # always uses the last two dimensions

In [None]:
x @ y   # still using the last two dimensions since @ overloads matmul

##### **EXERCISE**
>
> Use the einsum notation to compute the equivalent broadcast matrix multiplication!

In [None]:
# Yor code here

Expected Output:

```
tensor([[[ 5,  4],
         [11, 10],
         [17, 16]],

        [[ 5,  4],
         [11, 10],
         [17, 16]]])
```

### **Einsum notation**

Einstein notation is a way to express complex operations on tensors.

- It is **concise but expressive enough** to perform almost every operation you will need in building your neural networks, allowing you to think of the only thing that matters... **dimensions!**
- You will **not need to check your dimensions** after an einsum operation, since the dimensions themselves are *defining* the tensor operation.
- You will **not need to shape-comment** your tensors. Those comments do not work: they are bound to get outdated.
-  You will not need to explicitly code **intermediate operations** such as reshaping, transposing and intermediate tensors.
- It is **not library-specific**, being avaiable in ``numpy``, ``pytorch``, ``tensorflow`` and ``jax`` with the same signature. So you do not need to remember the functions signature in all the frameworks.
- It can sometimes be compiled to high-performing code (e.g. [Tensor Comprehensions](https://pytorch.org/blog/tensor-comprehensions/))

Check [this blog post by Olexa Bilaniuk](https://obilaniu6266h16.wordpress.com/2016/02/04/einstein-summation-in-numpy/) to take a peek under the hood of einsum and [this one by Tim Rocktäschel](https://rockt.github.io/2018/04/30/einsum) for several examples.

Its formal behavior is well described in the [Numpy documentation](https://docs.scipy.org/doc/numpy/reference/generated/numpy.einsum.html).
However, it is very intuitive and better explained through examples.

![alt text](https://obilaniu6266h16.files.wordpress.com/2016/02/einsum-fmtstring.png?w=676)

> *Historical note (taken from O.Bilaniuk's post)*
>
> Einstein had no part in the development of this notation. He merely popularized it, by expressing his entire theory of General Relativity in it. In a letter to [Tullio Levi-Civita](https://en.wikipedia.org/wiki/Tullio_Levi-Civita), co-developer alongside [Gregorio Ricci-Curbastro](https://en.wikipedia.org/wiki/Gregorio_Ricci-Curbastro) of Ricci calculus (of which this summation notation was only a part), Einstein wrote:
>
> " *I admire the elegance of your method of computation; it must be nice to ride through these fields upon the horse of true mathematics while the like of us have to make our way laboriously on foot.* "

In [None]:
a = torch.arange(6).reshape(2, 3)  # will use this in the examples below

###### **Matrix transpose**

$$ B_{ji} = A_{ij} $$

In [None]:
# The characters are indices along each dimension
b = torch.einsum('ij -> ji', a)
print_arr(a, b)

###### **Sum**

$$ b = \sum_i \sum_j A_{ij} := A_{ij} $$


In [None]:
# Indices that do not appear in the output tensor are summed up
b = torch.einsum('ij -> ', a)
print_arr(a, b)

###### **Column sum**

$$ b_j = \sum_i A_{ij} := A_{ij} $$

In [None]:
# Indices that do not appear in the output tensor are summed up,
# ...even if some other index appears
b = torch.einsum('ij -> j', a)
print_arr(a, b)

###### **EXERCISE**
>
> Given a binary tensor $X \in \{0, 1\}^{n \times m}$ return a tensor $y \in \mathbb{R}^{n}$ that has in the $i$-th position the **number of ones** in the $i$-th row of $X$.
>
>Give a solution using `einsum`, and a solution using standard manipulation.

In [None]:
x = (torch.rand(100, 200) > 0.5).int()

In [None]:
# Display a binary matrix with plotly

fig = px.imshow(x)
fig.show()

In [None]:
row_ones_einsum = ...  # your code here

In [None]:
# Check your result by comparing your result with the sum operator

row_ones = torch.sum(x, dim=-1)  # recall that -1 refers to the last dimension

torch.equal(row_ones_einsum, row_ones) # True if the two tensors are equal

In [None]:
px.imshow(row_ones[:, None]).show()
print(f'Sum up the row counts: {row_ones.sum()}\nSum directly all the ones in the matrix: {x.sum()}')

###### **Matrix-vector multiplication**

$$ c_i = \sum_k A_{ik}b_k := A_{ik}b_k $$

In [None]:
# Repeated indices in different input tensors indicate pointwise multiplication
a = torch.arange(6).reshape(2, 3)
b = torch.arange(3)
c = torch.einsum('ik, k -> i', [a, b])  # Multiply on k, then sum up on k
print_arr(a, b, c)

###### **Matrix-matrix multiplication**

$$ C_{ij} = \sum_k A_{ik}B_{kj} := A_{ik}B_{kj} $$

📖 Understanding einsum, what happens inside?

![alt text](https://obilaniu6266h16.files.wordpress.com/2016/02/einsum-matrixmul.png?w=676)

In [None]:
a = torch.arange(6).reshape(2, 3)
b = torch.arange(15).reshape(3, 5)
c = torch.einsum('ik, kj -> ij', [a, b])
print_arr(a, b, c)

###### **Dot product multiplication**

$$ c = \sum_i a_i b_i := a_i b_i $$

In [None]:
a = torch.arange(3)
b = torch.arange(3,6)
c = torch.einsum('i,i->', (a, b))
print_arr(a, b, c)

###### **Point-wise multiplication**
Also known as Hadamard product:

$$ C_{ij} = A_{ij} B_{ij} $$

In [None]:
a = torch.arange(6).reshape(2, 3)
b = torch.arange(6,12).reshape(2, 3)
c = torch.einsum('ij, ij -> ij', (a, b))
print_arr(a, b, c)

###### **Outer product**
Given two column vectors of length $m$ and $n$ respectively,
\begin{align*}
\mathbf{a}=\left[\begin{array}{c}
a_{1} &
a_{2} &
\dots &
a_{m}
\end{array}\right]^\top, \quad \mathbf{b}=\left[\begin{array}{c}
b_{1} &
b_{2} &
\dots &
b_{n}
\end{array}\right]^\top
\end{align*}
their outer product, denoted $\mathbf{a} \otimes \mathbf{b}$, is defined as the $m \times n$ matrix $\mathbf{C}$ obtained by multiplying each element of $\mathbf{a}$ by each element of $\mathbf{b}$:
\begin{align*}
\mathbf{a} \otimes \mathbf{b}=\mathbf{C}=\left[\begin{array}{cccc}
a_{1} b_{1} & a_{1} b_{2} & \ldots & a_{1} b_{n} \\
a_{2} b_{1} & a_{2} b_{2} & \ldots & a_{2} b_{n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m} b_{1} & a_{m} b_{2} & \ldots & a_{m} b_{n}
\end{array}\right]
\end{align*}
Or, in index notation,
$$ C_{ij} = a_i b_j $$

In [None]:
a = torch.arange(3)
b = torch.arange(3,7)
c = torch.einsum('i, j -> ij', (a, b))
print_arr(a, b, c)

In [None]:
# Using the standard PyTorch API
torch.outer(a, b)

In [None]:
# Using broadcasting black magic
a[:, None] * b[None, :]

###### 📖 **Batch matrix multiplication**

$$ c_{bij} = \sum_k a_{bik} b_{bkj} $$

In [None]:
a = torch.randn(2,2,5)
b = torch.randn(2,5,3)
c = torch.einsum('bik,bkj->bij', [a, b])
print_arr(a, b, c)

#### Singleton dimensions

 In deep learning it is very common to **add or remove dimensions of size $1$** in a tensor. As we mentioned, this is called **unsqueezing** and **squeezing**, and it occurs often during batch processing, manipulating feature maps, making network layers compatible, and in several other occasions.

 It is possible to perform these operations in different ways, feel free to use
 whatever is more comfortable to you! Again, **prefer readability to cryptic one-liners** for the sanity of a hypothetical unknown reader or your future self.

In the example below, we transform a rank-1 tensor into a rank-2 "column", and back to a rank-1:

In [None]:
# Define a rank-1 tensor we will use
x = torch.arange(6)
print_arr(x)

Transform **`x` into a column tensor** in four different ways.

Remember that the shape of a column tensor is in the form: `(rows, 1)`

In [None]:
# 1)
# Use the `reshape` or `view` functions

y1 = x.reshape(-1, 1)
y2 = x.view(-1, 1)

print_arr(y1, y2)

In [None]:
# 2)
# Use the specific `unsqueeze` function to unsqueeze a dimension

y3 = x.unsqueeze(dim=-1)
y4 = x.unsqueeze(dim=1)

print_arr(y3, y4)

In [None]:
# 3)
# Explicitly index a non-existing dimension with `None`

y5 = x[:, None]

print_arr(y5)

In [None]:
# 4)
# Same as before, but do not assume a rank-2 tensor and index the last one.
# This approach is useful to write functions that work both for
# batched or non-batched data

y6 = x[..., None]

print_arr(y5)

In [None]:
# Now we go back to a rank-1 tensor

x1 = y1.reshape(-1)
x2 = y2.view(-1)          # Explicity enforce to get a view of the tensors, without copying data
x3 = y3.squeeze(dim=-1)
x4 = y4.squeeze(dim=1)
x5 = y5[:, 0]             # Manually collapse the dimension with an integer indexing
x6 = y6[..., 0]

print_arr(x1, x2, x3, x4, x5, x6)

> **NOTE**
>
> indexing with `...` means  **keeping all the other dimensions the same**.
> Keep in mind that `...` is just a Python singleton object (just as `None`).
> Its type is Ellipsis:


In [None]:
...

In [None]:
x = torch.rand(3,3,3)
x[:, :, 0]

In [None]:
x[..., 0]

### Tensor types
Pay attention to the tensor types!
Several methods are available to convert tensors to different types:

In [None]:
a = torch.rand(3, 3) + 0.5

In [None]:
a.int()

In [None]:
a.long()

In [None]:
a.float()

In [None]:
a.double()

In [None]:
a.bool()

In [None]:
a.to(torch.double)

In [None]:
a.to(torch.uint8)

In [None]:
a.bool().int()

**Pro tip:** Do not try to memorize all the PyTorch API!

> Learn to understand what operation should already exist and search for it, when you need it. If it is something common, and it usually is, chances are it already exists.

Google, StackOverflow and the docs are your friends!

### Einops

If you liked the `einsum` operation, have fun with the [einops](https://github.com/arogozhnikov/einops) package! 🚀

It is a third-party library, compatible with most frameworks, that brings superpowers to `einsum`. We will not use the `einops` library in the tutorials, however, feel free to read the [docs](https://github.com/arogozhnikov/einops) and use it.

![](http://arogozhnikov.github.io/images/einops/einops_video.gif)


### Exercises

These final exercises are designed to showcase the elegant solutions of einsum.


#### **EXERCISE 1 (2 Points)**
>
> You are given $b$ images with size $w \times h$. Each pixel in each image has three color channels, `(r, g, b)`. These images are organized in a tensor $X \in \mathbb{R}^{w \times b \times c \times h}$.
>
> You want to apply a linear trasformation to the color channel of each single image. In particular, you want to :
> - **Convert each image into a grey scale image**.
> - **Afterthat, transpose the images** to swap the height and width.
>
> The linear traformation that converts from `(r, g, b)` to grey scale is simply a linear combination of `r`, `g` and `b`. It can be encoded in the following 1-rank tensor $y \in \mathbb{R}^3$:

In [None]:
y = torch.tensor([0.2989, 0.5870, 0.1140], dtype=torch.float)


> At the end, you want to obtain a tensor $Z \in \mathbb{R}^{b \times w \times h}$.
>
> Write the PyTorch code that performs this operation.

In [None]:
# Create the input tensors for the exercise
# Execute and ignore this cell

from skimage import io
from skimage.transform import resize

size = 100

image1 = io.imread('https://upload.wikimedia.org/wikipedia/commons/thumb/6/6f/Earth_Eastern_Hemisphere.jpg/260px-Earth_Eastern_Hemisphere.jpg')
image1 = torch.from_numpy(resize(image1, (size, size), anti_aliasing=True)).float()  # Covert  to float type
image1 = image1[..., :3]  # remove alpha channel

image2 = io.imread('https://upload.wikimedia.org/wikipedia/commons/thumb/b/b4/The_Sun_by_the_Atmospheric_Imaging_Assembly_of_NASA%27s_Solar_Dynamics_Observatory_-_20100819.jpg/628px-The_Sun_by_the_Atmospheric_Imaging_Assembly_of_NASA%27s_Solar_Dynamics_Observatory_-_20100819.jpg')
image2 = torch.from_numpy(resize(image2, (size, size), anti_aliasing=True)).float()
image2 = image2[..., :3]  # remove alpha channel

image3 = io.imread('https://upload.wikimedia.org/wikipedia/commons/thumb/8/80/Wikipedia-logo-v2.svg/1920px-Wikipedia-logo-v2.svg.png')
image3 = torch.from_numpy(resize(image3, (size, size), anti_aliasing=True)).float()
image3 = image3[..., :3]  # remove alpha channel

source_images = torch.stack((image1, image2, image3), dim=0)
images = torch.einsum('bwhc -> wbch', source_images)

In [None]:
# Plot source images
plot_row_images(source_images)

In [None]:
# ✏️ your code here
gray_images = ...

gray_images_tr = ...

In [None]:
# Plot the gray images
plot_row_images(gray_images)

Expected Output:

![grayscale expected output](../data/grayscale.png)

In [None]:
# Plot the gray transposed images
plot_row_images(gray_images_tr)

Expected Output:

![transposed expected output](../data/transposed.png)

#### **EXERCISE 2**
>
> Given $k$ points organized in a tensor $X \in \mathbb{R}^{k \times 2}$ apply a reflection along the $y$ axis as a linear transformation.


In [None]:
# Define some points in R^2
x = torch.arange(100, dtype=torch.float)
y = x ** 2

# Define some points in R^2
data = torch.stack((x, y), dim=0).t()

In [None]:
px.scatter(x = data[:, 0].numpy(), y = data[:, 1].numpy())

In [None]:
# ✏️ your code here
mirrored_data = ...

In [None]:
# Plot the new points
px.scatter(x = mirrored_data[:, 0].numpy(), y = mirrored_data[:, 1].numpy())

Expected Output:

![inverted expected output](../data/inverted.pngnewplot.png)

#### **EXERCISE 3**
>
>  You are given $b$ images with size $w \times h$. Each pixel in each image has `(r, g, b)` channels. These images are organized in a tensor $X \in \mathbb{R}^{w \times b \times c \times h}$, i.e. the same tensor as in the exercise 1.
>
> You want to swap the `red` color with the `blue` color, and decrease the intensity of the `green` by half.
>
> Perform the transformation on all the images simultaneously.

In [None]:
images.shape

In [None]:
# ✏️ your code here
rb_images = ...

Expected Output:

![transposed expected output](../data/change_color.png)

In [None]:
plot_row_images(rb_images)

#### **EXERCISE 4 (4 Point)**
>
>  You are given $b$ images with size $w \times h$. Each pixel in each image has `(r, g, b)` colors. These images are organized in a tensor $X \in \mathbb{R}^{w \times b \times c \times h}$, i.e. the same tensor as exercise 1 and 3.
>
> You want to **convert each image into a 3D point cloud**:
> - the `(x, y)` coordinates of each point in the point cloud are the **indices** of the pixels in the original image
> - the `z` coordinate of each point in the point cloud is the $L_2$ norm of the color of the corresponding pixel, multiplied by $10$
>
> *Hint*: you may need some other PyTorch function, search the docs!

In [None]:
# Fill the missing code

# Just normalize the tensor into the common form [batch, width, height, colors]
imgs = # Fill here
print(imgs.shape) # 3 images, 100x100 pixels, 3 colors

# The x, y coordinate of the point cloud are all the possible pairs of indices (i, j)
row_indices = torch.arange(imgs.shape[1], dtype=torch.float)
col_indices = torch.arange(imgs.shape[2], dtype=torch.float)
xy =  # Fill here
# hint: you need the *cartesian product* of the two. Check the PyTorch documentation for a function that does this

# Compute the L2 norm for each pixel in each image
depth =  # Fill here

# For every pair (i, j), retrieve the L2 norm of that pixel
z = depth[:, xy[:, 0].long(), xy[:, 1].long()] * 10

# Adjust the dimensions, repeat and concatenate accordingly
xy = xy.repeat(imgs.shape[0], 1, 1)  # x,y coordinates are constant for the three images
# concatenate xy and z
clouds =  # Fill here

# Three images, 10000 points, each point with coordinates x,y,z in 3D
print(clouds.shape)

In [None]:
# Utility function
# Execute and ignore this cell

from typing import Union

def plot_3d_point_cloud(cloud: Union[torch.Tensor, np.ndarray]) -> None:
  """ Plot a single 3D point cloud

  :param cloud: tensor with shape [number of points, coordinates]
  """
  import pandas as pd
  df = pd.DataFrame(np.asarray(cloud), columns=['x', 'y', 'z'])
  fig = px.scatter_3d(df, x=df.x, y=df.y, z=df.z, color=df.z, opacity=1, range_z=[0, 30])
  fig.update_layout({'scene_aspectmode': 'data', 'scene_camera':  dict(
          up=dict(x=0., y=0., z=0.),
          eye=dict(x=0., y=0., z=3.)
      )})
  fig.update_traces(marker=dict(size=3,),
                    selector=dict(mode='markers'))
  _ = fig.show()

In [None]:
# Loading expected results. Ignpore this cell
clouds_gt = np.load("../data/clouds.npy", allow_pickle=True)

In [None]:
plot_3d_point_cloud(clouds[0, ...])

In [None]:
# Expected result
plot_3d_point_cloud(clouds_gt[0, ...])

In [None]:
plot_3d_point_cloud(clouds[1, ...])

In [None]:
# Expected result
plot_3d_point_cloud(clouds_gt[1, ...])

In [None]:
plot_3d_point_cloud(clouds[2, ...])

In [None]:
# Expected result
plot_3d_point_cloud(clouds_gt[2, ...])