# Laboratory 04: PyTorch and Tensors Operations

In this laboratory you'll get introduced to [PyTorch](http://pytorch.org/), a framework for building and training neural networks. Specifically, today we'll explore operations that can be applied to tensors using this framework. Understanding how tensors work is essential for both building, training and inspecting different aspects of a neural network. 

As you'll see, PyTorch, in a lot of ways behaves like the arrays you may love from Numpy. These Numpy arrays, after all, are just tensors. PyTorch takes these tensors and makes it simple to move them to GPUs for the faster processing needed when training neural networks. 

The PyTorch framework main characteristics are:
 - a thin framework over python that inherits its core from its Lua predecessor called Torch
 - dynamically generates neural network computational graphs
 - it's object oriented with powerful debugging support

which adhere to a development philosophy that promotes linear code-flow, integrates full inter-operability with Python ecosystem and is as fast as other frameworks like TensorFlow, Keras or CNTK.


In [4]:
import numpy as np

## PyTorch Main Components

At the top level, the PyTorch package and tensor library is simply called [`torch`](https://pytorch.org/docs/stable/torch.html). While the features that allow us to build and train state of the art networks are split among:
-  [`torch.nn`](https://pytorch.org/docs/stable/nn.html) a subpackage that contains modules and extensible classes for building neural networks.
- [`torch.autograd`](https://pytorch.org/docs/stable/autograd.html?highlight=autograd#) a subpackage that supports all the differentiable tensor operations.
- [`torch.nn.functional`](https://pytorch.org/docs/stable/nn.functional.html#torch-nn-functional) a functional interface that contains typical operations used for building neural networks like loss functions, activation functions and convolution operations.

- [`torch.optim`](https://pytorch.org/docs/stable/optim.html#module-torch.optim) a subpackage that contains standard optimization operations such as SGD, Adam and so on.

- [`torch.utils`](https://pytorch.org/docs/stable/data.html) a subpackage that contains utility classes like data sets and data loaders that make data preprocessing easier.

Aside from these components the framework also includes packages such as [`torchvision`](https://pytorch.org/docs/stable/torchvision/index.html), [`torchaudio`](https://pytorch.org/audio/) and [`torchtext`](https://pytorch.org/text/) that provide access to popular datasets, neural networks architectures and transformations for computer vision, audio and natural language processing.

## PyTorch Tensors

It turns out neural network computations are just a bunch of linear algebra operations using the *tensor* generalization. Recall that a vector is a 1-dimensional tensor, a matrix is a 2-dimensional tensor, an array with three indices is a 3-dimensional tensor (RGB color images for example). The fundamental data structure for neural networks are tensors and PyTorch (as well as pretty much every other deep learning framework) is built around tensors.

<img src="res/tensor_examples.svg" width=600px>

In PyTorch tensors are instances of the [`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html) class and each tensor has the following attributes:
- torch.dtype is the data type of the tensor component elements.
- torch.device represents the device where the data is allocated CPU / GPU.
- torch.layout describes how a tensor is mapped in memory.
- torch.shape which, as the name implies is the shape of the tensor.

Note that, since a tensor can be allocated either on the CPU or GPU, the data type of each element composing a tensor depends both on `dtype` and `device` attributes. The table below summaries data types supported by PyTorch tensors:

| Data Type  | dtype  | CPU tensor  | GPU tensor  |
|---|---|---|---|
| 32-bit floating point  | torch.float32  | torch.FloatTensor  | torch.cuda.FloatTensor  |
| 64-bit floating point  | torch.float64  | torch.DoubleTensor  | torch.cuda.DoubleTensor  |
| 16-bit floating point  | torch.float16  | torch.HalfTensor  | torch.cuda.HalfTensor  |
| 8-bit integer (unsigned)  | torch.uint8  | torch.ByteTensor  | torch.cuda.ByteTensor  |
| 8-bit integer (signed)  | torch.int8  | torch.CharTensor  | torch.cuda.CharTensor  |
| 16-bit integer (signed)  | torch.int16  | torch.ShortTensor   | torch.cuda.ShortTensor  |
| 32-bit integer (signed)  | torch.int32  | torch.IntTensor  | torch.cuda.IntTensor  |
| 64-bit integer (signed)  | torch.int64  | torch.LongTensor  | torch.cuda.LongTensor  |

**Exercise 1**

Ok, now lets get down to business by first importing the `torch` package.

In [3]:
# TODO 1.1. Import the torch package
import torch

### Creating A Tensor

In PyTorch there are four options to create a tensor, namely:

1. Using the constructor `torch.Tensor(data)`
2. Using the factory method `torch.tensor(data)`
3. Using the the multiple input class method `torch.as_tensor(data)`
4. Using the conversion from numpy `torch.from_numpy(data)`.

A couple of things to note here are: i) the constructor uses the default torch.float32 dtype without performing any inference based on input type and ii) both `as_tensor(data)` and `from_numpy(data)` methods share the memory with the input data structure, i.e. changing the elements of one will change the other.

We can see this by creating two identical tensors form the same list of numbers.

In [None]:
data = [[1,2],[3,4]]

# TODO 1.2. Create a tensor t1 using the constructor & "data"
t1 = torch.Tensor(data)
print(t1)

# TODO 1.3. Print its dtype attribute
print(t1.dtype)

# TODO 1.4. Similarly create t2, but using the factory method
t2 = torch.tensor(data)
print(t2)
print(t2.dtype)

tensor([[1., 2.],
        [3., 4.]])
torch.float32
tensor([[1, 2],
        [3, 4]])
torch.int64


Now lets see what happens if we create a tensor from a numpy array and then change the array elements.

In [None]:
data = np.array([[1, 2],[3, 4]])

# TODO 1.5. Use the numpy conversion method to create t
t = torch.from_numpy(data)

# TODO: 1.6. Print the tensor t
print(t)

# TODO: 1.7. Change element at index (0, 0) of data
data[0, 0] = 13

# TODO: 1.8. Print the tensor t again
print(t)



tensor([[1, 2],
        [3, 4]])
tensor([[13,  2],
        [ 3,  4]])


Of course, similarly to numpy we also have at our disposal methods such as `rand()`,`zeros()`, `one()`, `eye()` and so on. For example, creating a rank 1 tensor of 3 random numbers ca be accomplished trough:

```Python
t = torch.rand(3, 1)
```

Now, to exercise this yourself, please create a rank 2 tensor of all ones in the TODO below. (see [.ones()](https://pytorch.org/docs/stable/generated/torch.ones.html?highlight=ones#torch.ones))

In [None]:
# TODO 1.9. Create a rank two tensor and print it
t = torch.ones(3, 2)
print(t)

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])


### Tensor Shape

Suppose that we have the following tensor:
$$
t = \begin{bmatrix}
1 &1 &1 &1 \\ 
2 &2 &2 &2 \\ 
3 &3 &3 &3 
\end{bmatrix}
$$

The shape of this tensor is 3 x 4, with a rank of 2. Remember, *rank* means the number of dimensions present within a tensor. 

**Exercise 2**

Now lets create this tensor and play around with it a bit.

In [5]:
# TODO 2.1. Create the tensor t from the example
t = torch.tensor([[1] * 4, [2] * 4, [3] * 4])
print(t)

tensor([[1, 1, 1, 1],
        [2, 2, 2, 2],
        [3, 3, 3, 3]])


PyTorch provides us with two ways to find the shape of a tensor:
- via the t.shape attribute
- via the t.size() method

Both methods return a `torch.Size()` object. Let's see this in action.

In [None]:
# TODO 2.2: Print the shape of the tensor
print(t.shape) 

torch.Size([3, 4])


As mentioned in the course the length of the shape of a tensor is just a way of counting the number of dimensions in a tensor. Hence, the length of the shape object represents the rank of the tensor.

In [None]:
# TODO 2.3. Print the rank of the tensor
len(t.shape)

2

We can also compute the number of individual elements in a tensor. It simply represents the product of the shape component values. Hence we can do something like:

```Python
torch.tensor(t.shape).prod()
```
However, the resulting number not be an integer. It will be a tensor that contains our integer. To obtain it we can directly use the [`.numel()`](https://pytorch.org/docs/stable/generated/torch.numel.html?highlight=numel#torch.numel) method on our tensor.

In [None]:
# TODO 2.4. Print the number of elments in our tensor
t.numel()

12

### Tensor Reshaping

Reshaping is the most frequently used operation and it is required to format tensors as they pass from the output of one  neural network layer type to the input of a different type of neural network layer. For example, when passing from a convolution layer to a fully connected layer. More on thins latter in the course. The operation is lossless in the sense that we don't change, remove or add any new data entries. As the name entails, the reshaping operation only changes how the data entries are organized within the tensor.

The simplest type of reshaping is the one that does not change the tensor rank. In PyTorch this can be accomplished via the [`.reshape()`](https://pytorch.org/docs/stable/tensors.html?highlight=reshape#torch.Tensor.reshape) or [`.view()`](https://pytorch.org/docs/stable/tensors.html?highlight=reshape#torch.Tensor.view) methods. For example, we could reshape our previous tensor such as it looks more like a vector than a matrix:

In [None]:
# Original tensor and its shape
print(t)
print(t.shape)

tensor([[1, 1, 1, 1],
        [2, 2, 2, 2],
        [3, 3, 3, 3]])
torch.Size([3, 4])


In [None]:
# Reshaping to 1 row and 12 columns
new_t = t.reshape(1, 12)

# Reshaped tensor and its shape
print(new_t)
print(new_t.shape)

tensor([[1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]])
torch.Size([1, 12])


Note, that although the new tensor looks more like vector it still has rank 2. That is, it still has two axes: the first with a length of 1, while the second with a length of 12. Also note, that in reshaping the new axes lengths we have passed yield a total of 12 data entries. If we were to pass incompatible axes lengths with respect to the total number of entries, we would get an error similar to the one below:

In [None]:
t.reshape(5,6)

RuntimeError: ignored

A neat trick we can use when doing reshapes like the one above is to let the `.reshape()` / `.view()` methods decide which is the correct length for one of the axes. For example, the previous reshape could have been written as:

In [None]:
# Reshaping to 1 row and 12 columns
new_t = t.reshape(1, -1)

# Reshaped tensor and its shape
print(new_t)
print(new_t.shape)

tensor([[1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]])
torch.Size([1, 12])


While here we have used `.reshape()`, you can used `.view()` in the same manner to achieve the same result. The difference is that `.view()` always clones the data (makes a copy of it) to build a tensor of the specified shape, whereas `.reshape()` will try not to copy the data if its possible.

**Exercise 3**

Now its your turn to play around with reshaping. So, repeat the steps above for the following sizes \[2, 6\], \[3, 4\], \[6, 2\], \[12, 1\], \[2, -1 \], \[-1, 1\].

In [None]:
# TODO 3.1. Reshape and print each reshaped tensor
print(t.reshape(2, 6))
print(t.reshape(3, 4))
print(t.reshape(6, 2))
print(t.reshape(12, 1))
print(t.reshape(2, -1))
print(t.reshape(-1, 1))

tensor([[1, 1, 1, 1, 2, 2],
        [2, 2, 3, 3, 3, 3]])
tensor([[1, 1, 1, 1],
        [2, 2, 2, 2],
        [3, 3, 3, 3]])
tensor([[1, 1],
        [1, 1],
        [2, 2],
        [2, 2],
        [3, 3],
        [3, 3]])
tensor([[1],
        [1],
        [1],
        [1],
        [2],
        [2],
        [2],
        [2],
        [3],
        [3],
        [3],
        [3]])
tensor([[1, 1, 1, 1, 2, 2],
        [2, 2, 3, 3, 3, 3]])
tensor([[1],
        [1],
        [1],
        [1],
        [2],
        [2],
        [2],
        [2],
        [3],
        [3],
        [3],
        [3]])


Of course, we can also do reshapes that do change rank. That is, we can introduce a new axis or remove an existing one. As long as we preserve the number of data entries in the tensor the reshape will work as expected. To see this complete the TODOs below:

In [None]:
# TODO 3.2. Reshape tensor to a shape of [2,2,3]
new_t = t.reshape(2, 2, 3)

# TODO 3.3. Print the new tensor
print(new_t)

# TODO 3.4. Print the new tensor shape
print(new_t.shape)

tensor([[[1, 1, 1],
         [1, 2, 2]],

        [[2, 2, 3],
         [3, 3, 3]]])
torch.Size([2, 2, 3])


In [None]:
# TODO 3.5. Reshape tensor to a shape of [2,-1,3]
new_t = t.reshape(2, -1, 3)

# TODO 3.6. Print the new tensor
print(new_t)

# TODO 3.7. Print the new tensor shape
print(new_t.shape)

tensor([[[1, 1, 1],
         [1, 2, 2]],

        [[2, 2, 3],
         [3, 3, 3]]])
torch.Size([2, 2, 3])


We can also remove or add axes by using [`.squeeze()`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.squeeze) or [`.unsqueeze()`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.unsqueeze), respectively. That is, *squeezing* a tensor removes the dimensions or axes that have a length of 1, whereas *unsqueezing* a tensor adds a dimension with a length of 1. Hence this methods allow us to modify the rank, either by expanding or shrinking the tensor in question. Note, that the `.unsqueeze()` method requires us to specify on what axis is the new dimension added via the `dim=` parameter. We can of course specify whatever axis we want. 

To see how `.squeeze()` works complete the TODO's below.

In [6]:
# Reshaping the tensor
new_t = t.reshape(1,-1)

# TODO 3.8. Print the new tensor shape
print(new_t)
print(new_t.shape)

# TODO 3.9. Squeeze the new_t and print its shape
new_t = new_t.squeeze()
print(new_t)
print(new_t.shape)

tensor([[1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]])
torch.Size([1, 12])
tensor([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])
torch.Size([12])


Now do the opposite operation on the new tensor via [`.unsqueeze()`](https://pytorch.org/docs/stable/tensors.html?highlight=unsqueeze#torch.Tensor.unsqueeze) to add back its length 1 axis on the appropriate dimension. Note, that the dimension indexes start at 0.

In [None]:
# TODO 3.10. Unsqueeze the new_t and print its shape
new_t = new_t.unsqueeze(dim = 0)
print(new_t)
print(new_t.shape)

tensor([[1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]])
torch.Size([1, 12])


Ok, so we can reshape a tensor and add/remove length 1 axes more easily via squeezing and unsqueezing. However, there is one reshape operation that is so intensely used within the forward flow of tensors through a neural network that we have dedicated method for it, namely: [`.flatten()`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.flatten) for the so called *flattening operation*.

The flattening operation simply reshapes a tensor such that data entries in the tensor are mapped in a sequence on a single axis. Without any arguments method returns a rank 1 tensor that resembles a 1D array of numbers. Nonetheless, this method provides us with some flexibility that allows us to keep the first few axes intact while flattening the rest via the `.start_dim=` argument. As its name implies the flattening operation occurs starting on the specified dimension.

Complete the TODO's below.

In [None]:
# Suppose we have the following tensor
t = torch.tensor([
        [[1, 1, 1, 1],
         [1, 1, 1, 1],
         [1, 1, 1, 1],
         [1, 1, 1, 1]],

        [[2, 2, 2, 2],
         [2, 2, 2, 2],
         [2, 2, 2, 2],
         [2, 2, 2, 2]],

        [[3, 3, 3, 3],
         [3, 3, 3, 3],
         [3, 3, 3, 3],
         [3, 3, 3, 3]]])

# TODO 3.11. Print the tensor shape
print(t.shape)

# TODO 3.12. Flatten the entire tensor and save it in new_t 
new_t = t.flatten()

# TODO 3.13. Print new_t tensor and its shape
print(new_t)
print(new_t.shape)

# TODO 3.14. Flatten t staring on dimension one
new_t = t.flatten(start_dim = 1)

# TODO 3.13. Print new_t tensor and its shape
print(new_t)
print(new_t.shape)


torch.Size([3, 4, 4])
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])
torch.Size([48])
tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
        [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]])
torch.Size([3, 16])


#### PyTorch Image Tensor Example

When dealing with neural networks that classify images, we need a tensor representation. In PyTorch this representation is given by a rank 4 tensor with the format \[B, C, H, W \] where:

- B represents the batch size, i.e. number of images in a tensor
- C represents the number of channels in an image, e.g. monochrome images have $C=1$, whereas color images usually have $C=3$ for the RGB color system. 
- H represents the image height in the number of pixels
- W represents the image width also in the number of pixels

Now suppose we have a tensor such as the one given bellow:

In [None]:
# an image batch tensor
a_image_batch = torch.tensor([
        [[[1, 1, 1, 1],
         [1, 1, 1, 1],
         [1, 1, 1, 1],
         [1, 1, 1, 1]],

        [[2, 2, 2, 2],
         [2, 2, 2, 2],
         [2, 2, 2, 2],
         [2, 2, 2, 2]],

        [[3, 3, 3, 3],
         [3, 3, 3, 3],
         [3, 3, 3, 3],
         [3, 3, 3, 3]]]])

print(a_image_batch)
print(a_image_batch.shape)


tensor([[[[1, 1, 1, 1],
          [1, 1, 1, 1],
          [1, 1, 1, 1],
          [1, 1, 1, 1]],

         [[2, 2, 2, 2],
          [2, 2, 2, 2],
          [2, 2, 2, 2],
          [2, 2, 2, 2]],

         [[3, 3, 3, 3],
          [3, 3, 3, 3],
          [3, 3, 3, 3],
          [3, 3, 3, 3]]]])
torch.Size([1, 3, 4, 4])


This tensor is a batch containing only 1 image, that has 3 channels and the image size is 4x4 pixels in height and width. When we flattening this tensor we usually want to keep the batch information intact since each image is different, but flatten the rest. So flatten the `a_image_batch` tensor by keeping the batch size intact in the TODO's bellow.

In [None]:
# TODO 3.14. Flatten a_image_batch via .flatten() and print it and its shape
t_flatten = a_image_batch.flatten(start_dim = 1)
print(t_flatten)
print(t_flatten.shape)

# TODO 3.15. Flatten a_image_batch via .reshape() and print it and its shape
t_reshape = a_image_batch.reshape(1, -1)
print(t_reshape)
print(t_reshape.shape)

# TODO 3.15. Flatten a_image_batch via .view() and print it and its shape
t_view = a_image_batch.view(1, -1)
print(t_view)
print(t_view.shape)


tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2,
         2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]])
torch.Size([1, 48])
tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2,
         2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]])
torch.Size([1, 48])
tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2,
         2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]])
torch.Size([1, 48])


#### Concatenating PyTorch Tensors

Similarly to Pandas, in Pytorch we can concatenate tensors based on a specific axis via the [`torch.cat()`](https://pytorch.org/docs/stable/generated/torch.cat.html#torch.cat) method. The method accepts a tuple of tensors (e.g. `(t1, t2)`) and the dimension `dim=` on which to concatenate them. Note that the tensors which we want to concatenate must be of the same shape (except the dimension we are concatenating on)  or empty.

Let's spouse we have the following two tensors:

In [None]:
t1 = torch.tensor([
    [1,2],
    [3,4]
])

t2 = torch.tensor([
    [5,6],
    [7,8]
])

print('t1 shape: ', t1.shape)
print('t2 shape: ', t2.shape)

t1 shape:  torch.Size([2, 2])
t2 shape:  torch.Size([2, 2])


We can concatenate row-wise (axis at index 0) in following way:

In [None]:
t = torch.cat((t1, t2), dim=0)

print(t)
print(t.shape)

tensor([[1, 2],
        [3, 4],
        [5, 6],
        [7, 8]])
torch.Size([4, 2])


As can be observed the result joins the two tensors based on the first axis, i.e. row wise. Now repeat the process such that the concatenation occurs column-wise and print the result.

In [None]:
# TODO 3.16. Concatenate t1, t2 column wise
t = torch.cat((t1, t2), dim = 1)

# TODO 3.17. Print the result and its shape
print(t)
print(t.shape)

tensor([[1, 2, 5, 6],
        [3, 4, 7, 8]])
torch.Size([2, 4])


#### Stacking PyTorch Tensors

Aside from combining tensor via concatenation, we may wish to combine tensors on a new axis (e.g. to create a batch out of separate images). This is called stacking and is accomplished via the [`stack()`](https://pytorch.org/docs/stable/generated/torch.stack.html#torch.stack) method. For example, suppose we have the following separate monocrom 4x4 images bellow:

In [None]:
t1 = torch.ones(4,4)
t2 = 2*torch.ones(4,4)
t3 = 3*torch.ones(4,4)

print('t1 shape: ', t1.shape)
print('t2 shape: ', t2.shape)
print('t3 shape: ', t3.shape)

t1 shape:  torch.Size([4, 4])
t2 shape:  torch.Size([4, 4])
t3 shape:  torch.Size([4, 4])


To create an image batch of shape \[3, 1, 4, 4\] we can use the `.stack()` in combination with `.unsqueeze()` on the second axis in the TODO's below.

In [None]:
# TODO 3.18. Create a [3, 1, 4, 4] image batch using t1, t2, t3
t = torch.stack([t1, t2, t3], dim = 0)
t = t.unsqueeze(dim = 1)

# TODO 3.19. Print the result and its shape
print(t)
print(t.shape)

tensor([[[[1., 1., 1., 1.],
          [1., 1., 1., 1.],
          [1., 1., 1., 1.],
          [1., 1., 1., 1.]]],


        [[[2., 2., 2., 2.],
          [2., 2., 2., 2.],
          [2., 2., 2., 2.],
          [2., 2., 2., 2.]]],


        [[[3., 3., 3., 3.],
          [3., 3., 3., 3.],
          [3., 3., 3., 3.],
          [3., 3., 3., 3.]]]])
torch.Size([3, 1, 4, 4])


### Tensor Element-wise Operations

With element-wise operations we do all kinds of arithmetic and comparison operations between tensors. An element-wise operation is an operation between two tensors, of the same shape, that operates on corresponding elements within the respective tensors. Here, two elements are said to be corresponding if they occupy the same position within tensors. The position is determined by the indexes used to locate each element. 

#### Arithmetic Operations

Tensors support all kinds of arithmetic operations like addition, subtraction, multiplication or division. In PyTorch these are operations supported through `.add()`, `.sub()`, `.mul()` and `.div()` methods and there overwritten operators `+`, `-`, `*` and `/`. Suppose we have the following tensors:

In [None]:
t1 = torch.tensor([
    [1,1],
    [1,1]
], dtype=torch.float32)

t2 = torch.tensor([
    [1,2],
    [3,4]
], dtype=torch.float32)

To add t1 and t2 we can use either the method or the operator as in the example below.

In [None]:
# Adding via the operator
t = t1 + t2
print(t)

# Adding via the method
t = t1.add(t2)
print(t)

tensor([[2., 3.],
        [4., 5.]])
tensor([[2., 3.],
        [4., 5.]])


**Exercise 4**

Now, proceed to subtract, multiply and divide the two tensors via operators and corresponding methods in the TODO's below. As in the example above print out the result after each operation. 

In [None]:
# TODO 4.1. Subtract the two tensors
t = t1 - t2
print(t)
t = t1.sub(t2)
print(t)
# TODO 4.2. Multiply the two tensors
t = t1 * t2
print(t)
t = t1.mul(t2)
print(t)
# TODO 4.3. Divide the two tensors
t = t1 / t2
print(t)
t = t1.div(t2)
print(t)

tensor([[ 0., -1.],
        [-2., -3.]])
tensor([[ 0., -1.],
        [-2., -3.]])
tensor([[1., 2.],
        [3., 4.]])
tensor([[1., 2.],
        [3., 4.]])
tensor([[1.0000, 0.5000],
        [0.3333, 0.2500]])
tensor([[1.0000, 0.5000],
        [0.3333, 0.2500]])


We can also do the same operations between tensors and scalars. Complete the TODO's below to see how this works.

> Indented block



In [None]:
# TODO 4.4. Add 2 to the t1 tensor and print the result
add_2 = t1 + 2
print(add_2)

# TODO 4.5. Divide by 2 the t1 tensor and print the result
div_2 = t1 / 2
print(div_2)

tensor([[3., 3.],
        [3., 3.]])
tensor([[0.5000, 0.5000],
        [0.5000, 0.5000]])


As you may have noticed, there is something out of place with these type of operations. That is, in our example we used a rank 2 tensor with a scalar (i.e. a rank 0 tensor). According to the definition at the beginning of the section this should not possible.

#### Broadcasting

However, this is possible due to broadcasting. In our example the scalar 2 is first broadcasted to a rank 2 tensor with all components equal to 2. Then, the arithmetic operation is carried out as usual. That is, the operations that actually happen are the following:

In [None]:
# The scalar is brocasted to a tensor with the same shape as t1
t_broadcast = torch.tensor(np.broadcast_to(2, t1.shape), dtype=torch.float32)
print(t_broadcast)

# The actual addition between tensors of same shape
t = t1 + t_broadcast
print(t)

tensor([[2., 2.],
        [2., 2.]])
tensor([[3., 3.],
        [3., 3.]])


  


Broadcasting also works for different rank tensors. Suppose we have the following two tensors:

In [None]:
t1 = torch.tensor([
    [1,2],
    [3,4]
], dtype=torch.float32)

t2 = torch.tensor([
    2, 4
], dtype=torch.float32)

print(t1.shape)
print(t2.shape)

torch.Size([2, 2])
torch.Size([2])


We can do the element-wise operation in a similar manner as before:

In [8]:
t = t1 + t2
print(t)

NameError: ignored

### Comparison operations

Comparison operations are also element-wise. For a given comparison operation between two tensors, a new tensor of the same shape is returned with each element containing either a False or True based on the comparison operation. Similarly to arithmetic operations we can use methods or there operator counter part, where complete list can be found [here](https://pytorch.org/docs/stable/tensors.html#torch.Tensor). For example, given the tensor below we can check to see which values are zero as in the following:

In [10]:
t = torch.tensor([
    [0,5,0],
    [6,0,7],
    [0,8,0]
], dtype=torch.float32)


result = t.eq(0) # equal to method
print(result)

result = t==0 # equal to operator
print(result)

tensor([[ True, False,  True],
        [False,  True, False],
        [ True, False,  True]])
tensor([[ True, False,  True],
        [False,  True, False],
        [ True, False,  True]])


Of course, all other comparison operation are supported, i.e. `.gt()`, `.lt()`, `.le()`, `.ge()` for greater, less, less than or equal and greater than equal, respectively. Try out some of these operation in the TODO's below.

In [None]:
# TODO 4.6. Test t for values greater than to 6 and print the result
result = t.gt(6)
print(result)

# TODO 4.7. Use the operator instead of t.le(6) and print the result
result = t <= 6
print(result)

tensor([[False, False, False],
        [False, False,  True],
        [False,  True, False]])
tensor([[ True,  True,  True],
        [ True,  True, False],
        [ True, False,  True]])


### Function operations

Function operations are applied to each of the data entries in a tensor, e.g. `.neg()`, `.abs()`, `.sqrt()` and so on. 

**Exercise 5**

Compute some of these operations on the previous tensor `t` in the TODO's below.

In [11]:
# TODO 5.1. Compute the 2's complement of t and print the result
#result = t.neg()
result = t.to(torch.int32).bitwise_not().to(torch.float32) + 1
print(result)

# TODO 5.2. Compute the absolut value of the previous result and print the result
result = result.abs()
print(result)

# TODO 5.3. Compute the cosine of t and print the result
result = t.cos()
print(result)

tensor([[ 0., -5.,  0.],
        [-6.,  0., -7.],
        [ 0., -8.,  0.]])
tensor([[0., 5., 0.],
        [6., 0., 7.],
        [0., 8., 0.]])
tensor([[ 1.0000,  0.2837,  1.0000],
        [ 0.9602,  1.0000,  0.7539],
        [ 1.0000, -0.1455,  1.0000]])


### Tensor Reduction

So far in this course, we've learned that tensors are a generalization of data representation that we use in to implement Machine Learning models. Specifically in regards to ML models, tensors are the data structures which help us manipulate our data and are supported widely by a number of frameworks, including PyTorch. For this reason, tensors are super important, but ultimately, what we are doing with the operations we've been learning about in this course is managing the data elements contained with our tensors. 

**Reshaping** operations gave us the ability to position our elements along particular axes, while **Element-wise** operations allow us to perform operations on elements between two tensors. **Reduction** operations, on the other hand, allow us to perform operations on elements within a single tensor.


**Definition:** *A reduction operation on a tensor is an operation that reduces the number of elements contained within the tensor.*

We'll focus mainly on the frequently used [`.argmax()`](https://pytorch.org/docs/stable/tensors.html?highlight=argmax#torch.Tensor.argmax) method to illustrate the reduction operation. Suppose we the following 3x3 rank-2 tensor:

In [None]:
t = torch.tensor([
    [0,1,0],
    [2,0,2],
    [0,3,0]
], dtype=torch.float32)
print('t shape: ', t.shape)
print('# elements: ', t.numel())

t shape:  torch.Size([3, 3])
# elements:  9


**Exercise 6**

As a first reduction operation we can use the summation method `.sum()` on a tensor.

In [None]:
# TODO 6.1. Compute the sum of elements within a tensor
sum = t.sum()
# TODO 6.2. Print the sum result and its shape
print(sum)
print(sum.shape)
# TODO 6.3. Print result number of elements
print(sum.numel())

tensor(8.)
torch.Size([])
1


As you can see the result of this call is a scalar valued tensor with an element who's value is 8. Meaning that we have reduced our tensor to a rank 0 tensor whose number of elements is 1.

#### Common tensor reduction operations

Common reduction operations include summing, computing the product, the mean and the standard devision, i.e. `.sum()`, `.prod()`, `.mean()` and `.std()`. We already have seen `.sum()`. Try the other operation on the previous tensor and print the results in the TODO's below

In [None]:
# TODO 6.4. Compute the product of the t tensor elements
print(t.prod())

# TODO 6.5. Compute the mean of the t tensor elements
print(t.mean())

# TODO 6.6. Compute the standard devision of the t tensor elements
print(t.std())

tensor(0.)
tensor(0.8889)
tensor(1.1667)


All of these tensor methods reduce the tensor to a single element scalar valued tensor by operating on all the tensor's elements. That is, reduction operations in general allow us to compute aggregate (total) values across data structures. In our case, our structures are tensors.

**Question:** *Do reduction operations always reduce a tensor to a single element?*

**Answer:** Certainly **NO!** 

In fact, we often reduce specific axes at a time. This process is important. It's just like we saw with reshaping when we aimed to flatten tensors within a batch while still maintaining the batch axis.

#### Reducing tensors by axes

To reduce a tensor with respect to a specific axis, we use the same methods, and we just pass a value for the dimension `dim=` parameter indicating the axis on which the operation should occur. Suppose we're dealing with the tensor below.

In [None]:
t = torch.tensor([
    [1,1,1,1],
    [2,2,2,2],
    [3,3,3,3]
], dtype=torch.float32)

This is a 3 x 4 rank-2 tensor having different lengths for the two axes. Now let's consider the `.sum()` method again. Only, this time, we will specify a dimension to reduce. We have two axes so we'll do both to see what result they yield.

In [None]:
# Summing on the first dimension
t.sum(dim=0)

tensor([6., 6., 6., 6.])

In [None]:
# TODO 6.7. Sum on the second dimension
t.sum(dim=1)

tensor([ 4.,  8., 12.])

As you can see, the summing operation on the first dimension produces a resulting tensor of 4 elements that represents the addition of rows, i.e. addition along the the first axis. Similarly, for the second summing operation the addition proceeds along the columns of the input tensor. We usually refer to this type of operation as  proceeding along some axis of the input tensor. 

#### Argmax  tensor reduction operation example

`argmax()` is a mathematical function that tells us which argument, when supplied to a function as input, results in the function's max output value. Hence, in software terms, `argmax()` returns the index location of the maximum value inside a tensor.

When we call the argmax() method on a tensor, the tensor is reduced to a new tensor that contains an index value indicating where the max value is inside the tensor. Given the tensor below.

In [None]:
t = torch.tensor([
    [1,0,0,2],
    [0,3,3,0],
    [4,0,0,5]
], dtype=torch.float32)

What's it max value? Complete the TODO below.

In [None]:
# TODO 6.8. Compute the maximun value of t
t.max()

tensor(5.)

However, when we use the `argmax()` method, the resulting tensor is?

In [None]:
# TODO 6.9. Compute the argmax of tensor t
t.argmax()

tensor(11)

This does not return the index pair (2, 3) corresponding to the maximum value. Instead we get a single index which is 11. The reason for this is that without supplying a dimension the result is an index from the flattened tensor. To see this complete the TODO's below.

In [None]:
# TODO 6.10. Flatten the tesor t
t_flatten = t.flatten()

# TODO 6.11. Print the flattened tensor
print(t_flatten)

# TODO 6.12. Print the maximum of the flatten tensor
print(t_flatten.max())

# TODO 6.13. Compute the argmax() of the flatten tensor
argmax = t_flatten.argmax()

# TODO 6.14. Print the result
print(argmax)

tensor([1., 0., 0., 2., 0., 3., 3., 0., 4., 0., 0., 5.])
tensor(5.)
tensor(11)


Now if we compute the maximum and argmax by supplying different dimension we will get the actual results we were expecting in the first place.

In [None]:
# TODO 6.15. Print the maximum values in each row
print(t.max(dim = 1))

# TODO 6.16. Print the argmax values in each row
print(t.argmax(dim = 1))

# TODO 6.17. Print the maximum values in each column
print(t.max(dim = 0))

# TODO 6.18. Print the argmax values in each column
print(t.argmax(dim = 0))

torch.return_types.max(
values=tensor([2., 3., 5.]),
indices=tensor([3, 1, 3]))
tensor([3, 1, 3])
torch.return_types.max(
values=tensor([4., 3., 3., 5.]),
indices=tensor([2, 1, 1, 2]))
tensor([2, 1, 1, 2])


Notice how the call to the `max()` method returns two tensors. The first tensor contains the max values and the second tensor contains the index locations for the max values. The index location is what `argmax()` gives us.

For the first axis, the max values are, 4, 3, 3, and 5. These values are determined by taking the element-wise maximum across each array running across the first axis.

For each of these maximum values, the `argmax()` method tells us which element along the first axis where the value lives.

The 4 lives at index two of the first axis.
The first 3 lives at index one of the first axis.
The second 3 lives at index one of the first axis.
The 5 lives at index two of the first axis.
For the second axis, the max values are 2, 3, and 5. These values are determined by taking the maximum inside each array of the second axis. We have three groups of four, which gives us 3 maximum values.

The argmax values here, tell the index inside each respective array where the max value lives.

In practice, we often use the argmax() function on a network’s output prediction tensor, to determine which category has the highest prediction value.

### Accessing elements inside tensors

The last common type of operation that we need for tensors is the ability to access data from within the tensor. Suppose we have the following tensor:

In [None]:
import torch

t = torch.tensor([
    [1,2,3],
    [4,5,6],
    [7,8,9]
], dtype=torch.float32)

**Exercise 7**

Compute the mean of the entire tensor and print the result.

In [None]:
# TODO 7.1. Compute the mean of elements within the tensor
mean = t.mean()

As expected we get an rank 0 tensor that contains the mean. To actually get the value as a number, we use the [`.item()`](https://pytorch.org/docs/stable/tensors.html?highlight=item#torch.Tensor.item) tensor method which works for scalar valued tensors.

In [None]:
# TODO 7.2. Call item() on the mean of the tensor
mean.item()

5.0

For multiple values we have to use some conversion methods such as `.tolist()`  to convert the result to a Python list or `.numpy()` to convert the result to a Numpy array. 

In [None]:
# TOOD 7.3. Compute the mean along the first axis
result = t.mean(dim = 0)

# TOOD 7.4. Print the result
print(result)


# TOOD 7.5. Convert the result to a Python list
result_list = result.tolist()

# TOOD 7.6. Print the result
print(result_list)

# TOOD 7.7. Convert the result to a Numpy array
result_np = result.numpy()

# TOOD 7.8. Print the result
print(result_np)

tensor([4., 5., 6.])
[4.0, 5.0, 6.0]
[4. 5. 6.]


**Remember:** When we compute the mean across the first axis, multiple values are returned, and we can access the numeric values by transforming the output tensor into a Python list or a NumPy array.


With NumPy ndarray objects, we have a pretty robust set of operations for indexing and slicing, and PyTorch tensor objects support most of these operations as well. Use this a resource for [advanced indexing and slicing](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html).