# **Introduction to PyTorch**

The material in this notebook is based on the [PyTorch documentation](https://pytorch.org/docs/stable/index.html). In the `Tensors` chapter, ChatGPT has helped with a few example code chunks and summarization for broadcasting rules :wink:.

As you already know, python has libraries for nearly all tasks. For machine learning, the two most popular ones are [PyTorch](https://pytorch.org/) and [TensorFlow](https://www.tensorflow.org/). Even though TensorFlow may be a little easier to start with, PyTorch offers more flexibility, which is why we like to work with it.

We will introduce you to `Tensors` and `Modules`, which are the base for all the helpful neural network building blocks in PyTorch. We will also show you how to prepare data for neural network training and how to build a simple network.

We hope you will have fun and find this material useful.

<img src="../figures/HeaDS_logo_large_withTitle.png" width="200">

(This notebook was created by Viktoria Schuster)

# Tensors

Tensors are THE data structure in machine learning. A tensor is a structure very similar to a numpy array, but it can be used on GPUs. And GPUs we really need if we don't want to get old waiting ...

<img src="../figures/tensors_image.jpeg" width="400">

## Basics: initialization, shape, type and device

### Initialization

You can create tensors from numpy arrays and various other data types.

In [1]:
# create a dummy array
import numpy as np
example_array = np.array([1, 2, 3, 4, 5])
print("the array: ", example_array)

the array:  [1 2 3 4 5]


In [2]:
# convert the array to a tensor
import torch
example_tensor = torch.tensor(example_array)
print("the tensor: ", example_tensor)
# there are more functions of how to convert to a tensor
# but I like this general one because it is more flexible

the tensor:  tensor([1, 2, 3, 4, 5])


In [26]:
# other initialization options
# list
torch.tensor([1,2,3,4,5])
# tuple
torch.tensor((1,2,3,4,5))
# ranges
torch.tensor(range(1,6))
# floats and integers
torch.tensor(0.5)
# dataframes
import pandas as pd
df = pd.DataFrame({'a': [1,2,3,4,5], 'b': [6,7,8,9,10]})
torch.tensor(df.values)

tensor([[ 1,  6],
        [ 2,  7],
        [ 3,  8],
        [ 4,  9],
        [ 5, 10]])

In [12]:
# another very helpful one: create a tensor of zeros (great placeholder)

# create a tensor of 2 zeros
torch.zeros(2)

tensor([0., 0.])

More creation options can be found [in the docs](https://pytorch.org/docs/stable/torch.html#creation-ops)

### Shape, dtype and device

A tensor has 3 attributes:
- shape
- data type
- device

these words you will often see in error messages, because PyTorch complains about shapes not matching, and data types and devices not being the same.

In [3]:
example_array = np.array([1, 2, 3, 4, 5])
print("the array: ", example_array)
print("with shape: ", example_array.shape)
print("and dtype: ", example_array.dtype)

the array:  [1 2 3 4 5]
with shape:  (5,)
and dtype:  int64


In [4]:
example_tensor = torch.tensor(example_array)

#### Shape

In [5]:
print("tensor shape: ", example_tensor.shape)

tensor shape:  torch.Size([5])


In [6]:
# tensors can have as many dimensions as you wish
example_tensor2 = torch.zeros(2, 3)
print(example_tensor2)
print("shape: ", example_tensor2.shape)

tensor([[0., 0., 0.],
        [0., 0., 0.]])
shape:  torch.Size([2, 3])


#### Data type

In [7]:
print("tensor dtype: ", example_tensor.dtype)

tensor dtype:  torch.int64


In [8]:
print("its size (in memory): ", example_tensor.element_size()) #returns size in bytes

# you can initialize a tensor with a specific dtype
example_tensor2 = torch.tensor(example_array, dtype=torch.int8)
print("new dtype: ", example_tensor2.dtype)
print("new size (in memory): ", example_tensor2.element_size())

its size (in memory):  8
new dtype:  torch.int8
new size (in memory):  1


#### Device

The device is the location of the tensor on the computer. It will either be the CPU or GPU. The GPU makes things faster, so we love it. There are currently 3 options:
- cpu
- cuda (for NVIDIA GPUs)
- mps (for some MAC GPUs)

In [9]:
print("tensor device: ", example_tensor.device)

tensor device:  cpu


In [12]:
# you can check what you can use on your computer
print(torch.cuda.is_available())
print(torch.backends.mps.is_available())

False
False


In [10]:
# the device (where your tensor is stored) is CPU per default
# if you have a GPU, you can move your tensor there

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("device available: ", device)
example_tensor = example_tensor.to(device)
print("device after moving: ", example_tensor.device)

device available:  cpu
device after moving:  cpu


## Tensor operations

### Indexing, slicing and concatenating

In [13]:
# create a tensor
tensor = torch.tensor([[1, 2, 3],
                       [4, 5, 6],
                       [7, 8, 9]])

In [14]:
# select the first row
result1 = tensor[0]

print("first row: ", result1)

first row:  tensor([1, 2, 3])


In [15]:
# select the first two columns
result2 = tensor[:, :2]

print("first 2 columns:\n", result2)

first 2 columns:
 tensor([[1, 2],
        [4, 5],
        [7, 8]])


In [16]:
# select elements with a Boolean mask
mask = tensor > 5
result3 = tensor[mask] # careful, this is not in the original structure anymore

print("locations where tensor > 5:\n", mask)
print("elements that are > 5 (or elements of the mask locations)", result3)

locations where tensor > 5:
 tensor([[False, False, False],
        [False, False,  True],
        [ True,  True,  True]])
elements that are > 5 (or elements of the mask locations) tensor([6, 7, 8, 9])


In [91]:
# create two tensors
tensor1 = torch.tensor([[1, 2],
                        [3, 4]])
tensor2 = torch.tensor([[5, 6],
                        [7, 8]])

# concatenate the two tensors along the second dimension
result1 = torch.cat((tensor1, tensor2), dim=1)

print(result1)

tensor([[1, 2, 5, 6],
        [3, 4, 7, 8]])


More options can be found [in the docs](https://pytorch.org/docs/stable/torch.html#indexing-slicing-joining-mutating-ops)

### Element-wise operations

In [17]:
# create two tensors
tensor1 = torch.tensor([[1, 1],
                        [2, 2]])
tensor2 = torch.tensor([[1, 2],
                        [3, 4]])

In [18]:
# add the two tensors element-wise
result = torch.add(tensor1, tensor2)

print(result)

tensor([[2, 3],
        [5, 6]])


These special element-wise operations give the same results as classic math operations (see below). However, using them instead of math operations can be more efficient. For small tensors, you don't have to worry about this though ;)

In [47]:
# add the two tensors element-wise
result = tensor1 + tensor2

print(result)

tensor([[2, 3],
        [5, 6]])


More options can be found [in the docs](https://pytorch.org/docs/stable/torch.html#pointwise-ops)

Some useful examples are
- sub
- mul
- div
- pow
- exp
- log

### Reduction operations

In [19]:
# create a tensor
tensor = torch.tensor([[1, 2],
                       [3, 4]])
print(tensor)

tensor([[1, 2],
        [3, 4]])


In [20]:

# sum the tensor along the first dimension
result = torch.sum(tensor, dim=0)

print(result)

tensor([4, 6])


More options can be found [in the docs](https://pytorch.org/docs/stable/torch.html#reduction-ops)

Some useful examples are
- mean
- min
- std
- count_nonzero


### Matrix operations

In [21]:
# create two matrices as tensors
matrix1 = torch.tensor([[1, 1],
                        [2, 2]])
matrix2 = torch.tensor([[1, 2],
                        [3, 4]])

In [22]:
# multiply the two matrices
result = torch.matmul(matrix1, matrix2)
# does [[m1(0,0)*m2(0,0) + m1(0,1)*m2(1,0)], [m1(1,0)*m2(0,0) + m1(1,1)*m2(1,0)]]

print(result)

tensor([[ 4,  6],
        [ 8, 12]])


More options can be found [in the docs](https://pytorch.org/docs/stable/torch.html#blas-and-lapack-operations)

## More advanced stuff

Here are some -in my opinion- importent behaviours of tensors I wish I had known earlier, but they are not necessary to get started in Pytorch. Let's see how it goes in the course up until here. Otherwise, feel free to have a look yourselves.

### Broadcasting

Broadcasting is a PyTorch feature that enables you to perform element-wise operations on tensors with different shapes. In my opinion, this is one of the most useful but hidden features of torch.
The rules about broadcasting are:
- If two tensors have the same number of dimensions, their shapes must either be equal or one of them must be 1 in all dimensions.
- If two tensors have different numbers of dimensions, the tensor with fewer dimensions is expanded by adding dimensions of size 1 on the left until the number of dimensions matches.

Here is a figure demonstrating the rules:

<img src="../figures/tensor_broadcasting.png" width="300">

So let's have a look at the code.

In [60]:
# tensor with shape (2, 3)
tensor1 = torch.tensor([[1, 2, 3],
                        [4, 5, 6]])
# tensor with shape (3,)
tensor2 = torch.tensor([1, 2, 3])
result = tensor1 + tensor2
print(result)

tensor([[2, 4, 6],
        [5, 7, 9]])


In [61]:
# tensor with shape (2, 3)
tensor1 = torch.tensor([[1, 2, 3],
                        [4, 5, 6]])
# tensor with shape (2,)
tensor2 = torch.tensor([1, 2])
result = tensor1 + tensor2
print(result)

RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 1

In [62]:
# tensor with shape (2, 3)
tensor1 = torch.tensor([[1, 2, 3],
                        [4, 5, 6]])
# tensor with shape (2, 1)
tensor2 = torch.tensor([1, 2]).unsqueeze(1)
result = tensor1 + tensor2
print(result)

tensor([[2, 3, 4],
        [6, 7, 8]])


### Tensor modification

As briefly shown, we can change the dimensions and shapes of tensors as we please.

You can add and remove dimensions:
- tensor.<span style="color:slateblue">squeeze</span>() reduces a given dimension
- tensor.<span style="color:slateblue">unsqueeze</span>() adds a dimension in a given position

In [17]:
tensor1 = torch.tensor([[1, 2, 3],
                        [4, 5, 6]])

print("original shape:\n   ", tensor1.shape)
print("unsqueezing the first dimension:\n   ", tensor1.unsqueeze(0).shape)
print("unsqueezing the second dimension:\n   ", tensor1.unsqueeze(1).shape)
print("unsqueezing the last dimension:\n   ", tensor1.unsqueeze(-1).shape)

original shape:
    torch.Size([2, 3])
unsqueezing the first dimension:
    torch.Size([1, 2, 3])
unsqueezing the second dimension:
    torch.Size([2, 1, 3])
unsqueezing the last dimension:
    torch.Size([2, 3, 1])


In [90]:
# what can we squeeze?

# if the dimension to be squeezed is larger than 1, nothing happens
print("shape of squeezed tensor:\n   ", tensor1.squeeze(0).shape)

tensor2 = torch.tensor([[1, 2, 3]])
print("new tensor of shape:\n   ", tensor2.shape)
print("shape of the squeezed new tensor:\n   ", tensor2.squeeze(0).shape)

shape of squeezed tensor:
    torch.Size([2, 3])
new tensor of shape:
    torch.Size([1, 3])
shape of the squeezed new tensor:
    torch.Size([3])


You can also change the shapes completely
- tensor.<span style="color:slateblue">view</span>()
- tensor.<span style="color:slateblue">expand</span>()

You can understand viewing and reshaping as taking the tensor's values in their natural occurance, that means our tensor

| | c1 | c2 | c3 |
| --- | :---: | :---: | :---: |
| r1 | 1 | 2 | 3 |
| r2 | 4 | 5 | 6 |

Is first going by row, then by column. The order of elements can be seen by flattening the tensor.

In [18]:
tensor1 = torch.tensor([[1, 2, 3],
                        [4, 5, 6]])
print(tensor1.flatten())

tensor([1, 2, 3, 4, 5, 6])


When using view, the tensor elements are sorted into the new defined shape.

In [19]:
print(tensor1.view(3,2))

tensor([[1, 2],
        [3, 4],
        [5, 6]])


I can also let torch do some of the thinking for me. If I have a 2d tensor, I only have to specify the new size of one dimension, the other one will be defined `automatically`.

In [20]:
print(tensor1.view(-1,2))

tensor([[1, 2],
        [3, 4],
        [5, 6]])


So what are these reshaped tensors?

In [79]:
print("memory address of tensor1:\n",hex(tensor1.storage().data_ptr()))

tensor2 = tensor1.view(-1)

print("memory address of tensor2:\n",hex(tensor2.storage().data_ptr()))

memory address of tensor1:
 0x7f7b8b56cf40
memory address of tensor2:
 0x7f7b8b56cf40


See that tensor1 and tensor2 have the same memory address?
The view is not creating a new tensor, but just a different `view` on the same tensor. We call tensor2 a `pointer`. If you want this as a new and independent tensor, make a copy of unsing tensor.<span style="color:slateblue">clone</span>().

In [80]:
tensor2 = tensor1.clone().view(-1,2)
print("memory address of tensor2:\n",hex(tensor2.storage().data_ptr()))

memory address of tensor2:
 0x7f7b8b572a40


If you don't just need a new view, but for some reason you need to make the tensor bigger in some dimension, you can use tensor.<span style="color:slateblue">expand</span>().

In [26]:
tensor3 = torch.tensor([[1, 2, 3]])
print("original tensor size:\n", tensor3.shape)

# expanding the tensor in dimension 0 (first dimension), keeping the other dimension the same
tensor3.expand(2, -1)

original tensor size:
 torch.Size([1, 3])


tensor([[1, 2, 3],
        [1, 2, 3]])

# Parameters and Modules

## Parameter

Machine learning involves the optimization of trainable parameters. The `Parameter` is a tensor subclass. Its special property is that it can receive gradients.

In [23]:
# initializing an empty parameter tensor
param = torch.nn.Parameter()
print(param)

Parameter containing:
tensor([], requires_grad=True)


## Module

The `Module` class is a container and the base class for neural networks in PyTorch. You can create any model you want based on this with keeping in mind 3 things:
- it needs an `__init__()` constructor
- remember to initialize the parent class in the initialization (basic class stuff)
- it needs a `forward()` method

In [24]:
# let's create a dummy child of Module

import torch.nn as nn

class SimpleLinear(nn.Module):
    def __init__(self):
        super().__init__()
        self.weights = torch.nn.Parameter(torch.randn(2, 2))
        self.bias = torch.nn.Parameter(torch.randn(2))

    def forward(self, x):
        return self.weights * x + self.bias

linear = SimpleLinear()

input_vals = torch.randn(2)

output_vals = linear(input_vals)

print(output_vals)

tensor([[ 1.3468, -0.7039],
        [ 1.0307, -0.7146]], grad_fn=<AddBackward0>)


# Datasets and Dataloaders

PyTorch helps us a lot with the data during training. `Dataloaders` create random splits of the data in every epoch of training for us, but they do need to know how to get the data and what the data is exactly. This is where the `Dataset` class comes in.

### Dataset

Let's find a super simple dataset to work with. Many of you may be familiar with the iris dataset from R. We also have this in python.

In [4]:
# import some super basic data
from sklearn.datasets import load_iris

# load the iris dataset
iris = load_iris(as_frame=True)
print(iris.keys())

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])


With the `Dataset` class, we can prepare the iris dataset for PyTorch. We will need to inherit the PyTorch `Dataset` class and specify our `__init__()`, `__len__()` and `__getitem__()` constructors.

The dataloader only accepts certain types of data. Among these are tensors and numpy arrays. So we should also know what format our data is in and change it if necessary.

In [5]:
print(type(iris.data))
print(type(iris.target))

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>


The iris data is so far in dataframe format, so we will transform them into tensors in our `__init__()` constructor.

In [6]:
# create a Dataset class for the iris data

# import torch and the Dataset class
import torch
from torch.utils.data import Dataset

class IrisDataset(Dataset):
    """This is a child of Dataset providing the iris data to the dataloader
    The init and getitem constructors are absolutely necessary"""
    def __init__(self, data, targets):
        super().__init__()
        self.data = torch.Tensor(data.values)
        self.targets = torch.Tensor(targets.values)
    
    def __len__(self):
        return self.data.shape[0]
    
    def __getitem__(self, idx):
        # for a given index, return the data and target
        return self.data[idx,:], self.targets[idx]

iris_dataset = IrisDataset(iris.data, iris.target)

This was already the most complicated part in dealing with data.

### Dataloader

Now we are ready to use a dataloader to provide us with random batches of our data during training. There is a lot in the docs that makes it look complicated, but don't worry. Getting started with dataloaders is really simple. For a while, you will not need more than this:

In [7]:
# now we can create a dataloader
from torch.utils.data import DataLoader
iris_dataloader = DataLoader(iris_dataset, batch_size=8, shuffle=True)

# now we can iterate over the dataloader
for data, target in iris_dataloader:
    print(data)
    print(target)
    break

tensor([[6.4000, 2.8000, 5.6000, 2.1000],
        [6.7000, 3.3000, 5.7000, 2.1000],
        [5.1000, 3.8000, 1.6000, 0.2000],
        [4.7000, 3.2000, 1.3000, 0.2000],
        [7.9000, 3.8000, 6.4000, 2.0000],
        [5.7000, 4.4000, 1.5000, 0.4000],
        [7.2000, 3.6000, 6.1000, 2.5000],
        [6.0000, 3.0000, 4.8000, 1.8000]])
tensor([2., 2., 0., 0., 2., 0., 2., 2.])


# Neural Networks in PyTorch

Let's put our knowledge to use and see how one builds a neural network in PyTorch for a given task. If we stick to the iris dataset from before, we know that we have 4 features for every datapoint (flower) and want to predict the species.

This already gives us a lot of information about what we should do. We want to build a classifier that takes the 4 features as input and spits out the probabilities of each species.

We can start with a class inheriting the `nn.Module` again as we have done before. Now we will use some more modules and functions that PyTorch provides in order to do actual machine learning. There are lots of nice building blocks that we can use, [have a look](https://pytorch.org/docs/stable/nn.html).

We often call these building blocks `Layers`. They are also PyTorch `Modules` with specific parameters and operations that are executed for you. We will start very simple, with linear layers and rectified linear activation. The `Linear` layer applies a simple linear transformation to data $x$:

$ y = xA^T + b$. with weight matrix $A$ and bias $b$ being trainable parameters of class `Parameter`.

In [1]:
# let's build a little classifier for iris

import torch.nn as nn
import torch.nn.functional as F

# define the network
class Classifier(nn.Module):
    def __init__(self, in_features:int, hidden_features:int, out_features:int):
        super().__init__()
        self.fc1 = nn.Linear( # define the first fully connected layer
            in_features,
            hidden_features
        )
        self.fc2 = nn.Linear( # define the second fully connected layer
            hidden_features,
            out_features
        )
        
    def forward(self, x):
        z = self.fc1(x) # apply the first fully connected layer
        z = F.relu(z) # apply the relu activation function (non-linearity)
        z = self.fc2(z) # apply the second fully connected layer
        return F.softmax(z, dim=1) # apply the softmax activation function to return probabilities for each class

In [2]:
iris_classifier = Classifier(in_features=4, hidden_features=64, out_features=3) # create an instance of the network

Now that we have an example, we can get a better intuition about how the Linear layer works.

In [7]:
### printing some things that happen in the network forward pass and architecture
dummy_input = torch.randn(1, 4)
print("an input sample would have the shape: ", dummy_input.shape)
print("the first layer weights have the shape: ", iris_classifier.fc1.weight.shape)
print("the first layer bias has the shape: ", iris_classifier.fc1.bias.shape)
print("the first layer output is of shape: ", iris_classifier.fc1(dummy_input).shape)

an input sample would have the shape:  torch.Size([1, 4])
the first layer weights have the shape:  torch.Size([64, 4])
the first layer bias has the shape:  torch.Size([64])
the first layer output is of shape:  torch.Size([1, 64])


The linear layer does in principle this (but more efficient in C):

In [8]:
pseudo_linear_output = torch.matmul(dummy_input, iris_classifier.fc1.weight.t()) + iris_classifier.fc1.bias
print(pseudo_linear_output.shape)

torch.Size([1, 64])


The ReLU activation is one of the simplest and most popular activation functions and looks like this:

<img src="../figures/ReLU_pytorch.png" width="300">

(image from [PyTorch docs](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html))