# Setup

Our recommendation is to use [Miniconda](https://docs.anaconda.com/miniconda/) to manage your `Python` environment.
If you have not installed Miniconda or Anaconda yet, follow the installation instructions [here](https://docs.anaconda.com/miniconda/#quick-command-line-install).

As soon as you have installed Miniconda, you can create a new environment with the name `deeplearning` by running
```bash
conda create -n "deeplearning" python=3.11.5
```

After creating the environment, use the following command to activate it:
```bash
conda activate deeplearning
```

If you open this Jupyter notebook in VSCode, select the Python interpreter in the `deeplearning` conda environment in the upper right corner.
VSCode will then prompt you to install the `ipykernel` package, which you should install.

In [1]:
!python --version

Python 3.11.5


We will now install the latest [PyTorch](https://pytorch.org/) version. 
+ If you have an NVIDIA GPU, this will automatically install the CUDA 12.1 version. 
+ If you have a different CUDA version, please check the [official website](https://pytorch.org/) for the correct installation command. 
+ It's also fine if you only have a CPU.


In [None]:
%pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1

In [3]:
%pip show torch

Name: torch
Version: 2.4.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /home/twede/miniforge3/envs/deeplearning/lib/python3.11/site-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions
Required-by: torchaudio, torchvision
Note: you may need to restart the kernel to use updated packages.


## IDEs

Recommended:

+ [Visual Studio Code](https://code.visualstudio.com/) (with the `Jupyter` and `Python` extensions)
+ [Jetbrains PyCharm](https://www.jetbrains.com/pycharm/) (The student version is totally sufficient for this course, but you can also get the professional edition for free with your university email (https://www.jetbrains.com/shop/eform/students).)

# PyTorch Introduction

This tutorial is based on the excellent tutorial from the [PyTorch website](https://pytorch.org/tutorials/beginner/basics/intro.html), written by Suraj Subramanian, Seth Juarez, Cassie Breviu, Dmitry Soshnikov and Ari Bornstein.

## Tensors

+ Tensors are essentially **multidimensional arrays**.
+ Used in `PyTorch` to encode model inputs, outputs, and parameters.
+ Similar to NumPy’s `ndarrays` but can run on **GPUs and hardware accelerators**.
+ Tensors and NumPy arrays can share the same memory, avoiding data copying.
+ Optimized for **automatic differentiation**.

In [2]:
import torch
import numpy as np

Creating a tensor from a multidimensional Python array:

In [2]:
x = [[1, 2], [3, 4], [5, 6]]

x_torch = torch.tensor(x)
x_torch

tensor([[1, 2],
        [3, 4],
        [5, 6]])

... which is akin to how we create numpy arrays:

In [3]:
x_np = np.array(x)
x_np

array([[1, 2],
       [3, 4],
       [5, 6]])

We can create a torch tensor directly from a numpy array, which shares the same memory:

In [4]:
x_torch_from_np = torch.from_numpy(x_np)
x_torch_from_np

tensor([[1, 2],
        [3, 4],
        [5, 6]])

We can also convert a tensor to a numpy array using the `.numpy()` method.

In [5]:
x_torch.numpy()

array([[1, 2],
       [3, 4],
       [5, 6]])

### Tensor attributes: `dtype`, `shape`, and `device`

Tensors have a `dtype` attribute that specifies the data type of the elements in the tensor. 

Since the original list contains integers, the tensor is of type `torch.int64` (type inference).
The default type is `torch.float32`.

In [6]:
x_torch.dtype

torch.int64

If we create a tensor from a list of *floats*, the tensor will be of type `torch.float32`:

In [7]:
x_torch = torch.tensor([[1., 2.], [3., 4.], [5., 6.]])
x_torch.dtype

torch.float32

We can explictly set the dtype when constructing a tensor:

In [8]:
x_torch = torch.tensor([[1, 2], [3, 4], [5, 6]], dtype=torch.float64)
x_torch.dtype

torch.float64

Like in `numpy`, each torch tensor has a `shape` attribute that describes the size of each dimension of the tensor.

In [9]:
x_torch = torch.tensor([[1., 2.], [3., 4.], [5., 6.]])
print(x_torch)
print(f'\nShape: {x_torch.shape}')

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])

Shape: torch.Size([3, 2])


i.e., `x_torch` $\in \mathbb{R}^{3 \times 2}$

By default, `torch` tensors are created on the CPU. 
We can check which device the tensor is on using the `.device` attribute.

In [10]:
x_torch.device

device(type='cpu')

If we have a GPU, we can move it to the GPU using the `to` method:

In [11]:
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'

x_torch = x_torch.to(DEVICE)
x_torch.device

device(type='cuda', index=0)

### Creating Tensors with default values

There are other convenient ways to create tensors:

In [72]:
ones = torch.ones(3, 2)
ones

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])

In [59]:
zeros = torch.zeros(3, 2)
zeros

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

In [64]:
normal_samples = torch.randn(3, 2)
normal_samples

tensor([[ 1.3683, -0.1100],
        [-0.3104,  0.6630],
        [ 0.1123, -1.2856]])

In [79]:
arange = torch.arange(6.)
arange

tensor([0., 1., 2., 3., 4., 5.])

If we already have a tensor and we want to create a new tensor with the same `shape`, `dtype`, and `device`, we can use the following methods:

In [12]:
ones = torch.ones_like(x_torch)
ones

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]], device='cuda:0')

In [13]:
zeros = torch.zeros_like(x_torch)
normal_samples = torch.randn_like(x_torch)

### Reshaping


In [15]:
arange = torch.arange(8.)
print(arange)
print(f'\nShape: {arange.shape}')

tensor([0., 1., 2., 3., 4., 5., 6., 7.])

Shape: torch.Size([8])


In [16]:
arange = arange.reshape(4, 2)
arange

tensor([[0., 1.],
        [2., 3.],
        [4., 5.],
        [6., 7.]])

In [17]:
arange = arange.reshape(2, 2, 2)
arange

tensor([[[0., 1.],
         [2., 3.]],

        [[4., 5.],
         [6., 7.]]])

### Indexing

In [18]:
tensor = torch.arange(16.).reshape(4, 4)
tensor

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [12., 13., 14., 15.]])

Indexing works the same way as in `numpy`:

In [132]:
tensor[2, 1]

tensor(9.)

In [20]:
print(f"First row: {tensor[0, :]}")
print(f"First column: {tensor[:, 0]}")
print(f"Second column: {tensor[:, 1]}")
print(f"Last column: {tensor[:, -1]}")

First row: tensor([0., 1., 2., 3.])
First column: tensor([ 0.,  4.,  8., 12.])
Second column: tensor([ 1.,  5.,  9., 13.])
Last column: tensor([ 3.,  7., 11., 15.])


### Operations

In [127]:
tensor

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [12., 13., 14., 15.]])

[Broadcast operation](https://pytorch.org/docs/stable/notes/broadcasting.html):

In [137]:
t2 = tensor + 1.5
t2

tensor([[ 1.5000,  2.5000,  3.5000,  4.5000],
        [ 5.5000,  6.5000,  7.5000,  8.5000],
        [ 9.5000, 10.5000, 11.5000, 12.5000],
        [13.5000, 14.5000, 15.5000, 16.5000]])

Add two tensors of same shape:

In [142]:
tensor + t2

tensor([[ 1.5000,  3.5000,  5.5000,  7.5000],
        [ 9.5000, 11.5000, 13.5000, 15.5000],
        [17.5000, 19.5000, 21.5000, 23.5000],
        [25.5000, 27.5000, 29.5000, 31.5000]])

Component-wise product:

In [143]:
tensor * t2

tensor([[  0.0000,   2.5000,   7.0000,  13.5000],
        [ 22.0000,  32.5000,  45.0000,  59.5000],
        [ 76.0000,  94.5000, 115.0000, 137.5000],
        [162.0000, 188.5000, 217.0000, 247.5000]])

We can *transpose* the tensor using the `T` method:

In [146]:
print(tensor)
print('\n')
print(tensor.T)

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [12., 13., 14., 15.]])


tensor([[ 0.,  4.,  8., 12.],
        [ 1.,  5.,  9., 13.],
        [ 2.,  6., 10., 14.],
        [ 3.,  7., 11., 15.]])


Matrix multiply:

In [129]:
tensor.T @ tensor

tensor([[224., 248., 272., 296.],
        [248., 276., 304., 332.],
        [272., 304., 336., 368.],
        [296., 332., 368., 404.]])

### Aggregation functions

In [150]:
print(f'Sum: {torch.sum(tensor)}')
print(f'Product: {torch.prod(tensor)}')
print(f'Max: {torch.max(tensor)}')
print(f'Min: {torch.min(tensor)}')
print(f'Mean: {torch.mean(tensor)}')

Sum: 120.0
Product: 0.0
Max: 15.0
Min: 0.0
Mean: 7.5


We can aggregate values over certain dimensions:

In [154]:
tensor

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [12., 13., 14., 15.]])

In [151]:
tensor.shape

torch.Size([4, 4])

In [155]:
col_sums = torch.sum(tensor, dim=0)
col_sums

tensor([24., 28., 32., 36.])

# Automatic Differentiation

Let $\mathbf{y} = (y_1, \dots, y_n)^T$ and $\hat{\mathbf{y}} = (\hat{y}_1, \dots, \hat{y}_n)^T$.
The *mean squared error function* is given by
$$\mathcal{L}(\mathbf{y}, \hat{\mathbf{y}}) = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 $$

In [14]:
def loss_fn(y_true, y_pred):
    return torch.mean((y_true - y_pred) ** 2)

For performance reasons, *avoid loops at all costs*. Instead, use the built-in functions that operate on the entire tensor at once.

In [20]:
# Do *not* do this:
def loss_fn_slow_please_dont_do_this(y_true, y_pred):
    loss = 0
    for y, y_hat in zip(y_true, y_pred):
        loss += (y - y_hat) ** 2
    
    return loss / len(y_true)

Generate dummy data. Note that we set `requires_grad=True` for `y_hat` because we will compute gradients with respect to it.

In [37]:
# Set random seed for reproducibility
torch.manual_seed(302); np.random.seed(302)

y_true = torch.randn(10)
y_hat = torch.randn(10, requires_grad=True)

In [38]:
print(f'True: {y_true}')
print(f'Predicted: {y_hat}')

True: tensor([ 1.2101,  0.9257, -0.6577, -1.6145,  0.7098, -1.2300, -1.1954, -0.1903,
         0.2244, -1.3593])
Predicted: tensor([-0.2696,  0.9912,  2.6123, -0.2423, -1.8687,  0.2160, -0.1764,  1.1408,
        -0.4920,  1.2591], requires_grad=True)


Let's test *automatic differentiation* with PyTorch.

For example, let's compute $$ \nabla_{\hat{\mathbf{y}}} \mathcal{L}(\mathbf{y}, \hat{\mathbf{y}})$$
for our **particular values** of $\mathbf{y}$ and $\hat{\mathbf{y}}$.

In [39]:
loss = loss_fn(y_true, y_hat)
loss

tensor(3.3690, grad_fn=<MeanBackward0>)

In [40]:
loss.backward()

To access $ \nabla_{\hat{\mathbf{y}}} \mathcal{L}(\mathbf{y}, \hat{\mathbf{y}})$, we can use the `grad` attribute of the tensor `y_hat`.

In [41]:
y_hat.grad

tensor([-0.2959,  0.0131,  0.6540,  0.2744, -0.5157,  0.2892,  0.2038,  0.2662,
        -0.1433,  0.5237])