# PyTorch
In this exercise, we will look at some basic functionality of PyTorch. Your are free to use other DL frameworks for your exercises and your project. However, the master solutions and code examples will be in PyTorch.

The [PyTorch documentation](https://pytorch.org/docs/stable/index.html) offers information on its functionality. A lot of the time, your specific question will also have been asked on the [PyTorch Forum](https://discuss.pytorch.org/), often with competent answers by the core developers (Google will find the relevant thread for you).

First, we have to install PyTorch. We will install the basic version for this exercise. For your project, if you want to run on a GPU, you'll have to make sure to have a PyTorch version installed that is compatible with the CUDA version of your NVIDIA drivers. PyTorch has an [installation guide](https://pytorch.org/get-started/locally/) that will help you with getting the right version.

In [1]:
%pip install -q -U numpy
%pip install -q torch ipywidgets

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
numba 0.61.0 requires numpy<2.2,>=1.24, but you have numpy 2.2.3 which is incompatible.[0m[31m
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
import torch

## Tensor operations
Most of PyTorch's operations have the same name as in NumPy. The basic object for storing data is the `torch.tensor`, the equivalent of the `np.array`. With the help of the [Tensor tutorial](https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html), do the following:

- Create a `torch.tensor` with the elements `[[1, 2], [3, 4]]`
- Create a tensor of ones/zeros with the same shape and dtype
- Create a random tensor of the same shape
- Print the tensor's shape, data type and device
- Try to move it to the GPU
- For Mac users: Try to move it to [MPS](https://pytorch.org/docs/stable/notes/mps.html)
- Check out indexing/slicing operations, and how you can assign values to a slice.
- Combine tensors with `torch.cat` and `torch.stack`. What are the differences?
- Multiply tensors, element-wise and with matrix multiplication.

### Instantiating Tensors

In [16]:
data = [[1, 2], [3, 4]]
x_data = torch.tensor(data, dtype=torch.float)
print(x_data)
print(x_data.shape)
print(x_data.dtype)
print(x_data.device)

tensor([[1., 2.],
        [3., 4.]])
torch.Size([2, 2])
torch.float32
cpu


In [17]:
x_data = torch.ones((2, 2), dtype=torch.float)
print(x_data)
print(x_data.shape)
print(x_data.dtype)
print(x_data.device)

tensor([[1., 1.],
        [1., 1.]])
torch.Size([2, 2])
torch.float32
cpu


In [18]:
x_data = torch.zeros((2, 2), dtype=torch.float)
print(x_data)
print(x_data.shape)
print(x_data.dtype)
print(x_data.device)

tensor([[0., 0.],
        [0., 0.]])
torch.Size([2, 2])
torch.float32
cpu


In [19]:
x_data = torch.rand((2, 2), dtype=torch.float)
print(x_data)
print(x_data.shape)
print(x_data.dtype)
print(x_data.device)

tensor([[0.7517, 0.9648],
        [0.4269, 0.5382]])
torch.Size([2, 2])
torch.float32
cpu


### Moving tensor to GPU

In [20]:
if torch.cuda.is_available():
  x_data = x_data.to('cuda')
  print(x_data.device)

cuda:0


### Concatenating and stacking tensors

In [29]:
print(torch.cat([x_data, x_data, x_data], dim=0))

tensor([[0.7517, 0.9648],
        [0.4269, 0.5382],
        [0.7517, 0.9648],
        [0.4269, 0.5382],
        [0.7517, 0.9648],
        [0.4269, 0.5382]], device='cuda:0')


In [28]:
print(torch.stack([x_data, x_data, x_data], dim=0))

tensor([[[0.7517, 0.9648],
         [0.4269, 0.5382]],

        [[0.7517, 0.9648],
         [0.4269, 0.5382]],

        [[0.7517, 0.9648],
         [0.4269, 0.5382]]], device='cuda:0')


### Mutliplying tensors

In [31]:
print(x_data * x_data)

tensor([[0.5650, 0.9309],
        [0.1823, 0.2896]], device='cuda:0')


In [32]:
print(x_data @ x_data.T)

tensor([[1.4959, 0.8401],
        [0.8401, 0.4719]], device='cuda:0')


## Neural Network Basics
Solve the followings tasks with the help of the [Neural networks tutorial](https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html).

The `nn.Module` is the basic class for layers, networks and models. All parameters of an `nn.Module` are automatically discovered by PyTorch and updated by back-propagation.

First, define a neural network (as a subclass of `nn.Module`) with two linear layers and a ReLU non-linearity in between. Make the input, output, and inner dimensions parameters of your network.

In [62]:
import torch.nn as nn
import torch.nn.functional as F

In [63]:
input_dim = 16
output_dim = 8
inner_dim = 32

In [64]:
class Network(nn.Module):
    def __init__(self, i_dim: int, o_dim: int, n_dim: int):
        super().__init__()
        self.fc1 = nn.Linear(i_dim, n_dim)
        self.fc2 = nn.Linear(n_dim, o_dim)

    def forward(self, data):
        x1 = F.relu(self.fc1(data))
        output = F.relu(self.fc2(x1))
        return output

model = Network(input_dim, output_dim, inner_dim)

Move the entire network to the GPU/MPS.

In [65]:
if torch.cuda.is_available():
    model = model.to('cuda')


Print the parameters of your network.

In [66]:
print(list(model.parameters()))

[Parameter containing:
tensor([[ 1.1357e-01, -1.2461e-01, -9.6427e-02,  2.1063e-01,  2.4405e-01,
          1.9475e-01,  1.4092e-01,  1.0269e-01, -2.1790e-01, -6.4003e-02,
          3.8139e-02, -2.2471e-01,  1.2052e-01, -2.2103e-01,  2.0432e-01,
          1.8947e-01],
        [-1.4985e-02,  1.0859e-01,  1.4339e-01, -7.6691e-02, -1.3636e-01,
          1.6803e-01,  2.4665e-02, -2.3384e-02, -2.0710e-01, -8.8228e-02,
         -1.6718e-01, -5.9927e-02, -1.7012e-01, -2.3862e-01,  1.0852e-02,
          9.2305e-02],
        [ 1.5696e-01, -1.4668e-01, -7.9758e-02, -2.3428e-01, -1.7764e-01,
          2.4710e-01,  1.5189e-01, -2.1949e-01,  8.2772e-02, -2.3328e-01,
          2.0154e-01,  2.2618e-01,  1.9442e-01,  2.2126e-02,  1.1351e-01,
          1.6278e-01],
        [ 2.4086e-01, -1.4690e-01,  5.7770e-02,  1.9331e-01,  1.9940e-01,
         -1.4034e-01,  8.5880e-02, -8.5437e-02,  1.3639e-01, -2.3808e-01,
         -1.0914e-01, -8.5457e-02, -1.1195e-01, -1.9593e-01, -1.6752e-01,
          1.6535e-01

Run a single forward-pass with a random input.

In [70]:
input = torch.randn(input_dim).to('cuda')
output = model.forward(input)

Define a `nn.MSELoss` and a random target.

In [71]:
target = torch.rand(output_dim).to('cuda')
criterion = nn.MSELoss()

Compute the loss and run backpropagation.

In [72]:
loss = criterion(output, target)

model.zero_grad()     # zeroes the gradient buffers of all parameters

print('fc1.bias.grad before backward')
print(model.fc1.bias.grad)

loss.backward()

print('fc1.bias.grad after backward')
print(model.fc1.bias.grad)

fc1.bias.grad before backward
None
fc1.bias.grad after backward
tensor([ 0.0000,  0.0000,  0.0000, -0.0186,  0.0435, -0.0100,  0.0000,  0.0070,
         0.0000,  0.0000,  0.0000, -0.0166, -0.0069,  0.0113,  0.0352,  0.0000,
         0.0000,  0.0000,  0.0088,  0.0000,  0.0000,  0.0000, -0.0028,  0.0272,
         0.0000, -0.0173,  0.0000, -0.0100, -0.0082,  0.0000, -0.0437,  0.0064],
       device='cuda:0')


Update the parameters of your network with a learning rate of 0.01.

In [73]:
print('fc1.bias before update')
print(model.fc1.bias)

learning_rate = 0.01
for f in model.parameters():
    f.data.sub_(f.grad.data * learning_rate)

print('fc1.bias after update')
print(model.fc1.bias)

fc1.bias before update
Parameter containing:
tensor([-0.0835, -0.0490,  0.1702, -0.0213,  0.1640,  0.1711,  0.1046,  0.0776,
         0.0779, -0.2013, -0.2531,  0.2261,  0.0535,  0.1303,  0.0930, -0.0091,
        -0.2097,  0.0518,  0.1130, -0.0166,  0.0245,  0.1981,  0.0575,  0.2252,
         0.0303,  0.1074, -0.1837, -0.2381,  0.0373, -0.1711,  0.0095, -0.0986],
       device='cuda:0', requires_grad=True)
fc1.bias after update
Parameter containing:
tensor([-0.0835, -0.0490,  0.1702, -0.0211,  0.1635,  0.1712,  0.1046,  0.0775,
         0.0779, -0.2013, -0.2531,  0.2263,  0.0536,  0.1302,  0.0926, -0.0091,
        -0.2097,  0.0518,  0.1129, -0.0166,  0.0245,  0.1981,  0.0576,  0.2249,
         0.0303,  0.1076, -0.1837, -0.2380,  0.0374, -0.1711,  0.0099, -0.0986],
       device='cuda:0', requires_grad=True)


Use the `AdamOptimizer` instead to update your parameters (see the [torch.optim documentation](https://pytorch.org/docs/stable/optim.html)).

In [57]:
from torch import optim

In [69]:
optimizer = optim.Adam(model.parameters(), lr=0.01)
loss = criterion(output, target)

print('fc1.bias before update')
print(model.fc1.bias)

optimizer.zero_grad()
loss.backward()
optimizer.step()

print('fc1.bias after update')
print(model.fc1.bias)

fc1.bias before update
Parameter containing:
tensor([-0.0835, -0.0490,  0.1702, -0.0313,  0.1640,  0.1611,  0.1046,  0.0776,
         0.0879, -0.1913, -0.2431,  0.2161,  0.0435,  0.1203,  0.0930, -0.0091,
        -0.2097,  0.0418,  0.1230, -0.0166,  0.0245,  0.1981,  0.0575,  0.2252,
         0.0303,  0.0974, -0.1837, -0.2481,  0.0273, -0.1611,  0.0095, -0.1086],
       device='cuda:0', requires_grad=True)
fc1.bias after update
Parameter containing:
tensor([-0.0835, -0.0490,  0.1702, -0.0213,  0.1640,  0.1711,  0.1046,  0.0776,
         0.0779, -0.2013, -0.2531,  0.2261,  0.0535,  0.1303,  0.0930, -0.0091,
        -0.2097,  0.0518,  0.1130, -0.0166,  0.0245,  0.1981,  0.0575,  0.2252,
         0.0303,  0.1074, -0.1837, -0.2381,  0.0373, -0.1711,  0.0095, -0.0986],
       device='cuda:0', requires_grad=True)
