# Day 1 (Cont'd): Introduction of PyTorch
>![PyTorch](https://upload.wikimedia.org/wikipedia/commons/9/96/Pytorch_logo.png)

PyTorch is one of the widely used open-source Machine Learning libraries which provides primitives for defining functions on tensors and automatically computing their derivatives. It is powerful and achieved great success for implementing Machine Learning and other algorithms involving a large number of mathematical operations. 

PyTorch was developed by Meta AI and is one of the most popular Machine Learning libraries on GitHub. 
PyTorch tensors are similar to NumPy arrays and can additionally be operated on a CUDA-capable GPU or TPU. 
Thus, PyTorch is mainly used as:
>* A tool for tensor computation (with GPU support)
>* A Deep Learning (research) platform which uses tape-based automatic differentiation

PyTorch is similar to NumPy, so it will feel quite familiar if you have used NumPy before.


This tutorial will have a brief introduction about the core components of PyTorch, including
>* Basic PyTorch: tensors, computational graph, parameters, gpu support
>* Flowchart: build graph, get output, gradient computation

## Tensor

Tensors are the basic data structures in PyTorch.
Formally, tensors are multilinear maps from vector spaces to the real numbers. In other words, a tensor is a N-dimensional vector, means a tensor can be used to represent N-dimensional datasets.

A scalar is a tensor, a vector is a tensor, a matrix is also a tensor, but they are of different dimensions.

![tensor](https://cdn-images-1.medium.com/max/1000/1*Wv9adjSwmgl4wLE7lSTRIw.png)

As the dimensions keep on increasing, data representation will become more and more complex. On given tensors, we can apply PyTorch operators for processing, which are similar to their counterparts in numpy.

In [1]:
import torch
import torch.nn as nn
import numpy as np

print(torch.__version__)
print(np.__version__)

2.2.2+cu121
1.26.4


In [2]:
### with numpy
a = np.eye(2); b = np.ones((2,2))
print(np.sum(b))
print(a.shape)
print(np.reshape(a, (1,4)))
print(np.matmul(a, b))

4.0
(2, 2)
[[1. 0. 0. 1.]]
[[1. 1.]
 [1. 1.]]


In [3]:
### repeat it in PyTorch
a = torch.eye(2); b = torch.ones((2,2))
print(torch.sum(b)) # or print(b.sum())
print(a.size()) # or just use a.shape, like in numpy
print(torch.reshape(a, (1,4))) # or print(a.view(1,4))
print(torch.mm(a, b))

tensor(4.)
torch.Size([2, 2])
tensor([[1., 0., 0., 1.]])
tensor([[1., 1.],
        [1., 1.]])


For more numpy-style matrix operations you can refer to the [pytorch documentation](https://pytorch.org/docs/stable/index.html).

## Computational Graph

Now we understood what tensors are and itâ€™s time to understand the dynamically created computational graph. Instead of using a static graph concept, such as TensorFlow, PyTorch takes a dynamic graph creation approach. That is, the graph is created on the go. This makes it possible to change the graphs even during runtime.

A computational graph is a series of PyTorch operations arranged into a graph.

For example, look at the graph below:

>![dynamic graph](https://raw.githubusercontent.com/pytorch/pytorch/master/docs/source/_static/img/dynamic_graph.gif)

Some rules for the computational graph:
1. Start of the graph is always tensors. Therefore, operations can never occur without inputs
2. Each operation should accept tensors and then produce new tensors
3. Complex operations are in hierarchial order
4. Operations in the nodes of the same level are independent of each other

### Automatic Differentiation

PyTorch uses reverse-mode auto-differentiation in order to compute the gradient of a function with respect to the inputs. Automatic differentiation utilises the chain rule, which allows for calculating complex derivatives by splitting them and recombining them later. Therefore, it is a very useful tool for neural networks.

`requires_grad` determines whether PyTorch needs to calculate the gradients with respect to this tensor later in the optimisation steps or not. This argument is set to **False** by default. When we set the parameter `requires_grad=True`, we specify that once the gradient of a tensor, which was built based on this tensor, is calculated, we store the derivative with respect to this tensor in its `grad` attribute. That is, whenever the tensor is used in an operation, PyTorch creates and stores a gradient function for it.
In order to do so, each time a new tensor is created by operating on other tensors, the derivative functions of said operations are stored in the new tensor's `grad_fn` attribute.

If the `backward()` function of a tensor is called, it computes the gradient of a tensor w.r.t. graph leaves, i.e., it iteratively combines the derivative (`grad_fn`) functions via chain rule until the derivatives w.r.t. all tensors with `requires_grad=True` are calculated. `Attention:` You might need to zero `.grad` attributes or set them to None before calling it. 
If a tensor is non-scalar (i.e. its data has more than one element) and requires_grad=True, the function additionally requires specifying the gradient argument in the `backward()` function. However, most of the time we only have a scalar output, since we compute the loss for our model.

In [4]:
a = torch.tensor(1., requires_grad=True)
b = 2

x = torch.randn(5, requires_grad=True)
y = torch.randn(5)
print(x)
print(y)

print("\nSome Operations")
print(a * b)
print(x + b)
print(y + b)

tensor([ 0.4812,  0.4688,  1.3136,  1.2392, -0.3129], requires_grad=True)
tensor([ 0.2507, -1.8078, -1.6042,  0.5388, -1.2312])

Some Operations
tensor(2., grad_fn=<MulBackward0>)
tensor([2.4812, 2.4688, 3.3136, 3.2392, 1.6871], grad_fn=<AddBackward0>)
tensor([2.2507, 0.1922, 0.3958, 2.5388, 0.7688])


When we print the output from the above example, we see there is a `grad_fn` parameter. This is the gradient function which is automatically created by PyTorch and is used for backpropagation.

In [5]:
c = a * b
# a.grad.zero_()
c.backward()
print("Operation c = a * b: ", c)
print("Gradient of c w.r.t. a: ", a.grad)

w = torch.mean(x + b)
# x.grad.zero_()
w.backward()
print("\nOperation mean(x + b): ", w)
print("Gradient of w w.r.t. x: ", x.grad)

Operation c = a * b:  tensor(2., grad_fn=<MulBackward0>)
Gradient of c w.r.t. a:  tensor(2.)

Operation mean(x + b):  tensor(2.6380, grad_fn=<MeanBackward0>)
Gradient of w w.r.t. x:  tensor([0.2000, 0.2000, 0.2000, 0.2000, 0.2000])


In [6]:
# Example when no gradient is computed
d = torch.tensor(2.)
z = a * d
z.backward()
print("Operation z = a * d: ", z)
print("Gradient: ", d.grad)

Operation z = a * d:  tensor(2., grad_fn=<MulBackward0>)
Gradient:  None


### Zeroing out the gradients

PyTorch accumulates the gradients on subsequent backward passes, i.e., on every loss.backward() call. This can be beneficial if you want to calculate the gradient summed over multiple mini-batches. However, typically you want to zero out the gradients when you start your training loop so that the parameter update is done correctly. If the gradient is not set to zero before backpropagation, the gradient would be a combination of the old (already used) and the newly calculated gradient.

In order to zero out the gradients you can do the following, before calling the `backward()` function:

In [7]:
a.grad.zero_()
x.grad.zero_()

tensor([0., 0., 0., 0., 0.])

**Note:** When you are using an optimiser from the `torch.optim` package you can use the `zero_grad()` function to zero out all gradients, e.g., `optimiser.zero_grad()`.

## Custom PyTorch Modules and Parameters

PyTorch uses modules to represent neural networks. In this connection, `torch.nn.Module` poses the base class for all neural network modules. Therefore, your models should subclass this class as well.
When you need a model which has more complex modules than the already existing ones, you have to define your own custom modules and use them for your desired model.

In order to add a parameter to the model parameter list, `torch.nn.Parameter()` has to be used. This way, the tensor will automatically be added to the list. **Attention:** this would not be the case when simply using a normal tensor.

For more information please be referred to the [PyTorch modules documentation](https://pytorch.org/docs/stable/notes/modules.html). 

Example code for our own module which should calculate the following formula: $f(x) = ax^2 + bx + c$, with $x$ being the input data and $a$, $b$, $c$ being the weights:

In [8]:
class CustomModule(torch.nn.Module):
    def __init__(self, size):
        super().__init__()
        self.a = torch.nn.Parameter(torch.randn((size)))
        self.b = torch.nn.Parameter(torch.randn((size)))
        self.c = torch.nn.Parameter(torch.randn((size)))
    
    def forward(self, x):
        return self.a * x**2 + self.b * x + self.c

    def __repr__(self):
        return f'Polynomial2(a: {self.a.shape}, b: {self.b.shape}, c: {self.c.shape})'
        

module = CustomModule(size=1)
print("Module: ", module) # Uses our string representation
print("\nModule Parameters:\n", list(module.parameters()))
output = module(7)
print(module(7))
print(module.forward(7))

Module:  Polynomial2(a: torch.Size([1]), b: torch.Size([1]), c: torch.Size([1]))

Module Parameters:
 [Parameter containing:
tensor([-0.2613], requires_grad=True), Parameter containing:
tensor([0.0803], requires_grad=True), Parameter containing:
tensor([-0.3050], requires_grad=True)]
tensor([-12.5484], grad_fn=<AddBackward0>)
tensor([-12.5484], grad_fn=<AddBackward0>)


#### Further explanations w.r.t. the above custom module

* The python `__init__()` function initialises a newly created object and uses the passed arguments in order to do so. So, $\text{__init__()}$ is called when the class is called to initialise the instance.
* The python $\text{__call__()}$ function is called when the instance is called. It allows the class's instance to be called as a function. Moreover, **it is already defined** in `nn.Module`, will register all hooks, and call your $\text{forward()}$ function. That is, your module can be used as a function. For instance, above we can simply say `output = module(7)` instead of `output = module.forward(7)`.
* The python $\text{__repr__()}$ function: returns the object representation in string format. It is called when the repr()-function is applied to the object.

## Conversion from NumPy to PyTorch and back

All previous examples have manually defined tensors for input data, but how can we input external data into PyTorch?
Simple solution is to use from_numpy:

In [9]:
a = np.random.random((3,3))
print(a)

# Convert numpy array to tensor
ta = torch.from_numpy(a) 
print(ta)
# or ta = torch.tensor(a)
# or ta = torch.Tensor(a)

# Convert back
na = ta.detach().cpu().numpy()
print(na)

[[0.43565476 0.02605827 0.58459374]
 [0.97909302 0.48270368 0.14847264]
 [0.52390551 0.09153113 0.09173771]]
tensor([[0.4357, 0.0261, 0.5846],
        [0.9791, 0.4827, 0.1485],
        [0.5239, 0.0915, 0.0917]], dtype=torch.float64)
[[0.43565476 0.02605827 0.58459374]
 [0.97909302 0.48270368 0.14847264]
 [0.52390551 0.09153113 0.09173771]]


## GPU support

In [10]:
# Check for cuda availability
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Create tensor on device
x = torch.ones(5, device=device)

# Move tensor to device
x = x.to(device)
print(x)

tensor([1., 1., 1., 1., 1.])


## Exercise 2:
For the following 10*10 array A, 



> [[0.69145505, 0.86931882, 0.88576413, 0.82707554, 0.94754421, 0.54767962,  0.51818679, 0.27907936, 0.95212406, 0.22750068],

> [0.345638,   0.16172159, 0.87807572, 0.38444467, 0.84255332, 0.69666159,  0.43339905, 0.91927538, 0.58666126, 0.83215206],

>[0.91359442, 0.06356911, 0.37205853, 0.18242315, 0.37961342, 0.93335263,  0.34068447, 0.48598708, 0.24260729, 0.70004846],

>[0.75245372, 0.64147803, 0.84013461, 0.6152693,  0.02235612, 0.4492574,  0.55206705, 0.69409179, 0.1666939,  0.67387225],

>[0.30664677, 0.87559232, 0.07164895, 0.85516997, 0.77945438, 0.51948711,  0.18721151, 0.7690967 , 0.53605078, 0.55431431],

>[0.1750064,  0.95009262, 0.57121048, 0.87359026, 0.05715099, 0.43202169,  0.3648696,  0.24367817, 0.06807447, 0.46999578],

>[0.41121198, 0.10125657, 0.0869751,  0.91816382, 0.01738795, 0.19420588,  0.00127754, 0.19281699, 0.56083174, 0.55424236],

>[0.34467108, 0.18352578, 0.69203741, 0.48087863, 0.39596428, 0.28107969,  0.09727506, 0.11236618, 0.82687268, 0.22700161],

>[0.92788092, 0.87184167, 0.72492497, 0.94086364, 0.86998108, 0.35178978,  0.45463869, 0.0242793,  0.75607483, 0.21317889],

>[0.15680697, 0.13109825, 0.93463861, 0.78143659, 0.30680001, 0.67935342,  0.3583568,  0.7522564,  0.19810852, 0.22378965]],



answer the following questions by programming with PyTorch. Again avoid any explicit implementations of for-/while-loops or list comprehensions.


1.   What is the row index in A that has the largest last element? 
2.   What is the row index in A that has the second largest last element?
3.   What is the row index in A that has a row sum greater than 5?
4.   What is the sum of all elements of the form A\[i, i+1] in A?
5.   Multiply elementwise every row in matrix A by the vector w \[0.49039597 0.73424538 0.08249155 0.0488797  0.62525918 0.29331343 0.76435348 0.68825002 0.53465669 0.3399619], and then sum the results for every row
6.   Do the above task in 5. using matrix multiplication
7.   Set all element of A that are larger than 0.5 to 0. What is the sum of the resulting matrix?
8.   Subtract 1 from all elements of A that are smaller than 0.5. What is the sum of the resulting matrix?
9.   What is the sum of all element in A, that are smaller than their column's mean?
10.  Create a diagonal matrix B of size 10*10, with B\[i,i] = i


In [11]:
import torch 

A = torch.tensor([[0.69145505, 0.86931882, 0.88576413, 0.82707554, 0.94754421, 0.54767962,  0.51818679, 0.27907936, 0.95212406, 0.22750068],
[0.345638,   0.16172159, 0.87807572, 0.38444467, 0.84255332, 0.69666159,  0.43339905, 0.91927538, 0.58666126, 0.83215206],
[0.91359442, 0.06356911, 0.37205853, 0.18242315, 0.37961342, 0.93335263,  0.34068447, 0.48598708, 0.24260729, 0.70004846],
[0.75245372, 0.64147803, 0.84013461, 0.6152693,  0.02235612, 0.4492574,  0.55206705, 0.69409179, 0.1666939,  0.67387225],
[0.30664677, 0.87559232, 0.07164895, 0.85516997, 0.77945438, 0.51948711,  0.18721151, 0.7690967 , 0.53605078, 0.55431431],
[0.1750064,  0.95009262, 0.57121048, 0.87359026, 0.05715099, 0.43202169,  0.3648696,  0.24367817, 0.06807447, 0.46999578],
[0.41121198, 0.10125657, 0.0869751,  0.91816382, 0.01738795, 0.19420588,  0.00127754, 0.19281699, 0.56083174, 0.55424236],
[0.34467108, 0.18352578, 0.69203741, 0.48087863, 0.39596428, 0.28107969,  0.09727506, 0.11236618, 0.82687268, 0.22700161],
[0.92788092, 0.87184167, 0.72492497, 0.94086364, 0.86998108, 0.35178978,  0.45463869, 0.0242793,  0.75607483, 0.21317889],
[0.15680697, 0.13109825, 0.93463861, 0.78143659, 0.30680001, 0.67935342,  0.3583568,  0.7522564,  0.19810852, 0.22378965]])


w = torch.tensor([0.49039597, 0.73424538, 0.08249155, 0.0488797,  0.62525918, 0.29331343, 0.76435348, 0.68825002, 0.53465669, 0.3399619])

print(A)

print(w)

tensor([[0.6915, 0.8693, 0.8858, 0.8271, 0.9475, 0.5477, 0.5182, 0.2791, 0.9521,
         0.2275],
        [0.3456, 0.1617, 0.8781, 0.3844, 0.8426, 0.6967, 0.4334, 0.9193, 0.5867,
         0.8322],
        [0.9136, 0.0636, 0.3721, 0.1824, 0.3796, 0.9334, 0.3407, 0.4860, 0.2426,
         0.7000],
        [0.7525, 0.6415, 0.8401, 0.6153, 0.0224, 0.4493, 0.5521, 0.6941, 0.1667,
         0.6739],
        [0.3066, 0.8756, 0.0716, 0.8552, 0.7795, 0.5195, 0.1872, 0.7691, 0.5361,
         0.5543],
        [0.1750, 0.9501, 0.5712, 0.8736, 0.0572, 0.4320, 0.3649, 0.2437, 0.0681,
         0.4700],
        [0.4112, 0.1013, 0.0870, 0.9182, 0.0174, 0.1942, 0.0013, 0.1928, 0.5608,
         0.5542],
        [0.3447, 0.1835, 0.6920, 0.4809, 0.3960, 0.2811, 0.0973, 0.1124, 0.8269,
         0.2270],
        [0.9279, 0.8718, 0.7249, 0.9409, 0.8700, 0.3518, 0.4546, 0.0243, 0.7561,
         0.2132],
        [0.1568, 0.1311, 0.9346, 0.7814, 0.3068, 0.6794, 0.3584, 0.7523, 0.1981,
         0.2238]])
tensor([0

1. What is the row index in A that has the largest last element?

In [12]:
row_index_with_largest_last_element = torch.argmax(A[:,-1])
print(row_index_with_largest_last_element)

tensor(1)


2. What is the row index in A that has the second largest last element?

In [13]:
# inefficient way
sorted_indices = torch.argsort(A[:,-1])
print(sorted_indices[-2])

tensor(2)


3. What is the row index in A that has a row sum greater than 5?

In [14]:
# note: several indices!
row_with_sum_greater_5 = torch.where(torch.sum(A, dim=1) > 5)[0]
print(row_with_sum_greater_5)

tensor([0, 1, 3, 4, 8])


4. What is the sum of all elements of the form A\[i, i+1] in A?

In [15]:
assert torch.isclose(A[0,1], torch.diag(A, diagonal=1)[0])
print(A.diag(diagonal=1).sum())

tensor(4.0694)


5. Multiply elementwise every row in matrix A by the vector w \[0.49039597 0.73424538 0.08249155 0.0488797 0.62525918 0.29331343 0.76435348 0.68825002 0.53465669 0.3399619], and then sum the results for every row

In [16]:
result1 = A[:,:] * w[None,:]
print(result1)
result2 = result1.sum(axis=1)
print(result2)

tensor([[0.3391, 0.6383, 0.0731, 0.0404, 0.5925, 0.1606, 0.3961, 0.1921, 0.5091,
         0.0773],
        [0.1695, 0.1187, 0.0724, 0.0188, 0.5268, 0.2043, 0.3313, 0.6327, 0.3137,
         0.2829],
        [0.4480, 0.0467, 0.0307, 0.0089, 0.2374, 0.2738, 0.2604, 0.3345, 0.1297,
         0.2380],
        [0.3690, 0.4710, 0.0693, 0.0301, 0.0140, 0.1318, 0.4220, 0.4777, 0.0891,
         0.2291],
        [0.1504, 0.6429, 0.0059, 0.0418, 0.4874, 0.1524, 0.1431, 0.5293, 0.2866,
         0.1884],
        [0.0858, 0.6976, 0.0471, 0.0427, 0.0357, 0.1267, 0.2789, 0.1677, 0.0364,
         0.1598],
        [0.2017, 0.0743, 0.0072, 0.0449, 0.0109, 0.0570, 0.0010, 0.1327, 0.2999,
         0.1884],
        [0.1690, 0.1348, 0.0571, 0.0235, 0.2476, 0.0824, 0.0744, 0.0773, 0.4421,
         0.0772],
        [0.4550, 0.6401, 0.0598, 0.0460, 0.5440, 0.1032, 0.3475, 0.0167, 0.4042,
         0.0725],
        [0.0769, 0.0963, 0.0771, 0.0382, 0.1918, 0.1993, 0.2739, 0.5177, 0.1059,
         0.0761]])
tensor([3

6. Do the above task in 5. using matrix multiplication

In [17]:
print(A @ w)

tensor([3.0185, 2.6711, 2.0080, 2.3030, 2.6282, 1.6785, 1.0178, 1.3853, 2.6890,
        1.6532])


7. Set all elements of A that are larger than 0.5 to 0. What is the sum of the resulting matrix?

In [18]:
newA = A.detach().clone()
newA[newA > 0.5] = 0
print(newA.sum())

tensor(12.4601)


8. Subtract 1 from all elements of A that are smaller than 0.5. What is the sum of the resulting matrix?

In [19]:
newA = A.detach().clone()
newA[newA < 0.5] -= 1
print(newA.sum())

tensor(-1.1536)


9. What is the sum of all elements in A, that are smaller than their column's mean?

In [20]:
column_mean = A.mean(axis=0)
indices_smaller = torch.where(A[:,:] < column_mean[None,:])
print(A[indices_smaller].sum())

tensor(10.7386)


10. Create a diagonal matrix B of size 10*10, with B\[i,i] = i

In [21]:
B = torch.arange(10).diag()
print(B)

tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 2, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 3, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 4, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 5, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 6, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 7, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 8, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 9]])
