# Introduction to PyTorch

To extend the SGDClassifier from the previous notebook, there will be a jumpy from sklearn to PyTorch library.

## Instructions for All Labs
* Read each cell and implement the TODOs sequentially. The markdown/text cells also contain instructions which you need to follow to get the whole notebook working.
* Do not change the variable names unless the instructor allows you to.
* Some markdown cells contain questions.
  * For questions <span style="color:red;">colored in red</span>, you must submit your answers in the corresponding Assignment in the course page. Make sure that you enter your responses in the item with the matching question code. Answers that do not follow the prescribed format will automatically be marked wrong by the checker.
  * For questions <span style="color:green;">colored in green</span>, you don't have to submit your answers, but you must think about these questions as they will help enrich your understanding of the concepts covered in the labs.
* You are expected to search how to some functions work on the Internet or via the docs. 
* You may add new cells for "scrap work".
* The notebooks will undergo a "Restart and Run All" command, so make sure that your code is working properly.
* You may not reproduce this notebook or share them to anyone.

This notebook provides a short introduction to PyTorch, for a more in-depth introduction, refer to the official PyTorch [website](https://pytorch.org/tutorials/). 

PyTorch is an open-source machine learning library developed by Facebook, its primary usage is for implementing neural networks. But this does not mean that it is exclusive to neural networks. It can be used for general-purpose scientific computing, which is beyond the scope of this notebook.

You have to install PyTorch,

```shell
pip install torch
```

To take advantage of a GPU, it is recommended to upload this notebook to [Google Colab](https://colab.research.google.com/), but please do not use any code-generative features of the platform so you can actually learn the materials and concepts. [Lightning AI](https://lightning.ai/) is a nice alternative, but it usually requires account verification (within 24-48 hours). 

Note that Colab may have a PyTorch installation readily available.

## PyTorch Basics

We start with reviewing the basic concepts of PyTorch. Familiarity with `numpy` is recommended as it shares similar concepts with PyTorch. If you are not familiar with `numpy`, you may refer to their [official guide](https://numpy.org/devdocs/user/quickstart.html). 

Let's check the version of the package we have installed.

In [1]:
import torch


torch.__version__

'2.6.0+cpu'

The `+cu124` indicates that the version installed is capable of using the Nvidia GPU in my machine.

To control the stochasticity of pseudorandom number generator, we should set the seed for the environment. For completeness sake, we can also set the seed for other libraries as some functions under the hood might be using them as well. There are also other functions to set the determinism of pseudorandom number generator in PyTorch.

In [2]:
import random  # Python built-in library
import numpy as np

np.__version__  # for sanity checking

seed = 73
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)

torch.use_deterministic_algorithms(True)
torch.backends.cudnn.deterministic = True
# benchmarking results to the library 
# computing the best algorithm to use for your hardware
torch.backends.cudnn.benchmark = False  

To initialize a tensor (a generalized mathematical object for structures, i.e. a scalar is a 0-d tensor, a vector is a 1-d tensor, and a matrix is a 2-d tensor), we can use the following function.

In [3]:
initial_values = torch.empty(5, 3)

print(initial_values)

tensor([[nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan]])


The above initialization does virtually the same thing as the following function,

In [4]:
initial_values = torch.rand(5, 3)

print(initial_values)

tensor([[0.5286, 0.1616, 0.8870],
        [0.6216, 0.0459, 0.3856],
        [0.2258, 0.7837, 0.2052],
        [0.1868, 0.9023, 0.9923],
        [0.4589, 0.7409, 0.4562]])


To check the size or shape of the tensor,

In [5]:
print(initial_values.size())
print(initial_values.shape)

torch.Size([5, 3])
torch.Size([5, 3])


To perform arithmetic operations in PyTorch, we may use the basic Python operations or the PyTorch-specific functions.

In [7]:
print(initial_values + 1)
print()
print(torch.add(initial_values, 1))

# try torch.sub(), torch.mul(), torch.div()
print(torch.sub(initial_values,1))
print(torch.mul(initial_values,1))
print(torch.div(initial_values,1))


tensor([[1.5286, 1.1616, 1.8870],
        [1.6216, 1.0459, 1.3856],
        [1.2258, 1.7837, 1.2052],
        [1.1868, 1.9023, 1.9923],
        [1.4589, 1.7409, 1.4562]])

tensor([[1.5286, 1.1616, 1.8870],
        [1.6216, 1.0459, 1.3856],
        [1.2258, 1.7837, 1.2052],
        [1.1868, 1.9023, 1.9923],
        [1.4589, 1.7409, 1.4562]])
tensor([[-0.4714, -0.8384, -0.1130],
        [-0.3784, -0.9541, -0.6144],
        [-0.7742, -0.2163, -0.7948],
        [-0.8132, -0.0977, -0.0077],
        [-0.5411, -0.2591, -0.5438]])
tensor([[0.5286, 0.1616, 0.8870],
        [0.6216, 0.0459, 0.3856],
        [0.2258, 0.7837, 0.2052],
        [0.1868, 0.9023, 0.9923],
        [0.4589, 0.7409, 0.4562]])
tensor([[0.5286, 0.1616, 0.8870],
        [0.6216, 0.0459, 0.3856],
        [0.2258, 0.7837, 0.2052],
        [0.1868, 0.9023, 0.9923],
        [0.4589, 0.7409, 0.4562]])


There's also an in-place operation where we use the function with underscore suffix.

In [8]:
print(initial_values.add(1))
print(initial_values)

print()

print(initial_values.add_(1))
print(initial_values)  # notice anything?

tensor([[1.5286, 1.1616, 1.8870],
        [1.6216, 1.0459, 1.3856],
        [1.2258, 1.7837, 1.2052],
        [1.1868, 1.9023, 1.9923],
        [1.4589, 1.7409, 1.4562]])
tensor([[0.5286, 0.1616, 0.8870],
        [0.6216, 0.0459, 0.3856],
        [0.2258, 0.7837, 0.2052],
        [0.1868, 0.9023, 0.9923],
        [0.4589, 0.7409, 0.4562]])

tensor([[1.5286, 1.1616, 1.8870],
        [1.6216, 1.0459, 1.3856],
        [1.2258, 1.7837, 1.2052],
        [1.1868, 1.9023, 1.9923],
        [1.4589, 1.7409, 1.4562]])
tensor([[1.5286, 1.1616, 1.8870],
        [1.6216, 1.0459, 1.3856],
        [1.2258, 1.7837, 1.2052],
        [1.1868, 1.9023, 1.9923],
        [1.4589, 1.7409, 1.4562]])


Slicing is the same as numpy,

In [9]:
print(initial_values[:, 0])
print(initial_values[:, 1])
print(initial_values[:, 2])
print(initial_values[0, :])

tensor([1.5286, 1.6216, 1.2258, 1.1868, 1.4589])
tensor([1.1616, 1.0459, 1.7837, 1.9023, 1.7409])
tensor([1.8870, 1.3856, 1.2052, 1.9923, 1.4562])
tensor([1.5286, 1.1616, 1.8870])


Initialize one-tensor or zero-tensor,

In [10]:
ones = torch.ones(5, 3)
zeros = torch.zeros(5, 3)

print(ones)
print(zeros)

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])


Perform matrix transpose,

In [11]:
# 5x3 matrix transposed is 3x5 matrix
# we transpose using the indices
print((initial_values.transpose(1, 0)))  

tensor([[1.5286, 1.6216, 1.2258, 1.1868, 1.4589],
        [1.1616, 1.0459, 1.7837, 1.9023, 1.7409],
        [1.8870, 1.3856, 1.2052, 1.9923, 1.4562]])


Use some transformation functions like logistic and softmax,

In [12]:
print(torch.nn.functional.sigmoid(initial_values))
print(torch.nn.functional.softmax(initial_values))

tensor([[0.8218, 0.7616, 0.8684],
        [0.8350, 0.7400, 0.7999],
        [0.7731, 0.8562, 0.7694],
        [0.7662, 0.8701, 0.8800],
        [0.8114, 0.8508, 0.8110]])
tensor([[0.3201, 0.2218, 0.4581],
        [0.4252, 0.2391, 0.3358],
        [0.2684, 0.4688, 0.2629],
        [0.1893, 0.3871, 0.4236],
        [0.3009, 0.3990, 0.3001]])


  print(torch.nn.functional.softmax(initial_values))


Recall the formulas being,

$\sigma(z) = \dfrac{1}{1 + \exp(-z)}$

for logistic function (sigmoid). 

And the following for softmax,

$\sigma(z) = \dfrac{\exp(z)}{\sum \exp(z)}$

In [13]:
print(
    1 / (1 + torch.exp(-initial_values))
)

print(
    torch.exp(initial_values) / torch.sum(torch.exp(initial_values), axis=1, keepdims=True)
)

tensor([[0.8218, 0.7616, 0.8684],
        [0.8350, 0.7400, 0.7999],
        [0.7731, 0.8562, 0.7694],
        [0.7662, 0.8701, 0.8800],
        [0.8114, 0.8508, 0.8110]])
tensor([[0.3201, 0.2218, 0.4581],
        [0.4252, 0.2391, 0.3358],
        [0.2684, 0.4688, 0.2629],
        [0.1893, 0.3871, 0.4236],
        [0.3009, 0.3990, 0.3001]])


However, performing softmax this way is numerically unstable. As the values become large, there is an overflow issue.

In [14]:
samples = torch.Tensor([10, 2, 10_000, 4])

# axis=0 since this is only a vector
print(
    torch.exp(samples) / (torch.sum(torch.exp(samples), axis=0, keepdims=True))
)

tensor([0., 0., nan, 0.])


How do we resolve this? Use max-value subtraction.

In [15]:
samples = torch.Tensor([10, 2, 10_000, 4])
samples = torch.exp(samples - torch.max(samples))

# axis=0 since this is only a vector
print(
    torch.exp(samples) / (torch.sum(torch.exp(samples), axis=0, keepdims=True))
)

tensor([0.1749, 0.1749, 0.4754, 0.1749])


To perform some linear algebra operations,

In [16]:
torch.matmul(
    initial_values, initial_values.transpose(1, 0)
)

tensor([[7.2467, 6.3083, 6.2200, 7.7833, 7.0001],
        [6.3083, 5.6433, 5.5233, 6.6746, 6.2042],
        [6.2200, 5.5233, 6.1368, 7.2490, 6.6486],
        [7.7833, 6.6746, 7.2490, 8.9964, 7.9442],
        [7.0001, 6.2042, 6.6486, 7.9442, 7.2795]])

In [17]:
## inner product
torch.dot(samples, (samples - 1))

tensor(0.)

In [18]:
## L2 norm
print(initial_values.norm(dim=1, p=2))
print(torch.linalg.norm(initial_values, dim=1, ord=2))
print(initial_values.pow(2).sum(dim=1).sqrt())

tensor([2.6920, 2.3756, 2.4773, 2.9994, 2.6981])
tensor([2.6920, 2.3756, 2.4773, 2.9994, 2.6981])
tensor([2.6920, 2.3756, 2.4773, 2.9994, 2.6981])


Stacking tensors,

In [19]:
a = torch.randn(4)
b = torch.randn(4)
stacked_tensors = torch.stack([a, b])
print(stacked_tensors)
print(stacked_tensors.shape)

tensor([[1.4051, 0.5739, 0.8014, 0.3398],
        [0.3198, 0.3518, 2.7384, 0.0552]])
torch.Size([2, 4])


Flatten a tensor, e.g. from 2x4 tensor to a 8x1 tensor

In [20]:
print(stacked_tensors.view(-1))
print(stacked_tensors.flatten())

tensor([1.4051, 0.5739, 0.8014, 0.3398, 0.3198, 0.3518, 2.7384, 0.0552])
tensor([1.4051, 0.5739, 0.8014, 0.3398, 0.3198, 0.3518, 2.7384, 0.0552])


Convert a numpy array to a tensor,

In [21]:
samples = np.random.randn(2, 10)
print(samples)

samples = torch.from_numpy(samples)
print(samples)

[[ 0.57681305  2.1311088   2.44021967  0.26332687 -1.49612065 -0.03673531
   0.43069579 -1.52947433 -0.73025968  1.05131524]
 [ 1.61979267 -1.60501337  0.33100953 -0.21095236  0.2981767  -1.14607352
   0.57536202 -0.36390663  0.03639919 -0.52056399]]
tensor([[ 0.5768,  2.1311,  2.4402,  0.2633, -1.4961, -0.0367,  0.4307, -1.5295,
         -0.7303,  1.0513],
        [ 1.6198, -1.6050,  0.3310, -0.2110,  0.2982, -1.1461,  0.5754, -0.3639,
          0.0364, -0.5206]], dtype=torch.float64)


`torch.from_numpy()` vs `torch.Tensor()`

* `torch.Tensor()` creates a new copy of the array, and automatically converts the array to float32
* `torch.from_numpy()` does not create a new copy, and preserves the dtype of the array

## Exercise

Given a list of three tensors, with each tensor representing a set of hypothetical model outputs, put them together so that each row would have all outputs for a given sample.

In [22]:
out_1 = torch.Tensor(
    [
        [0.2562, 0.1650, 0.0918, 0.0045, 0.0175, 0.1096, 0.0831, 0.2002, 0.0532, 0.0188],
        [0.0553, 0.1154, 0.0719, 0.0945, 0.0705, 0.2141, 0.0665, 0.0610, 0.2023, 0.0487]
    ]
)

out_2 = torch.Tensor(
    [
        [0.0942, 0.0448, 0.0929, 0.0316, 0.0296, 0.0272, 0.4189, 0.0804, 0.0988, 0.0816],
        [0.2645, 0.0424, 0.0199, 0.1344, 0.0226, 0.1131, 0.2144, 0.1160, 0.0421, 0.0306]
    ]
)

out_3 = torch.Tensor(
    [
        [0.1159, 0.1263, 0.1011, 0.0870, 0.1503, 0.0259, 0.0609, 0.1611, 0.0082, 0.1633],
        [0.1230, 0.0577, 0.3204, 0.0145, 0.0699, 0.0975, 0.0466, 0.1191, 0.0841, 0.0671]
    ]
)

In [None]:
# TO DO:
#  1. Simulate predicted probability distributions by passing them through softmax
#  2. Stack the tensors to get a tensor of shape 2x3x10 (`torch.Size([2, 3, 10])`)

import torch.nn.functional as F

#applying softmax to each tensor
out_1_softmax = F.softmax(out_1, dim=-1)
out_2_softmax = F.softmax(out_2, dim=-1)
out_3_softmax = F.softmax(out_3, dim=-1)

#stacking 
stacked_tensor = torch.stack([out_1_softmax, out_2_softmax, out_3_softmax], dim=1)

print(stacked_tensor.shape)
print(stacked_tensor)


torch.Size([2, 3, 10])
tensor([[[0.1165, 0.1064, 0.0989, 0.0906, 0.0918, 0.1006, 0.0980, 0.1102,
          0.0951, 0.0919],
         [0.0988, 0.0940, 0.0986, 0.0928, 0.0926, 0.0924, 0.1367, 0.0974,
          0.0992, 0.0975],
         [0.1015, 0.1025, 0.1000, 0.0986, 0.1050, 0.0927, 0.0960, 0.1062,
          0.0911, 0.1064]],

        [[0.0955, 0.1014, 0.0971, 0.0993, 0.0969, 0.1119, 0.0965, 0.0960,
          0.1106, 0.0948],
         [0.1175, 0.0941, 0.0920, 0.1032, 0.0922, 0.1010, 0.1117, 0.1013,
          0.0941, 0.0930],
         [0.1020, 0.0955, 0.1242, 0.0915, 0.0967, 0.0994, 0.0945, 0.1016,
          0.0981, 0.0964]]])


<span style="color:red;">**Question 6-1:** What are the probability values for the first instance (first row) by the the third hypothetical model?</span>

In [27]:
probability_value = out_3_softmax[0]
print(probability_value)

tensor([0.1015, 0.1025, 0.1000, 0.0986, 0.1050, 0.0927, 0.0960, 0.1062, 0.0911,
        0.1064])


**Answer**:  [0.1015, 0.1025, 0.1000, 0.0986, 0.1050, 0.0927, 0.0960, 0.1062, 0.0911,
        0.1064]

<span style="color:red;">**Question 6-1:** What are the probability values for the second instance (second row) by the the second hypothetical model?</span>

In [25]:
probability_value = out_2_softmax[1]
print(probability_value)

tensor([0.1175, 0.0941, 0.0920, 0.1032, 0.0922, 0.1010, 0.1117, 0.1013, 0.0941,
        0.0930])


**Answer**: [0.1175, 0.0941, 0.0920, 0.1032, 0.0922, 0.1010, 0.1117, 0.1013, 0.0941,
        0.0930]

<span style="color:red;">**Question 6-1:** What are the probability values for the first instance (first row) by the the first hypothetical model?</span>

In [26]:
probability_value = out_1_softmax[0]
print(probability_value)

tensor([0.1165, 0.1064, 0.0989, 0.0906, 0.0918, 0.1006, 0.0980, 0.1102, 0.0951,
        0.0919])


**Answer**: [0.1165, 0.1064, 0.0989, 0.0906, 0.0918, 0.1006, 0.0980, 0.1102, 0.0951,
        0.0919]

Other useful functions that you may explore,
* torch.nn.BCELoss is the binary cross entropy
* torch.nn.BCEWithLogitsLoss BCE without logistic function.
* torch.nn.CrossEntropyLoss is the softmax cross entropy loss
* torch.nn.MSELoss is the Mean Squared Error 
* torch.nn.NLLLoss is the negative log likelihood (log likelihood without softmax computation)