<a href="https://colab.research.google.com/github/MichalSlowakiewicz/Statistical-Data-Analysis-2/blob/master/Kopia_notatnika_SAD2_Lab1_torch_distributions_student_version.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **SAD2 lab 1**

*Today's lab is based on the lab scenario by Kazik Oksza-Orzechowski.*


# Introduction to PyTorch

In today's lab, we will familiarize ourselves with [PyTorch](https://pytorch.org/docs/stable/index.html). At the beginning of the semester we will mainly use PyTorch [Distributions](https://pytorch.org/docs/stable/distributions.html), however being familiar with PyTorch will be important later in the semester for the classes about Variational Autoencoders.

First let's follow quick tutorial based on [this source](https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/tutorial2/Introduction_to_PyTorch.html).

## Basics

We will start with reviewing the basic concepts of PyTorch. As a prerequisite, we recommend to be familiar with the `numpy` package as most machine learning frameworks are based on very similar concepts. If you are not familiar with `numpy` yet, don't worry: [here](https://numpy.org/doc/stable/user/quickstart.html) is a tutorial to go through.

Let's start with importing PyTorch. The package is called `torch`, based on its original framework [Torch](http://torch.ch/). As a first step, we can check its version:

In [None]:
import torch

print("Using torch", torch.__version__)

Using torch 2.8.0+cu126


As in every machine learning framework, PyTorch provides functions that are stochastic like generating random numbers. However, a very good practice is to setup your code to be reproducible with the exact same random numbers. This is why we set a seed below. As everyone knows, 42 is the [best seed](https://medium.com/geekculture/the-story-behind-random-seed-42-in-machine-learning-b838c4ac290a).

In [None]:
torch.manual_seed(42) # Setting the seed

<torch._C.Generator at 0x7a471c834d70>

For custom operators, you might need to set python seed as well:


In [None]:
import random
random.seed(42)

If you or any of the libraries you are using rely on NumPy, you can seed the global NumPy RNG with:

In [None]:
import numpy as np
np.random.seed(42)

## Tensors



Let’s first start by looking at different ways of creating a tensor. There are many possible options, the simplest one is to call `torch.Tensor` passing the desired shape as input argument:

In [None]:
x = torch.Tensor(2, 3, 4)
print(x)

tensor([[[1.6272e+01, 4.3865e-41, 1.6272e+01, 4.3865e-41],
         [2.0930e-07, 0.0000e+00, 2.2230e-20, 4.3865e-41],
         [2.0930e-07, 0.0000e+00, 9.1477e-41, 0.0000e+00]],

        [[1.3452e-43, 0.0000e+00, 3.5873e-43, 0.0000e+00],
         [2.3526e-20, 4.3865e-41, 0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00]]])


The function `torch.Tensor` allocates memory for the desired tensor, but reuses any values that have already been in the memory. To directly assign values to the tensor during initialization, there are many alternatives including:

* `torch.zeros`: Creates a tensor filled with zeros
* `torch.ones`: Creates a tensor filled with ones
* `torch.rand`: Creates a tensor with random values uniformly sampled between 0 and 1
* `torch.randn`: Creates a tensor with random values sampled from a normal distribution with mean 0 and variance 1
* `torch.arange`: Creates a tensor containing the values $N,N+1,N+2,...,M$
* `torch.Tensor` (input list): Creates a tensor from the list elements you provide

In [None]:
# Create a tensor from a (nested) list
x = torch.Tensor([[1, 2], [3, 4]])
print(x)

tensor([[1., 2.],
        [3., 4.]])


In [None]:
x = torch.zeros(1, 2, 13)
print(x)

tensor([[[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]])


In [None]:
x = torch.ones(3, 1, 1)
print(x)

tensor([[[1.]],

        [[1.]],

        [[1.]]])


In [None]:
x = torch.rand(2, 3, 4)
print(x)

tensor([[[0.1053, 0.2695, 0.3588, 0.1994],
         [0.5472, 0.0062, 0.9516, 0.0753],
         [0.8860, 0.5832, 0.3376, 0.8090]],

        [[0.5779, 0.9040, 0.5547, 0.3423],
         [0.6343, 0.3644, 0.7104, 0.9464],
         [0.7890, 0.2814, 0.7886, 0.5895]]])


In [None]:
x = torch.randn(3, 3, 3)
print(x)

tensor([[[-0.9138, -0.6581,  0.0780],
         [ 0.5258, -0.4880,  1.1914],
         [-0.8140, -0.7360, -1.4032]],

        [[ 0.0360, -0.0635, -0.1423],
         [ 0.1971, -1.1441,  0.3383],
         [ 1.6992,  0.0109, -0.3387]],

        [[-1.3407,  0.4584, -0.5644],
         [ 1.0563, -1.4692,  1.4332],
         [ 0.7440, -0.4816, -1.0495]]])


In [None]:
x = torch.arange(1, 4)
print(x)

tensor([1, 2, 3])


In [None]:
# Create a tensor with random values between 0 and 1 with the shape [2, 3, 4]
x = torch.rand(2, 3, 4)
print(x)

tensor([[[0.8823, 0.9150, 0.3829, 0.9593],
         [0.3904, 0.6009, 0.2566, 0.7936],
         [0.9408, 0.1332, 0.9346, 0.5936]],

        [[0.8694, 0.5677, 0.7411, 0.4294],
         [0.8854, 0.5739, 0.2666, 0.6274],
         [0.2696, 0.4414, 0.2969, 0.8317]]])


In [None]:
x = torch.Tensor([[[1, 2], [3, 4]],[[1, 2], [3, 4]]])
print(x)

tensor([[[1., 2.],
         [3., 4.]],

        [[1., 2.],
         [3., 4.]]])


You can obtain the shape of a tensor in the same way as in `numpy` (`x.shape`), or using the `.size` method:

In [None]:
shape = x.shape
print("Shape:", x.shape)

size = x.size()
print("Size:", size)

dim1, dim2, dim3 = x.size()
print("Size:", dim1, dim2, dim3)

Shape: torch.Size([2, 3, 4])
Size: torch.Size([2, 3, 4])
Size: 2 3 4


## Tensors and NumPy

Tensors can be converted to numpy arrays, and numpy arrays back to tensors. To transform a numpy array into a tensor, we can use the function `torch.from_numpy`:

In [None]:
import numpy as np

np_arr = np.array([[1, 2], [3, 4]])
tensor = torch.from_numpy(np_arr)

print("Numpy array:", np_arr)
print("PyTorch tensor:", tensor)

Numpy array: [[1 2]
 [3 4]]
PyTorch tensor: tensor([[1, 2],
        [3, 4]])


To transform a PyTorch tensor back to a numpy array, we can use the function `.numpy()` on tensors:

In [None]:
tensor = torch.arange(4)
np_arr = tensor.numpy()

print("PyTorch tensor:", tensor)
print("Numpy array:", np_arr)

PyTorch tensor: tensor([0, 1, 2, 3])
Numpy array: [0 1 2 3]


The conversion of tensors to numpy require the tensor to be on the CPU, and not the GPU (more on GPU support in a later section). In case you have a tensor on GPU, you need to call `.cpu()` on the tensor beforehand. Hence, you get a line like `np_arr = tensor.cpu().numpy()`.

## Operations on tensors

Most operations that exist in numpy, also exist in PyTorch. A full list of operations can be found in the [PyTorch documentation](https://pytorch.org/docs/stable/tensors.html#), but we will review the most important ones here.

The simplest operation is to add two tensors:

In [None]:
x1 = torch.rand(2, 3)
x2 = torch.rand(2, 3)
y = x1 + x2

print("X1", x1)
print("X2", x2)
print("Y", y)

X1 tensor([[0.1371, 0.5117, 0.1585],
        [0.0758, 0.2247, 0.0624]])
X2 tensor([[0.1816, 0.9998, 0.5944],
        [0.6541, 0.0337, 0.1716]])
Y tensor([[0.3187, 1.5115, 0.7529],
        [0.7299, 0.2583, 0.2340]])


Calling `x1 + x2` creates a new tensor containing the sum of the two inputs. However, we can also use in-place operations that are applied directly on the memory of a tensor. We therefore change the values of `x2` without the chance to re-accessing the values of `x2` before the operation. An example is shown below:

In [None]:
x1 = torch.rand(2, 3)
x2 = torch.rand(2, 3)
print("X1 (before)", x1)
print("X2 (before)", x2)

x2.add_(x1)
print("X1 (after)", x1)
print("X2 (after)", x2)

X1 (before) tensor([[0.3336, 0.5782, 0.0600],
        [0.2846, 0.2007, 0.5014]])
X2 (before) tensor([[0.3139, 0.4654, 0.1612],
        [0.1568, 0.2083, 0.3289]])
X1 (after) tensor([[0.3336, 0.5782, 0.0600],
        [0.2846, 0.2007, 0.5014]])
X2 (after) tensor([[0.6475, 1.0435, 0.2212],
        [0.4414, 0.4090, 0.8302]])


In-place operations are usually marked with a underscore postfix (e.g. `add_` instead of `add`).

Another common operation aims at changing the shape of a tensor. A tensor of size $(2,3)$ can be re-organized to any other shape with the same number of elements (e.g. a tensor of size $(6)$, or $(3,2)$, etc.). In PyTorch, this operation is called `view`:

In [None]:
x = x1.view(2,3)
print("X", x)

X tensor([[0.3336, 0.5782, 0.0600],
        [0.2846, 0.2007, 0.5014]])


Other commonly used operations include matrix multiplications, which are essential for neural networks. Quite often, we have an input vector $\mathbf{x}$, which is transformed using a learned weight matrix $\mathbf{W}$. There are multiple ways and functions to perform matrix multiplication, some of which we list below:

* `torch.matmul`: Performs the matrix product over two tensors, where the specific behavior depends on the dimensions. If both inputs are matrices (2-dimensional tensors), it performs the standard matrix product. For higher dimensional inputs, the function supports broadcasting (for details see the [documentation](https://pytorch.org/docs/stable/generated/torch.matmul.html?highlight=matmul#torch.matmul)). Can also be written as `a @ b`, similar to numpy.
* `torch.mm`: Performs the matrix product over two matrices, but doesn't support broadcasting (see [documentation](https://pytorch.org/docs/stable/generated/torch.mm.html?highlight=torch%20mm#torch.mm))
* `torch.bmm`: Performs the matrix product with a support batch dimension. If the first tensor $T$ is of shape ($b\times n\times m$), and the second tensor $R$ ($b\times m\times p$), the output $O$ is of shape ($b\times n\times p$), and has been calculated by performing $b$ matrix multiplications of the submatrices of $T$ and $R$: $O_i = T_i @ R_i$
* `torch.einsum`: Performs matrix multiplications and more (i.e. sums of products) using the Einstein summation convention. Explanation of the Einstein sum can be found in assignment 1.

Usually, we use `torch.matmul` or `torch.bmm`. We can try a matrix multiplication with `torch.matmul` below.

In [None]:
x1 = torch.rand(5, 2, 3)
x2 = torch.rand(5, 3, 2)
print("X1", x1)
print("X2", x2)

y = torch.bmm(x1,x2)

print("Y", y)

X1 tensor([[[0.1054, 0.9192, 0.4008],
         [0.9302, 0.6558, 0.0766]],

        [[0.8460, 0.3624, 0.3083],
         [0.0850, 0.0029, 0.6431]],

        [[0.3908, 0.6947, 0.0897],
         [0.8712, 0.1330, 0.4137]],

        [[0.6044, 0.7581, 0.9037],
         [0.9555, 0.1035, 0.6258]],

        [[0.2849, 0.4452, 0.1258],
         [0.9554, 0.1330, 0.7672]]])
X2 tensor([[[0.6757, 0.6625],
         [0.2297, 0.9545],
         [0.6099, 0.5643]],

        [[0.0594, 0.7099],
         [0.4250, 0.2709],
         [0.9295, 0.6115]],

        [[0.2234, 0.2469],
         [0.4761, 0.7792],
         [0.3722, 0.2147]],

        [[0.3288, 0.1265],
         [0.6783, 0.8870],
         [0.0293, 0.6161]],

        [[0.7583, 0.5907],
         [0.3219, 0.7610],
         [0.7628, 0.6870]]])
Y tensor([[[0.5267, 1.1733],
         [0.8259, 1.2854]],

        [[0.4908, 0.8873],
         [0.6040, 0.4543]],

        [[0.4514, 0.6570],
         [0.4119, 0.4076]],

        [[0.7394, 1.3057],
         [0.4027, 0.59

In [None]:
x1 = torch.rand(5, 1, 3)
x2 = torch.rand(5, 3, 10)
print("X1", x1)
print("X2", x2)

y = torch.bmm(x1,x2)

print("Y", y)

X1 tensor([[[0.7401, 0.9208, 0.7619]],

        [[0.6265, 0.4951, 0.1197]],

        [[0.0716, 0.0323, 0.7047]],

        [[0.2545, 0.3994, 0.2122]],

        [[0.4089, 0.1481, 0.1733]]])
X2 tensor([[[0.6659, 0.3514, 0.8087, 0.3396, 0.1332, 0.4118, 0.2576, 0.3470,
          0.0240, 0.7797],
         [0.1519, 0.7513, 0.7269, 0.8572, 0.1165, 0.8596, 0.2636, 0.6855,
          0.9696, 0.4295],
         [0.4961, 0.3849, 0.0825, 0.7400, 0.0036, 0.8104, 0.8741, 0.9729,
          0.3821, 0.0892]],

        [[0.6124, 0.7762, 0.0023, 0.3865, 0.2003, 0.4563, 0.2539, 0.2956,
          0.3413, 0.0248],
         [0.9103, 0.9192, 0.4216, 0.4431, 0.2959, 0.0485, 0.0134, 0.6858,
          0.2255, 0.1786],
         [0.4610, 0.3335, 0.3382, 0.5161, 0.3939, 0.3278, 0.2606, 0.0931,
          0.9193, 0.2999]],

        [[0.6325, 0.3265, 0.5406, 0.9662, 0.7304, 0.0667, 0.6985, 0.9746,
          0.6315, 0.8352],
         [0.9929, 0.4234, 0.6038, 0.1525, 0.3970, 0.8703, 0.7563, 0.1836,
          0.0991, 0.1583

In [None]:
tensor1 = torch.randn(10, 3, 4)
tensor2 = torch.randn(4, 5)
print('X1', tensor1)
print('X2', tensor2)

y = torch.matmul(tensor1, tensor2)
print('Y', y)

X1 tensor([[[ 0.9770, -0.3444, -0.1889,  0.0417],
         [-0.3357, -1.2594, -0.2131,  0.3444],
         [-0.0357, -0.8881, -0.5891,  0.1307]],

        [[ 1.7127,  0.6464,  0.1379,  0.5234],
         [ 1.2479,  0.0929, -0.7844,  0.0350],
         [ 0.8422, -0.2108,  0.8012,  0.0169]],

        [[ 0.0330, -1.2598, -0.7298,  1.2975],
         [-0.0965,  1.3945, -1.3005, -0.7347],
         [-0.1303,  1.7551,  0.0675, -0.3978]],

        [[ 0.7583, -0.5347, -0.1458,  0.9213],
         [-0.3893,  0.6138, -0.2786,  0.5885],
         [ 0.7091, -0.2645, -2.9836, -0.4146]],

        [[ 0.4463, -0.5218,  0.8302, -0.0510],
         [ 1.4310,  0.3673, -0.0192, -1.0667],
         [-0.4834,  0.5600, -1.0602, -1.4201]],

        [[-0.5559,  1.6862,  0.9885,  1.3676],
         [ 0.1919,  1.1600, -0.0088,  0.8505],
         [-0.8496, -1.4020,  0.1723, -0.2206]],

        [[ 1.2056,  1.3690, -0.6950,  1.4324],
         [ 0.2276, -1.1287,  1.1242, -0.2815],
         [ 0.1488, -0.2378, -0.1014,  0.9077]

## Indexing

We often have the situation where we need to select a part of a tensor. Indexing works just like in numpy, so let's try it:

In [None]:
x = torch.arange(12).view(3, 4)
print("X: ", x)

print("Second column")
print(x[:, 1])

print("First row")
print(x[0])

print("First two rows, last column")
print(x[:2, -1])

print("Middle two rows")
print(x[1:3, :])

X:  tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
Second column
tensor([1, 5, 9])
First row
tensor([0, 1, 2, 3])
First two rows, last column
tensor([3, 7])
Middle two rows
tensor([[ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])


# PyTorch Distributions

Source: https://pytorch.org/docs/stable/distributions.html

The `distributions` package contains parameterizable probability distributions and sampling functions. This allows the construction of stochastic computation graphs and stochastic gradient estimators for optimization. This package generally follows the design of the [TensorFlow Distributions](https://arxiv.org/abs/1711.10604) package.

During this course, we focus on probabilistic modelling and will be working with various probabilistic distributions. Thus we need to be able to:

- efficiently sample from the distribution,

- efficiently compute pointwise probability density,

- differentiate probability density funtion with respect to the parameters.

Those conditions are generally met by the distributions available in PyTorch Distributions.


## Exercise 1: Sampling from Multivariate Normal distribution

1. Import `torch`, print the version you are using.

2. Set a seed, you can use `set_seed` function below.

3. Define a multivariate normal distribution with $\mu = [1, 2]$ and $\Sigma = \begin{bmatrix} 1 & 0 \\ 0 & 2\end{bmatrix}$. Use `torch.distributions.MultivariateNormal`.

4. Sample $n = 100$ i.i.d. observations from the defined distribution using `sample` method (Hint: `sample_shape = torch.Size([n])`).

5. Compute the probability and log-probability of the sampled batch obtained in 4 under the distribution from 3 (see: `log_prob` method).

6. Explain **why do we often work with log-probs not probs?**

7. How is log-probability of density function  connected to the likelihood? What is the difference?

In [None]:
def set_seed(seed):
    #####
    # TODO
    #####
    torch.manual_seed(seed)
    np.random.seed(seed)
    random.seed(seed)
    print(f"Seed set to {seed}")

In [None]:
### PUT YOUR CODE HERE
import torch
print(torch.__version__)

set_seed(42)
n = 100
distribution = torch.distributions.MultivariateNormal(loc = torch.Tensor([1,2]), covariance_matrix = torch.Tensor([[1,0],[0,2]]))
sample = distribution.sample(sample_shape = torch.Size([n]))

log_prob = distribution.log_prob(sample)
print('log_probability', log_prob)
prob = np.exp(log_prob)
print('probability', prob)

joint_log_prob = 0
for i in range(n):
  joint_log_prob += log_prob[i]
print(joint_log_prob)

joint_prob = 1
for i in range(n):
  joint_prob = joint_prob*prob[i]
print(joint_prob)


#ładniej
print(log_prob.sum())
print(log_prob.sum().exp())



2.8.0+cu126
Seed set to 42
log_probability tensor([-5.1470, -4.8067, -3.1766, -3.4729, -3.8264, -3.2465, -2.6058, -2.7707,
        -3.5458, -2.4048, -3.0532, -3.9173, -3.8429, -3.2615, -2.2121, -2.5858,
        -3.5226, -3.6841, -2.3254, -2.5312, -3.8931, -2.7521, -5.2493, -3.0657,
        -2.8186, -2.3257, -3.0132, -2.7866, -3.1697, -2.4147, -3.8905, -3.8431,
        -3.5954, -4.7812, -2.2640, -3.5572, -2.3365, -2.2769, -2.5557, -4.9004,
        -2.2419, -3.2545, -2.4658, -2.8370, -2.5771, -2.9175, -4.0102, -3.1921,
        -5.4524, -2.4926, -2.5598, -2.8549, -2.4849, -2.2823, -3.0300, -2.2305,
        -3.2512, -2.4777, -2.4523, -2.3995, -2.2135, -2.4853, -2.8417, -2.5283,
        -4.5611, -3.8544, -3.5310, -2.5647, -2.8361, -4.2431, -2.7885, -2.6230,
        -3.1168, -4.5475, -2.2429, -2.4584, -2.7134, -3.5806, -5.3471, -3.2290,
        -4.0166, -2.2446, -2.4927, -2.7981, -4.3831, -3.0164, -3.6671, -4.5386,
        -2.1924, -2.5305, -3.8843, -2.9662, -2.2874, -2.6473, -2.7688, -2.404

  prob = np.exp(log_prob)


Shapes of distributions may be confusing. A perfect guide to make your life easier: https://bochang.me/blog/posts/pytorch-distributions/

Additionally: https://stats.stackexchange.com/questions/2641/what-is-the-difference-between-likelihood-and-probability
