<img align="center" style="max-width: 1000px" src="https://raw.githubusercontent.com/HSG-AIML-Teaching/AI2024-Lab/main/banner.png?raw=1">

<img align="right" style="max-width: 200px; height: auto" src="https://raw.githubusercontent.com/HSG-AIML-Teaching/AI2024-Lab/main/hsg_logo.png?raw=1">

##  Lab 02 - "PyTorch and Tensor Processing"

Artificial Intelligence (Spring 2024), University of St. Gallen


In this lab, we will learn about tensors and some basics of tensor processing using PyTorch.

In [1]:
# Install required packages
# !pip install torch torchvision

### 1.0 Tensors

In [2]:
import torch

Tensors are a type of data structure used in linear algebra, and they are especially important in various fields of artificial intelligence, including deep learning. Essentially, tensors are a generalization of scalars, vectors, and matrices to higher dimensions.

Here's how you can think of different tensor dimensions:

A 0-dimensional tensor is a scalar (a single number).
A 1-dimensional tensor is a vector (a list of numbers).
A 2-dimensional tensor is a matrix (a 2D array of numbers).
A 3-dimensional tensor and higher are just extensions of these concepts to more dimensions.
Tensors are key in machine learning and deep learning because they allow for the efficient storage and manipulation of data sets that are essential for training models.

In PyTorch, a popular deep learning framework, tensors are used extensively. Let's see how you can create and manipulate tensors in PyTorch.

In [3]:
# A 0-dimensional tensor (scalar)
scalar = torch.tensor(5)
print("Scalar (0D tensor):", scalar)

# A 1-dimensional tensor (vector)
vector = torch.tensor([1, 2, 3])
print("Vector (1D tensor):", vector)

# A 2-dimensional tensor (matrix)
matrix = torch.tensor([[1, 2], 
                       [3, 4], 
                       [5, 6]])
print("Matrix (2D tensor):", matrix)

# A 3-dimensional tensor
tensor_3d = torch.tensor([
    [[1, 2], 
     [3, 4]], 
    [[5, 6], 
     [7, 8]]
])
print("3D Tensor:", tensor_3d)

# Show the shapes of each tensor
print("Shapes of tensors:", scalar.shape, vector.shape, matrix.shape, tensor_3d.shape)


Scalar (0D tensor): tensor(5)
Vector (1D tensor): tensor([1, 2, 3])
Matrix (2D tensor): tensor([[1, 2],
        [3, 4],
        [5, 6]])
3D Tensor: tensor([[[1, 2],
         [3, 4]],

        [[5, 6],
         [7, 8]]])
Shapes of tensors: torch.Size([]) torch.Size([3]) torch.Size([3, 2]) torch.Size([2, 2, 2])


### 2.0 A Comparison Between NumPy and PyTorch:

PyTorch and NumPy are both powerful libraries in Python, but they serve different purposes and have different strengths.

Array vs. Tensor: NumPy provides support for large, multi-dimensional arrays and matrices, while PyTorch provides multi-dimensional arrays called tensors with strong support for deep learning and computational graph dynamics.
Computational Graphs and Gradients: PyTorch tensors can be part of a computational graph, and they can keep track of the gradient - something NumPy arrays can't do.
GPU Support: PyTorch tensors can be moved to a GPU in order to perform massively parallel, fast computations. NumPy, on the other hand, operates only on the CPU.
API and Usage: While PyTorch and NumPy share many similar operations, their APIs differ, and PyTorch's is more aligned with deep learning workflows.

<style>
.beautifulTable {
    width: 100%;
    border-collapse: collapse;
    text-align: center;
}
.beautifulTable th, .beautifulTable td {
    padding: 10px;
    border: 1px solid #ddd;
}
.beautifulTable th {
    background-color: #4CAF50;
    color: white;
}
.beautifulTable tr:nth-child(even) {
    background-color: white;
    color: black;
}
.beautifulTable tr:hover {
    background-color: yellow;
    color: black;

}
</style>

<table class="beautifulTable">
    <tr>
        <th>Feature</th>
        <th>NumPy</th>
        <th>PyTorch</th>
    </tr>
    <tr>
        <td><strong>Core Data Structure</strong></td>
        <td>Multi-dimensional array (ndarray)</td>
        <td>Multi-dimensional array (Tensor)</td>
    </tr>
    <tr>
        <td><strong>Computational Backend</strong></td>
        <td>CPU only</td>
        <td>CPU and GPU</td>
    </tr>
    <tr>
        <td><strong>Dynamic Computational Graph</strong></td>
        <td>Not available</td>
        <td>Available (Dynamic computation graph for backpropagation)</td>
    </tr>
    <tr>
        <td><strong>Automatic Differentiation</strong></td>
        <td>Not available</td>
        <td>Available (through the autograd system)</td>
    </tr>
    <tr>
        <td><strong>Data Handling for AI</strong></td>
        <td>Basic array operations</td>
        <td>Comprehensive support for data loading, transformations, and batching for AI</td>
    </tr>
    <tr>
        <td><strong>Parallel Computing</strong></td>
        <td>Limited (through libraries like Dask)</td>
        <td>Extensive (native GPU support for parallel computing)</td>
    </tr>
    <tr>
        <td><strong>Memory Management</strong></td>
        <td>Manual control and efficiency</td>
        <td>Advanced (with in-place operations, shared memory)</td>
    </tr>
    <tr>
        <td><strong>Interoperability</strong></td>
        <td>Extensive (used as a base for many scientific computing in Python)</td>
        <td>Can convert between NumPy arrays and PyTorch tensors easily</td>
    </tr>
    <tr>
        <td><strong>API Consistency</strong></td>
        <td>Stable API, widely used in scientific computing</td>
        <td>Designed for deep learning, changes more frequently</td>
    </tr>
    <tr>
        <td><strong>Speed</strong></td>
        <td>Fast for array operations on CPU</td>
        <td>Faster for large-scale operations, especially on GPU</td>
    </tr>
    <tr>
        <td><strong>Community and Support</strong></td>
        <td>Very large user community, extensive documentation</td>
        <td>Growing rapidly, especially among researchers and AI practitioners</td>
    </tr>
</table>


### What features does PyTorch offer?

<img align="center" style="max-width: 700px" src="images/pytorch_packages.jpg">


+ `autograd`: This package is used for automatic differentiation. The autograd package is essential for training neural networks using backpropagation, as it allows users to easily compute gradients of the loss function with respect to the model parameters.

+ `nn`: This package provides a high-level API for building neural networks in PyTorch. It includes the most common types of layers such as convolutional layers, pooling layers, and linear layers, as well as activation functions and loss functions. The `nn` module also provides tools for building custom layers and models using PyTorch tensors.

+ `optim`: This package provides various optimization algorithms for training neural networks in PyTorch. It includes popular optimization methods such as Stochastic Gradient Descent (SGD), Adam, and Adagrad. The optim module also provides tools for customizing the learning rate and weight decay, as well as implementing learning rate schedulers.

+ `utils`: This package provides a variety of utility functions such as data loading and and visualization. For example, the `torch.utils.data` module contains classes and functions for loading and preprocessing data, and the `torch.utils.tensorboard` module provides support for visualizing training and validation metrics in via `TensorBoard`. 




### 3.0 Transferring tensors between compute devices

We can easily transfer a tensor to the target device by calling `.to(DEVICE_NAME)` on the tensor directly.

In [5]:
matrix = torch.tensor([[1, 2], 
                       [3, 4], 
                       [5, 6]])


print("Compute device - before: ", matrix.device)

# Transfer the tensor to the GPU
matrix = matrix.to("cuda") # CUDA-compatible GPU
# matrix = matrix.to("mps")   # MPS-compatible GPU - Macbooks with M-series chips

print("Compute device - after: ", matrix.device)

Compute device - before:  cpu
Compute device - after:  mps:0


### 4.0 Computational Graphs and Automatic Differentiation

#### What are computational graphs and why do we need them?

A computational graph in a directed acyclic graph (DAG) that represents the flow of information through the network. It consists of nodes that represent mathematical operations and edges that represent the flow of data between the nodes.


Let's assume we have a very simple function:

$$f(x) = w \times x + b$$

Here $x$ is the input and $w$ and $b$ are (learnable) parameters. We want to change $w$ and $b$ such that the output of the function gets as close as possible to a target output (ground-truth). We (randomly) initialize $w=0.2$ and $b=0.0$.

Now let's calculate $f(0.4)$ in PyTorch:

In [6]:
# Example input and output
x = torch.tensor(0.4)  # input tensor
y = torch.tensor(1.0)  # expected output

# Initialize w and b with random value (here we set them to 0.2 and 0.0)
w = torch.tensor(0.2, requires_grad=True) # requires_grad=True -> learnable parameter
b = torch.tensor(0.0, requires_grad=True) # requires_grad=True -> learnable parameter

# Calculate f(x)
z = w * x + b

print(z)

tensor(0.0800, grad_fn=<AddBackward0>)


Let's assume we want $f(0.4)=1.0$, but currently  $f(0.4)=0.08$:

$f(0.4) = 0.08$ $\color{red}{\neq}$ $\color{green}{f(0.4) = 1.0}$

To do so, we first mesaure the difference between the desired output and the actual output of the function and we call it the loss ($l$):

$$l = ||f(0.4) - 1.0||_{2}^{2}$$

Then, to estimate the amount of required change in $w$ and $b$ to get closer to the desired value, we need to compute the gradients of the loss w.r.t. the functions parameters:

$$\frac{\partial l}{\partial w}, \frac{\partial l}{\partial b}$$

And finally update $w$ an $b$ using gradient descent:
$$w_{new} \leftarrow w - \alpha  \frac{\partial l}{\partial w}$$
$$b_{new} \leftarrow b - \alpha \frac{\partial l}{\partial b}$$

In [7]:
loss = torch.norm(z - y, p=2)
print(loss)

tensor(0.9200, grad_fn=<LinalgVectorNormBackward0>)


Running the tensor operations above creates the following computational graphs that enables automatic differentiation.

<img align="center" style="max-width: 700px" src="images/comp-graph.png">

<sup> Image adapted from: <a href="https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html">https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html</a> <sup>

Technically, the computational graph of the function above is created dynamically or on-the-fly.

#### Computing gradients

Once the loss is computed and the computational graphs is formed (in the background), we can compute the gradients for the learnable parmeters. But first let's check what are the gradient values for the (learnable) parameters $w$ and $b$ before computing the gradients:

In [8]:
print(w.grad)
print(b.grad)

None
None


The easiest way to compute all gradients in a computational graphs is to call `.backward()` on the loss terms:

In [9]:
loss.backward()

Now, let's check the gradients again.

In [10]:
print(w.grad)
print(b.grad)

tensor(-0.4000)
tensor(-1.)


voila! the gradients are there. Remember that after calling `.backward()` the computational graph is removed for computational reasons. For most application you don't need to keep the computational graph, but there are ways to keep it which is outside the scope of this tutorial.

#### Another way to compute gradients

We can compute partial derrivatives w.r.t. particular parameters in the model directly

In [11]:
x = torch.tensor(0.4)  # input tensor
y = torch.tensor(1.0)  # expected output
w = torch.tensor(0.2, requires_grad=True) # requires_grad=True -> learnable parameter
b = torch.tensor(0.0, requires_grad=True) # requires_grad=True -> learnable parameter
z = x * w + b
loss = torch.norm(z - y, p=2)


In [12]:
torch.autograd.grad(loss, [w, b])

(tensor(-0.4000), tensor(-1.))

### 5.0 Slicing and Indexing 

Slicing, indexing and reshaping is done similar to NumPy:

In [13]:
tensor = torch.tensor([[1, 2], 
                       [3, 4], 
                       [5, 6]])

# Indexing and slicing
print("\nFirst row: ", tensor[0])
print("First column: ", tensor[:, 0])
print("Last column: ", tensor[:, -1])

# Reshaping
reshaped_tensor = tensor.view(3, 2)
print("\nReshaped Tensor:\n", reshaped_tensor)



First row:  tensor([1, 2])
First column:  tensor([1, 3, 5])
Last column:  tensor([2, 4, 6])

Reshaped Tensor:
 tensor([[1, 2],
        [3, 4],
        [5, 6]])


### 6.0 Tensor Operations

We can also transpose a 2D tensor and compute the sum of its elements by directly calling operators on the tensor:

In [14]:
# Matrix operations
transposed_tensor = tensor.t()
print("\nTransposed Tensor:\n", transposed_tensor)

# Reduction operations
sum_tensor = tensor.sum()
print("\nSum of elements:", sum_tensor)

# Element-wise operations
sin_tensor = torch.sin(tensor)
print("\nSine of Tensor:\n", sin_tensor)



Transposed Tensor:
 tensor([[1, 3, 5],
        [2, 4, 6]])

Sum of elements: tensor(21)

Sine of Tensor:
 tensor([[ 0.8415,  0.9093],
        [ 0.1411, -0.7568],
        [-0.9589, -0.2794]])


### 7.0 Concatenating and Splitting Tensors

For many applications we will need to concatenate two tensors or split an existing tensor into chunks:

In [15]:
# Concatenation
concatenated_tensor = torch.cat([tensor, tensor], dim=0)
print("\nConcatenated Tensor:\n", concatenated_tensor)

# Splitting
split_tensors = torch.split(tensor, split_size_or_sections=3, dim=0)
print("\nSplit Tensors:", split_tensors)



Concatenated Tensor:
 tensor([[1, 2],
        [3, 4],
        [5, 6],
        [1, 2],
        [3, 4],
        [5, 6]])

Split Tensors: (tensor([[1, 2],
        [3, 4],
        [5, 6]]),)


### 8.0 In-Place Operations and Operations Without Gradients

Usually, when we perform an operation on a tensor a new tensor is created to store the output of the operation. Some operations can be done "in-place" that directly change the value of the original tensor.

In [16]:
# In-place operations
print("\nOriginal tensor:", tensor)
tensor.add_(1)  # In-place addition
print("Tensor after in-place addition:", tensor)



Original tensor: tensor([[1, 2],
        [3, 4],
        [5, 6]])
Tensor after in-place addition: tensor([[2, 3],
        [4, 5],
        [6, 7]])


By default, PyTorch keeps track of all operations in the background for gradient computation purposes. We can also run operations without gradient tracking, for memory efficiency reasons, using the `torch.no_grad()` context manager:

In [17]:
# Using no_grad for memory efficiency
tensor_with_grad = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
with torch.no_grad():
    out = tensor_with_grad * 2
    print("Tensor operations without gradient tracking:\n", out)


Tensor operations without gradient tracking:
 tensor([2., 4., 6.])


In [18]:
out = tensor_with_grad * 2
print("Tensor operations with gradient tracking:\n", out)

Tensor operations with gradient tracking:
 tensor([2., 4., 6.], grad_fn=<MulBackward0>)
