# Introduction to PyTorch

[PyTorch](https://pytorch.org/) is an open source machine learning and deep learning framework based on the Torch library.

## Why PyTorch?

PyTorch is a popular starting point for deep learning research due to its flexibility and ease of use, compared to other frameworks like TensorFlow. It is often recommended to use TensorFlow for **production-level projects**, but PyTorch is a great choice for **research and experimentation**. More explicitly, PyTorch has the following advantages:

- **Dynamic computation graph**: PyTorch uses a dynamic computation graph, which means that the graph is generated on-the-fly as operations are created. This is in contrast to TensorFlow, which uses a static computation graph. The dynamic computation graph in PyTorch makes it easier to debug and understand the code.

- **Pythonic**: PyTorch is designed to be Pythonic, which means that it is easy to read and write. This is in contrast to TensorFlow, which uses a more verbose syntax.

- **Imperative programming**: PyTorch uses imperative programming, which means that you can write code that looks like regular Python code. This is in contrast to TensorFlow, which uses declarative programming.

## Setting up the working environment

We are going to use different python modules throughout this course. It is not necessary to be familiar with all of them at the moment. Some of these libraries enable us to work with data and perform numerical operations, while others are used for visualization purposes.

In [1]:
from pathlib import Path
import sys

helper_utils = Path.cwd().parent
sys.path.append(str(helper_utils))

from utils.data import download_dataset
import pandas as pd
import torch
import random

Faculty of Science and Engineering 🔬
[95mThe University of Manchester [0m
Invoking utils version: [92m0.7.0[0m


# Introduction to tensors

> **Tensor**: A tensor is a generalisation of vectors and matrices that can have an arbitrary number of dimensions. In simple terms, a tensor is a multidimensional array.

Similar to arrays, tensors can have different shapes and sizes. The number of dimensions of a tensor is called its **rank**. Here are some examples of tensors:

- **Scalar**: A scalar is a single number, denoted as a tensor of rank 0.
- **Vector**: A vector is an array of numbers, denoted as a tensor of rank 1.
- **Matrix**: A matrix is a 2D array of numbers, denoted as a tensor of rank 2.
- **3D tensor**: A 3D tensor is a cube of numbers, denoted as a tensor of rank 3.
- **nD tensor**: An nD tensor is a generalisation of the above examples, denoted as a tensor of rank n.

<figure style="background-color: white; border-radius: 10px; padding: 20px; text-align: center; margin: 0 auto;">
    <img src="..\figs\tensors.png" alt="Visual representation of tensors" align="center" style="width: 60%; height: auto; margin: 0 auto;" />
</figure>


The power of tensors comes in the form of their operations. Tensors can be added, multiplied, and manipulated in various ways. In the next section, we will see how to create tensors using PyTorch.

## Creating tensors
To create a tensor in PyTorch, we can use the class `torch.Tensor`

> 📚 **Documentation**: [torch.Tensor](https://pytorch.org/docs/stable/tensors.html)

We are going to create a scalar tensor using a random integer value.

In [2]:
# Create a random integer
rand_int = random.randint(0, 100)

# Create a tensor from the random integer
scalar = torch.tensor(rand_int)

print(f'Scalar value: {scalar}')

scalar, scalar.shape, scalar.ndim, scalar.dtype

Scalar value: 61


(tensor(61), torch.Size([]), 0, torch.int64)

In [3]:
# Float scalar
rand_float = random.uniform(0, 100)
scalar = torch.tensor(rand_float)
print(scalar)
scalar.shape, scalar.ndim, scalar.dtype

tensor(32.6290)


(torch.Size([]), 0, torch.float32)

When we create a tensor, we are creating a python object that represents the multidimensional array. As any other python object, the tensor has attributes and methods that we can use to manipulate it.

In the above example, we created a scalar tensor with a single element. Looking at its attributes, we can see that the tensor has a shape of `torch.Size([])`, which means that it has no dimensions. We can also see that the tensor has a data type of `torch.int64`, which means that it is an integer tensor.

> **Note**: The data type of a tensor is determined by the data type of the elements that it contains. It is important to be aware of the data type of a tensor, as it can affect the results of operations that are performed on it. Good practice is to always specify the data type of a tensor when creating it.

As we can see our single element is now stored in a type of container, which means that we can perform operations on it but not directly on the element itself. To access the element, we can use the method `item()`.

In [4]:
scalar.item()

32.62903594970703

We can specify the data type of a tensor by passing the `dtype` argument to the `torch.Tensor` constructor. Alternatively, we can use the 'torch.tensor.type` method to change the data type of a tensor.

In [5]:
# Create a scalar tensor with a specific data type
scalar_tensor = torch.tensor(42, dtype=torch.float32)
print(scalar_tensor)

# Change the data type of a tensor
scalar_tensor = scalar_tensor.type(torch.int64)
print(scalar_tensor)

# Another way to change the data type of a tensor
scalar_tensor = scalar_tensor.int()
print(scalar_tensor)

# # Not recommended as it can be confusing 
# with the .to() method that is used to move tensors
# to different devices
scalar_tensor = scalar_tensor.to(torch.float64) 
print(scalar_tensor) 


tensor(42.)
tensor(42)
tensor(42, dtype=torch.int32)
tensor(42., dtype=torch.float64)


## Initializing tensors

PyTorch has several functions to create tensors with different initial values. This is useful when we want to create tensors with specific shapes to represent data. For example, we can create a tensor of zeros, ones, or random values. Here are some examples of how to create tensors with different initial values

In [6]:
# Create a 1D tensor using a range
torch.arange(0, 9)

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [7]:
# Create a 3x3x3 tensor of zeros
torch.zeros(3, 3, 3)

tensor([[[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]]])

In [8]:
# Create a 3x3x3 tensor of ones
torch.ones(3, 3, 3)

tensor([[[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]],

        [[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]],

        [[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]]])

In [9]:
# Create a 3x3x3 tensor of random numbers using a uniform distribution between 0 and 1
torch.rand(3, 3, 3)

tensor([[[0.5284, 0.5199, 0.2040],
         [0.5786, 0.1376, 0.0844],
         [0.3773, 0.9143, 0.8173]],

        [[0.6608, 0.4748, 0.3692],
         [0.0484, 0.1147, 0.9300],
         [0.7445, 0.5994, 0.3852]],

        [[0.6493, 0.7546, 0.5238],
         [0.3814, 0.9689, 0.6665],
         [0.7511, 0.0962, 0.9858]]])

In [10]:
# Create a 3x3 diagonal tensor with 1s on the diagonal
torch.eye(3)

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])

In [11]:
# Create a 1D tensor with 5 evenly spaced values between 0 and 10
torch.linspace(0, 10, 5)

tensor([ 0.0000,  2.5000,  5.0000,  7.5000, 10.0000])

## Indexing tensors

In order to access the elements of a tensor we can use the same indexing methods used for lists and numpy arrays. We can use the `[]` operator to access the elements of a tensor. 

> **Note:** The indexing is zero-based, which means that the first element has an index of 0.

In [12]:
myTensor = torch.tensor(
    [[[1, 2, 3],
      [4, 5, 6],
      [7, 8, 9]],
     [[10, 11, 12],
     [13, 14, 15],
     [16, 17, 18]]])


print(f'Element at [0, 1, 2]: {myTensor[0, 1, 2]}')
print(f'Element at [1, 0, 1]: {myTensor[1, 0, 1]}')

Element at [0, 1, 2]: 6
Element at [1, 0, 1]: 11


We can also use the `:` operator to access a range of elements. This is similar to how we access elements in lists and numpy arrays. The `:` operator allows us to specify a range of indices to access a subset of the tensor.

Furthermore, we can use the `...` operator to access all elements in a tensor. This is useful when we want to access all elements in a specific dimension of a tensor. 

Finally, we can use negative indexing to access elements from the end of a tensor. For instance, we can use `-2` to access the second-to-last element of a tensor. 

In [13]:
# Slicing
print(myTensor[1, 1:3, 1:3])

tensor([[14, 15],
        [17, 18]])


In [14]:
# Slicing with step
print(myTensor[0, 1:3, 1:3:2])

tensor([[5],
        [8]])


In [15]:
# Slicing with ellipsis
# Ellipsis (...) can be used to represent multiple colons
print(myTensor[..., 0:2])

tensor([[[ 1,  2],
         [ 4,  5],
         [ 7,  8]],

        [[10, 11],
         [13, 14],
         [16, 17]]])


In [16]:
# Negative indexing
print(myTensor[..., -1])  # Last element in the last dimension

tensor([[ 3,  6,  9],
        [12, 15, 18]])


# Tensor operations

PyTorch allows us to manipulate tensors in different ways. Since PyTorch is built on top of NumPy, the same operations can be accessed through the `torch` module or alternatively through the `numpy` module. Due to the pythonic nature of PyTorch, we can also use the same operations as we would in Python.

In [17]:
a = torch.tensor([[1, 2], [3, 4]])
b = torch.tensor([[5, 6], [7, 8]])

In [18]:
# Tensor Addition
c = a + b
print(f'Addition:\n' + '-' * 20)
print(c, '\n')

# Tensor subtraction
c = a - b
print(f'Subtraction:\n' + '-' * 20)
print(c, '\n')

# Tensor multiplication
c = a * b
print(f'Multiplication:\n' + '-' * 20)
print(c, '\n')

# Tensor division
c = a / b
print(f'Division:\n' + '-' * 20)
print(c, '\n')

# Tensor exponentiation
c = a ** b
print(f'Exponentiation:\n' + '-' * 20)
print(c, '\n')

# Tensor square root
c = a ** (1/2)
print(f'Square root:\n' + '-' * 20)
print(c, '\n')

# Tensor logarithm
c = torch.log(a)
print(f'Logarithm:\n' + '-' * 20)
print(c, '\n')


Addition:
--------------------
tensor([[ 6,  8],
        [10, 12]]) 

Subtraction:
--------------------
tensor([[-4, -4],
        [-4, -4]]) 

Multiplication:
--------------------
tensor([[ 5, 12],
        [21, 32]]) 

Division:
--------------------
tensor([[0.2000, 0.3333],
        [0.4286, 0.5000]]) 

Exponentiation:
--------------------
tensor([[    1,    64],
        [ 2187, 65536]]) 

Square root:
--------------------
tensor([[1.0000, 1.4142],
        [1.7321, 2.0000]]) 

Logarithm:
--------------------
tensor([[0.0000, 0.6931],
        [1.0986, 1.3863]]) 



## Matrix operations

Matrix multiplication is a common operation in algebra and is used in many machine learning algorithms. We can perform:

- **Matrix multiplication**: This is the standard matrix multiplication operation, which is denoted by the `@` operator in Python. This operation is also known as the dot product.
- **Element-wise multiplication**: This is the multiplication of two matrices of the same shape, which is denoted by the `*` operator in Python. This operation is also known as the Hadamard product.
- **Matrix transpose**: This is the operation of flipping a matrix over its diagonal, which is denoted by the `.T` attribute in Python. This operation is also known as the matrix transpose.
- **Matrix inverse**: This is the operation of finding the inverse of a matrix, which is denoted by the `torch.inverse()` function in Python. This operation is also known as the matrix inverse.

<figure style="background-color: white; border-radius: 10px; padding: 20px; text-align: center; margin: 0 auto; display: flex; justify-content: center; align-items: center; overflow: hidden;">
    <img src="..\figs\matrix_mul.gif" alt="Matrix Multiplication" style="width: 40%; height: 220px; object-fit: cover;">
</figure>

In [19]:
# Tensor matrix multiplication
A = torch.ones(3, 3) * 3
B = torch.randint(0, 10, (3, 3)).to(torch.float32)
print(f'Matrix A:\n' + '-' * 20)
print(A, '\n')
print(f'Matrix B:\n' + '-' * 20)
print(B, '\n')

print(f'Matrix multiplication:\n' + '-' * 20)
print(A @ B, '\n')  # Equivalent to A.matmul(B)

print(f'Element-wise multiplication:\n' + '-' * 20)
print(A * B, '\n') # Element-wise multiplication

print(f'Matrix transpose:\n' + '-' * 20)
print(B.T, '\n')  # Transpose of A

Matrix A:
--------------------
tensor([[3., 3., 3.],
        [3., 3., 3.],
        [3., 3., 3.]]) 

Matrix B:
--------------------
tensor([[5., 9., 4.],
        [5., 0., 8.],
        [0., 7., 0.]]) 

Matrix multiplication:
--------------------
tensor([[30., 48., 36.],
        [30., 48., 36.],
        [30., 48., 36.]]) 

Element-wise multiplication:
--------------------
tensor([[15., 27., 12.],
        [15.,  0., 24.],
        [ 0., 21.,  0.]]) 

Matrix transpose:
--------------------
tensor([[5., 5., 0.],
        [9., 0., 7.],
        [4., 8., 0.]]) 



## Tensor Broadcasting

Since PyTorch is built on top of NumPy we can use its broadcasting capabilities. Broadcasting is how NumPy handles arrays with different shapes during arithmetic operations. It allows us to perform operations on arrays of different shapes without having to explicitly reshape them. This is done by automatically expanding the smaller array to match the shape of the larger array.

For example, if we have a 1D array of shape `(3,)` and a 2D array of shape `(3, 2)`, we can add them together without having to reshape the 1D array. NumPy will automatically expand the 1D array to match the shape of the 2D array.

> 📚 **Documentation**: [Broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html)

In [20]:
a = torch.tensor([[1, 2], [3, 4]])
b = torch.tensor([[5], [6]])

# Broadcasting
c = a + b
print(f'Broadcasting:\n' + '-' * 20)
print(c, '\n')

c = a * b
print(f'Broadcasting with multiplication:\n' + '-' * 20)
print(c, '\n')


Broadcasting:
--------------------
tensor([[ 6,  7],
        [ 9, 10]]) 

Broadcasting with multiplication:
--------------------
tensor([[ 5, 10],
        [18, 24]]) 



## Reshaping tensors
### Unsqueeze
If a tensor is not broadcastable we can use different methods to make it broadcastable. For example, we can use the `unsqueeze` method to add a dimension to the tensor. The `unsqueeze` method takes an integer argument that specifies the dimension to add.

In [21]:
images = torch.rand(8, 3, 32, 32)  #  batch of 8 images, 3 channels (RGB), 32x32 resolution
scaling_factors = torch.tensor([0.5, 1.5, 2.0])  # Shape: (3,)

try:
    scaled_images = images * scaling_factors
except RuntimeError as e:
    print(f'Error: {e}')

# To apply the scaling factors to each channel of the images, we need to unsqueeze the scaling_factors tensor
scaling_factors = scaling_factors.unsqueeze(0).unsqueeze(2).unsqueeze(3)  # Shape: (1, 3, 1, 1)
scaled_images = images * scaling_factors

print(f'Scaled images shape: {scaled_images.shape}')


Error: The size of tensor a (32) must match the size of tensor b (3) at non-singleton dimension 3
Scaled images shape: torch.Size([8, 3, 32, 32])


**Why unsqueeze(0), unsqueeze(2), and unsqueeze(3)?**

When applying channel-specific scaling to images, we need to align the dimensions correctly:

**1. Understanding the Shapes**
- **Images**: `torch.Size([8, 3, 32, 32])`
    - 8 images in the batch
    - 3 channels (RGB)
    - 32×32 pixels per image

- **Scaling Factors**: `torch.Size([3])`
    - One scaling factor per channel

**2. Dimension Transformation**
- **Original**: `[3]` (just channel values)
- **After unsqueeze(0)**: `[1, 3]` (adds batch dimension)
- **After unsqueeze(2)**: `[1, 3, 1]` (adds height dimension)
- **After unsqueeze(3)**: `[1, 3, 1, 1]` (adds width dimension)

**3. Broadcasting in Action**
- The `[1, 3, 1, 1]` tensor broadcasts to `[8, 3, 32, 32]`
- Each scaling factor is applied to its corresponding channel across all images
- Batch dimension (1→8), height (1→32), and width (1→32) dimensions are all broadcast

### View
The `view` method allows us to reshape a tensor while keeping the underlying memory layout. This is useful when we want to change the shape of a tensor without copying the data. The `view` method takes an integer argument that specifies the shape of the tensor after reshaping.

In [22]:
images = torch.rand(8, 3, 32, 32)  #  batch of 8 images, 3 channels (RGB), 32x32 resolution
scaling_factors = torch.tensor([0.5, 1.5, 2.0])  # Shape: (3,)

scaling_factors = scaling_factors.view(1, 3, 1, 1)
scaled_images = images * scaling_factors
print(f'Scaled images shape: {scaled_images.shape}')

Scaled images shape: torch.Size([8, 3, 32, 32])


### Expand
The `expand` method allows us to expand the dimensions of a tensor without copying the data. This is useful when we want to create a tensor with a specific shape without having to copy the data. The `expand` method takes an integer argument that specifies the shape of the tensor after expansion.

In [23]:
a = torch.rand(3,4) # Shape (3, 4)
b = torch.rand(3) # Shape (3,)

try:
    c = a + b
except RuntimeError as e:
    print(f'Error: {e}')

b_expanded = b.unsqueeze(1).expand(3, 4) # Shape (3, 1) -> (3, 4)
c = a + b_expanded
print(f'Expanded b shape: {b_expanded.shape}')
print(f'Final result shape: {c.shape}')

Error: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 1
Expanded b shape: torch.Size([3, 4])
Final result shape: torch.Size([3, 4])


## Reshape
The `reshape` method works similarly to the `view` method, but it can also handle cases where the tensor is not contiguous in memory. The latter means that it may create a copy of the data if the memory layout is not compatible. The `reshape` method takes an integer argument that specifies the shape of the tensor after reshaping.

In [24]:
matrix = torch.rand(5, 3)  # Shape: (5, 3)
vector = torch.rand(5)     # Shape: (5,)

# Reshape vector from (5,) to (5,1) to allow broadcasting over (5,3)
vector_reshaped = vector.reshape(5, 1)
result = matrix + vector_reshaped
print(f'Matrix shape: {matrix.shape}')
print(f'Reshaped vector shape: {vector_reshaped.shape}')

Matrix shape: torch.Size([5, 3])
Reshaped vector shape: torch.Size([5, 1])



| Method | Function |
|--------|----------|
| `unsqueeze(dim)` | Adds a singleton dimension at dim |
| `expand(*sizes)` | Expands singleton dimensions without copying data |
| `view(*sizes)` | Reshapes without copying memory (if possible) |
| `reshape(*sizes)` | Reshapes, may create a new memory copy |


# Data to tensors

As mentioned before, PyTorch inherent pythonic nature allows us to easily convert existing data structures to tensors. Thus, we can use different data science libraries to load data and convert it to tensors. We are going to use the `pandas` library to load data from CSV files and convert it to tensors.

In [25]:
data_path = Path(Path.cwd(), 'datasets')
dataset_path = download_dataset('ARKOMA',
                                   dest_path=data_path,
                                   extract=True)

dataset_path = dataset_path / 'Dataset on NAO Robot Arms' / 'Left Arm Dataset' / 'LTrain_x.csv'

Downloading:
ARKOMA: The Dataset to Build Neural Networks-Based Inverse Kinematics for NAO Robot Arms
> Authors: Arif Nugroho, Eko Mulyanto Yuniarno, Mauridhi Hery Purnomo
> Year: 2020
> Website: https://www.sciencedirect.com/science/article/pii/S2352340923007989



Downloading brg4dz8nbb-1.zip: 100%|██████████| 658k/658k [00:00<00:00, 4.38MiB/s]
Extracting brg4dz8nbb-1.zip: 100%|██████████| 1/1 [00:00<00:00, 248.43it/s]
Extracting Dataset on NAO Robot Arms.zip: 100%|██████████| 15/15 [00:00<00:00, 806.02it/s]


DataFrames in pandas are similar to tables in SQL or Excel. They are two-dimensional data structures that can hold different types of data. DataFrames have rows and columns, where each column can have a different data type. We can use the `pandas` library to load data from CSV files and convert it to DataFrames.
> 📚 **Documentation**: [pandas](https://pandas.pydata.org/)

In [26]:
# Read the dataset
df = pd.read_csv(dataset_path)
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Px,6000.0,98.928583,76.226124,-149.87,45.2975,116.345,161.75,218.63
Py,6000.0,179.584893,52.8084,-30.44,145.8625,180.81,213.7625,315.66
Pz,6000.0,88.939072,124.252501,-117.3,-24.62,74.065,206.29,318.99
Rx,6000.0,-4.468263,46.874867,-179.42,-32.7,-0.91,26.7625,159.17
Ry,6000.0,-1.739148,55.525642,-171.43,-42.1325,-0.305,38.6725,166.35
Rz,6000.0,4.639302,32.807616,-157.1,-13.2125,4.635,23.5825,153.66


In [27]:
# Get the data as a numpy array
type(df.Px.values)

numpy.ndarray

In [28]:
# Convert the numpy array to a PyTorch tensor
px_tensor = torch.tensor(df.Px.values)
px_tensor, px_tensor.shape, px_tensor.ndim, px_tensor.dtype, px_tensor.device

(tensor([-27.9200, 110.5800, 180.9600,  ..., 104.3300, 122.9200, 139.4300],
        dtype=torch.float64),
 torch.Size([6000]),
 1,
 torch.float64,
 device(type='cpu'))

In [29]:
# Alternatively, we can use the from_numpy method
px_tensor = torch.from_numpy(df.Px.values)
px_tensor, px_tensor.shape, px_tensor.ndim, px_tensor.dtype, px_tensor.device

(tensor([-27.9200, 110.5800, 180.9600,  ..., 104.3300, 122.9200, 139.4300],
        dtype=torch.float64),
 torch.Size([6000]),
 1,
 torch.float64,
 device(type='cpu'))

In [30]:
# Create a tensor from the entire DataFrame
data = torch.tensor(df.values)
data.shape, data.ndim, data.dtype, data.device

(torch.Size([6000, 6]), 2, torch.float64, device(type='cpu'))

# Using the GPU
PyTorch allows us to use the GPU to accelerate computations. This is done by moving the tensors to the GPU memory. We can do this by using the `to` method of a tensor and passing the device as an argument. The device can be either `cuda` or `cpu`. The `cuda` device refers to the GPU, while the `cpu` device refers to the CPU.

> **Note**: Not all operations are supported on the GPU. If an operation is not supported on the GPU, PyTorch will automatically move the tensor to the CPU and perform the operation there. This can lead to performance issues, so it is important to be aware of which operations are supported on the GPU.

## Checking for GPU availability
We can check if a GPU is available by using the `torch.cuda.is_available()` method. This method returns a boolean value that indicates whether a GPU is available or not. If a GPU is available, we can use it to accelerate computations.

## When to use the GPU
Using the GPU is beneficial when we are working with large tensors or when we are performing operations that are computationally expensive. For example, training a deep learning model on a large dataset can be accelerated by using the GPU. However, if we are working with small tensors or performing simple operations, using the CPU may be faster. 

Typically, we use the GPU for computer vision and natural language processing tasks, where the data is large and the operations are computationally expensive.

> **Note**: When choosing between the CPU and GPU, it is important to make sure that all tensors and models are on the same device. If a tensor is on the CPU and a model is on the GPU, PyTorch will automatically move the tensor to the GPU, which can lead to performance issues. It is important to be aware of which device each tensor and model is on.


In [31]:
# Check GPU availability
if torch.cuda.is_available():
    device = torch.device('cuda')
    print(f'GPU is available: {device}')
else:
    device = torch.device('cpu')
    print(f'GPU is not available, using CPU: {device}')

# Move the tensor to the GPU
px_tensor = px_tensor.to(device)
print(f'Tensor moved to device: {px_tensor.device}')

GPU is available: cuda
Tensor moved to device: cuda:0
