In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import torch
print(torch.__version__)

2.5.1+cu124


## Introduction to Tensors
Ref: https://www.learnpytorch.io/00_pytorch_fundamentals/#creating-tensors
### Creating Tensors
PyTorch tensors are created using `torch.tensor()`

In [2]:
scalar = torch.tensor(7)
scalar

tensor(7)

We can check the dimensions of a tensor using the `ndim` attribute

In [3]:
scalar.ndim

0

Retrieved the number from the tensor

In [4]:
scalar.item()

7

In [5]:
# Vector
vector = torch.tensor([-5,0])
vector

tensor([-5,  0])

In [6]:
# check number of dimension of the vector
vector.ndim

1

In [7]:
# Check shape of a vector
vector.shape

torch.Size([2])

In [8]:
# Matrix
MATRIX = torch.tensor([[7,8],[9,10]])
MATRIX

tensor([[ 7,  8],
        [ 9, 10]])

In [9]:
# check number of dimension
MATRIX.ndim

2

In [10]:
# Check shape
MATRIX.shape

torch.Size([2, 2])

In [11]:
TENSOR = torch.tensor([[[1,2,3],[4,5,6],[7,8,9]]])
TENSOR

tensor([[[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]])

In [12]:
TENSOR.ndim

3

In [13]:
TENSOR.shape

torch.Size([1, 3, 3])

<img src="images/00-pytorch-different-tensor-dimensions.png"> 

Let's Summarise....
| Name | What is it? | Number of dimensions | Variable Declaration |
|:--------:|:--------:|:--------:| :--------: |
|  scalar  |  a single number   |  0 | Lower(a)   |
|  vector  |  1-dimensional array |  1  | Lower(y) |
|  matrix  |  2-dimensional array |  2  | Upper(Q) |
| tensor | an n-dimensional array | n | Upper(X) |


<image src="images/00-scalar-vector-matrix-tensor.png" width=30% />

### Random tensors

In [14]:
# Create a random tensor of size(3, 4)
random_tensor = torch.rand(size=(3, 4))
random_tensor, random_tensor.dtype

(tensor([[0.0202, 0.3253, 0.8353, 0.3862],
         [0.7080, 0.9983, 0.1873, 0.4760],
         [0.9020, 0.1138, 0.1311, 0.4415]]),
 torch.float32)

- The flexibility of `torch.rand()` is that we can adjust the size to be whatever we want.
- For example, say you wanted a random tensor in the common image shape of `[224, 224, 3]` (`[height, width, color_channels]`).


### Zeros and Ones

In [15]:
# Create a tensor of all zeros
zeros = torch.zeros(size=(3, 5))
zeros, zeros.dtype

(tensor([[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]]),
 torch.float32)

In [16]:
# Create a tensor of all ones
ones = torch.ones(size=(3, 4))
ones, ones.dtype

(tensor([[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]),
 torch.float32)

### Creating a range and tensors like
`torch.arange(start, end, step)` 

In [17]:
zero_to_ten = torch.arange(0, 10, 1)
zero_to_ten

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

- Sometimes you might want one tensor of a certain type with the same shape as another tensor. 
- For Example, a tensor of all zeros with the same shape as a previous tensor.
- To do so you can use `torch.zeros_like(input)` or `torch.ones_like(input)` which return a tensor filled with `zeros or ones` in the same shape as the `input` respectively.

In [18]:
# Create a tensor of zeros similar to another tensor
ten_zeros = torch.zeros_like(input=zero_to_ten)
ten_zeros

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

### Tensor Datatypes
- There are many different tensor datatypes available in PyTorch. 
- Some are specific for CPU and some are better for GPU.
- Getting to know which one can take some time.
- Generally if you see `torch.cuda` anywhere, the tensor is being used for `GPU` (since `Nvidia GPUs` use a computing toolkit called `CUDA`).
- The most common type (and generally the default) is `torch.float32` or `torch.float`. This is referred to as "`32-bit floating point`".
- But there's also 16-bit floating point (`torch.float16 or torch.half`) and 64-bit floating point (`torch.float64 or torch.double`).
- And to confuse things even more there's also 8-bit, 16-bit, 32-bit and 64-bit integers.
- The reason for all of these is to do with precision in computing. Precision is the amount of detail used to describe a number.
- The higher the precision value (8, 16, 32), the more detail and hence data used to express a number.
- This matters in deep learning and numerical computing because you're making so many operations, the more detail you have to calculate on, the more compute you have to use.
- So lower precision datatypes are generally faster to compute on but sacrifice some performance on evaluation metrics like accuracy (faster to compute but less accurate).

In [19]:
# Create a tensor with specific datatype
float_32_tensor = torch.tensor([3.0, 6.0, 9.0], dtype=None, device=None, requires_grad=False) # Default dtype = float32, Default device CPU otherwise CUDA if mentioned, requires_grad means record the gradient
float_32_tensor.shape, float_32_tensor.dtype, float_32_tensor.device

(torch.Size([3]), torch.float32, device(type='cpu'))

Aside from shape issues (tensor shapes don't match up), two of the other most common issues you'll come across in PyTorch are **datatype and device issues**. For example, one of tensors is `torch.float32` and the other is `torch.float16` (*PyTorch often likes tensors to be the same format*).Or one of your tensors is on the `CPU` and the other is on the `GPU` (*PyTorch likes calculations between tensors to be on the same device*).

### Getting Information From tensors
Three of the most common attributes we want to findout about tensors are:
1. `shape`: What shape is the tensor? (Some operations require specific shape rules)
2. `dtype`: What datatype are the elements within the tensor stored in?
3. `device`: What device is the tensor stored on? (usually GPU or CPU)

In [20]:
some_tensor = torch.rand(3, 4)
# Find the details
print(some_tensor)
print(f'Shape: {some_tensor.shape}')
print(f'Datatype: {some_tensor.dtype}')
print(f'Device: {some_tensor.device}')

tensor([[0.9679, 0.8841, 0.3502, 0.5061],
        [0.0281, 0.3315, 0.4322, 0.7651],
        [0.4030, 0.3256, 0.0921, 0.5250]])
Shape: torch.Size([3, 4])
Datatype: torch.float32
Device: cpu


`
    Note: When you run into issues in PyTorch, it's very often one to do with one of the three attributes above. So when the error messages show up, sing yourself a little song called "what, what, where":
`

### Manipulating tensors
In deep learning, data (images, text, video, audio, protein structures, etc) gets represented as tensors.

A model learns by investigating those tensors and performing a series of operations (could be 1,000,000s+) on tensors to create a representation of the patterns in the input data.

These operations are often a wonderful dance between:
1. Addition
2. Substraction
3. Multiplication (element-wise)
4. Division
5. Matrix multiplication

And that's it. Sure there are a few more here and there but these are the basic building blocks of neural networks.

Stacking these building blocks in the right way, you can create the most sophisticated of neural networks (just like lego!).

In [21]:
# Addition
tensor = torch.tensor([1, 2, 3])
tensor + 10

tensor([11, 12, 13])

In [22]:
# Multiply it by 10
tensor * 10

tensor([10, 20, 30])

Notice how the tensor values above didn't end up being tensor([110, 120, 130]), this is because the values inside the tensor don't change unless they're reassigned.

In [23]:
# Subtract and reassign
tensor = tensor - 10
tensor

tensor([-9, -8, -7])

In [24]:
# Add and reassign
tensor += 10
tensor

tensor([1, 2, 3])

PyTorch also has a bunch of built-in functions like `torch.mul()` (short for multiplication) and `torch.add()` to perform basic operations.

In [25]:
torch.multiply(tensor, 10)

tensor([10, 20, 30])

In [26]:
torch.add(tensor,10)

tensor([11, 12, 13])

In [27]:
# Elementwise multiplication
tensor * tensor

tensor([1, 4, 9])

### Matrix Multiplication
- PyTorch implements matrix multiplication functionality using `torch.matmul()` method.
- `@` in Python is the symbol for matrix multiplication.

#### Case-1: If both tensors are 1D

In [28]:
tensor = torch.tensor([1, 2, 3])
torch.matmul(tensor, tensor)

tensor(14)

In [29]:
tensor @ tensor

tensor(14)

#### Case-2: If Both are 2D

In [30]:
A = torch.rand(size=(2,2))
B = torch.rand(size=(2,2))
torch.matmul(A,B) # Matrix-Matrix Product

tensor([[1.0833, 0.6784],
        [0.8199, 0.5251]])

#### Case-3: 1D @ 2D
In this case, the first tensor is treated as a row-vector by prepending a 1 to its dimension. After the matrix multiplication the prepend dimension is removed.

In [31]:
a = torch.tensor([1, 2]) # Shape(2,)
B = torch.tensor([[3,4],[5,6]]) # Shape(2,2)
# Inorder to support matrix multiplication shape of a must be (1, 2)
# Internally a is treated as [[1,2]], now shape is (1, 2)
# Matrix Multiplication is performed and result is [[13, 16]]
# Remove the Prepending Dimension: [13, 16]
torch.matmul(a,B)

tensor([13, 16])

#### Case 4: 2D @ 1D
The second tensor is treated as a column vector by appending a 1 to its dimension. The result is a matrix-vector product.

In [32]:
A = torch.tensor([[1,2],[3,4]]) # Shape(2,2)
b = torch.tensor([5,6]) # Shape(2,)
# b will be converted as [[5],[6]] now Shape(2,1)
# Multiplication result: [[17],[19]] now Shape(2,1)
# Remove the prepend dimension: [17, 19] now Shape(2, )
A @ b

tensor([17, 39])

#### Case-5: Batched Matrix Multiplication(N-Dimensional tensors)
Steps are as follows
1. The last two dimensions are treated as matrices.
2. All dimensions before the last two are treated as batch dimensions
3. Batch dimensions must either:
   1. Match Exactly
   2. Be Broadcasted to each other.
4. The Output shape combines broadcasted batch dimensions and the resulting matrix product dimensions.

In [33]:
# Batched Matrix multiplication
A = torch.rand(2, 3, 4) # Shape: (2, 3, 4)
B = torch.rand(2, 4, 5) # Shape: (2, 4, 5)
result = torch.matmul(A, B) # Shape: (2, 3, 5)
print(result.shape)

torch.Size([2, 3, 5])


<image src="images/batched-matrix-multiplication.jpg" width=30%/>

In [34]:
# Broadcasting batch dimensions
A = torch.rand(1, 3, 4)
B = torch.rand(2, 4, 5)
result = A @ B
print(result.shape) # Shape: (2, 3, 5)

torch.Size([2, 3, 5])


<image src="images/broadcasting-batch-dimension.jpg" width=30%/>

1D and N-dimensional batched multiplication

In [35]:
a = torch.tensor([1.0, 2.0])
B = torch.randn(2, 2, 3)
# Prepend a Dimension to a: a = [[1, 2]] Shape(1, 2)
# For the first matrix in B, matrix dimensions : (1 x 2) x (2 x 3) = (1 x 3)
# For the Second Matrix in B, matrix dimensions: (1 x 2) x (2 x 3) = (1 x 3)
# Combine result: Shape(2, 3)
result = torch.matmul(a, B) 
print(result.shape)

torch.Size([2, 3])


### Transpose
`torch.transpose(input, dim0, dim1)` It swaps the two specified dimension of the input tensor. It rearanges the data in the tensor along the specified axes without changing the underlying data.

#### Function Paramter
1. input: The tensor you want to transpose
2. dim0: The first dimension to swap
3. dim1: The second dimension to swap

In [36]:
# Transpose of a matrix 
tensor = torch.tensor([[1, 2], [3, 4]])
tensor.T # torch.transpose()

tensor([[1, 3],
        [2, 4]])

In [37]:
# Swapping Axes of a 3D Tensor
a = torch.tensor([[[1,2],[3,4]],
                  [[5,6],[7,8]]])
b = torch.transpose(a, 0, 1)
b

tensor([[[1, 2],
         [5, 6]],

        [[3, 4],
         [7, 8]]])

This tensor has 3 dimensions:
- Dimension 0: 2 matrices (or slices): [[1, 2], [3, 4]] and [[5, 6], [7, 8]].
- Dimension 1: Each matrix has 2 rows.
- Dimension 2: Each row has 2 columns.
  
Before Transpose:
- dim0 (Axis 0): Represents slices [[1, 2], [3, 4]] and [[5, 6], [7, 8]].
- dim1 (Axis 1): Represents rows within each slice.

After Swapping dim0 and dim1:
- What used to be rows (dim1) now become slices (dim0).
- What used to be slices (dim0) now become rows (dim1).

In [38]:
torch.transpose(a, 0, 2)

tensor([[[1, 5],
         [3, 7]],

        [[2, 6],
         [4, 8]]])

In [39]:
torch.transpose(a, 2, 1)

tensor([[[1, 3],
         [2, 4]],

        [[5, 7],
         [6, 8]]])

### Neural networks are full of matrix multiplications and dot products.

The `torch.nn.Linear()` module, also known as a feed-forward layer or fully connected layer, implements matrix multiplication between an input `x` and a weights matrix `A`.
$$y=x.A^T + b$$

Where 
- `x` is the input to the layer
- `A` is the weights matrix created by the layer, this starts out as random numbers that get adjusted as a neural network learns to better represent patterns in the data (notice the "`T`", that's because the weights matrix gets transposed).
- `b` is the bias term used to slightly offset the weights and inputs.
- `y` is the output (a manipulation of the input in the hopes to discover patterns in it)

In [40]:
torch.manual_seed(42) # To make the things reproducible
linear = torch.nn.Linear(in_features=2, # Matches inner dimension of Input
                         out_features=6)
x = torch.tensor([[1, 2],
                  [3, 4],
                  [5, 6]], dtype=torch.float32)
output = linear(x)
print(f"Input shape: {x.shape}")
print(f"Output: \n{output}\n\nOutput shape: {output.shape}")

Input shape: torch.Size([3, 2])
Output: 
tensor([[2.2368, 1.2292, 0.4714, 0.3864, 0.1309, 0.9838],
        [4.4919, 2.1970, 0.4469, 0.5285, 0.3401, 2.4777],
        [6.7469, 3.1648, 0.4224, 0.6705, 0.5493, 3.9716]],
       grad_fn=<AddmmBackward0>)

Output shape: torch.Size([3, 6])


**Question**: What happens if you change in_features from 2 to 3 above? Does it error? How could you change the shape of the input (x) to accommodate to the error?

In [41]:
torch.manual_seed(42) # To make the things reproducible
linear = torch.nn.Linear(in_features=3, # Matches inner dimension of Input
                         out_features=6)
x = torch.tensor([[1, 2],
                  [3, 4],
                  [5, 6]], dtype=torch.float32)
output = linear(x.T)
print(f"Input shape: {x.shape}")
print(f"Output: \n{output}\n\nOutput shape: {output.shape}")

Input shape: torch.Size([3, 2])
Output: 
tensor([[0.9332, 0.8805, 3.0149, 1.5545, 1.8186, 2.0634],
        [1.7186, 1.4009, 3.5818, 1.7408, 2.6017, 2.5123]],
       grad_fn=<AddmmBackward0>)

Output shape: torch.Size([2, 6])


### Finding the min, max, mean, sum, etc (aggregation)

In [42]:
# Create a tensor
x = torch.arange(0, 100, 10)
x

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

#### find the max, min, mean and sum of it.

In [43]:
print(f"Minimum: {x.min()}")
print(f"Maximum: {x.max()}")
print(f"Mean: {x.type(torch.float32).mean()}") # won't work without float datatypes
print(f"Sum: {x.sum()}")

Minimum: 0
Maximum: 90
Mean: 45.0
Sum: 450


### Positional min/max
You can also find the index of a tensor where the max or minimum occurs with `torch.argmax()` and `torch.argmin()` respectively.

In [44]:
# Return index of max and min values
print(f"Index where max value occur: {x.argmax()}")
print(f"Index where min value occur: {x.argmin()}")

Index where max value occur: 9
Index where min value occur: 0


### Change tensor datatype
As mentioned, a common issue with deep learning operations is having your tensors in different datatypes.

If one tensor is in `torch.float64` and another is in `torch.float32`, you might run into some errors.

But there's a fix.

You can change the datatypes of tensors using `torch.Tensor.type(dtype=None)` where the `dtype` parameter is the datatype you'd like to use.

In [45]:
# Create a tensor and check the datatype
tensor = torch.arange(10.,100.,10.)
tensor.dtype

torch.float32

In [46]:
# Create a float16 tensor from the above tensor
tensor_float16 = tensor.type(torch.float16)
tensor_float16

tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float16)

In [47]:
# Create an int8 tensor
tensor_int8 = tensor.type(torch.int8)
tensor_int8

tensor([10, 20, 30, 40, 50, 60, 70, 80, 90], dtype=torch.int8)

`Exercise:` So far we've covered a fair few tensor methods but there's a bunch more in the `torch.Tensor` [documentation](https://pytorch.org/docs/stable/tensors.html), I'd recommend spending 10-minutes scrolling through and looking into any that catch your eye. Click on them and then write them out in code yourself to see what happens.

In [48]:
# Tensor.real
# Returns a new tensor containing real values of the self tensor for a complex-valued input tensor. 
# The returned tensor and self share the same underlying storage.
# Returns self if self is a real-valued tensor tensor.
x = torch.randn(4, dtype=torch.cfloat)
print(f"x: {x}")
print(f"x_real: {x.real}")

x: tensor([ 0.7851-1.1949j, -0.6993+0.6774j,  0.9349+0.5778j, -0.5415-0.5308j])
x_real: tensor([ 0.7851, -0.6993,  0.9349, -0.5415])


In [49]:
# torch.abs()
torch.abs(torch.tensor([-1, -2, 3]))

tensor([1, 2, 3])

In [50]:
# torch.acos(): Computes the inverse cosine of each element in input.
a = torch.randn(4)
inverse_cos_a = torch.acos(a)
print(f"a: {a}")
print(f"Inverse_Cos_of_a: {a}")

a: tensor([ 1.3525,  0.6863, -0.3278,  0.7950])
Inverse_Cos_of_a: tensor([ 1.3525,  0.6863, -0.3278,  0.7950])


In [51]:
# torch.addbmm(input, batch1, batch2, *, beta=1, alpha=1, out=None)
# Performs a batch matrix-matrix product of matrices stored in batch1 and batch2,
# with a reduced add step (all matrix multiplications get accumulated along the first dimension).
# input is added to the final result.
# Ref: https://pytorch.org/docs/stable/generated/torch.addbmm.html#torch.addbmm
M = torch.randn(3, 5)
batch1 = torch.randn(10, 3, 4)
batch2 = torch.randn(10, 4, 5)
torch.addbmm(M, batch1, batch2)

tensor([[ 4.1059e+00, -3.1359e-03, -2.0432e+00,  6.6354e-01,  6.4135e+00],
        [ 2.7884e+00,  5.5228e-01, -4.5921e+00,  8.5628e+00,  4.4435e+00],
        [-2.5552e+00,  6.3446e+00, -9.0685e+00,  4.4779e+00,  2.1720e+00]])

In [52]:
# torch.equal(input, other)
# True if two tensors have the same size and elements, False otherwise.

torch.equal(torch.tensor([1,2]), torch.tensor([1, 2]))

True

### Reshaping, stacking, squeezing and unsqueezing¶
<image src="images/popular_methods_reshape_view.png" />

Why do any of these?

Because deep learning models (neural networks) are all about manipulating tensors in some way. And because of the rules of matrix multiplication, if you've got shape mismatches, you'll run into errors. These methods help you make sure the right elements of your tensors are mixing with the right elements of other tensors.

In [53]:
torch.cuda.is_available()

True

In [54]:
# Set device type
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

In [55]:
# Count number of devices
torch.cuda.device_count()

1