# Tensors

## Introduction to Tensors

At the core of PyTorch lies the `Tensor` object, which serves as the fundamental data structure for all computations. Tensors are multidimensional arrays, conceptually similar to NumPy arrays, but with added capabilities tailored for deep learning and high-performance computing.

PyTorch tensors support a wide variety of operations, including arithmetic, indexing, reshaping, and broadcasting. More importantly, they can be transferred seamlessly between the CPU and GPU, allowing for accelerated computation with minimal code changes.

In practice, tensors represent everything from scalar values to high-dimensional data such as images, sequences, and model parameters. Understanding tensors and how to manipulate them efficiently is essential for working with PyTorch and developing neural network models.

PyTorch tensors are created using [`torch.tensor`](https://pytorch.org/docs/stable/tensors.html). For example, we may create the simplest tensor – a scalar – in the following way:


In [2]:
import torch
# scalar
scalar = torch.tensor(2)
scalar

tensor(2)

A scalar is a $0$-dimensional tensor. As a matter of fact, we can check its dimension with `ndim`

In [70]:
scalar.ndim


0

To access the value of a tensor, we must use the `item()` method: 

In [71]:
scalar.item()

2

The next structure is just a vector, i.e. a tensor of dimension $2$. We can create it using the torch.tensor() constructor from a simple python list:

In [72]:
# vector
vector = torch.tensor([2, 7])
vector

tensor([2, 7])

Note: be careful using `tensor()` instead of `Tensor()`. The latter is a lower-level class constructor. When given shape arguments (e.g., `torch.Tensor(2, 3)`), it creates an *uninitialized* tensor. This can lead to unintended behavior. So, always use `torch.tensor()` when starting from data.


Now, tensors' dimensions and their shapes are very important in PyTorch. When manipulating vectors, we must pay close attention to their shapes, or we may run into errors or miscalculations.


In [73]:
vector.ndim


1

This is different from vector.shape:

In [74]:
vector.shape


torch.Size([2])

We can step things up and create a matrix:

In [75]:
matrix = torch.tensor([[1, 2], [7, 8]])
matrix

tensor([[1, 2],
        [7, 8]])

In [76]:
print(f"matrix dims: {matrix.ndim}")
print(f"matrix shape: {matrix.shape}")


matrix dims: 2
matrix shape: torch.Size([2, 2])


This tells us that our matrix is 2-dimensional, and in fact it's a 2×2 square matrix. How about a rectangular matrix? Very simple:
<!--  -->

In [77]:
rect_matrix = torch.tensor([[1, 2], [3, 4], [4, 5]])
print(f"matrix dims: {rect_matrix.ndim}")
print(f"matrix shape: {rect_matrix.shape}")


matrix dims: 2
matrix shape: torch.Size([3, 2])


Since we have $2$ elements along each axes, and it's a $3\times2$ matrix.  
Let's see how it works with tensors:

In [78]:
# TENSOR
TENSOR = torch.tensor([[[1, 2, 3],
                        [3, 6, 9],
                        [2, 4, 5],
                        [6, 7, 8]]])
TENSOR


tensor([[[1, 2, 3],
         [3, 6, 9],
         [2, 4, 5],
         [6, 7, 8]]])

In [79]:
print(f"Tensor dim: {TENSOR.ndim}")


Tensor dim: 3


In [80]:
print(f"Tensor shape: {TENSOR.shape}")


Tensor shape: torch.Size([1, 4, 3])


Why this shape? We can think of tensors as multidimensional matrices. In this case, we have 1 matrix containing 4 vectors of dimension 3.

Understanding shapes takes some practice, but once you get used to it, it becomes second nature.

This represents 2 images, each of size 3×3 pixels, with 3 channels (RGB).


## Random Tensors

Random tensors are very important in PyTorch's workflows. The reason is the way Neural Networks work. They start from randomly initialized weights, and they adapt these values through training (by minimizing a Loss function). So the first step is always to intialize weights randomly. 

In order to create random tensors we can use Torch's [torch.rand()](https://docs.pytorch.org/docs/main/generated/torch.rand.html). 

In [81]:
random_tensor = torch.rand(3,4)
random_tensor
print(f"random tensor dimension: {random_tensor.ndim}") # dims will be 2
print(f"random tensor shape: {random_tensor.shape}") # shape will be torch.Size([3,4])

random tensor dimension: 2
random tensor shape: torch.Size([3, 4])


A common example of tensor encoding is when we want to encode an image. Usually to encode an image we use a tensor of shapes `[colour_channels, height, width]`. For an RGB image of size `224 x 224`we will use something like this:

![Example of encoding an RGB image](imgs/00-tensor-shape-example-of-image.png)

## 0's and 1's Tensors
These are useful for creating masks.


In [82]:
# create a tensor of all zeros
zeros = torch.zeros(3,4)
zeros

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [83]:
# create a tensor of all zeros
ones = torch.ones(3,4)
ones

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

Note: when you let Torch automatically create a tensor, its default type is `torch.float32`. To get the type we can use the `.dtype` argument. We will see more about types in the future.

In [84]:
ones.dtype

torch.float32

## Creating range of tensors & tensors-like

In [85]:
# use torch.range()
one_to_Ten = torch.arange(1,11)
one_to_Ten

tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

Basically its parameters are start, end (ends at end-1) and steps. by default the step is $1$ but we can modify it:

In [86]:
zero_to_hundred = torch.arange(start=0, end=100, step=2)
zero_to_hundred

tensor([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34,
        36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70,
        72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98])

Another useful method is the `_like()` method. For example, `zeros_like()`  allows us to create a tensor with the shape of another already existing tensor, all filled with zeros. There are many options, you can fill it with chosen values (`torch.full_like()`) , with random numbers (`torch.rand_like()`), and so on.  

In [87]:
ten_zeros = torch.zeros_like(one_to_Ten)
ten_zeros

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

## Tensor datatypes

The standard type is float32, but we can change it of course by explicitly setting it. See [torch.dtype](https://docs.pytorch.org/docs/stable/tensor_attributes.html#torch-dtype).

In [88]:
# Float 32 tensor
float_32_tensor = torch.rand(2 ,6)
float_32_tensor

tensor([[0.3513, 0.6349, 0.8420, 0.9925, 0.9101, 0.7914],
        [0.1295, 0.1268, 0.9806, 0.1038, 0.3814, 0.0941]])

In [89]:
float_32_tensor.dtype

torch.float32

In [90]:
# Float 16 tensor
float_16_tensor = torch.rand((2 ,6), dtype=torch.float16)
float_16_tensor.dtype

torch.float16

Or, if you are creating a tensor from scratch, you can specify it in the same way

In [91]:
data = [3, 6, 9, 10]
tensor = torch.tensor(data, dtype=torch.float16)
tensor 

tensor([ 3.,  6.,  9., 10.], dtype=torch.float16)

The higher the precision value (8, 16, 32), the more detail and hence data used to express a number.

This matters in deep learning and numerical computing because you're making so many operations, the more detail you have to calculate on, the more compute you have to use.

So lower precision datatypes are generally faster to compute on but sacrifice some performance on evaluation metrics like accuracy (faster to compute but less accurate).



`.dtype` is one of the three main arguments in the `tensor()` class. The other two are just as important - if not more - and we will see more about them later on. They are the `device` argument, which specify on which device (cpu or gpu) our machine should create our tensor, and `requires_grad`, which specifies if we should keep track of operations performed on our tensor in order to perform backpropagation. 

In [3]:
# Default datatype for tensors is float32
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=None, # What dataype is the tensor, defaults to torch.float32 (even if it's set to None)
                               device='cpu', # What device is your tensor on
                               requires_grad=False) # whether or not to track gradients for backpropagation (default False if not wrapped in nn.Parameter)

float_32_tensor.dtype, float_32_tensor.device, float_32_tensor.requires_grad

(torch.float32, device(type='cpu'), False)

Operations between types in Torch are robust, meaning that even if two tensors have different types, sometimes this will not raise an error, and revert everything to default (float 32). Nonetheless, you should always be careful to *perform operations between tensors of the same type*, to avoid miscalculations and problems.  

## Getting Information from Tensors

Once you've created tensors (or someone else or a PyTorch module has created them for you), you might want to get some information from them.

We've seen these before but three of the most common attributes you'll want to find out about tensors are:
- `shape` - what shape is the tensor? (some operations require specific shape rules)
- `dtype` - what datatype are the elements within the tensor stored in?
- `device` - what device is the tensor stored on? (usually GPU or CPU) 

Let's create a random tensor and find out details about it.

In [4]:
# Create a tensor
some_tensor = torch.rand(3, 4)

# Find out details about it
print(some_tensor)
print(f"Shape of tensor: {some_tensor.shape}")
print(f"Datatype of tensor: {some_tensor.dtype}")
print(f"Device tensor is stored on: {some_tensor.device}") # will default to CPU

tensor([[0.3923, 0.4521, 0.5415, 0.3870],
        [0.9300, 0.6538, 0.9350, 0.3785],
        [0.0193, 0.2008, 0.8574, 0.0714]])
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


## Basic Operations and Broadcasting

In deep learning, data (images, text, video, audio, protein structures, etc) gets represented as tensors.

A model learns by investigating those tensors and performing a series of operations (could be $1,000,000$'s+) on tensors to create a representation of the patterns in the input data.

These operations are often a wonderful dance between:

- Addition
- Substraction
- Multiplication (element-wise)
- Division
- Matrix multiplication

And that's it. Sure there are a few more here and there but these are the basic building blocks of neural networks.

Let's start with a few of the fundamental operations, addition (`+`), subtraction (`-`), mutliplication (`*`).

In [5]:
# Create a tensor of values and add a number to it
tensor = torch.tensor([1, 2, 3])
tensor + 10

tensor([11, 12, 13])

In [7]:
# Multiply it by 10
tensor * 10

tensor([10, 20, 30])

Notice how the tensor values above didn't end up being `tensor([110, 120, 130])`, this is because the values inside the tensor don't change unless they're reassigned.

In [8]:
# Tensors don't change unless reassigned
tensor

tensor([1, 2, 3])

Let's subtract a number and this time we'll reassign the tensor variable.

In [10]:
# Subtract and reassign
tensor = tensor - 10
tensor

tensor([-19, -18, -17])

In [None]:
# Add and reassign
tensor = tensor + 10
tensor

tensor([-9, -8, -7])

In [12]:
# Add and reassign
tensor = tensor + 10
tensor

tensor([1, 2, 3])

In [18]:
# Matrix multiplication

### Note on Broadcasting 

Broadcasting allows PyTorch to perform operations between tensors of different shapes by automatically expanding the smaller tensor to match the shape of the larger one.

For example what we saw above :
```python
tensor = torch.tensor([1, 2, 3])
tensor + 10
```

PyTorch is *broadcasting* the scalar `10` across all elements of the tensor. Internally, it's as if it expanded `10` to match the shape of tensor:

In [15]:
# Equivalent to:
torch.tensor([1, 2, 3]) + torch.tensor([10, 10, 10])

tensor([11, 12, 13])

#### Broadcasting a Column Vector to a Matrix
Let's see another example:


In [19]:
# A 3x3 matrix
matrix = torch.tensor([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

# A column vector (3 rows, 1 column)
col_vector = torch.tensor([
    [10],
    [20],
    [30]
])

# Broadcasting: adds col_vector to each column of the matrix
result = matrix + col_vector
print(result)


tensor([[11, 12, 13],
        [24, 25, 26],
        [37, 38, 39]])


In this example, we add a 3×1 column vector to a 3×3 matrix. PyTorch automatically broadcasts the column vector along the second dimension (columns), so it gets added to each row:

- Matrix shape: `(3, 3)`
- Column vector shape: `(3, 1)`

PyTorch expands the column vector to match the shape of the matrix:

```
[[10, 10, 10],
[20, 20, 20],
[30, 30, 30]]
```


And then adds it element-wise to the matrix.

## Matrix Multiplication (*dot product*)

One of the most common operations in machine learning and deep learning algorithms (like neural networks) is matrix multiplication, also referred to as the *dot product*.

PyTorch implements matrix multiplication functionality in the [torch.matmul()](https://docs.pytorch.org/docs/stable/generated/torch.matmul.html#torch.matmul) method. 

The main two rules for matrix multiplication to remember are:

The inner dimensions must match:
```
(3, 2) @ (3, 2) # won't work
(2, 3) @ (3, 2) # will work
(3, 2) @ (2, 3) # will work
```
The resulting matrix has the shape of the outer dimensions:
```
(2, 3) @ (3, 2) -> (2, 2)
(3, 2) @ (2, 3) -> (3, 3)
```

> **Note:** `@` in Python is the symbol for matrix multiplication, but it is recommended to use `.matmul()`, which is way faster.

> **Note:** `.mm()` is (almost) an alias for `.matmul()`; there is a difference: `.mm()` works *only for 2D matrices*, while `.matmul()` works with tensors of any dimensiion. Basically, `.mm()` does not broadcast, while `.matmul()` does.



Let's create a tensor and perform element-wise multiplication and matrix multiplication on it.

In [31]:
tensor = torch.tensor([1, 2, 3])
tensor.shape

torch.Size([3])

In [30]:
# Element-wise matrix multiplication
tensor * tensor

tensor([1, 4, 9])

In [34]:
# Matrix multiplication
torch.matmul(tensor, tensor)

tensor(14)

In [33]:
# Can also use the "@" symbol for matrix multiplication, though not recommended (slower)
tensor @ tensor

tensor(14)

In [None]:
torch.mm(tensor, tensor)    # this won't work

RuntimeError: self must be a matrix

In [40]:
# to use .mm() with vectors we should manually change shapes (what .matmul() does automatically)
tensor1 = tensor.reshape(1,3)
tensor2 = tensor.reshape(3,1)
result = torch.mm(tensor1, tensor2) # shape (1,1)

print(f"result: {result}")  #tensor([[14]])
print(f"result value: {result.item()}") # 14 

result: tensor([[14]])
result value: 14


To summarize, for our `tensor` variable with values `[1, 2, 3]`:

| Operation                | Calculation                        | Code                     |
|--------------------------|------------------------------------|--------------------------|
| Element-wise multiplication | `[1*1, 2*2, 3*3] = [1, 4, 9]`      | `tensor * tensor`        |
| Matrix multiplication     | `[1*1 + 2*2 + 3*3] = [14]`         | `tensor.matmul(tensor, tensor)`  |


### Transposition and Reshaping

Because much of deep learning is multiplying and performing operations on matrices and matrices have a strict rule about what shapes and sizes can be combined, one of the most common errors you'll run into in deep learning is shape mismatches.

Of course if we are creating tensors "on the fly" we could just create them with matching shapes. But what if these tensors already exist and we need to multiply them? For example:

In [41]:
# Shapes need to be in the right way  
tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]], dtype=torch.float32)

tensor_B = torch.tensor([[7, 10],
                         [8, 11], 
                         [9, 12]], dtype=torch.float32)

torch.matmul(tensor_A, tensor_B) # (this will error)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)

We want the inner dimensions to match. In this case we can simply transpose the second matrix. We can do this with 
- `torch.transpose(input, dim0, dim1)` - where `input` is the desired tensor to transpose and `dim0` and `dim1` are the dimensions to be swapped.
- `tensor.T` - where tensor is the desired tensor to transpose.

Let's try the latter.

In [43]:
# View tensor_A and tensor_B
print(tensor_A)
print(tensor_B)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7., 10.],
        [ 8., 11.],
        [ 9., 12.]])


In [44]:
# View tensor_A and tensor_B.T
print(tensor_A)
print(tensor_B.T)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7.,  8.,  9.],
        [10., 11., 12.]])


In [45]:
# The operation works when tensor_B is transposed
print(f"Original shapes: tensor_A = {tensor_A.shape}, tensor_B = {tensor_B.shape}\n")
print(f"New shapes: tensor_A = {tensor_A.shape} (same as above), tensor_B.T = {tensor_B.T.shape}\n")
print(f"Multiplying: {tensor_A.shape} * {tensor_B.T.shape} <- inner dimensions match\n")
print("Output:\n")
output = torch.matmul(tensor_A, tensor_B.T)
print(output) 
print(f"\nOutput shape: {output.shape}")

Original shapes: tensor_A = torch.Size([3, 2]), tensor_B = torch.Size([3, 2])

New shapes: tensor_A = torch.Size([3, 2]) (same as above), tensor_B.T = torch.Size([2, 3])

Multiplying: torch.Size([3, 2]) * torch.Size([2, 3]) <- inner dimensions match

Output:

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

Output shape: torch.Size([3, 3])


Remember, matrix multiplication is the building block of neural networks (that's why the mathematics of deep learning isn't usually that hard). So matrix multiplication is all you need.

![Example of encoding an RGB image](imgs/00_matrix_multiplication_is_all_you_need.jpeg)