# An introduction to PyTorch and Tensors

In [1]:
# import the needed librareis and check PyTorch Version
import torch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

print(torch.__version__)

2.2.0+cu121


## What are tensors?
In order to understand tensors we need to take a step back and define a few other concepts such as scalar, vector and matrix.
* A scalar: is a single number.
* A vector: is a line of numbers with direction.
* A matrix: is a table of numbers.
* A tensor is a multi-dimensional container for numbers, extending beyond the concept of a matrix.

Therefore **Tensor**s are like the next level up from matrices. They can have any number of dimensions, making them a very flexible way to describe things that are too complex for scalars, vectors, and matrices. A simple way to understand a tensor is to think of it as a container that can hold scalars, vectors, and matrices in a way that can have many layers of depth.

*For example, the colors in a digital image can be represented as a tensor, where each color (red, green, blue) at each pixel is a number in the tensor.*

<img src="./Images/Scalar-Vector-Matrix-Tensor.png" alt="Scalar Vector Matrix Tensor Image" />

All of the above can be held in a tensor of varying dimensionality lets explore that below.

In [4]:
# First lets define some tensors

# Scalar
scalar = torch.tensor(7)

# Vector
vector = torch.tensor([7,7])

# Matrix
matrix = torch.tensor([[7,8],[9, 10]])

# Tensor
tensor = torch.tensor([[[1,2,3],[3,6,9],[2,4,5]]])


In [15]:
print(f'Our scalar has {scalar.ndim} dimensions\nOur vector has {vector.ndim} dimension\nOur matrix has {matrix.ndim} dimensions\nOur tensor has {tensor.ndim} dimensions')

Our scalar has 0 dimensions
Our vector has 1 dimension
Our matrix has 2 dimensions
Our tensor has 3 dimensions


What does that mean when all of them are translated in to tensors? Well lets use some python to find out:

In [16]:
print(f'Our scalar has the shape {scalar.shape}\nOur vector has the shape {vector.shape}\nOur matrix has the shape {matrix.shape}\nOur tensor has the shape {tensor.shape}')

Our scalar has the shape torch.Size([])
Our vector has the shape torch.Size([2])
Our matrix has the shape torch.Size([2, 2])
Our tensor has the shape torch.Size([1, 3, 3])


* **Scalar (torch.Size([]))**: This represents a scalar, which is a single number. The empty brackets [] indicate that there are no dimensions, just one value. In PyTorch, even a single number is considered a tensor, but with no dimensions.
* **Vector (torch.Size([2]))**: This shape indicates a vector with 2 elements. The number inside the brackets [2] tells you the length of the vector. So, this is a list of 2 numbers, which can represent something with magnitude and direction in a 2-dimensional space.
* **Matrix (torch.Size([2, 2]))**: This shape describes a matrix with 2 rows and 2 columns. The numbers inside the brackets [2, 2] show the matrix's dimensions. You can think of this as a table or grid that contains 4 numbers in total, arranged in 2 rows and 2 columns.
* **Tensor (torch.Size([1, 3, 3]))**: This shape is for a tensor that has 3 dimensions. The numbers [1, 3, 3] tell you the size of each dimension. This particular tensor can be thought of as having 1 layer, where each layer contains a 3x3 matrix. It's like a book with only 1 page, and on that page, there's a grid of 3 rows and 3 columns.

## Random tensors
Random tensors are important in machine learning.  Because many neural networks start with tensors full of random numbers then they adjust those random numbers to better represent the data that they are trying to model. 

Very simplistically this is how they work:
`Start with random numbers -> look at the data -> update the random numbers -> look at the data again -> update the randome numbers.`

Lets explore creating a tensor filled with random numbers in PyTorch:

In [19]:
print(f'Here we have created a tensor of {random_tensor.ndim} dimensions and of shape {random_tensor.shape}')

Here we have created a tensor of 2 dimensions and of shape torch.Size([3, 4])


In [18]:
# Create a tensor filled with randome numbers of size 3 x 4
random_tensor = torch.rand(3,4)
random_tensor

tensor([[0.8642, 0.5504, 0.0658, 0.9212],
        [0.8350, 0.6604, 0.2170, 0.1070],
        [0.4252, 0.2398, 0.4214, 0.7776]])

We can make larger and multi-dimensional tensors also:

In [20]:
random_tensor = torch.rand(1,10,10)
random_tensor

tensor([[[0.9972, 0.4375, 0.8007, 0.0257, 0.7878, 0.6261, 0.6823, 0.2414,
          0.7736, 0.8829],
         [0.7341, 0.1779, 0.2873, 0.5280, 0.8427, 0.3950, 0.3152, 0.9940,
          0.5507, 0.3117],
         [0.0020, 0.1574, 0.1020, 0.5805, 0.9213, 0.5165, 0.6973, 0.3236,
          0.4469, 0.4705],
         [0.2569, 0.8970, 0.4913, 0.7579, 0.2204, 0.6195, 0.4873, 0.4412,
          0.9604, 0.7719],
         [0.2790, 0.2687, 0.8167, 0.9973, 0.7488, 0.6443, 0.6540, 0.7764,
          0.1999, 0.8666],
         [0.1039, 0.3868, 0.4793, 0.3949, 0.5854, 0.7829, 0.0241, 0.6927,
          0.9757, 0.2450],
         [0.1075, 0.9201, 0.4762, 0.2821, 0.0881, 0.8523, 0.2073, 0.6401,
          0.5034, 0.8078],
         [0.3503, 0.3018, 0.3794, 0.8251, 0.1291, 0.9127, 0.8098, 0.1972,
          0.9213, 0.2583],
         [0.4588, 0.3679, 0.7099, 0.1036, 0.9696, 0.7534, 0.1193, 0.4321,
          0.7916, 0.8649],
         [0.2468, 0.9902, 0.0660, 0.1937, 0.4037, 0.3572, 0.7999, 0.3999,
          0.3380,

In [21]:
print(f'Here we have created a tensor of {random_tensor.ndim} dimensions and of shape {random_tensor.shape}')

Here we have created a tensor of 3 dimensions and of shape torch.Size([1, 10, 10])


In [23]:
random_tensor = torch.rand(10,10,10)
random_tensor

tensor([[[0.7098, 0.0710, 0.4738, 0.5933, 0.1720, 0.0855, 0.2142, 0.5943,
          0.4354, 0.4134],
         [0.7597, 0.8588, 0.4299, 0.9546, 0.9550, 0.2900, 0.1761, 0.1618,
          0.0273, 0.4046],
         [0.1886, 0.2542, 0.0730, 0.9392, 0.3261, 0.7835, 0.6379, 0.8070,
          0.3404, 0.2625],
         [0.0241, 0.3867, 0.8074, 0.9284, 0.5520, 0.9403, 0.7983, 0.3739,
          0.0700, 0.6273],
         [0.0306, 0.8675, 0.6679, 0.9613, 0.0602, 0.4383, 0.7797, 0.2431,
          0.1429, 0.8274],
         [0.3915, 0.7344, 0.6699, 0.0293, 0.3863, 0.2913, 0.8673, 0.7184,
          0.5454, 0.7647],
         [0.5184, 0.2849, 0.6915, 0.5468, 0.9505, 0.6770, 0.5271, 0.4215,
          0.3268, 0.9618],
         [0.8065, 0.8280, 0.4248, 0.8596, 0.5074, 0.7614, 0.7524, 0.5292,
          0.5049, 0.2402],
         [0.8879, 0.7586, 0.6921, 0.5886, 0.1960, 0.0073, 0.7049, 0.0093,
          0.9619, 0.2911],
         [0.2087, 0.4803, 0.3824, 0.6428, 0.3593, 0.0888, 0.0334, 0.0723,
          0.8935,

In [24]:
print(f'Here we have created a tensor of {random_tensor.ndim} dimensions and of shape {random_tensor.shape}')

Here we have created a tensor of 3 dimensions and of shape torch.Size([10, 10, 10])


Tensors can be used to hold a wide variety of data for example we can create a tensor that would hold 1 second of video of 30 frames per second. We need to define the height and width of our video in this case 224 x 224 the number of colour channels = 3 and the number of frames = 30.

In [25]:
random_video_size_tensor = torch.rand(size=(224,224,3,30)) 

In [26]:
print(f'Our video tensor is {random_video_size_tensor.ndim} dimensions and of shape {random_video_size_tensor.shape}')

Our video tensor is 4 dimensions and of shape torch.Size([224, 224, 3, 30])


## Tensor datatypes
The three biggest problems/bugs that any data scientist encounters from beginner to pro are: 
1. Tensors are not the right **datatype**
2. Tensors are not the right **shape**
3. Tensors are not on the right **device**

In this section we will learn about some of these problems in order to know where to look for these types of issues.

In [28]:
# Lets create a tensor to work with
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=None, # Here we can set the datatype for example float32 or float16
                               device=None, # Where does your tensor live by default this is cpu but it can also be cuda for example
                               requires_grad=False # Do we track gradients with operations on this tensor or not?
                              )

Now we can do a few things which intuitively shouldn't work but guess what they do with tensors:

In [29]:
float_16_tensor = float_32_tensor.type(torch.float16) # we can make a flaot 16 of the tensor

In [30]:
float_16_tensor * float_32_tensor # We can multiply different datatypes together

tensor([ 9., 36., 81.])

In [32]:
int_32_tensor = torch.tensor([3,6,9], dtype=torch.int32) # Lets make an int32 for this next trick

In [33]:
float_32_tensor * int_32_tensor # This also shouldn't work but it does!

tensor([ 9., 36., 81.])

In [34]:
int_32_tensor = torch.tensor([3, 6, 9], dtype=torch.long)

In [35]:
float_32_tensor * int_32_tensor # This also works

tensor([ 9., 36., 81.])

When you perform an operation like multiplication between tensors of different data types in PyTorch, such as a floating-point tensor (`float_32_tensor`) and an integer tensor (`int_32_tensor`), PyTorch automatically performs type promotion (also known as type casting) to ensure that the operation can proceed smoothly. Type promotion rules determine the data type of the result of an operation involving tensors of different types.

Here's how it works in the context of your question:

- **`float_32_tensor`**: This tensor is a floating-point data type, `torch.float32`. This data type represents 32-bit floating-point numbers.

- **`int_32_tensor`**: Despite the naming, if we follow the earlier definition, this tensor is actually of type `torch.long` (or `torch.int64`), which is a 64-bit integer.

- **Operation**: When you multiply these tensors (`float_32_tensor * int_32_tensor`), PyTorch automatically promotes the integer tensor to a floating-point tensor so that the operation is between two tensors of the same type (floating-point). The rules for type promotion ensure that the operation does not result in unintended data loss. In this case, the result of the multiplication will be a floating-point tensor.

Type promotion is a feature designed to make tensor operations more intuitive and to prevent common mistakes that can occur when dealing with tensors of different data types. It allows for operations between tensors of different types by automatically converting them to a common type that is capable of representing the information in both tensors as accurately as possible.

## How do we extract information from tensors?
To help us in squashing the three common bugs outined above we can extract information quite easily from our tensors using python for each of the three scenarios. 

1. Tensors not the right **datatype** -  to get datatype from a tensor we use `tensor.dtype`
2. Tensors not the right **shape** - to get the shape from a tensor we use `tensor.shape`
3. Tensors not on the right **device** - to get the device from a tensor we use `tensor.device`

In [37]:
# Lets create a tensor to play with
some_tensor = torch.rand([3,4])

In [38]:
# Find out some details about our tensor
print(some_tensor)
print(f'Datatype of tensor: {some_tensor.dtype}')
print(f'Shape of tensor: {some_tensor.shape}')
print(f'Device tensor lives on: {some_tensor.device}')

tensor([[0.9676, 0.6742, 0.0454, 0.9786],
        [0.1619, 0.6151, 0.4271, 0.1533],
        [0.3807, 0.6424, 0.0373, 0.6684]])
Datatype of tensor: torch.float32
Shape of tensor: torch.Size([3, 4])
Device tensor lives on: cpu


## How do we manipulate tensors? 

Tensor operations are just like those associated with matrix manipulations so if you remember matrix math you can do tensor math. 
Tensor operators include:
* Addition
* Subtraction
* Multiplication (element-wise)
* Division
* Matrix Multiplication

As with many things in life there is more than one way to complete this task.  PyTorch provides some built-in functions such as `torch.add` and `torch.mul` but generally people use the python built in functions

In [43]:
# Create a tensor
tensor = torch.tensor([1,2,3])

# Add 10 to each element
print(tensor + 10)

# Multiply each element by 10
print(tensor * 10)

# Subtract 10 from each element
print(tensor - 10)

tensor([11, 12, 13])
tensor([10, 20, 30])
tensor([-9, -8, -7])


## A recap of Matric Multiplication
There are two main ways of performing multiplication on a matrix or tensor:

1. Elemnent-wise multiplication
2. Matrix multiplication (dot product)

We covered element-wise multiplication above.  

### Matrix multiplication
In order to carry out matrix multiplication two rules need to be satisfied:
1. The inner dimensions of the matrix must match
   * `(3,2) @ (3,2)` **Will not** work
   * `(2,3) @ (3,2)` **Will** work
   * `(3,2) @ (2,3)` **Will** work
     
  
2. The resulting matrix has the shape of the outer dimensions:
   * `(2,3) @ (3,2)` -> (2,2)
   * `(3,2) @ (2,3)` -> (3,3)re re
  

### Recap of Matrix's
The matrix below $A$ has $2$ rows and $3$ columns.  The element $a_{2,1}$ is the entry in the second row of the first column of matrix $A$ = 5

$$ A = \begin{bmatrix}-2 & 5 & 6 \\ 5 & 2 & 7\end{bmatrix}$$

#### Scalar multiplication

When we work with matrixes real numbers are referd to as **scalars**.

$$ 2 \cdot \begin{bmatrix}5 & 2 \\ 3 & 1\end{bmatrix} = \begin{bmatrix} 2\cdot 5 & 2 \cdot 2 \\ 2 \cdot 3 & 2 \cdot 1\end{bmatrix} $$

Scalar multiplication refers to thee product of a real number and a matrix. In this case **each** entry in the matrix is multiplied by the given scalar.

This differs from matrix multiplication which refers to the product of two matrices.  


#### $n$-tuples and the dot product
Ordered pairs and even ordered triples are noted as $(2,5)$ and $(3,1,8)$ respectively.  An $n-$tuple is a generalization of this. It is an ordered list of $n$ numbers. The **dot product** of two $n-$tuples of equal length is found by summing the products of the corresponding entires.

$e.g.$ find the dot product of $(2,5) \cdot (3,1)$:
$$ (\textcolor{purple}2,\textcolor{green}5) \cdot (\textcolor{purple}3,\textcolor{green}1) = \textcolor{purple}{2 \cdot 3} + \textcolor{green}{5 \cdot 1} $$ $$= 6 + 5$$ $$=11 $$

Ordered $n-$tuples are indicated by a variable with an arrow over the top. For example $\overrightarrow a = (3,1,8)$ and $\overrightarrow b = (4,2,3)$.  The dot product is calculated as follows:

$$ \overrightarrow a \cdot \overrightarrow b = (\textcolor{purple}3,\textcolor{green}1,\textcolor{magenta}8) \cdot (\textcolor{purple}4,\textcolor{green}2,\textcolor{magenta}3)$$
$$= \textcolor{purple}{3 \cdot 4} + \textcolor{green}{1 \cdot 2} + \textcolor{magenta}{8 \cdot 3}$$
$$=12 + 2 + 24$$
$$=38$$

*Note:*  The dot product of two $n-$tuples of equal length is always a single real number.

#### $n-$tuples and Matrices

When considering matritices for multiplication each row and column can be considered as an $n-$tuple

$$
\begin{array}{cc}
& \textcolor{olive}{\overrightarrow{c_1} \quad \overrightarrow{c_2}}\\
& \textcolor{olive}{\downarrow \quad \downarrow} \\
\begin{array}{c}
\textcolor{teal}{\overrightarrow{r_1} \quad \rightarrow} \\
\textcolor{teal}{\overrightarrow{r_2} \quad \rightarrow}
\end{array} &
\begin{bmatrix}
6 & 2 \\
4 & 3
\end{bmatrix}
\end{array}
$$

Thus in this example:

Row 1 is denoted by the ordered pair $\textcolor{teal}{\overrightarrow{r_1}}=(6,2)$ and row 2 is denoted by the ordered pair $\textcolor{teal}{\overrightarrow{r_2}}=(4,3)$. 

Columns are similarly notated $\textcolor{olive}{\overrightarrow{c_1}}=(6,4)$ represents column 1 and $\textcolor{olive}{\overrightarrow{c_2}}=(2,3)$ represents column 2.


### Matrix multiplication

Taking the following example we will work through matrix multiplication. 

Given $A = \begin{bmatrix} 1 & 7 \\ 2 & 4 \end{bmatrix}$ and $B = \begin{bmatrix} 3 & 3 \\ 5 & 2 \end{bmatrix}$ find $C = AB$.

Lets split is up into rows and columns as before rows in matrix $A$ and columns in matrix $B$.  We define matrix $C$ below:
$$\begin{array}{ccc}
& \textcolor{olive}{\quad \quad \quad \quad \overrightarrow{b_1} \quad \overrightarrow{b_2}}\\
& \textcolor{olive}{\quad \quad \quad \quad \downarrow \quad \downarrow} \\
\begin{array}{c}
\textcolor{teal}{\overrightarrow{a_1} \quad \rightarrow} \\
\textcolor{teal}{\overrightarrow{a_2} \quad \rightarrow}
\end{array} &
\begin{bmatrix}
1 & 7 \\
2 & 4
\end{bmatrix} \cdot
\begin{bmatrix}
1 & 7 \\
2 & 4
\end{bmatrix} & = \begin{bmatrix}
\textcolor{teal}{\overrightarrow{a_1}} \cdot \textcolor{olive}{\overrightarrow{b_1}} & \textcolor{teal}{\overrightarrow{a_1}} \cdot \textcolor{olive}{\overrightarrow{b_2}} \\
\textcolor{teal}{\overrightarrow{a_2}} \cdot \textcolor{olive}{\overrightarrow{b_1}} & \textcolor{teal}{\overrightarrow{a_2}} \cdot \textcolor{olive}{\overrightarrow{b_2}}
\end{bmatrix}
\end{array}$$
$$\quad \; A \quad \quad \quad B \quad \quad \quad \quad \quad \quad C $$

Each entry in C is the dot product of a row in matrix $A$ and a column in matrix $B$.
We can write this more generally as the element $c_{\textcolor{olive}i,\textcolor{teal}j}$ is the dot product of $\textcolor{teal}{\overrightarrow{a_i}}$ and $\textcolor{olive}{\overrightarrow{b_j}}$.