<a href="https://colab.research.google.com/github/Chood16/DSCI222/blob/main/lectures/(14)_PyTorch_Fundamentals.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PyTorch Fundamentals

[PyTorch](https://pytorch.org/) is an open source machine learning and deep learning framework. It was developed and released in 2017 by Meta (Facebook). Version 2 was released in 2023.

PyTorch allows you to manipulate and process data and write machine learning algorithms using Python code.

In [None]:
# Fortunately for us, Colab naturally suppots torch
import torch
torch.__version__

cu126 means this version of PyTorch supports CUDA 12.6

CUDA (Compute Unified Device Architecture) is NVIDIA's parallel computing platform and programming model.

It lets you use the power of an NVIDIA GPU (Graphics Processing Unit) for general-purpose computing, not just graphics.

Have you heard in the past few years about [NVIDIA stock](https://www.google.com/search?q=nvidia+stock&oq=nvidi&gs_lcrp=EgZjaHJvbWUqEwgAEEUYJxg7GEYY-gEYgAQYigUyEwgAEEUYJxg7GEYY-gEYgAQYigUyEwgBEC4YgwEYxwEYsQMY0QMYgAQyBggCEEUYOTIKCAMQABixAxiABDIKCAQQABixAxiABDINCAUQLhixAxiABBjlBDIKCAYQABixAxiABDIKCAcQABixAxiABDIKCAgQABixAxiABDIHCAkQABiPAtIBCDE5NjhqMGo3qAIAsAIA&sourceid=chrome&ie=UTF-8) and valuation skyrocketing? This is one of the largest reasons why!

When code is run on the CPU, it runs sequentially, one thing at a time.

By utilizing the GPU, you can run multiple operations at the same time (in parallel).

## Introduction to tensors

Now we've got PyTorch imported, it's time to learn about tensors.

Many libraries we've worked with have a core data structure associated with it. NumPy uses `ndarray`, Pandas has `DataFrames` and `Series`, PyTorch has `Tensor`.

A tensor is a general concept for multidimensional numerical arrays and that is exactly what this data structure represents. We access these in PyTorch through the `torch.Tensor` class.

Name | What is it? | Number of dimensions | Lower or upper (usually/example) |
| ----- | ----- | ----- | ----- |
| **scalar** | a single number | 0 |
| **vector** | a number with direction | 1 |
| **matrix** | a 2-dimensional array of numbers | 2 |
| **tensor** | an n-dimensional array of numbers | can be any number, a 0-dimension tensor is a scalar, a 1-dimension tensor is a vector |



### Creating tensors

In [None]:
# Scalar
scalar = torch.tensor(7)
scalar

We can check the dimensions of a tensor using the `ndim` attribute.

In [None]:
scalar.ndim

In [None]:
# Get the Python number within a tensor (only works with one-element tensors)
scalar.item()

In [None]:
# Vector
vector = torch.tensor([7, 7])
vector
# vector.item()

In [None]:
# Check the number of dimensions of vector
vector.ndim

In [None]:
# Check shape of vector
vector.shape

In [None]:
# Matrix
MATRIX = torch.tensor([[7, 8],
                       [9, 10],
                       [11, 12]])
MATRIX

In [None]:
# Check number of dimensions
MATRIX.ndim

Notice: The number of nested brackets is a shortcut way to identify the dimensions

In [None]:
MATRIX.shape

In [None]:
# Tensor
TENSOR = torch.tensor([[[7, 8],
                       [9, 10],
                       [11, 12]]])
TENSOR

In [None]:
print(TENSOR.ndim)
print(TENSOR.shape)

Alright, it outputs `torch.Size([1, 3, 2])`, why?

The dimensions go outer to inner.

> **Note:** You might've noticed us using lowercase letters for `scalar` and `vector` and uppercase letters for `MATRIX` and `TENSOR`. This was on purpose. In practice, you'll often see scalars and vectors denoted as lowercase letters such as `y` or `a`. And matrices and tensors denoted as uppercase letters such as `X` or `W`.

![scalar vector matrix tensor and what they look like](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/00-scalar-vector-matrix-tensor.png)

### Random tensors

We've established tensors represent some form of data.

And machine learning models such as neural networks manipulate and seek patterns within tensors.

But when building machine learning models with PyTorch, it's rare you'll create tensors by hand (like what we've been doing). In neural netroks, tensors are used to represent the input data, the hidden layers, and the out data.

For the hidden layers and output, a machine learning model often starts out with large random tensors of numbers and adjusts these random numbers as it works through data to better represent it.

In essence:

`Start with random numbers -> look at data -> update random numbers -> look at data -> update random numbers...`

As a data scientist, you can define how the machine learning model starts (initialization), looks at data (representation) and updates (optimization) its random numbers. This will be our ultimate goal in creating a Neural Network.


In [None]:
# Create a random tensor of size (3, 4)
random_tensor = torch.rand(size=(3, 4))
random_tensor, random_tensor.dtype

The flexibility of `torch.rand()` is that we can adjust the `size` to be whatever we want.

For example, say you wanted a random tensor in the common image shape of `[224, 224, 3]` (`[height, width, color_channels`]).

In [None]:
# Create a random tensor of size (224, 224, 3)
random_image_size_tensor = torch.rand(size=(224, 224, 3))
print(f'Shape: {random_image_size_tensor.shape} \n Size: {random_image_size_tensor.ndim}')

### Some Methods borrowed from numpy

Sometimes you'll just want to fill tensors with zeros or ones.

This happens a lot with masking (like masking some of the values in one tensor with zeros to let a model know not to learn them).

Let's create a tensor full of zeros with `torch.zeros()`

Again, the `size` parameter comes into play.

In [None]:
# Create a tensor of all zeros
zeros = torch.zeros(size=(3, 4))
zeros, zeros.dtype

In [None]:
# Compare to numpy
import numpy as np
zeros_np = np.zeros((3, 4))
zeros_np, zeros_np.dtype # <-- we'll talk about dtype more later

We can do the same to create a tensor of all ones except using `torch.ones()` instead.

In [None]:
# Create a tensor of all ones
ones = torch.ones(size=(3, 4))
ones, ones.dtype

Sometimes you might want a range of numbers, such as 1 to 10 or 0 to 100.

You can use `torch.arange(start, end, step)` to do so.

Where:
* `start` = start of range (e.g. 0)
* `end` = end of range (e.g. 10)
* `step` = how many steps in between each value (e.g. 1)

In [None]:
# Create a range of values 0 to 10
zero_to_ten = torch.arange(start=0, end=10, step=1)
zero_to_ten

Sometimes you might want one tensor of a certain type with the same shape as another tensor.

For example, a tensor of all zeros with the same shape as a previous tensor.

To do so you can use `torch.zeros_like(input)` or `torch.ones_like(input)`which return a tensor filled with zeros or ones in the same shape as the `input` respectively.

In [None]:
# Can also create a tensor of zeros similar to another tensor
ten_zeros = torch.zeros_like(input=zero_to_ten) # will have same shape
ten_zeros

### Tensor datatypes

There are many different tensor datatypes available in PyTorch

Some are specific for CPU and some are better for a GPU `device`.

Getting to know which one can take some time.

Generally if you see `torch.cuda` anywhere, the tensor is being used for GPU.

The most common type (and generally the default) is `torch.float32`

This is referred to as "32-bit floating point".

But there's also 16-bit floating point (`torch.float16` or `torch.half`) and 64-bit floating point (`torch.float64` or `torch.double`).

And to confuse things even more there's also 8-bit, 16-bit, 32-bit and 64-bit integers.

The reason for all of these is to do with **precision in computing**.

Precision is the amount of detail used to describe a number.

The higher the precision value (8, 16, 32), the more detail and hence data used to express a number.

This matters in deep learning and numerical computing because you're making so many operations, the more detail you have to calculate on, the more compute you have to use.

So lower precision datatypes are generally faster to compute on but sacrifice some performance on evaluation metrics like accuracy (faster to compute but less accurate).

In [None]:
# Default datatype for tensors is float32
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=None, # defaults to None, which is torch.float32 or whatever datatype is passed
                               device=None) # defaults to None, which uses the default tensor type

print(f"Shape of tensor: {float_32_tensor.shape}")
print(f"Datatype of tensor: {float_32_tensor.dtype}")
print(f"Device tensor is stored on: {float_32_tensor.device}")

Aside from shape issues (tensor shapes don't match up), two of the other most common issues you'll come across in PyTorch are datatype and device issues.

For example, one of tensors is `torch.float32` and the other is `torch.float16` may throw an error or give unexpected results.

Or one of your tensors is on the CPU and the other is on the GPU.

We'll see more of this device talk later on.

For now let's create a tensor with `dtype=torch.float16`.

In [None]:
float_16_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=torch.float16) # torch.half would also work

float_16_tensor.dtype

In [None]:
float_16_tensor + float_16_tensor
float_16_tensor.dtype

In [None]:
float_32_tensor + float_32_tensor
float_32_tensor.dtype

In [None]:
float_mix_tensor = float_32_tensor + float_16_tensor
float_mix_tensor.dtype

In [None]:
# We'll talk about what @ does as a mathematical operation later (try to guess!)
float_32_tensor @ float_32_tensor
# float_32_tensor @ float_16_tensor # <-- Won't work

> **Note:** When you run into issues in PyTorch, it's very often one to do with one of the three attributes above.

## Manipulating tensors (tensor operations)

* Addition
* Substraction
* Multiplication (element-wise)
* Division
* Matrix multiplication

### Basic operations

Operations with scalars

In [None]:
# Create a tensor of values and add a number to it
tensor = torch.tensor([1, 2, 3])
print(tensor + 10)
print(tensor * 10)
print(tensor - 10)

Operations with vectors

In [None]:
print(tensor + tensor)
print(tensor - tensor)
print(tensor * tensor)
print(tensor @ tensor) # <-- ???

In [None]:
tensor1_2d = torch.rand(3,4)
tensor2_2d = torch.randn(3,4)
print(tensor1_2d)
print(tensor2_2d)

In [None]:
print(tensor1_2d + tensor2_2d)
print(tensor1_2d - tensor2_2d)
print(tensor1_2d * tensor2_2d)
# print(tensor1_2d @ tensor2_2d)

### Matrix multiplication

One of the most common operations in machine learning and deep learning algorithms (like neural networks) is matrix multiplication

PyTorch implements matrix multiplication functionality in the `torch.matmul()` method or by using `@`.

> Note: `@` calculates the dot product for vectors

The main two rules for matrix multiplication to remember are:

1. The **inner dimensions** must match:
  * `(3, 2) @ (3, 2)` won't work
  * `(2, 3) @ (3, 2)` will work
  * `(3, 2) @ (2, 3)` will work
2. The resulting matrix has the shape of the **outer dimensions**:
 * `(2, 3) @ (3, 2)` -> `(2, 2)`
 * `(3, 2) @ (2, 3)` -> `(3, 3)`



In [None]:
print(tensor1_2d.shape)
print(tensor2_2d.shape)

In [None]:
mat_mul = tensor1_2d @ tensor2_2d.T
mat_mul.shape

In [None]:
tensor1_2d @ tensor2_2d.T

### Let's make our first neural network layer!

Neural networks are full of matrix multiplications and dot products. A layer in a neural network is essentially a computational unit that transforms its input into an output

The `torch.nn.Linear()` module allows us to create a layer of our network called a `feed-forward layer`. It does so by performing matrix operations (called a linear transformation) on our input `x`

$$
y = x\cdot{A^T} + b
$$

Where:
* `x` is the input to the layer
* `A` is the weights matrix (notice the "`T`", that's because the weights matrix gets transposed).
* `b` is the bias term used to slightly offset the weights and inputs.
* `y` is the output (a manipulation of the input in the hopes to discover patterns in it).

This is a linear function, just like $y = mx+b$, but handles vectors (and even matrices) as inputs and outputs instead of just scalars

In [None]:
linear = torch.nn.Linear(in_features=4, # in_features = matches inner dimension of input
                         out_features=2) # out_features = describes outer value

X = tensor1_2d
print(X)
y = linear(X)
print(y)

This `linear` object takes our input `X`, does some sort of manipulation, and gives us an output. Let's breakdown how it does this.

When we make the object `linear`, PyTorch randomly generates weight and bias based on [Kaiming uniform initialization](https://docs.pytorch.org/docs/stable/nn.init.html#torch.nn.init.kaiming_uniform_).

`linear(x)` then performs the linear transformation we outlined above to `x`, resulting in our output.

* `x` = our input
* `A` = `linear.weight`
* `b` = `linear.bias`
* `y` = our output

Let's try manually replicating the tranformation produced by this forward-feeding neural network layer



In [None]:
A = linear.weight
print(A)
b = linear.bias
print(b)



In [None]:
y = X @ A.T + b
print(y)

### Finding the min, max, mean, sum, etc (aggregation)

Now we've seen a few ways to manipulate tensors, let's run through a few ways to aggregate them.

First we'll create a tensor and then find the max, min, mean and sum of it.





In [None]:
# Create a tensor
x = torch.randn(2,4,2)
x



In [None]:
print(f"Minimum: {x.min()}")
print(f"Maximum: {x.max()}")
print(f"Mean1: {x.mean()}")
print(f"Mean2: {x.type(torch.float32).mean()}") # won't work without float32 datatype
print(f"Sum: {x.sum()}")

> **Note:** You may find some methods such as `torch.mean()` require tensors to be in `torch.float32`, otherwise the operation will fail.

### Positional min/max

You can also find the index of a tensor where the max or minimum occurs with `torch.argmax()`and `torch.argmin()` respectively.

In [None]:
tensor = torch.randn(2, 4, 2)
print("Tensor:")
print(tensor)

max_index = tensor.argmax()
min_index = tensor.argmin()

# Convert to multi-dimensional indices


print(f"\nIndex where max value occurs (flat): {max_index}")
print(f"Index where min value occurs (flat): {min_index}")


In [None]:
max_coords = torch.unravel_index(max_index, tensor.shape)
min_coords = torch.unravel_index(min_index, tensor.shape)

print(f"Index where max value occurs (multi-dim): {max_coords}")
print(f"Index where min value occurs (multi-dim): {min_coords}")

### Reshaping, stacking, squeezing and unsqueezing

Often times you'll want to reshape or change the dimensions of your tensors without actually changing the values inside them.

To do so, some popular methods are:

| Method | One-line description |
| ----- | ----- |
| `torch.reshape(input, shape)` | Reshapes `input` to `shape` (if compatible), can also use `torch.Tensor.reshape()`. |
| `torch.stack(tensors, dim=0)` | Concatenates a sequence of `tensors` along a new dimension (`dim`), all `tensors` must be same size. |
| `torch.squeeze(input)` | Squeezes `input` to remove all the dimenions with value `1`. |
| **`torch.unsqueeze(input, dim)`**| Returns `input` with a dimension value of `1` added at `dim`.|
| `torch.permute(input, dims)`| Returns a *view* of the original `input` with its dimensions permuted (rearranged) to `dims`. |

Why do any of these?

Because deep learning models (neural networks) are all about manipulating tensors in some way. And because of the rules of matrix multiplication, if you've got shape mismatches, you'll run into errors. These methods help you make sure the right elements of your tensors are mixing with the right elements of other tensors.

In [None]:
# Create a tensor
x = torch.rand(1,5,1)
x, x.shape

If we wanted to stack our new tensor on top of itself five times, we could do so with `torch.stack()`.

In [None]:
# Stack tensors on top of each other
x_stacked = torch.stack([x, x, x, x])
x_stacked

How about removing all single dimensions from a tensor?

To do so you can use `torch.squeeze()` (I remember this as *squeezing* the tensor to only have dimensions over 1).

In [None]:
print(f"Previous tensor: {x}")
print(f"Previous shape: {x.shape}")

# Remove dimensions of size 1 from x_reshaped
x_squeezed = x.squeeze()
print(f"\nNew tensor: {x_squeezed}")
print(f"New shape: {x_squeezed.shape}")

And to do the reverse of `torch.squeeze()` you can use `torch.unsqueeze()` to add a dimension value of 1 at a specific index.

In [None]:
print(f"Previous tensor: {x_squeezed}")
print(f"Previous shape: {x_squeezed.shape}")

## Add an extra dimension with unsqueeze
x_unsqueezed = x_squeezed.unsqueeze(dim=1) # <-- what do you think dim=0 does?
print(f"\nNew tensor: {x_unsqueezed}")
print(f"New shape: {x_unsqueezed.shape}")

## Indexing

Indexing in PyTorch with tensors is very similar to what we've previously seen

In [None]:
import torch
x = torch.arange(1, 10).reshape(1, 3, 3)
x, x.shape

Indexing values goes outer dimension -> inner dimension (check out the square brackets).

In [None]:
# Let's index!
print(x[0])
print(x[0,0])
print(x[0,0,0])

PRACTICE!!

In [None]:
x[:, 0]
x[:, :, 1]
x[:, 1, 1]
x[0, 0, :]

## PyTorch tensors & NumPy

The two main methods you'll want to use for NumPy to PyTorch (and back again) are:
* `torch.from_numpy(ndarray)` - NumPy array -> PyTorch tensor.
* `Tensor.numpy()` - PyTorch tensor -> NumPy array.

Let's try them out.

In [None]:
# NumPy array to tensor
import torch
import numpy as np
array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array)
print(f'Numpy Arrary: {array}\n Tensor: {tensor}')

> **Note:** By default, NumPy arrays are created with the datatype `float64` and if you convert it to a PyTorch tensor, it'll keep the same datatype (as above).
>
> However, many PyTorch calculations default to using `float32`.
>
> So we'll want to get in the habit of converting the file type when going from a nparray to PyTorch
> If one tensor is in `torch.float64` and another is in `torch.float32`, you might run into some errors.
>
>You can change the datatypes of tensors using `Tensor.type(dtype=None)` where the `dtype` parameter is the datatype you'd like to use.

In [None]:
tensor = tensor.type(torch.float32)
tensor

This is what it looks like all together

In [None]:
# NumPy array to tensor
import torch
import numpy as np
array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array).type(torch.float32)
print(f'Numpy Array: {array}')
print(f'Array dtype: {array.dtype}')
print(f'\nTensor: {tensor}')
print(f'Tensor dtype: {tensor.dtype}')

If you want to go from PyTorch tensor to NumPy array, you can call `tensor.numpy()`.

In [None]:
# Tensor to NumPy array
tensor = torch.ones(7)
numpy_tensor = tensor.numpy()
tensor, numpy_tensor

In [None]:
numpy_tensor.astype(np.float64)

## Running tensors on GPUs (and making faster computations)

Deep learning algorithms require a lot of numerical operations.

And by default these operations are often done on a CPU (computer processing unit).

However, there's another common piece of hardware called a GPU (graphics processing unit), which is often much faster at performing the specific types of operations neural networks need (matrix multiplications) than CPUs.

Your computer might have one.

If so, you should look to use it whenever you can to train neural networks because chances are it'll speed up the training time dramatically.




### 1. Getting a GPU

There are a few ways to access a GPU

| **Method** | **Difficulty to setup** | **Pros** | **Cons** | **How to setup** |
| ----- | ----- | ----- | ----- | ----- |
| Google Colab | Easy | Free to use, almost zero setup required, can share work with others as easy as a link | Doesn't save your data outputs, limited compute, subject to timeouts | Follow the Google Colab Guide|
| Use your own | Medium | Run everything locally on your own machine | GPUs aren't free, require upfront cost | Follow the PyTorch installation guidelines |
| Cloud computing (AWS, GCP, Azure) | Medium-Hard | Small upfront cost, access to almost infinite compute | Can get expensive if running continually, takes some time to setup right | Follow the PyTorch installation guidelines |

To check if you've got access to a Nvidia GPU, you can run `!nvidia-smi` where the `!` (also called bang) means "run this on the command line".



In [None]:
!nvidia-smi



### 2. Getting PyTorch to run on the GPU

Once you've got a GPU ready to access, the next step is getting PyTorch to use for storing data (tensors) and computing on data (performing operations on tensors).


In [None]:
import torch

# Is a GPU available in our environment?
print(torch.cuda.is_available())

If the above outputs `True`, PyTorch can see and use the GPU, if it outputs `False`, it can't see the GPU.

In Google Colab, we can change this by selecting `Change runtime type` in the upper right corner

In [None]:
!nvidia-smi

Now, let's say you wanted to setup your code so it ran on CPU *or* the GPU if it was available.

That way, if you or someone decides to run your code, it'll work regardless of the computing device they're using.

Let's create a `device` variable to store what kind of device is available.

In [None]:
# Set device type
device = "cuda" if torch.cuda.is_available() else "cpu"
device

If the above output `"cuda"` it means we can set all of our PyTorch code to use the available CUDA device (a GPU) and if it output `"cpu"`, our PyTorch code will stick with the CPU.

> **Note:** In PyTorch, it's best practice to write **device agnostic code**. This means code that'll run on CPU (always available) or GPU (if available).

If you want to do faster computing you can use a GPU but if you want to do *much* faster computing, you can use multiple GPUs.

You can count the number of GPUs PyTorch has access to using `torch.cuda.device_count()`.

In [None]:
# Count number of devices
torch.cuda.device_count()

Knowing the number of GPUs PyTorch has access to is helpful incase you wanted to run a specific process on one GPU and another process on another (PyTorch also has features to let you run a process across *all* GPUs).

### 3. Putting tensors (and models) on the GPU

You can put tensors (and models) on a specific device by calling `to(device)` on them. Where `device` is the target device you'd like the tensor (or model) to go to.

Why do this?

GPUs offer far faster numerical computing than CPUs do and if a GPU isn't available, because of our **device agnostic code** (see above), it'll run on the CPU.

> **Note:** Putting a tensor on GPU using `to(device)` (e.g. `some_tensor.to(device)`) returns a copy of that tensor, e.g. the same tensor will be on CPU and GPU. To overwrite tensors, reassign them:
>
> `some_tensor = some_tensor.to(device)`

Let's try creating a tensor and putting it on the GPU (if it's available).

In [None]:
# Create tensor (default on CPU)
tensor = torch.tensor([1, 2, 3])

# Tensor not on GPU
print(tensor, tensor.device)

# Move tensor to GPU (if available)
tensor = tensor.to(device)
print(tensor, tensor.device)


Notice the second tensor has `device='cuda:0'`, this means it's stored on the 0th GPU available (GPUs are 0 indexed, if two GPUs were available, they'd be `'cuda:0'` and `'cuda:1'` respectively, up to `'cuda:n'`).



### 4. Moving tensors back to the CPU

What if we wanted to move the tensor back to CPU?

For example, NumPy operations are only supported on CPU

In [None]:
# Copy the tensor back to cpu
tensor_back_on_cpu = tensor.cpu().numpy()
tensor_back_on_cpu