<a href="https://colab.research.google.com/github/FernandoJungLau/PyTorch-Study/blob/main/PyTorch_Study.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Created while following PyTorch documentation and Daniel Bourke's PyTorch course.
<br>
https://docs.pytorch.org/docs/stable/index.html<br>
https://youtu.be/Z_ikDlimN6A?si=64iIEmE2UiIbqjKw

# PyTorch Docs - Quickstart

## Setup

In [None]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

In [None]:
# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

100%|██████████| 26.4M/26.4M [00:02<00:00, 12.5MB/s]
100%|██████████| 29.5k/29.5k [00:00<00:00, 200kB/s]
100%|██████████| 4.42M/4.42M [00:01<00:00, 3.80MB/s]
100%|██████████| 5.15k/5.15k [00:00<00:00, 7.23MB/s]


We pass the `Dataset` as an argument to `DataLoader`. This wraps an iterable over our dataset, and supports automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, i.e. each element in the dataloader iterable will return a batch of 64 features and labels.

In [None]:
batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64


## Creating Models

To define a neural network in PyTorch, we create a class that inherits from **nn.Module**. We define the layers of the network in the `__init__` function and specify how data will pass through the network in the `forward` function. To accelerate operations in the neural network, we move it to the accelerator such as CUDA, MPS, MTIA, or XPU. If the current **accelerator** is available, we will use it. Otherwise, we use the CPU.

In [None]:
device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"
print(f"Using {device} device")

# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)

Using cpu device
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


## Optimizing the Model Parameters

In [None]:
# Loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)


Training loop with predictions in the training dataset and backpropagation to adjust the model’s parameters.

In [None]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

Checking the model’s performance against the test dataset to ensure it is learning.

In [None]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

The training process is conducted over several iterations (epochs). During each epoch, the model learns parameters to make better predictions. We print the model’s `accuracy` and `loss` at each epoch; we’d like to see the accuracy increase and the loss decrease with every epoch.

In [None]:
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.303888  [   64/60000]
loss: 2.286021  [ 6464/60000]
loss: 2.266347  [12864/60000]
loss: 2.265998  [19264/60000]
loss: 2.241197  [25664/60000]
loss: 2.210743  [32064/60000]
loss: 2.223500  [38464/60000]
loss: 2.180671  [44864/60000]
loss: 2.171514  [51264/60000]
loss: 2.146654  [57664/60000]
Test Error: 
 Accuracy: 44.7%, Avg loss: 2.139349 

Epoch 2
-------------------------------
loss: 2.146621  [   64/60000]
loss: 2.132837  [ 6464/60000]
loss: 2.071890  [12864/60000]
loss: 2.096262  [19264/60000]
loss: 2.019951  [25664/60000]
loss: 1.965499  [32064/60000]
loss: 2.001622  [38464/60000]
loss: 1.909685  [44864/60000]
loss: 1.914742  [51264/60000]
loss: 1.838361  [57664/60000]
Test Error: 
 Accuracy: 57.1%, Avg loss: 1.840133 

Epoch 3
-------------------------------
loss: 1.880162  [   64/60000]
loss: 1.841166  [ 6464/60000]
loss: 1.722552  [12864/60000]
loss: 1.770128  [19264/60000]
loss: 1.638416  [25664/60000]
loss: 1.604070  [32064/600

## Saving Models

In [None]:
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth


## Loading Models

In [None]:
model = NeuralNetwork().to(device)
model.load_state_dict(torch.load("model.pth", weights_only=True))

<All keys matched successfully>

Making predictions...

In [None]:
classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    x = x.to(device)
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"


# PyTorch Fundamentals

In [2]:
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
print(torch.__version__)

2.8.0+cu126


## Tensors

Creating tensors

In [None]:
scalar = torch.tensor(7)
scalar

tensor(7)

In [None]:
scalar.ndim

0

In [None]:
# Tensor as Python int
scalar.item()

7

In [None]:
# Vector
vector = torch.tensor([7, 7])
vector

tensor([7, 7])

In [None]:
vector.ndim

1

In [None]:
vector.shape

torch.Size([2])

In [None]:
# MATRIX
MATRIX = torch.tensor([[7, 8],
                       [9, 10]])
MATRIX

tensor([[ 7,  8],
        [ 9, 10]])

In [None]:
MATRIX.ndim

2

In [None]:
MATRIX[1]

tensor([ 9, 10])

In [None]:
MATRIX.shape

torch.Size([2, 2])

In [None]:
# Tensor
TENSOR = torch.tensor([[[1, 2, 3],
                        [3, 6, 9],
                        [2, 4, 5]]])
TENSOR

tensor([[[1, 2, 3],
         [3, 6, 9],
         [2, 4, 5]]])

In [None]:
TENSOR.ndim

3

In [None]:
TENSOR.shape

torch.Size([1, 3, 3])

In [None]:
TENSOR[0]

tensor([[1, 2, 3],
        [3, 6, 9],
        [2, 4, 5]])

## Random tensors

Neural networks start with tensors full of random numbers and then adjust those random numbers to better represent the data.

In [8]:
# Random tensor of size (3, 4)
random_tensor = torch.rand(3, 4)
random_tensor

tensor([[0.9660, 0.7701, 0.1496, 0.9633],
        [0.3812, 0.3887, 0.4191, 0.3924],
        [0.7652, 0.1830, 0.9913, 0.5161]])

In [None]:
random_tensor.ndim

2

In [None]:
# Random tensor with similar shape to an image tensor
# 3 is the amount of color channels
# Color channels can come both at the start or the end
random_image_size_tensor = torch.rand(size=(224 ,224 ,3 ))
random_image_size_tensor.shape, random_image_size_tensor.ndim

(torch.Size([224, 224, 3]), 3)

## Zeros and ones

In [6]:
# Create a tensor of all zeros
zeros = torch.zeros((3, 4))
zeros

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [9]:
# Multiply your targe tensor by the zeros tensor to make the model ignore the values
zeros*random_tensor

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [10]:
# Tensor of all ones
ones = torch.ones((3, 4))
ones

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

In [11]:
ones.dtype

torch.float32

In [12]:
random_tensor.dtype

torch.float32

## Range of tensors and tensors-like

In [15]:
# Use torch.arange() instead of torch.range() -> this one is deprecated
# Default step is 1
torch.arange(start=0, end=1000, step=77)

tensor([  0,  77, 154, 231, 308, 385, 462, 539, 616, 693, 770, 847, 924])

In [16]:
one_to_ten = torch.arange(start=1, end=11, step=1)
one_to_ten

tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [18]:
# Creating tensors like
ten_zeros = torch.zeros_like(input=one_to_ten)
ten_zeros

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

## Tensor datatypes

In [22]:
# Float 32 tensor
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=None, # What datatype is the tensor (ex.: torch.float16, torch.float32)
                               device=None, # What device is your tensor on (ex.: cuda)
                               requires_grad=False) # Whether or not to track gradients with this tensors operations
float_32_tensor

tensor([3., 6., 9.])

In [26]:
# Converting float 32 to float 16
float_16_tensor = float_32_tensor.type(torch.float16)
float_16_tensor

tensor([3., 6., 9.], dtype=torch.float16)

In [27]:
float_16_tensor * float_32_tensor

tensor([ 9., 36., 81.])

In [30]:
int_32_tensor = torch.tensor([3, 6, 9], dtype=torch.int32)
int_32_tensor

tensor([3, 6, 9], dtype=torch.int32)

In [31]:
float_32_tensor * int_32_tensor

tensor([ 9., 36., 81.])

## Getting information from tensors

How to get each tensor attribute:
- `tensor.dtype`: to get datatype from a tensor
- `tensor.shape`: to get shape from a tensor
- `tensor.device`: to get device from a tensor

In [37]:
some_tensor = torch.rand(3, 4)
some_tensor

tensor([[0.6167, 0.1128, 0.7814, 0.0429],
        [0.8781, 0.0469, 0.4747, 0.0138],
        [0.3139, 0.3137, 0.2349, 0.9759]])

In [40]:
print(some_tensor)
print(f"Datatype of tensor: {some_tensor.dtype}")
print(f"Shape of tensor: {some_tensor.shape}")
print(f"Device tensor is on: {some_tensor.device}")

tensor([[0.6167, 0.1128, 0.7814, 0.0429],
        [0.8781, 0.0469, 0.4747, 0.0138],
        [0.3139, 0.3137, 0.2349, 0.9759]])
Datatype of tensor: torch.float32
Shape of tensor: torch.Size([3, 4])
Device tensor is on: cpu


## Manipulating tensors (tensor operations)

- Addiction
- Subtraction
- Multiplication (element-wise)
- Division
- Matrix multiplication

In [42]:
# Addiction
tensor = torch.tensor([1, 2, 3])
tensor + 100

tensor([101, 102, 103])

In [43]:
# Multiplication
tensor * 10

tensor([10, 20, 30])

In [44]:
# Subtraction
tensor - 10

tensor([-9, -8, -7])

In [46]:
# PyTorch in-built functions
torch.mul(tensor, 10)

tensor([10, 20, 30])

In [47]:
torch.add(tensor, 10)

tensor([11, 12, 13])

## Matrix multiplication

In [48]:
# Element wise
print(tensor, "*", tensor)
print(f"Equals: {tensor * tensor}")

tensor([1, 2, 3]) * tensor([1, 2, 3])
Equals: tensor([1, 4, 9])


In [49]:
# Matrix multiplication (dot product)
# It's faster than manually multiplying
torch.matmul(tensor, tensor)

tensor(14)

In [80]:
%%time
value = 0
for i in range(len(tensor)):
  value += tensor[i] * tensor[i]
print(value)

tensor(14)
CPU times: user 1.52 ms, sys: 0 ns, total: 1.52 ms
Wall time: 1.98 ms


In [90]:
%%time
torch.matmul(tensor, tensor)

CPU times: user 278 µs, sys: 9 µs, total: 287 µs
Wall time: 230 µs


tensor(14)

## Shape errors

1.  The inner dimensions must match:
* `(3, 2) @ (3, 2)` won't work
* `(2, 3) @ (3, 2)` will work
* `(3, 2) @ (2, 3)` will work
 __ __
**OBS:** The resulting matrix has the shape of the outer dimensions !
* `(2, 3) @ (3, 2)` -> `(2, 2)`
* `(3, 2) @ (2, 3)` -> `(3, 3)`

In [94]:
# Won't work
torch.matmul(torch.rand(3, 2), torch.rand(3, 2))

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)

In [95]:
# Will work
torch.matmul(torch.rand(2, 3), torch.rand(3, 2))

tensor([[1.1536, 0.9779],
        [1.4037, 0.8051]])

In [97]:
# Results in a 3x3 matrix
torch.matmul(torch.rand(3, 2), torch.rand(2, 3))

tensor([[0.5401, 0.5637, 0.7234],
        [0.7801, 0.5598, 1.0120],
        [0.5325, 0.3659, 0.6887]])

In [98]:
# Shapes for matrix multiplication
tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]])

tensor_B = torch.tensor([[7, 10],
                         [8, 11],
                         [9, 12]])
torch.mm(tensor_A, tensor_B) # alias for matmul

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)

In [99]:
tensor_A.shape, tensor_B.shape

(torch.Size([3, 2]), torch.Size([3, 2]))

We can fix this issue by manipulating the shape of one of our tensors using a **transporse**

In [105]:
tensor_B.T, tensor_B.T.shape

(tensor([[ 7,  8,  9],
         [10, 11, 12]]),
 torch.Size([2, 3]))

In [106]:
# Now it will work
torch.matmul(tensor_A, tensor_B.T)

tensor([[ 27,  30,  33],
        [ 61,  68,  75],
        [ 95, 106, 117]])

## Finding the min, mean, sum, etc (tensor aggregation)

In [119]:
x = torch.arange(1, 100, 10)
x

tensor([ 1, 11, 21, 31, 41, 51, 61, 71, 81, 91])

In [120]:
torch.max(x), x.max()

(tensor(91), tensor(91))

In [121]:
torch.mean(x.type(torch.float32))

tensor(46.)

In [122]:
torch.sum(x)

tensor(460)

## Positional min and max

In [123]:
x.argmin()

tensor(0)

In [124]:
x[0]

tensor(1)

In [125]:
x.argmax()

tensor(9)

In [126]:
x[9]

tensor(91)

## Reshaping, stacking, squeezing and unsqueezing tensors

* Reshaping - reshapes an input tensor to a defined shape
* View - return a view of an input tensor of certain shape but keep the same memory as the original tensor
* Stacking - combine multiple tensors on top of each other (vstack) or side by side (hstack)
* Squeeze - remove all `1` dimensions from a tensor
* Unsqueeze - add `1` dimension to a target tensor
* Permute - return a view of the input with dimensions permuted (swapped) in a certain way

In [134]:
x = torch.arange(1., 11.)
x, x.shape

(tensor([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]), torch.Size([10]))

In [151]:
# Add an extra dimension
x_reshaped = x.reshape(10, 1)
x_reshaped, x_reshaped.shape

(tensor([[ 5.],
         [ 2.],
         [ 3.],
         [ 4.],
         [ 5.],
         [ 6.],
         [ 7.],
         [ 8.],
         [ 9.],
         [10.]]),
 torch.Size([10, 1]))

In [152]:
# Changing the view
z = x.view(1, 10)
z, z.shape

(tensor([[ 5.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]]),
 torch.Size([1, 10]))

Changing z changes x, because a view of a tensor shares the same memory as the original

In [153]:
z[:, 0] = 5
z, x

(tensor([[ 5.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]]),
 tensor([ 5.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]))

In [154]:
# Stack tensors on top of each other
x_stacked = torch.stack([x, x, x, x], dim=0)
x_stacked

tensor([[ 5.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.],
        [ 5.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.],
        [ 5.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.],
        [ 5.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]])

In [155]:
x_reshaped.squeeze()

tensor([ 5.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

In [158]:
# Shape before squeezing
x_reshaped.shape

torch.Size([10, 1])

In [159]:
# Shape after squeezing
x_reshaped.squeeze().shape

torch.Size([10])

In [163]:
# Unsqueeze
x_squeezed = x_reshaped.squeeze()

x_unsqueezed = x_squeezed.unsqueeze(dim=0)
x_unsqueezed, x_unsqueezed.shape

(tensor([[ 5.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]]),
 torch.Size([1, 10]))

In [167]:
# Permute
x_original = torch.rand(size=(224, 224, 3)) # image of height: 224, width: 224 and 3 color channels

# Permute the original tensor to rearrange the axis (or dim) order
x_permuted = x_original.permute(2, 0, 1) # shifts axis 0 to 1, 1 to 2 and 2 to 0

print(f"Previous shape: {x_original.shape}")
print(f"New shape: {x_permuted.shape}")

Previous shape: torch.Size([224, 224, 3])
New shape: torch.Size([3, 224, 224])


## Indexing

In [171]:
x = torch.arange(1, 10).reshape(1, 3, 3)
x

tensor([[[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]])

In [172]:
x[0]

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

In [173]:
x[0][0]

tensor([1, 2, 3])

In [174]:
x[0][0][0]

tensor(1)

In [175]:
# ':' can select all of a target dimension
x[:, 0]

tensor([[1, 2, 3]])

In [178]:
x[0, 0, :]

tensor([1, 2, 3])

In [181]:
x[:, 1, 1]

tensor([5])

In [183]:
x[0, :, 0]

tensor([1, 4, 7])

## PyTorch tensors and NumPy

PyTorch has functionality to interact with Numpy.  
NumPy is a Python numerical computing library.

In [188]:
# NumPy array to tensor
# NumPy's default datatype is float64
array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array).type(torch.float32) # You can change the type to float32
array, tensor, tensor.dtype

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([1., 2., 3., 4., 5., 6., 7.]),
 torch.float32)

In [190]:
# The array has a different space in memory that of the tensor
array = array + 1
array, tensor

(array([3., 4., 5., 6., 7., 8., 9.]), tensor([1., 2., 3., 4., 5., 6., 7.]))

In [192]:
# Tensor to array
# Just like with the last example, the array reflects the datatype of the tensor
tensor = torch.ones(7)
numpy_tensor = tensor.numpy()
tensor, numpy_tensor

(tensor([1., 1., 1., 1., 1., 1., 1.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

In [193]:
tensor = tensor + 1
tensor, numpy_tensor

(tensor([2., 2., 2., 2., 2., 2., 2.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

## Reproducibility

Trying to take random out of random. <br>  
Neural networks start with random numbers, then execute tensor operations, followed by the stage of updating random numbers to try and make them better representations of the data. This process repeats continually.



To reduce the randomness in neural networks, there is a concept in PyTorch called **random seed**.

In [214]:
random_tensor_A = torch.rand(3, 4)
random_tensor_B = torch.rand(3, 4)

print(random_tensor_A)
print(random_tensor_B)
print(random_tensor_A == random_tensor_B) # Compare every values of each tensor

tensor([[0.0093, 0.0559, 0.9600, 0.1316],
        [0.0959, 0.4369, 0.7624, 0.2867],
        [0.7448, 0.2385, 0.8769, 0.8824]])
tensor([[0.1128, 0.8313, 0.3947, 0.2769],
        [0.4555, 0.1754, 0.4359, 0.6174],
        [0.7879, 0.5627, 0.1923, 0.2928]])
tensor([[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]])


In [220]:
# Creating random reproducible tensors

# Setting random seed
RANDOM_SEED = 42
torch.manual_seed(RANDOM_SEED)
random_tensor_C = torch.rand(3, 4)
torch.manual_seed(RANDOM_SEED)
random_tensor_D = torch.rand(3, 4)

print(random_tensor_C)
print(random_tensor_D)
print(random_tensor_C == random_tensor_D)

tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])
tensor([[True, True, True, True],
        [True, True, True, True],
        [True, True, True, True]])
