<a href="https://colab.research.google.com/github/christophergaughan/PyTorch/blob/main/ComputerVision_PyTorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Computer Vision- Using PYTorch

**Basis**

pixels are read as RGB colors and turned into --> numbers (tensors) or `numerical encoding` --> model (algorithm) --> output probability that the image is X ot Y or Z

**Details**
 Tensors contain the following information:
 1. Width of image
 2. Height of image
 3. Color channels == 3 (RGB)
 depending on what algorithm you're working with data as tensors whose ID is as follows:

 [batch_size, height, width, color_channels] OR [batch_size, color_channels, height, width]

 These will be mainly CNN models

 We will be working with `torch.nn.Conv2d`

 ## Computer version libraries in PyTorch

* `torchvision`- base domain library for PyTorch computer vision-
  https://pytorch.org/vision/stable/index.html
* `torchvision.datassets`get datasets and loading functions here:
  https://pytorch.org/vision/stable/datasets.html#built-in-datasets
* `torchvision.models` get pre-trained computer vision models i.e. have pretrained weights, etc. that you can leverage for your own problems.
* `torchvision.transforms`- functions for manipulating your vision data (images) to be suitable for use with an ML model.
* `torch.utils.Dataset`- Base dataset class for PyTorch.
* `torch.utils.data.DataLoader` - Creates a Python iterable over a dataset

Torchvision supports common computer vision transformations in the torchvision.transforms and torchvision.transforms.v2 modules. Transforms can be used to transform or augment data for training or inference of different tasks (image classification, detection, segmentation, video classification).

* PIL is the Python Imaging Library by Fredrik Lundh and contributors.

### torchvision.datasets

All datasets are subclasses of torch.utils.data.Dataset i.e, they have __getitem__ and __len__ methods implemented. Hence, they can all be passed to a torch.utils.data.DataLoader which can load multiple samples parallelly using torch.multiprocessing workers. For example:
```
imagenet_data = torchvision.datasets.ImageNet('path/to/imagenet_root/')
data_loader = torch.utils.data.DataLoader(imagenet_data,
                                          batch_size=4,
                                          shuffle=True,
                                          num_workers=args.nThreads)
```

In [None]:
import torch
import torchvision
from torchvision import datasets
from torchvision import transforms
from torchvision.transforms import ToTensor
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
import numpy as np
import matplotlib.pyplot as plt

print(torch.__version__)
print(torchvision.__version__)

## Getting a dataset

we will be using `fashion.mnist` datset- greyscale images of clothing
basic dataset for implementation here

Be aware that IMAGENET  is the gold standard for computer vision evaluations

`torchvision.datasets.FashionMNIST(root: str, train: bool = True, transform: Union[Callable, NoneType] = None, target_transform: Union[Callable, NoneType] = None, download: bool = False) â†’ None[source]`

### Fashion-MNIST Dataset.

Parameters:
* **root (string)** â€“ Root directory of dataset where FashionMNIST/processed/training.pt and FashionMNIST/processed/test.pt exist.
* **train (bool, optional)** â€“ If True, creates dataset from training.pt, otherwise from test.pt.
* **download (bool, optional)** â€“ If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) â€“ A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
* **target_transform (callable, optional)** â€“ A function/transform that takes in the target and transforms it.

In [None]:
# Setup Training data
train_data = datasets.FashionMNIST(
    root="data", # where to download data to
    train=True, # do we want the training dataset?
    download=True, # do we want to download?
    transform=torchvision.transforms.ToTensor(), # how to transform the data
    target_transform=None # how do we want to transform the labels/target
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=torchvision.transforms.ToTensor(),
    target_transform=None
)



In [None]:
len(train_data), len(test_data)

In [None]:
# See the first training data- this will output the data as tensors (C x H x W) NOTE: grey scale images only have 1 color channel
image, label = train_data[0]
image, label

In [None]:
class_names = train_data.classes
class_names

In [None]:
class_to_idx = train_data.class_to_idx
class_to_idx

In [None]:
train_data.targets

In [None]:
# Check shape of our image
print(f"Image Shape: {image.shape} --> [color_channels, height, width], Image Label: {class_names[label]}")

## Visualizing our data

In [None]:
image, label = train_data[0]
print(f"Image Shape: {image.shape}")
plt.imshow(image.squeeze(), cmap="gray") # had to remove a dimension so it would plot
plt.title(class_names[label])
plt.axis("off")
plt.imshow(image.squeeze())
# image

In [None]:
# Plot more images
torch.manual_seed(42)
fig = plt.figure(figsize=(9, 9))
row, cols = 4, 4
for i in range(1, row * cols + 1):
    random_idx = torch.randint(0, len(train_data), size=[1]).item()
    img, label = train_data[random_idx]
    fig.add_subplot(row, cols, i)
    plt.imshow(img.squeeze(), cmap="gray")
    plt.title(class_names[label])
    plt.axis(False)

## Check Input/Output shapes of Data

In [None]:
print(f"Image Shape: {image.shape}")
print(f"Image Label: {class_names[label]}")

Visualizing data

In [None]:
image, label = train_data[0]
print(f"Image Shape: {image.shape}")
plt.imshow(image.squeeze(), cmap="plasma") # had to remove a dimension so it would plot b/c shape issue (1, 28, 28) and output data is not correlating with image size it is looking for, in this case it expects color channels to be last the squeze gets rid of the 1 in [1, 28, 28]
plt.title(class_names[label])
plt.axis("off")

In [None]:
from matplotlib import colormaps
list(colormaps)

In [None]:
# Plot more images
torch.manual_seed(42)
fig = plt.figure(figsize=(9, 9))
row, cols = 4, 4
for i in range(1, row * cols + 1):
    random_idx = torch.randint(0, len(train_data), size=[1]).item()
    img, label = train_data[random_idx]
    fig.add_subplot(row, cols, i)
    plt.imshow(img.squeeze(), cmap="gray")
    plt.title(class_names[label])
    plt.axis(False);

Can these items of clothing (images) could be modelled with linear lines only? Or is it the case we will have to introduce some non-linearity? Just a thought.

In [None]:
train_data, test_data

## Prepare DataLoader

Right now, our data is in the form of PyTorch Datasets.

DataLoader turns our dataset into Python iterable.

More specifically, we want to turn our data into batches (or mini-batches)

Q) Why do we do this?
A) The data takes up memory, and we have 60,000 training mages and 10,000 testing images. To alleviate this memeory load, we break the data up into batches. More Specifically:

1. It is more computationally efficient, as in, your computing hardware may not be able to look at (store in memory) 60000 images at once. Thus we brak these images up into batches of 32 (batch_size=32). This is a very common batch size.
2. It gives our neural network more chances to update it's gradients per epoch. See video by Andrew ng: https://www.youtube.com/watch?v=4qJaSmvhxi8 for more info about this.
3. One parameter in the DataLoader is `shuffle`. We want to be able to shuffle the data incase there is some pre-determined order to our data and this helps randomize the images the training loop sees without that order grafted onto our model, thus producing a poor model. We don't want our model to 'memorize' the data.


In [None]:
# Batchify our dataset
from torch.utils.data import DataLoader
BATCH_SIZE = 32
# Turn our datasets into iterables (batches)
train_dataloader = DataLoader(dataset=train_data,
                              batch_size=BATCH_SIZE,
                              shuffle=True)

test_dataloader = DataLoader(dataset=test_data,
                             batch_size=BATCH_SIZE,
                             shuffle=False) # we don't shuffle the test dataset

train_dataloader, test_dataloader

In [None]:
# Let's check out what we've created
print(f"Length of train dataloader: {len(train_dataloader)} batches of {BATCH_SIZE}")
print(f"Length of test dataloader: {len(test_dataloader)} batches of {BATCH_SIZE}")

## Check out what is inside the training dataloader

In [None]:
train_features_batch, train_labels_batch = next(iter(train_dataloader))
train_features_batch.shape, train_labels_batch.shape

Note above, the color channels are first

In [None]:
torch.manual_seed(42)
random_idx = torch.randint(0, len(train_features_batch), size=[1]).item()
image, label = train_features_batch[random_idx], train_labels_batch[random_idx]
plt.imshow(img.squeeze(), cmap="gray")
plt.title(class_names[label])
plt.axis(False)
print(f"Image Shape: {image.shape}")
print(f"Label: {label}, label_size: {label.shape}")

## Model 0: Build a baseline model

When starting to build a series of machine learning modelling experiments, it's best practice to start with a *baseline model*

A baseline model in a model you will try to improve upon with subsequent models/expt's

AKA: start simply and add/ experiment with complexity when necessary ðŸ§ª

In [None]:
# Create a flattened layer
flatten_model = torch.nn.Flatten()

# Get a single sample
x = train_features_batch[0]
x.shape
# Flatten the sample
output = flatten_model(x)

# Print out what happened
print(f"Shape before flattening: {x.shape} -> [color_channels, height, width]")
print(f"Shape after flattening: {output.shape} -> [color_chanells, height*width]")

we can see the batch size and the product of 78x78

In [None]:
import torch
from torch import nn

torch.manual_seed(42)

class FashionMNISTModelV0(nn.Module):  # Inherit from nn.Module
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
        super().__init__()
        self.layer_stack = nn.Sequential(  # Correct the attribute name
            nn.Flatten(),
            nn.Linear(in_features=input_shape, out_features=hidden_units),
            nn.Linear(in_features=hidden_units, out_features=output_shape)
        )

    def forward(self, x):
        return self.layer_stack(x)  # Use the correct attribute name


In [None]:
model_0 = FashionMNISTModelV0(
    input_shape=784,
    hidden_units=10,
    output_shape=len(class_names)
).to("cpu")  # Move model to CPU
print(model_0)

In [None]:
dummy_x = torch.rand([1, 1, 28, 28])
model_0(dummy_x)

In [None]:
model_0.state_dict()

## Setup loss, optimizer and evaluation metrics

* Loss function- since we're working with multi-class data, our loss function will be `nn.CrossEntropyLoss()`
* Optimizer - our optimizer `torch.optim.SGD()`
* Evaluation Metric- since this is a classification problem, we'll use Accuracy


In [None]:
import requests
from pathlib import Path

# Download helper function for accuracy from learn PyTorch.repo
if Path("helper_functions.py").is_file():
  print("helper_functions.py already exists, skipping download....")
else:
  print("Downloading helper_functions.py")
  request = requests.get("https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/helper_functions.py")
  with open("helper_functions.py", "wb") as f:
    f.write(request.content)

In [None]:
# Import accuracy metric
from helper_functions import accuracy_fn

In [None]:
accuracy_fn(torch.tensor([[0.2, 0.5, 0.3]]), torch.tensor([2]))

In [None]:
# Setup loss and optimizer functions
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model_0.parameters(), lr=0.1)