<a href="https://colab.research.google.com/github/ArthurCBx/PyTorch-DeepLearning-Udemy/blob/main/06_transfer_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 06. PyTorch Transfer Learning

What is transfer learning?

Transfer learning involver taking the parameters of what one model has learned on another dataset and applying to our own problem.

* Pretrained model = foundation models

Where to find pre-trained models:
* PyTorch documentation
* Torch iamge Models (timm library)
* HuggingFace Hub
* Paperswithcode SOTA(State of the arte)  

In [None]:
import torch
import torchvision
print(torch.__version__)
print(torchvision.__version__)

Now we've got the versions of torch and torchvision, we're after, let's import the code we've written in previous sections so that we don't have to write it all again.

In [None]:
# Continue with regular imports
import matplotlib.pyplot as plt
import torch
import torchvision

from torch import nn
from torchvision import transforms

# Try to get torchinfo, install it if it doesn't work
try:
    from torchinfo import summary
except:
    print("[INFO] Couldn't find torchinfo... installing it.")
    !pip install -q torchinfo
    from torchinfo import summary

# Try to import the going_modular directory, download it from GitHub if it doesn't work
try:
    from going_modular.going_modular import data_setup, engine
except:
    # Get the going_modular scripts
    print("[INFO] Couldn't find going_modular scripts... downloading them from GitHub.")
    !git clone https://github.com/mrdbourke/pytorch-deep-learning
    !mv pytorch-deep-learning/going_modular .
    !rm -rf pytorch-deep-learning
    from going_modular.going_modular import data_setup, engine

In [None]:
# Setup device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"

## 1. Get data

We need our pizza, steak, sushi data to build a transfer learning model on.

In [None]:
!wget https://github.com/mrdbourke/pytorch-deep-learning/raw/refs/heads/main/data/pizza_steak_sushi.zip

In [None]:
import os
import zipfile

from pathlib import Path
import requests

data_path = Path("data/")
image_path = data_path / "pizza_steak_sushi"

if image_path.is_dir():
  print(f"{image_path} directory exists, skipping re-download.")
else:
  print(f"Did not find {image_path} directory, downloading it...")
  image_path.mkdir(parents=True, exist_ok=True)

  with open(data_path / "pizza_steak_sushi.zip", "wb") as f:
    request = requests.get("https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip")
    print("Downloading, pizza, steak, sushi data...")
    f.write(request.content)

  with zipfile.ZipFile(data_path / "pizza_steak_sushi.zip", "r") as zip_ref:
    print("Unzipping pizza, steak, sushi data...")
    zip_ref.extractall(image_path)

  os.remove(data_path / "pizza_steak_sushi.zip")


In [None]:
train_dir = image_path / "train"
test_dir = image_path / "test"

train_dir, test_dir

## 2. Create Datasets and DataLoaders

Now we've got some data, want to turn it into PyTorch DataLoaders.

To do so, we can use `data_setup.py` and the `create_dataloaders()` function we made in 05.

There's one thing we have to think about when loading: how to **transform** it?

And with `torchvision` 0.13+ there's two ways to do this:

1. Manually created transforms - you define what transforms you want your data to go through.
2. Automatically created transforms - the transforms for your data are defined by the model you'd like to use.

Important point: when using a pretrained model, it's important that the data (including your custom data) that you pass through it is **transformed** in the same way that the data the model was trained on.

### 2.1 Creating a transform for `torchvision.models` (manual creation)

`torchvision.models` contains pretrained models (models ready for transfer learning) right within `torchvision`.

> All pre-trained models expect input images normalized in the same way i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0,1] and then normalized with the models parameters.

In [None]:
from torchvision import transforms
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

manual_transforms = transforms.Compose([
    transforms.Resize((224,224)),
    transforms.ToTensor(),
    normalize
])

In [None]:
from going_modular.going_modular import data_setup

train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(
    train_dir=train_dir,
    test_dir=test_dir,
    transform=manual_transforms,
    batch_size=32
    )

train_dataloader,test_dataloader, class_names

### 2.2 Creating a transform for `torchvision.models` (auto creation)

As of `torchvision` v0.13+ there is now support for automatic data transform creation based on the pretrained model weights you're using.

In [None]:
# Get a set of pretrained model weights

weights = torchvision.models.EfficientNet_V2_S_Weights.DEFAULT # "DEFAULT" = best available weights
weights

In [None]:
# Get the transforms used to create our pretrained weights

auto_transforms = weights.transforms()
auto_transforms

In [None]:
# Create DataLoaders using automatic transforms

train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir,
                                                                               test_dir=test_dir,
                                                                               transform=auto_transforms,
                                                                               batch_size=32)

train_dataloader,test_dataloader, class_names

## 3. Getting a pretrained model

There are various places to get a pretrained model, such as:
1. PyTorch domain libraries
2. Libraries like `timm` (torch image models)
3. HuggingFace Hub (for plenty of different models)
4. Paperswithcode (for models across different problem spaces/domains)

### 3.1 Which pretrained model should you use?

*Experiment,Experiment,Experiment*

The whole idea of transfer learning: take an already well-performing model from a problem space similar to your own and then customize to your own problem.

Three thing to consider:
1. Speed - how fast does it run?
2. Size - how big is the model?
3. Performance - how well does it go on your chosen problem (e.g. how well does it classify food images?)

Where does the model live?

Is it on device (like a self-driving car) or does it live on a server?

### 3.2 Setting up a pretrained model

Want to create an instance of a pretrained EfficientNet_V2_S

In [None]:
# Old method of creating a pretrained model (prior to torchvision v0.13)
#model = torchvision.models.efficientnet_v2_s(pretrained=True)

# New method of creating a pretrained model
weights = torchvision.models.EfficientNet_V2_S_Weights.DEFAULT
model = torchvision.models.efficientnet_v2_s(weights=weights)
model

In [None]:
model.features

In [None]:
model.classifier

### 3.3 Getting a summary of our model with `torchinfo.summary()`


In [None]:
from torchinfo import summary
summary(model=model,
        input_size=(1,3,224,224),
        col_names=["input_size","output_size","num_params","trainable"],
        col_width=20,
        row_settings=["var_names"])

### 3.4 Freezing the base model and changing the output layer to suit our needs

With a feature extractor model, typically you will "freeze" the base layers of a pretrained/foundation model and update the output layers to suit your own problem.

In [None]:
# Freeze all of the base layers
for param in model.features.parameters():
  param.requires_grad = False

In [None]:
# Update the classifier head of our model to suit our problem
torch.manual_seed(42)
torch.cuda.manual_seed(42)
model = model.to(device)

model.classifier = nn.Sequential(
    nn.Dropout(p=0.2,inplace=True),
    nn.Linear(in_features=1280,
              out_features=len(class_names))
)
model.classifier

## 4. Train model

In [None]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)


In [None]:
# Import train function
from going_modular.going_modular import engine

# Set the manual seeds
torch.manual_seed(42)
torch.cuda.manual_seed(42)

# Start the timer
from timeit import default_timer as timer
start_time = timer()

# Setup training and save the results
results = engine.train(model=model,
                       train_dataloader=train_dataloader,
                       test_dataloader=test_dataloader,
                       optimizer=optimizer,
                       loss_fn=loss_fn,
                       epochs=10,
                       device=device)

end_time = timer()

print(f"Total training time: {end_time-start_time:.3f} seconds")

## 5. Evaluate model by plotting loss curves

In [None]:
try:
  from helper_functions import plot_loss_curves
except:
  print(f"[INFO] Couldn't find helper_functions.py, downloading it...")
  with open("helper_functions.py","wb") as f:
    request = requests.get("https://github.com/mrdbourke/pytorch-deep-learning/raw/main/helper_functions.py")
    print("Downloading helper_functions.py...")
    f.write(request.content)
  from helper_functions import plot_loss_curves

# Plot the loss curves of our model
plot_loss_curves(results)

## 6. Make predictions on images from the test set

Let's add here to the data explorer's motto of *visualize,visualize,visualize*!

And make some qualitative predictions on our test set.

Some things to keep in mind when making predictions/inference on test data/custom data.

We have to make sure that our test/custom data is:
* Same shape - images need to be same shape as model was trained on
* Same datatype - custom data should be in the same data type
* Same device - custom test data should be in the same device as the model
* Same transform - if you've transformed your custom data, ideally you will transform the test data and custom data the same

To do all of this automatically, let's create a function called `pred_and_plot_image()`:
1. Take in a trained model, a list of class names, a filepath to a target image, an image size, a transform and a target device
2. Open the image with `PIL.Image.Open()`
3. Create a transform if one doesn't exist
4. Make sure the model is on the target device
5. Turn the model to `model.eval()` mode to make sure it's ready for inference (this will turn off things like `nn.Dropout()`)
6. Transform the target image and make sure its dimensionality is suited for the model (this mainly relates to batch size)
7. Make a prediction on the image by passing to the model
8. Plot the image with `matplotlib` and set the title to the prediction label and prediction probability

In [None]:
from typing import List, Tuple

from PIL import Image
from torchvision import transforms
# 1. Take in a trained model

def pred_and_plot_image(model: torch.nn.Module,
                        image_path: str,
                        class_names: List[str],
                        image_size: Tuple[int,int]=(224,224),
                        transform: torchvision.transforms = None,
                        device: torch.device = device):

  # 2. Open the image with PIL
  img = Image.open(image_path)

  # 3. Create a transform if one doesn't exist
  if transform is not None:
    image_transform = transform
  else:
    image_transform = transforms.Compose([
        transforms.Resize(image_size),
        transforms.ToTensor(),
        transfroms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
    ])

  ### Predict on image ###
  # 4. Make sure the model is on target device
  model.to(device)

  # 5. Turn on inference and eval mode
  model.eval()
  with torch.inference_mode():
    # 6. Transform the image and add an extra batch dimension
    transformed_image = image_transform(img).unsqueeze(dim=0)

    # 7. Make a prediction on the transformed image by passing it to the model
    target_image_pred = model(transformed_image.to(device))

  target_image_pred_probs = torch.softmax(target_image_pred, dim=1)

  target_image_pred_label = torch.argmax(target_image_pred_probs, dim=1)

  # 8. Plot image with predicted label and probability
  plt.figure()
  plt.imshow(img)
  plt.axis("off")
  plt.title(f"Pred: {class_names[target_image_pred_label]} | Prob: {target_image_pred_probs.max():.3f}")

In [None]:
# Get a random list of image paths from the test set
import random

num_images_to_plot = 3
test_image_path_list = list(Path(test_dir).glob("*/*.jpg"))
test_image_path_sample = random.sample(population=test_image_path_list,
                                       k=num_images_to_plot)
for image_path in test_image_path_sample:
  pred_and_plot_image(model=model,
                      image_path=image_path,
                      class_names=class_names,
                      transform=auto_transforms,
                      device=device)

### 6.1 Making predictions on a custom image

In [None]:
import requests

custom_image_path = data_path / "04-pizza-dad.jpeg"

if not custom_image_path.is_file():
  with open(custom_image_path, "wb") as f:
    request = requests.get("https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/04-pizza-dad.jpeg")
    print("Downloading custom image...")
    f.write(request.content)
else:
  print(f"{custom_image_path} already exists")

In [None]:
pred_and_plot_image(model=model,
                      image_path=custom_image_path,
                      class_names=class_names,
                      transform=auto_transforms,
                      device=device)

## Exercises


### 1. Make predictions on the entire test dataset and plot a confusion matrix for the results of our model compared to the truth labels.

In [None]:
!pip install -q torchmetrics

In [None]:
from torchmetrics import ConfusionMatrix
from mlxtend.plotting import plot_confusion_matrix

confmat = ConfusionMatrix(task="MULTICLASS",num_classes=len(class_names))

predictions_label = []

model.eval()
with torch.inference_mode():
  for batch, (X,y) in enumerate(test_dataloader):
    X,y = X.to(device), y.to(device)
    y_pred = model(X)
    y_pred_label = torch.argmax(torch.softmax(y_pred,dim=1),dim=1)
    predictions_label.append(y_pred_label.cpu())

y_pred_tensor = torch.cat(predictions_label)
target_tensor = torch.cat([y for X,y in test_dataloader])

confmat_tensor = confmat(y_pred_tensor,target=target_tensor)
fig, ax = plot_confusion_matrix(conf_mat=confmat_tensor.numpy(),
                                  figsize=(10,10),
                                  colorbar=True,
                                  class_names=class_names)


### 2. Get the "most wrong" of the predictions on the test dataset and plot the 5 "most wrong" images. You can do this by:
* Predicting across all of the test dataset, storing the labels and predicted probabilities.
* Sort the predictions by wrong prediction and then descending predicted probabilities, this will give you the wrong predictions with the highest prediction probabilities, in other words, the "most wrong".
* Plot the top 5 "most wrong" images, why do you think the model got these wrong?

In [None]:
import os
from pathlib import Path

# Get all test data paths
test_data_paths = list(Path(test_dir).glob("*/*.jpg"))
test_labels = [path.parent.stem for path in test_data_paths]

# Create a function to return a list of dictionaries with sample,label, pred_prob

def pred_and_store(test_paths, model, transform,class_names):
  test_pred_list = []
  for path in test_paths:
    pred_dict = {}

    pred_dict["image_path"] = path
    class_name = path.parent.stem
    pred_dict["class_name"] = class_name

    from PIL import Image
    img = Image.open(path)
    transformed_image = transform(img).unsqueeze(0) # Add batch dimension

    model.eval()
    with torch.inference_mode():
      pred_prob = torch.softmax(model(transformed_image.to(device)),dim=1)
      pred_label = torch.argmax(pred_prob,dim=1)
      pred_class = class_names[pred_label.cpu()]

      pred_dict["pred_prob"] = pred_prob.unsqueeze(0).max().cpu().item()
      pred_dict["pred_class"] = pred_class

    pred_dict["correct"] = pred_class == class_name

    test_pred_list.append(pred_dict)

  return test_pred_list

pred_list = pred_and_store(test_paths=test_data_paths,model=model,transform=auto_transforms,class_names=class_names)

In [None]:
pred_list

In [None]:
import pandas as pd
test_pred_df = pd.DataFrame(pred_list)
top_5_most_wrong = test_pred_df.sort_values(by=["correct","pred_prob"],ascending=[True,False]).head()
top_5_most_wrong


In [None]:
import torchvision
for row in top_5_most_wrong.iterrows():
  row=row[1]
  image_path = row[0]
  true_label=row[1]
  pred_prob = row[2]
  pred_class = row[3]
  img = torchvision.io.read_image(str(image_path))
  plt.figure()
  plt.imshow(img.permute(1,2,0))
  plt.title(f"True: {true_label} | Pred: {pred_class} | Prob: {pred_prob:.3f}")
  plt.axis("off")


### 4. Train the model from section 4 above for longer (10 epochs should do), what happens to the performance?

With more epochs the model accuracy increased, even more in the testing data in comparison with the training data, so there's no overfitting yet

### 5. Train the model from section 4 above with more data, say 20% of the images from Food101 of Pizza, Steak and Sushi images.

In [None]:
from pathlib import Path
import requests
import zipfile
import os

bigger_data_path = Path("bigger_data")
bigger_data_dir = bigger_data_path / "pizza_steak_sushi_20_percent"

if bigger_data_dir.is_dir():
  print(f"{bigger_data_path} directory already exists")
else:
  print(f"Creating {bigger_data_path} directory")
  bigger_data_dir.mkdir(parents=True,exist_ok=True)

with open(bigger_data_path / "pizza_steak_sushi_20_percent.zip","wb") as f:
  request = requests.get("https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi_20_percent.zip")
  f.write(request.content)

with zipfile.ZipFile(bigger_data_path / "pizza_steak_sushi_20_percent.zip","r") as zip_ref:
  print("Unzipping pizza_steak_sushi_20_percent.zip...")
  zip_ref.extractall(bigger_data_path)
  os.remove(bigger_data_path / "pizza_steak_sushi_20_percent.zip")

train_dir = bigger_data_path / "train"
test_dir = bigger_data_path / "test"

In [None]:
weights = torchvision.models.EfficientNet_V2_M_Weights.DEFAULT
model = torchvision.models.efficientnet_v2_m(weights=weights)
transformation = weights.transforms()

In [None]:
from going_modular.going_modular import data_setup

train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir,
                                                                               test_dir=test_dir,
                                                                               transform=transformation,
                                                                               batch_size=32,
                                                                               num_workers=os.cpu_count(),
                                                                               )

In [None]:
for param in model.features.parameters():
  param.requires_grad = False

model.classifier = nn.Sequential(
    nn.Dropout(p=0.2,inplace=True),
    nn.Linear(in_features=1280,out_features=3)
)

In [None]:
optimizer = torch.optim.Adam(model.parameters(),lr=0.001)
loss_fn = nn.CrossEntropyLoss()

In [None]:
from going_modular.going_modular import engine
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
results = engine.train(model=model,
                       train_dataloader=train_dataloader,
                       test_dataloader=test_dataloader,
                       optimizer=optimizer,
                       loss_fn=loss_fn,
                       epochs=10,
                       device=device)



In [None]:
confmat = ConfusionMatrix(task="MULTICLASS",num_classes=len(class_names))

predictions_label = []

model.eval()
with torch.inference_mode():
  for batch, (X,y) in enumerate(test_dataloader):
    X,y = X.to(device), y.to(device)
    y_pred = model(X)
    y_pred_label = torch.argmax(torch.softmax(y_pred,dim=1),dim=1)
    predictions_label.append(y_pred_label.cpu())

y_pred_tensor = torch.cat(predictions_label)
target_tensor = torch.cat([y for X,y in test_dataloader])

confmat_tensor = confmat(y_pred_tensor,target=target_tensor)
fig, ax = plot_confusion_matrix(conf_mat=confmat_tensor.numpy(),
                                  figsize=(10,10),
                                  colorbar=True,
                                  class_names=class_names)

In [None]:
pathlist = list(Path(test_dir).glob("*/*.jpg"))

pred_and_plot_image(model=model,
                    image_path=random.choice(pathlist),
                    transform=transformation,
                    device=device,
                    class_names=class_names)