<img src="https://futurejobs.my/wp-content/uploads/2021/05/d-min-1024x297.png" width="300"> </img>

> **Copyright &copy; 2021 Skymind Education Group Sdn. Bhd.**<br>
 <br>
This program and the accompanying materials are made available under the
terms of the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). \
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations
under the License. <br>
<br>**SPDX-License-Identifier: Apache-2.0** 

# Convolutional Neural Network: Fruits Classification Using Transfer Learning

## Introduction

In this hands-on, we will be building a fruits classifier using a **CNN pretrained model**. We will only classify the images of the fruits into 3 classes namely **apple, grapes and lemon**.

## Objectives
In this hands-on, we will :-

1. Download the fruits image dataset from a provided link.
2. Make the dataset iterable by using it as a DataLoader object.
5. Instantiate a pretrained Model (VGG) class.
6. Instantiate the Loss class.
7. Instantiate the Optimizer class.
8. Train the CNN Model.
9. Visualize metrics of the CNN Model.
10. Save and load the CNN Model.
11. Classify a test image.

## Transfer Learning 

The key motivation of transfer learning is that most models which solve complex problems need a lot of data train on in order to perform well, especially for deep learning. However, getting large dataset for a specific domain is hard.

**Transfer learning** enables us to reuse knowledge from previously learned tasks and apply them to the new related ones. Instead of training the a new neural network from scratch, we “transfer” the learned features and use the model trained on other similar problem as a starting point.

A **pre-trained model** is a saved network that was previously trained on a large dataset, typically on a large-scale image-classification task. The intuition behind is that if a model is trained on a large and general enough dataset, this model will effectively serve as a generic model of the visual world. 

In practice, very few people train an CNN from scratch (with random initialization), because it is relatively rare to have a dataset of sufficient size. Instead, it is common to use a pretrained CNN on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use the CNN either as an initialization or a fixed feature extractor for the task of interest. Read more about transfer learning [here](https://cs231n.github.io/transfer-learning/).

## Download Data

In [None]:
from pathlib import Path
from urllib import request
import zipfile
from tqdm import tqdm

In [None]:
class DownloadProgressBar(tqdm):
  def update_to(self, b=1, bsize=1, tsize=None):
    if tsize is not None: self.total = tsize
    self.update(b * bsize - self.n)

download_link = 'https://s3.eu-central-1.wasabisys.com/certifai/deployment-training-labs/fruits_image_classification-20210604T123547Z-001.zip'
DATASET_BASE_PATH = Path("../datasets/FruitsClassification").resolve()

if not DATASET_BASE_PATH.exists(): DATASET_BASE_PATH.mkdir()

destination_file = Path.joinpath(DATASET_BASE_PATH, "fruits_classification_zip.zip")
if not destination_file.exists():
  with DownloadProgressBar(unit='B', unit_scale=True, miniters=1, desc=download_link.split('/')[-1]) as t:
    request.urlretrieve(download_link, destination_file, reporthook=t.update_to)
  zipr = zipfile.ZipFile(destination_file)
  zipr.extractall(DATASET_BASE_PATH)
  zipr.close()
else:
  print(f"{destination_file} already exists, skipping download!")

Open the `fruit_classification_data` folder in your root directory and take a look at the folder structure and the images within them.

In the first level, there are four subfolders namely:-
- train      : Images used to train the model
- test       : Images used to test the model
- validation : Images used to validate the model
- dirty_test : Images that represent real life input which may be more challenging for a classifier model.

In the second level: there are three subfolders, one for each class of:-
- apple
- grapes
- lemon

In [None]:
data_dir = Path.joinpath(DATASET_BASE_PATH, "fruits_image_classification")
train_dir = Path.joinpath(data_dir,"train")
valid_dir = Path.joinpath(data_dir,"validation")
test_dir = Path.joinpath(data_dir,"test")
dirtytest_dir = Path.joinpath(data_dir,"dirty_test")
num_classes = 3

## Data Augmentation and Preprocessing

The idea of data augmentation is to artificially increase the number of training images by applying random transformations to the images. For each epoch, a different random transformation is applied to each training image. 

Data augmentation is used to improve robustness of deep learning model. It helps reducing overfitting and improves generalization.

Note that each pre-trained model will have different input requirements, but if we read through what Imagenet requires, we figure out that our images need to be 224x224 and normalized to a range shown below.

Quick References

1. Why use those specific numbers in image normalization? [ref1](https://discuss.pytorch.org/t/discussion-why-normalise-according-to-imagenet-mean-and-std-dev-for-transfer-learning/115670/7) [ref2](https://pytorch.org/vision/stable/models.html)
-https://arxiv.org/abs/1409.1556


In [None]:
import torch
from torch.utils.data import DataLoader
from torch import nn
from torch.nn import functional as F
from torch.utils.tensorboard import SummaryWriter

import torchvision
from torchvision import transforms
from torchvision import datasets
from torchvision import models

import numpy as np
import matplotlib.pyplot as plt
%load_ext tensorboard
%matplotlib inline

In [None]:
random_seed = 123
torch.manual_seed(123)

In [None]:
image_transforms = {
    # Augment data only for the training dataset
    "train": transforms.Compose([
                                 transforms.RandomResizedCrop(224),
                                 transforms.RandomRotation(degrees=(-15,15)),
                                 transforms.ColorJitter(),
                                 transforms.RandomHorizontalFlip(),
                                 transforms.ToTensor(),
                                 transforms.Normalize( 
                                     mean=[0.485, 0.456, 0.406], # All pretrained models expect input image to be normalized with these setting
                                     std=[0.229, 0.224, 0.225] # Refer to the Quick Reference section above for more details
                                     )
    ]),
    "valid":  transforms.Compose([
                                  transforms.Resize(256),
                                  transforms.CenterCrop(224),
                                  transforms.ToTensor(),
                                  transforms.Normalize(
                                     mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225]
                                     )
    ])
}

Since our dataset directory satisfies the convention of the directory tree stated in the official documentation of [torchvision.datasets.ImageFolder](https://pytorch.org/vision/stable/datasets.html#torchvision.datasets.ImageFolder), we can care less when using this method to read our input images.

In [None]:
data = {
    "train": datasets.ImageFolder(root = train_dir, transform=image_transforms["train"]),
    "valid": datasets.ImageFolder(root = valid_dir, transform=image_transforms["valid"])
}

batch_size = 10
data_loaders = {
    "train": DataLoader(data["train"], batch_size = batch_size, shuffle=True),
    "valid": DataLoader(data["valid"], batch_size = batch_size, shuffle=True)
}

Let's view the images in the dataloader. You will notice that the transformations has been applied.

In [None]:
# To check the iterative behavior of the DataLoader (Optional)
# Iterate through the dataloader once
features, labels = next(iter(data_loaders["train"]))
features.shape, labels.shape

The shape of the features show that there are 10 samples in each iteration of the `DataLoader` object, whereby each iteration holds an RGB image sized 224 x 224.

And as expected there are 10 labels, one for each sample.

In [None]:
# Visualization of the samples in the first iteration of the DataLoader object
grid = torchvision.utils.make_grid(features, nrow=10)
plt.figure(figsize=(15,15))
plt.imshow(np.transpose(grid,(1,2,0)))

## Pretrained Model

PyTorch has a models subpackage that contains model architectures and pretrained models that have been trained on 1000 classes of images (millions of images) in Imagenet. See the list of models available in `torchvision.models` [here](https://pytorch.org/vision/stable/models.html).

For this example, we’ll be using the VGG-16. Assuming the given number of training samples (images) are not sufficient enough for building a classifier, in this case, VGG can be easily leveraged for feature extraction as it is trained on millions of images. Though it didn’t record the lowest error and has a higher inference time due to large number of parameters, it is quicker to train than other models.

Here are the general steps to use a pre-trained model:
- Load in pre-trained weights / model trained on a large dataset.
- Freeze all the weights in the front (convolutional) layers (adjusting layers to freeze based on the similarity of new task to the trained large dataset).
- Replace the end of the network with a custom classifier (set the number of outputs to be the number of classes).
- Train only on the unfreezed layers for the task.

### Instantiate VGG Model

In [None]:
vgg16_model = models.vgg16(pretrained=True)
vgg16_model

Notice that there are three major section in VGG16's architecture. 
- `features`
- `avgpool`
- `classifier`

In [None]:
total_params = 0
for name, param in vgg16_model.named_parameters():
  print(name, param.numel())
  total_params += param.numel()
print("------------------")
print("Total parameters: ", total_params)

This model with almost 140 million parameters had already been trained with 1.3 million images! This greatly shortens our time to develop a fruits classifier model. 

For us to be able to use the trained features as our fruits classifier we need to make several modifications since we are using a model that was developed for 1000 classes. We will freeze all the layers to retain their weights and biases but replace entirely the final layer in the `classifier` sequential layers - `classifier[6]` which is a fully-connected layer with 4096 input neurons and 1000 output neurons.  

In [None]:
# Freeze model weights and biases
for param in vgg16_model.parameters():
  param.requires_grad = False

In [None]:
print("--------------")
print("Before modification")
print("--------------")
print(f"{vgg16_model.classifier}\n")

# Custom create a few additional layers to replace classifier[6]
new_layers = nn.Sequential(
    nn.Linear(4096, 256),
    nn.ReLU(),
    nn.Dropout(0.4),
    nn.Linear(256, num_classes),
    nn.Softmax(dim=1)
)
vgg16_model.classifier[6] = new_layers
print("--------------")
print("After modification")
print("--------------")
print(f"{vgg16_model.classifier}\n")

In [None]:
total_params = sum(p.numel() for p in vgg16_model.parameters())
print(f"total_param = {total_params}")
total_trainable_params = sum(
    p.numel() for p in vgg16_model.parameters() if p.requires_grad)
print(f'total_trainable_params = {total_trainable_params}')
print(f"{round(total_trainable_params/total_params*100, 2)}% of the model's parameters require training")

Now that we had added a new layer, let's train the trainable parameters using our training dataset.

In [None]:
# Instantiate optimizer and loss function
optimizer = torch.optim.SGD(
    [
     {"params": vgg16_model.features.parameters()},
     {"params": vgg16_model.classifier.parameters(), "lr": 1e-3}
    ], 
    lr=1e-2, 
    momentum=0.9
)

criterion = nn.CrossEntropyLoss()

# Setup Hyperparameters
epochs = 60
n_epochs_stop = 10
loss_score = {"train": [], "valid": []}

# Setup Tensorboard writer
TENSORBOARD_LOGS_PATH = Path("./run_VGG16_Fruits").resolve()
writer = SummaryWriter(TENSORBOARD_LOGS_PATH)

# Setup model saving path
MODEL_SAVE_BASE = Path("../generated_models").resolve()
if not MODEL_SAVE_BASE.exists(): MODEL_SAVE_BASE.mkdir() # Create folder
MODEL_SAVE_PATH = Path.joinpath(MODEL_SAVE_BASE, "VGG16_TL_FruitClassifier.pt")

Use GPU, if available, else just use CPU.

In [None]:
# Check availability of CUDA, cuDNN and check model
# --STRONGLY RECOMMEND TO USE GPU, ELSE IT'LL BE AGES FOR YOUR TRAINING TO COMPLETE!--

device = None
if torch.cuda.is_available and torch.backends.cudnn.enabled:
  print("GPU is available, training will use GPU instead of CPU.")
  # Move model and variables to gpu
  device = torch.device("cuda:0")
  vgg16_model.to(device)
  criterion.to(device)
else:
  print("GPU is unavailable, training remains in the default CPU setting.")

## (OPTIONAL): Verify the device the model parameters are at by uncommenting the following line
# next(vgg16_model.parameters()).device

In [None]:
def train(model, loader, optimizer, criterion, model_saving_path, writer, n_epochs_stop, epochs):
  epochs_no_improve = 0
  min_val_loss = np.Inf
  early_stop = False
  curr_epoch = 0

  while curr_epoch < epochs and early_stop is False:
  
    for phase in ["train", "valid"]:
      running_loss = 0.0
      running_size = 0.0
      epoch_loss = 0.0
      correct = 0

      if phase == "train":
        model.train()
      else:
        # Set layers to evaluation model before running inference
        model.eval()
      
      for images, labels in loader[phase]:
        # Move tensor to GPU if available
        if device: # device=None if 'cpu'
          images, labels = images.to(device), labels.to(device)
        
        # Predict and compute loss
        with torch.set_grad_enabled(phase=="train"):
          y_pred = model(images)
          loss = criterion(y_pred, labels)

          if phase == "train":
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        # If data size is not divisible by batch size
        # The last batch size will be the remainder, thus, we multiply the loss with the number of items in batch 
        running_loss += loss.item()*y_pred.size(0)
        running_size += y_pred.size(0)

        # The predictions is the index of the maximum values in the output of model
        predictions = torch.max(y_pred, 1)[1]
        correct += (predictions == labels).sum().item()
      
      # Calculate the epoch_loss (running_size is equal to the respective data size)
      epoch_loss = running_loss / running_size
      epoch_accuracy = correct / running_size
      writer.add_scalars('Loss', {phase: epoch_loss}, curr_epoch)
      writer.add_scalars('Accuracy', {phase: epoch_accuracy}, curr_epoch)

      # Print score every 5 epochs
      if (curr_epoch % 5 == 0):
        if phase == 'train':
            print(f'Epoch {curr_epoch}:')
        print(f'  {phase.upper()} Loss: {epoch_loss}')
        print(f'  {phase.upper()} Accuracy: {epoch_accuracy}')

      loss_score[phase].append(epoch_loss)

      # Early stopping
      if phase == 'valid':
        if epoch_loss < min_val_loss:
          # Save the model before the epochs start to not improving
          torch.save(model.state_dict(), MODEL_SAVE_PATH)
          print(f"Model saved at Epoch {curr_epoch} \n")
          epochs_no_improve = 0
          min_val_loss = epoch_loss

        else:
          # Add 1 to epochs_no_improve if valid epoch_loss is not lower then min_val_loss
          epochs_no_improve += 1
          print(f'\tepochs_no_improve: {epochs_no_improve} at Epoch: {curr_epoch}')
          
          if epochs_no_improve == n_epochs_stop:
            print(f'\n Early stopping condition was met when there were no improvements in validation loss for {n_epochs_stop} epochs, continuously!')
            early_stop = True
      
    curr_epoch += 1
    writer.close()

In [None]:
%%time

# Training with early-stopping technique
train(vgg16_model, data_loaders, optimizer, criterion, MODEL_SAVE_PATH, writer, n_epochs_stop, epochs)

The cell output already displays a lot of information in the form of text. You can also make use of `Tensorboard` to visualize the accuracies and losses per iteration.

If you're using **Windows** and are not able to view your plots after running the following cell, there is a temporary workaround [(Reference)](https://github.com/tensorflow/tensorboard/issues/2481#issuecomment-516974546). Run the following commands in `CMD.exe` or `Powershell` and try running the cell again:-
>`taskkill /im tensorboard.exe /f`</br>
>`del /q %TMP%\.tensorboard-info\*` (CMD.exe only)</br>
>`del $env:TMP\.tensorboard-info\*` (Powershell only)</br>

In [None]:
%tensorboard --logdir={TENSORBOARD_LOGS_PATH.as_posix()}

### Model inferencing

We know that our model does well on both the training and validation dataset, but the ultimate determinant of its performance is on how it scores against the hold-out set that it has not seen before.

Before we start inferencing, let's assume we only have  the best model that we've saved.

In [None]:
# Load the best model's state_dict by first instantiating the VGG16 model, excluding it's pretrained weights and biases.
model_inference = models.vgg16()
model_inference.classifier[6] = new_layers
model_inference.load_state_dict(torch.load(MODEL_SAVE_PATH))
model_inference

In [None]:
%%time

# Load the test images from test_dir
test_data = datasets.ImageFolder(test_dir, transform=image_transforms['valid'])

batch_size = len(test_data)
test_loader = DataLoader(test_data, batch_size, shuffle=False)

# moving model_inference to GPU if available
if device:
  model_inference.to(device)

for images, labels in test_loader:
  correct = 0
  if device:
    images, labels = images.to(device), labels.to(device)
  y_pred = model_inference(images)
  predictions = torch.max(y_pred, 1)[1]
  print(predictions)
  print(labels)
  correct += (predictions == labels).sum().item()
  accuracy = correct / len(test_data)
  print(f"Test Accuracy: {accuracy}")

Note that we have provided an extra test set named `dirty_test` containing images that are harder to classify. Let's test our model with it.

In [None]:
%%time

# Load the test images from test_dir
dirtytest_data = datasets.ImageFolder(dirtytest_dir, transform=image_transforms['valid'])

batch_size = len(dirtytest_data)
dirtytest_loader = DataLoader(dirtytest_data, batch_size, shuffle=False)

try:
  # moving model_inference to GPU if available
  if device:
    model_inference.to(device)

  for images, labels in dirtytest_loader:
    correct = 0
    if device:
      images, labels = images.to(device), labels.to(device)
    y_pred = model_inference(images)
    predictions = torch.max(y_pred, 1)[1]
    print(predictions)
    print(labels)
    correct += (predictions == labels).sum().item()
    accuracy = correct / len(dirtytest_data)
    print(f"Dirty Test Accuracy: {accuracy}")
except RuntimeError as e:
  print(e)
  # Use CPU to inference if GPU lack in memory
  # moving model to CPU
  device = torch.device("cpu")
  if device:
    model_inference.to(device)
    print(f"Moved model to {device}")

  for images, labels in dirtytest_loader:
    correct = 0
    if device:
      images, labels = images.to(device), labels.to(device)
    y_pred = model_inference(images)
    predictions = torch.max(y_pred, 1)[1]
    print(predictions)
    print(labels)
    correct += (predictions == labels).sum().item()
    accuracy = correct / len(dirtytest_data)
    print(f"Dirty Test Accuracy: {accuracy}")


Optionally, you may try to build your own CNN model and compare its performance with the VGG16 pretrained model.

### Appendix <a id=appendix></a>

**PyTorch Hub Pre-trained Model**

A recent announcement in the PyTorch world provides an additional route to get models: PyTorch Hub. This is supposed to become a central location for obtaining any published model in the future, whether it’s for operating on not only images, like in the torchvision model library, but all types like **images, text, audio, video, or any other type of data**. To obtain a model in this fashion, you use the `torch.hub module`:

In [None]:
# resnet_model = torch.hub.load('pytorch/vision', 'resnet50', pretrained=True)

The first parameter points to a GitHub owner and repository (with an optional tag/
branch identifier in the string as well); the second is the model requested (in this case,
resnet50); and finally, the third indicates whether to download pretrained weights.
You can also use `torch.hub.list('pytorch/vision')` to discover all the models
inside that repository that are available to download.

### Reference
1. [Transfer Learning with Convolutional Neural Networks in PyTorch](https://towardsdatascience.com/transfer-learning-with-convolutional-neural-networks-in-pytorch-dd09190245ce)
2. [Transfer Learning - Machine Learning's Next Frontier](https://ruder.io/transfer-learning/)
3. [Transfer Learning and Fine-tuning](https://www.tensorflow.org/tutorials/images/transfer_learning)
4. [Deep Learning for Everyone: Master the Powerful Art of Transfer Learning using PyTorch](https://www.analyticsvidhya.com/blog/2019/10/how-to-master-transfer-learning-using-pytorch/?utm_source=blog&utm_medium=building-image-classification-models-cnn-pytorch)
5. [A Poor Example of Transfer Learning: Applying VGG Pre-trained model with Keras](https://towardsdatascience.com/a-demonstration-of-transfer-learning-of-vgg-convolutional-neural-network-pre-trained-model-with-c9f5b8b1ab0a)