# <font style="color:blue">Project 2: Kaggle Competition - Classification</font>

#### Maximum Points: 100

<div>
    <table>
        <tr><td><h3>Sr. no.</h3></td> <td><h3>Section</h3></td> <td><h3>Points</h3></td> </tr>
        <tr><td><h3>1</h3></td> <td><h3>Data Loader</h3></td> <td><h3>10</h3></td> </tr>
        <tr><td><h3>2</h3></td> <td><h3>Configuration</h3></td> <td><h3>5</h3></td> </tr>
        <tr><td><h3>3</h3></td> <td><h3>Evaluation Metric</h3></td> <td><h3>10</h3></td> </tr>
        <tr><td><h3>4</h3></td> <td><h3>Train and Validation</h3></td> <td><h3>5</h3></td> </tr>
        <tr><td><h3>5</h3></td> <td><h3>Model</h3></td> <td><h3>5</h3></td> </tr>
        <tr><td><h3>6</h3></td> <td><h3>Utils</h3></td> <td><h3>5</h3></td> </tr>
        <tr><td><h3>7</h3></td> <td><h3>Experiment</h3></td><td><h3>5</h3></td> </tr>
        <tr><td><h3>8</h3></td> <td><h3>TensorBoard Dev Scalars Log Link</h3></td> <td><h3>5</h3></td> </tr>
        <tr><td><h3>9</h3></td> <td><h3>Kaggle Profile Link</h3></td> <td><h3>50</h3></td> </tr>
    </table>
</div>


# <font style="color:green">Project Approach</font>

The objective of this project is to correctly classify the following food types in the KenyanFood13 data set.

<img src="https://raw.githubusercontent.com/monajalal/Kenyan-Food/master/img/KenyanFood13.png" alt="Sample images from the KenyanFood13 data set" width="600px" />

Rather than create my own CNN architecture, I will take advantage of existing models that have been trained on the ImageNet data set.

___

The **[Transfer learning and the art of using Pre-trained Models in Deep Learning](https://www.analyticsvidhya.com/blog/2017/06/transfer-learning-the-art-of-fine-tuning-a-pre-trained-model/)** blog post outlines four ways to fine tune a model that has been trained on a different dataset. The following is a short section of this post that I slightly changed.

The following diagram depicts four scenarios of using a pretrained model on a new data set.

<img src="https://cdn.analyticsvidhya.com/wp-content/uploads/2017/05/31112715/finetune1.jpg" alt="Transfer learning approaches" width="300px">

**Scenario 1: Size of the data set is small and the data similarity is high.** In this case, since the data similarity is very high, we do not need to retrain the model. All we need to do is to customize and modify the output layers according to our problem statement. We use the pretrained model as a feature extractor and retrain the classification block/layer.

**Scenario 2: Size of the data set is small and the data similairity is low.** In this case we can freeze the initial (let’s say k) layers of the pretrained model and train just the remaining(n-k) layers again. The top layers would then be customized to the new data set. Since the new data set has low similarity it is significant to retrain and customize the higher layers according to the new dataset.  The small size of the data set is compensated by the fact that the initial layers are kept pretrained(which have been trained on a large dataset previously) and the weights for those layers are frozen.

**Scenario 3: Size of the data set is large and the data similarity is low.**  In this case, since we have a large dataset, our neural network training would be effective. However, since the data we have is very different as compared to the data used for training our pretrained models. The predictions made using pretrained models would not be effective. Hence, its best to train the neural network from scratch according to your data.

**Scenario 4: Size of the data set is large and the data similarity is high.** This is the ideal situation. In this case the pretrained model should be most effective. The best way to use the model is to retain the architecture of the model and the initial weights of the model. Then we can retrain this model using the weights as initialized in the pre-trained model.

___

Since I did not know how to program in Python before this class, I used this project to improve my Python proficiency. Consequently, I did not only to explore using pretrained models on a new data set, but I also spent significant time developing class hierachies that will allow me to easily conduct experiments on the following pretrained TorchVision models using any of the scenarios described above.

* ResNet-18
* ResNet-34
* ResNet-50
* ResNet-101
* ResNet-152
* ResNeXt-50-32x4d
* ResNeXt-101-32x8d
* Wide ResNet-50-2
* Wide ResNet-101-2
* VGG-11 with batch normalization
* VGG-13 with batch normalization
* VGG-16 with batch normalization
* VGG-19 with batch normalization
* DenseNet-121
* DenseNet-169
* DenseNet-201
* DenseNet-161

I used and modified the trainer module rather than use Pytorch Lightning. Modications to the trainer module include, but are not limited to, adding additional configuration parameters, adding the ability to prematurely stop training when either the loss or accuracy does not significantly improve over a certain number of epochs, extending the visualization base and TensorBoard classes to allow logging of images, figures, graphs, and PR curves.

Experiments are identified by uppercase capital letters, _reg expr_ = \(\[A-Z\]\[A-Z\]\[A-Z\]\). The first and second letters designate the experiment group and set respectiviely, while the last letter designates an individual experiment. Hence, all experiments that begin with "A" belong the Group A, while all experiments that begin with "ExpAB" belong to Group A, Set B.

I implemented the following groups of expeirments.

* Group A to explore the data and verify the training pipeline.
* Group B to explore the four transfer learning scenarios on a model from the ResNet, VGG, and DenseNet families.
* Group C to explore optimizing the transfer learning approach that worked best.
* Group D to explore whether an ensemble performs better than its constitute parts.
* Group E to explore miscellaneous issues, e.g., performance of Project 1 model, normalizing KenyanFood13 data by its mean/std, training on grayscale images, impact of no or poor data augmentation.

In [None]:
# This cell initializes the notebook for execution on different hosts.

import os
import sys

def get_host() -> str:
    """
    The get_ipython() function returns the following from different hosts.

    colab:  <google.colab._shell.Shell object at 0x7f23c5e386d8>
    brule:  <ipykernel.zmqshell.ZMQInteractiveShell object at 0x7f1990f22a50>
    kaggle: <ipykernel.zmqshell.ZMQInteractiveShell object at 0x7f9d093aebd0>
    """
    
    if 'google.colab' in str(get_ipython()):
        return "colab"
    else:
        # ToDo: Determine whether running on kaggle.
        return "brule"

def init_host(host:str):
    if host == "brule":
        # set data and project directories
        if os.path.isdir("./trainer"):
            data_dir = "./data"
            proj_dir = "./"
        elif os.path.isdir("./project2/trainer"):
            data_dir = "./project2/data"
            proj_dir = "./project2"
        else:
            raise SystemExit("Cannot locate trainer module.")

    elif host == "colab":
        # mount Google Drive
        from google.colab import drive
        drive.mount("/content/gdrive")

        # set data and project directories
        data_dir = "/content/data"
        proj_dir = "/content/gdrive/MyDrive/Colab Notebooks/project2"

        # fetching data from Google Drive is very, very slow ...
        # hence, we will unzip the dataset to /content/data if it is not there
        dataset = os.path.join(proj_dir, "data", "pytorch-opencv-course-classification.zip")
        if not os.path.isdir(data_dir):
              os.makedirs(data_dir)
              import zipfile
              with zipfile.ZipFile(dataset, 'r') as zip_ref:
                  zip_ref.extractall(data_dir)              

    else:
        raise SystemExit("Unknown host! Cannot continue.")

    sys.path.append(proj_dir)
    return data_dir, proj_dir

data_dir, proj_dir = init_host(get_host())

print(f"data_dir: {data_dir}")
!ls -lh {data_dir.replace(" ", "\\ ")}

print(f"proj_dir: {proj_dir}")
!ls -lh {proj_dir.replace(" ", "\\ ")}

In [None]:
# import organzier @ https://pypi.org/project/importanize/

from abc import ABC, abstractmethod, abstractproperty
from collections import namedtuple
from dataclasses import dataclass, replace
from enum import Enum, auto
from operator import itemgetter
from typing import Callable, Iterable, List, Optional, Tuple

import itertools
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import PIL
import random
from PIL import Image

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchvision

from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from torch.utils.data import DataLoader, Dataset
from torchvision import models, transforms

from trainer import Trainer, configuration, hooks
from trainer.configuration import SystemConfig
from trainer.configuration import DataAugConfig, DatasetConfig, DataLoaderConfig
from trainer.configuration import OptimizerConfig, SchedulerConfig, TrainerConfig
from trainer.metrics import AccuracyEstimator
from trainer.tensorboard_visualizer import TensorBoardVisualizer
from trainer.utils import patch_configs, setup_system

## <font style="color:green">1. Data Loader [10 Points]</font>

In this section, you have to write a class or methods that will be used to get training and validation data
loader.

You will have to write a custom dataset class to load data.

**Note that there are not separate validation data, so you will have to create your validation set by dividing train data into train and validation data. Usually, in practice, we do `80:20` ratio for train and validation, respectively.** 

For example,

```
class KenyanFood13Dataset(Dataset):
    """
    
    """
    
    def __init__(self, *args):
    ....
    ...
    
    def __getitem__(self, idx):
    ...
    ...
    
    
```

```
def get_data(args1, *agrs):
    ....
    ....
    return train_loader, test_loader
```

In [None]:
class KenyanFood13Data:
    """
    This class parses the KenyanFood13's test.csv and train.csv files and divides the training data
    into training and validation sets preserving the relative ratios of the number of images of each
    class type.
    """
    
    def __init__(self, data_root, valid_size = 0.2, random_seed = 42):
        """
        """
        
        # the root path of the images
        self.__image_root = os.path.join(data_root, 'images', 'images')
        
        # parse the test CSV file to obtain filenames (labels are not given)
        test_data_frame = self.__parse_data_file(data_root, 'test.csv')
        self.__test_fnames = test_data_frame.values[:,0]
        
        # parse the train CSV file to obtain filenames and labels       
        train_data_frame = self.__parse_data_file(data_root, 'train.csv')
        fnames = train_data_frame.values[:,0]
        labels = train_data_frame.values[:,1]
        
        # get the classes and class counts
        self.__classes, self.__class_counts = np.unique(labels, return_counts=True)
        num_classes = len(self.__classes)
        
        # create a dictionary of text labels to integer labels
        label_dict = {}
        for key, value in zip(self.__classes, np.arange(num_classes)):
            label_dict[key] = value
                
        # convert the text labels to their numeric equivalents
        labels = [label_dict[label] for label in labels]

        # retain the complete unsplit training dataset for visualization purposes
        self.__unsplit_fnames = fnames
        self.__unsplit_labels = labels

        # create a dictionary library that stores list of images of the same label
        self.__library = {key : [fname for fname, label in zip(fnames, labels) if label == key] 
                          for key in range(num_classes)}

        # split the training data into training and validation sets
        self.__train_fnames, self.__valid_fnames, self.__train_labels, self.__valid_labels = train_test_split(
            fnames,                      # image file names w/o path or extension
            labels,                      # image labels
            test_size = valid_size,      # test size
            random_state = random_seed,  # random seed for reproducibility
            shuffle = True,              # shuffle data before splitting into training and validation sets
            stratify = labels            # maintain equal class representation in training and validation sets
        )

        # create subsets of the training and validation sets for pipeline check
        subset_size = 256.0 / len(self.__train_fnames)

        _, self.__train_fnames_subset, _, self.__train_labels_subset = train_test_split(
            self.__train_fnames,
            self.__train_labels,
            test_size = subset_size,
            random_state = random_seed,
            shuffle = True,
            stratify = self.__train_labels
        )

        _, self.__valid_fnames_subset, _, self.__valid_labels_subset = train_test_split(
            self.__valid_fnames,
            self.__valid_labels,
            test_size = subset_size,
            random_state = random_seed,
            shuffle = True,
            stratify = self.__valid_labels
        )

        
    def __parse_data_file(self, data_root, file):
        path = os.path.join(data_root, file)
        return pd.read_csv(path, delimiter=',', dtype={'id': 'str'}, engine='python')
    
    @property
    def image_root(self):
        return self.__image_root

    @property
    def classes(self):
        return self.__classes
    
    @property
    def class_counts(self):
        return self.__class_counts
    
    @property
    def test_fnames(self):
        return self.__test_fnames
    
    @property
    def train_fnames(self):
        return self.__train_fnames
    
    @property
    def train_labels(self):
        return self.__train_labels
    
    @property
    def valid_fnames(self):
        return self.__valid_fnames
    
    @property
    def valid_labels(self):
        return self.__valid_labels

    @property
    def train_fnames_subset(self):
        return self.__train_fnames_subset
    
    @property
    def train_labels_subset(self):
        return self.__train_labels_subset
    
    @property
    def valid_fnames_subset(self):
        return self.__valid_fnames_subset
    
    @property
    def valid_labels_subset(self):
        return self.__valid_labels_subset

    @property
    def unsplit_fnames(self):
        return self.__unsplit_fnames
    
    @property
    def unsplit_labels(self):
        return self.__unsplit_labels

    @property
    def library(self):
          return self.__library

In [None]:
class KenyanFood13Dataset(Dataset):
    """
    This custom PyTorch dataset contains images and classification labels from
    Kaggle's KenyanFood13 dataset.
    """
    
    def __init__(self, image_root, fnames, labels=None, transform=None):
        super().__init__()
        self.__fnames = fnames
        self.__labels = labels
        self.__transform = transform
        self.__image_root = image_root

    def __len__(self):
        """
        Returns the dataset's length, i.e., the number of image/label pairs.
        """

        return len(self.__fnames)
    
    def __getitem__(self, idx):
        """
        Returns the (optionally resized & preprocessed) image that corresponds to the specified index.
        """

        # conversion needed to remove alpha channel, if present
        path = os.path.join(self.__image_root, self.__fnames[idx] + ".jpg")
        image = Image.open(path).convert("RGB")
        
        if self.__transform is not None:
            image = self.__transform(image)

        if self.__labels is not None:
            extra = self.__labels[idx]  # return target with image
        else:
            extra = self.__fnames[idx]  # return filename with image

        return image, extra

In [None]:
 def get_datasets(
    data: KenyanFood13Data,
    test_transforms,
    train_transforms,
    subset = False
):
    """
    Creates datasets for the training, validation, and testing.
    """

    if not subset:

        train_dataset = KenyanFood13Dataset(
            image_root = data.image_root, 
            fnames = data.train_fnames, 
            labels = data.train_labels, 
            transform = train_transforms)

        valid_dataset = KenyanFood13Dataset(
            image_root = data.image_root, 
            fnames = data.valid_fnames, 
            labels = data.valid_labels, 
            transform = test_transforms)


    else:
        
        train_dataset = KenyanFood13Dataset(
            image_root = data.image_root, 
            fnames = data.train_fnames_subset, 
            labels = data.train_labels_subset, 
            transform = train_transforms)

        valid_dataset = KenyanFood13Dataset(
            image_root = data.image_root, 
            fnames = data.valid_fnames_subset, 
            labels = data.valid_labels_subset, 
            transform = test_transforms)

    test_dataset = KenyanFood13Dataset(
        image_root = data.image_root, 
        fnames = data.test_fnames, 
        transform = test_transforms)

    return train_dataset, valid_dataset, test_dataset

In [None]:
def get_data_loaders(
    train_dataset: Dataset,
    valid_dataset: Dataset,
    test_dataset: Dataset,
    batch_size = 16, 
    num_workers = 2
):
    """
    This function creates and returns the training and validation data loaders.
    """
    
    train_data_loader = torch.utils.data.DataLoader(
        train_dataset, 
        batch_size=batch_size, 
        num_workers=num_workers, 
        shuffle=True)
    
    valid_data_loader = torch.utils.data.DataLoader(
        valid_dataset, 
        batch_size=batch_size, 
        num_workers=num_workers, 
        shuffle=False)

    test_data_loader = torch.utils.data.DataLoader(
        test_dataset, 
        batch_size=batch_size, 
        num_workers=num_workers, 
        shuffle=False)
    

    return train_data_loader, valid_data_loader, test_data_loader

In [None]:
def get_mean_std(data_loader=None):
    """
    Computes the mean and standard deviation. Since this method takes a long
    time to run and the data for this workbook is fixed, this method was run
    once and its result was copied to the normalization transform.
    """
    
    if data_loader is None:
        """
        Returns the mean and standard deviation used by the pretrained
        classification models.
        """

        mean = [0.485, 0.456, 0.406] 
        std = [0.229, 0.224, 0.225]
    
    else:
        """
        Computes the mean and standard deviation of the images returned
        by the specified data loader. 
        
        For comparision, the mean and standard deviation of the KenyanFood13
        images using the train_dataset and preprocess transforms is as follows.
        
            mean = [0.5778, 0.4631, 0.3471], 
            std = [0.2380, 0.2461, 0.2464]):
        """
        
        std = 0.
        mean = 0.
        for images, _ in data_loader:
            batch_samples = images.size(0)
            images = images.view(batch_samples, images.size(1), -1)
            std += images.std(2).sum(0)
            mean += images.mean(2).sum(0)
        std /= len(data_loader.dataset)
        mean /= len(data_loader.dataset)

    return mean, std

In [None]:
class ImageTransforms:
    """
    This utility class has methods to create transforms used to train and evaluate a model as
    well as visualize images.
    """
    
    def __init__(
            self, 
            resize = 256, 
            crop_size = 224, 
            mean = [0.485, 0.456, 0.406], 
            std = [0.229, 0.224, 0.225],
            config = DataAugConfig()
        ):
        self.__resize = resize
        self.__crop_size = crop_size
        self.__mean = mean
        self.__std = std
        self.__config = config

    def preprocess(self, augment=False):
        """
        These transformations convert PIL images to uniformly sized tensors whose dimensions
        are crop_size x crop_size pixels. If the augment parameter is True, then the following
        data augmentation transforms are applied: color jitter, horizontal flip, vertical flip,
        rotation, translation, scaling, and erasing.
        """
        return transforms.Compose(self.__create_transform_list(normalize=False, augment=augment))
    
    def common(self):
        """
        These transformations convert PIL images to uniformly sized tensors whose dimensions
        are crop_size x crop_size pixels and values are normalized by the mean and standard
        deviation.
        """
        return transforms.Compose(self.__create_transform_list(normalize=True, augment=False))
    
    def augment(self):
        """
        These transformations convert PIL images to uniformly sized tensors whose dimensions
        are crop_size x crop_size pixels and values are normalized by the mean and standard
        deviation with the following data random augmentations: color jitter, horizontal flip,
        vertical flip, rotation, translation, scaling, and erasing.
        """
        return transforms.Compose(self.__create_transform_list(normalize=True, augment=True))

    def __create_transform_list(self, normalize, augment):
        tlist = []

        # resize before data augmentation to reduce execution time
        tlist.append(transforms.Resize(
            size = self.__resize, 
            interpolation = PIL.Image.BILINEAR
        ))

        if augment:
            # apply rotation before center cropping to avoid "corner voids"
            tlist.extend(self.__get_color_jitter())
            tlist.extend(self.__get_random_vertical_flip())
            tlist.extend(self.__get_random_horizontal_flip())
            tlist.extend(self.__get_random_affine())

        tlist.append(transforms.CenterCrop(self.__crop_size))
        tlist.append(transforms.ToTensor())

        if normalize:
            tlist.append(transforms.Normalize(self.__mean, self.__std, inplace=True))

        if augment:
            tlist.extend(self.__get_random_erasing())

        return tlist

    def __get_color_jitter(self):
        tlist = []
        if self.__config.color_enabled:
            tlist.append(transforms.ColorJitter(
                brightness = self.__config.color_brightness, 
                contrast = self.__config.color_contrast, 
                saturation = self.__config.color_saturation, 
                hue = self.__config.color_hue
            ))
        return tlist

    def __get_random_vertical_flip(self):
        tlist = []
        if self.__config.vert_flip_prob > 0:
            tlist.append(transforms.RandomVerticalFlip(
                p=self.__config.vert_flip_prob
            ))
        return tlist 

    def __get_random_horizontal_flip(self):
        tlist = []
        if self.__config.horz_flip_prob > 0:
            tlist.append(transforms.RandomHorizontalFlip(
                p=self.__config.horz_flip_prob
            ))
        return tlist 

    def __get_random_affine(self):
        tlist = []
        if self.__config.affine_enabled:
            tlist.append(transforms.RandomAffine(
                degrees = self.__config.affine_rotation,
                translate = self.__config.affine_translate,
                scale = self.__config.affine_scale,
                resample=PIL.Image.BILINEAR
            ))
        return tlist

    def __get_random_erasing(self):
        tlist = []
        if self.__config.erasing_prob > 0:
            tlist.append(transforms.RandomErasing(
                p = self.__config.erasing_prob,
                scale = self.__config.erasing_scale,
                ratio = self.__config.erasing_ratio,
                inplace = True
            ))
        return tlist 

## <font style="color:green">2. Configuration [5 Points]</font>

Define your configuration in this section.

For example,

```
@dataclass
class TrainingConfiguration:
    '''
    Describes configuration of the training process
    '''
    batch_size: int = 10 
    epochs_count: int = 50  
    init_learning_rate: float = 0.1  # initial learning rate for lr scheduler
    log_interval: int = 5  
    test_interval: int = 1  
    data_root: str = "/kaggle/input/pytorch-opencv-course-classification/" 
    num_workers: int = 2  
    device: str = 'cuda'  
    
```

## <font style="color:blue">Assignment Response</font>

Since I am using the **trainer** module, I made minor modifications to the <u>configuration.py</u> file. In addition, I created a master _MasterConfig_ data class that encapsulates the individual configuration data classes. Lastly, I created helper functions to instantiate the _MasterConfig_ class with experiment-specific overrides.

The following is the output of the `create_master_config` method w/o any parameter overrides.
```
MasterConfig(
    system=SystemConfig(
        proj_dir='./project2',
        seed=42,
        cudnn_deterministic=True,
        cudnn_benchmark_enabled=False
    ),
    data_aug=DataAugConfig(
        color_enabled=True,
        color_brightness=(0.85, 1.15),
        color_contrast=(0.5, 1.5),
        color_saturation=(0.5, 2.0),
        color_hue=(-0.03, 0.03),
        horz_flip_prob=0.5,
        vert_flip_prob=0.5,
        affine_enabled=True,
        affine_rotation=45,
        affine_translate=(0.1, 0.1),
        affine_scale=(0.9, 1.1),
        erasing_prob=0.5,
        erasing_scale=(0.02, 0.33),
        erasing_ratio=(0.3, 3.3)
    ),
    dataset=DatasetConfig(
        data_dir='./project2/data',
        valid_size=0.2,
        train_transforms=Compose(
            Resize(size=256, interpolation=PIL.Image.BILINEAR)
            ColorJitter(brightness=(0.85, 1.15), contrast=(0.5, 1.5), saturation=(0.5, 2.0), hue=(-0.03, 0.03))
            RandomVerticalFlip(p=0.5)
            RandomHorizontalFlip(p=0.5)
            RandomAffine(degrees=[-45.0, 45.0], translate=(0.1, 0.1), scale=(0.9, 1.1), resample=PIL.Image.BILINEAR)
            CenterCrop(size=(224, 224))
            ToTensor()
            Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
            RandomErasing()
        ),
        test_transforms=Compose(
            Resize(size=256, interpolation=PIL.Image.BILINEAR)
            CenterCrop(size=(224, 224))
            ToTensor()
            Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
        ),
        visual_transforms=Compose(
            Resize(size=256, interpolation=PIL.Image.BILINEAR)
            CenterCrop(size=(224, 224))
            ToTensor()
        ),
        visual_aug_transforms=Compose(
            Resize(size=256, interpolation=PIL.Image.BILINEAR)
            ColorJitter(brightness=(0.85, 1.15), contrast=(0.5, 1.5), saturation=(0.5, 2.0), hue=(-0.03, 0.03))
            RandomVerticalFlip(p=0.5)
            RandomHorizontalFlip(p=0.5)
            RandomAffine(degrees=[-45.0, 45.0], translate=(0.1, 0.1), scale=(0.9, 1.1), resample=PIL.Image.BILINEAR)
            CenterCrop(size=(224, 224))
            ToTensor()
            RandomErasing()
        )
    ),
    data_loader=DataLoaderConfig(
        batch_size=32,
        num_workers=4
    ),
    optimizer=OptimizerConfig(
        learning_rate=0.001,
        momentum=0.9,
        weight_decay=0.0001,
        betas=(0.9, 0.999)
    ),
    scheduler=SchedulerConfig(
        gamma=0.1,
        step_size=10,
        milestones=(20, 30, 40),
        patience=10,
        threshold=0.0001
    ),
    trainer=TrainerConfig(
        device='cuda',
        training_epochs=50,
        weighted_loss_fn=True,
        progress_bar=True,
        model_dir='models',
        model_saving_period=0,
        visualizer_dir='runs',
        stop_loss_epochs=0,
        stop_acc_epochs=0,
        stop_acc_ema_alpha=0.3,
        stop_acc_threshold=2.0
    )
)
```

In [None]:
def create_system_config() -> SystemConfig:
    return SystemConfig(
        proj_dir = proj_dir
    )

In [None]:
def create_data_aug_config(
    color_enabled: Optional[bool] = None,
    color_brightness: Optional[Tuple[float, float]] = None,
    color_contrast: Optional[Tuple[float, float]] = None,
    color_saturation: Optional[Tuple[float, float]] = None,
    color_hue: Optional[Tuple[float, float]] = None,
    horz_flip_prob: Optional[float] = None,
    vert_flip_prob: Optional[float] = None,
    affine_enabled: Optional[bool] = None,
    affine_rotation: Optional[float] = None,
    affine_translate: Optional[Tuple[float, float]] = None,
    affine_scale: Optional[Tuple[float, float]] = None,
    erasing_prob: Optional[float] = None,
    erasing_scale: Optional[Tuple[float, float]] = None,
    erasing_ratio: Optional[Tuple[float, float]] = None
) -> DataAugConfig:
    config = DataAugConfig()
    if color_enabled is None:
        color_enabled = config.color_enabled
    if color_brightness is None:
        color_brightness = config.color_brightness
    if color_contrast is None:
        color_contrast = config.color_contrast
    if color_saturation is None:
        color_saturation = config.color_saturation
    if color_hue is None:
        color_hue = config.color_hue
    if horz_flip_prob is None:
        horz_flip_prob = config.horz_flip_prob
    if vert_flip_prob is None:
        vert_flip_prob = config.vert_flip_prob
    if affine_enabled is None:
        affine_enabled = config.affine_enabled
    if affine_rotation is None:
        affine_rotation = config.affine_rotation
    if affine_translate is None:
        affine_translate = config.affine_translate
    if affine_scale is None:
        affine_scale = config.affine_scale
    if erasing_prob is None:
        erasing_prob = config.erasing_prob
    if erasing_scale is None:
        erasing_scale = config.erasing_scale
    if erasing_ratio is None:
        erasing_ratio = config.erasing_ratio
    return DataAugConfig(
        color_enabled = color_enabled,
        color_brightness = color_brightness,
        color_contrast = color_contrast,
        color_saturation = color_saturation,
        color_hue = color_hue,
        horz_flip_prob = horz_flip_prob,
        vert_flip_prob = vert_flip_prob,
        affine_enabled = affine_enabled,
        affine_rotation = affine_rotation,
        affine_translate = affine_translate,
        affine_scale = affine_scale,
        erasing_prob = erasing_prob,
        erasing_scale = erasing_scale,
        erasing_ratio = erasing_ratio
    )

In [None]:
def create_dataset_config(
    resize: int = 256, 
    crop_size: int = 224,
    data_aug_config = DataAugConfig()
) -> DatasetConfig:
    mean, std = get_mean_std()
    transforms = ImageTransforms(
        resize = resize, 
        crop_size = crop_size, 
        mean = mean, 
        std = std,
        config = data_aug_config
    )
    return DatasetConfig(
        data_dir = data_dir,
        test_transforms = transforms.common(),
        train_transforms = transforms.augment(),
        visual_transforms = transforms.preprocess(augment=False),
        visual_aug_transforms = transforms.preprocess(augment=True)
    )

In [None]:
def create_data_loader_config(
    batch_size: Optional[int] = None, 
    num_workers: Optional[int] = None
) -> DataLoaderConfig:
    config = DataLoaderConfig()
    if batch_size is None:
        batch_size = config.batch_size
    if num_workers is None:
        num_workers = config.num_workers
    return DataLoaderConfig(
        batch_size = batch_size,
        num_workers = num_workers
    )

In [None]:
def create_optimizer_config(
    learning_rate: Optional[float] = None, 
    momentum: Optional[float] = None, 
    weight_decay: Optional[float] = None,
    betas: Optional[Tuple[float, float]] = None
) -> OptimizerConfig():
    config = OptimizerConfig()
    if learning_rate is None:
        learning_rate = config.learning_rate
    if momentum is None:
        momentum = config.momentum
    if weight_decay is None:
        weight_decay = config.weight_decay
    if betas is None:
        betas = config.betas
    return OptimizerConfig(
        learning_rate = learning_rate,
        momentum = momentum,
        weight_decay = weight_decay,
        betas = betas
    )

In [None]:
def create_scheduler_config(
    gamma: Optional[float] = None,
    step_size: Optional[int] = None,
    milestones: Optional[Iterable] = None,
    patience: Optional[int] = None,
    threshold: Optional[float] = None
) -> SchedulerConfig:
    config = SchedulerConfig()
    if gamma is None:
        gamma = config.gamma
    if step_size is None:
        step_size = config.step_size
    if milestones is None:
        milestones = config.milestones
    if patience is None:
        patience = config.patience
    if threshold is None:
        threshold = config.threshold
    return SchedulerConfig(
        gamma = gamma,
        step_size = step_size,
        milestones = milestones,
        patience = patience,
        threshold = threshold
    )

In [None]:
def create_trainer_config(
    training_epochs: Optional[int] = None,
    weighted_loss_fn: Optional[bool] = None,
    model_saving_period: Optional[int] = None,
    stop_loss_epochs: Optional[int] = None,
    stop_acc_epochs: Optional[int] = None, 
    stop_acc_ema_alpha: Optional[float] = None,
    stop_acc_threshold: Optional[float] = None
) -> TrainerConfig:
    config = TrainerConfig()
    if training_epochs is None:
        training_epochs = config.training_epochs
    if weighted_loss_fn is None:
        weighted_loss_fn = config.weighted_loss_fn
    if model_saving_period is None:
        model_saving_period = config.model_saving_period
    if stop_loss_epochs is None:
        stop_loss_epochs = config.stop_loss_epochs
    if stop_acc_epochs is None:
        stop_acc_epochs = config.stop_acc_epochs
    if stop_acc_ema_alpha is None:
        stop_acc_ema_alpha = config.stop_acc_ema_alpha
    if stop_acc_threshold is None:
        stop_acc_threshold = config.stop_acc_threshold
    return TrainerConfig(
        training_epochs = training_epochs,
        weighted_loss_fn = weighted_loss_fn,
        model_saving_period = model_saving_period,
        stop_loss_epochs = stop_loss_epochs,
        stop_acc_epochs = stop_acc_epochs,
        stop_acc_ema_alpha = stop_acc_ema_alpha,
        stop_acc_threshold = stop_acc_threshold
    )

In [None]:
@dataclass
class MasterConfig:
    system: SystemConfig = create_system_config()
    data_aug: DataAugConfig = create_data_aug_config()
    dataset: DatasetConfig = create_dataset_config()
    data_loader: DataLoaderConfig = create_data_loader_config()
    optimizer: OptimizerConfig = create_optimizer_config()
    scheduler: SchedulerConfig = create_scheduler_config()
    trainer: TrainerConfig = create_trainer_config()

In [None]:
def create_master_config(
    transform_resize: int = 256,
    transform_crop_size: int = 224,
    data_aug_color_enabled: Optional[bool] = None,
    data_aug_color_brightness: Optional[Tuple[float, float]] = None,
    data_aug_color_contrast: Optional[Tuple[float, float]] = None,
    data_aug_color_saturation: Optional[Tuple[float, float]] = None,
    data_aug_color_hue: Optional[Tuple[float, float]] = None,
    data_aug_horz_flip_prob: Optional[float] = None,
    data_aug_vert_flip_prob: Optional[float] = None,
    data_aug_affine_enabled: Optional[bool] = None,
    data_aug_affine_rotation: Optional[float] = None,
    data_aug_affine_translate: Optional[Tuple[float, float]] = None,
    data_aug_affine_scale: Optional[Tuple[float, float]] = None,
    data_aug_erasing_prob: Optional[float] = None,
    data_aug_erasing_scale: Optional[Tuple[float, float]] = None,
    data_aug_erasing_ratio: Optional[Tuple[float, float]] = None,
    data_loader_batch_size: Optional[int] = None,
    data_loader_num_workers: Optional[int] = None,
    optimizer_learning_rate: Optional[float] = None,
    optimizer_momentum: Optional[float] = None,
    optimizer_weight_decay: Optional[float] = None,
    optimizer_betas: Optional[Tuple[float, float]] = None,
    lr_scheduler_gamma: Optional[float] = None,
    lr_scheduler_step_size: Optional[int] = None,
    lr_scheduler_milestones: Optional[Iterable] = None,
    lr_scheduler_patience: Optional[int] = None,
    lr_scheduler_threshold: Optional[float] = None,
    trainer_training_epochs: Optional[int] = None,
    trainer_weighted_loss_fn: Optional[bool] = None,
    trainer_model_saving_period: Optional[int] = None,
    trainer_stop_loss_epochs: Optional[int] = None,
    trainer_stop_acc_epochs: Optional[int] = None,
    trainer_stop_acc_ema_alpha: Optional[float] = None,
    trainer_stop_acc_threshold: Optional[float] = None       
) -> MasterConfig:
    # used to initialize MasterConfig data class and as a parameter to the
    # create_data_config function
    data_aug_config = create_data_aug_config(
        data_aug_color_enabled,
        data_aug_color_brightness,
        data_aug_color_contrast,
        data_aug_color_saturation,
        data_aug_color_hue,
        data_aug_horz_flip_prob,
        data_aug_vert_flip_prob,
        data_aug_affine_enabled,
        data_aug_affine_rotation,
        data_aug_affine_translate,
        data_aug_affine_scale,
        data_aug_erasing_prob,
        data_aug_erasing_scale,
        data_aug_erasing_ratio
    )
    return MasterConfig(
        system = create_system_config(),
        data_aug = data_aug_config,
        dataset = create_dataset_config(
            transform_resize,
            transform_crop_size,
            data_aug_config
        ),
        data_loader = create_data_loader_config(
            data_loader_batch_size,
            data_loader_num_workers
        ),
        optimizer = create_optimizer_config(
            optimizer_learning_rate,
            optimizer_momentum,
            optimizer_weight_decay,
            optimizer_betas
        ),
        scheduler = create_scheduler_config(
            lr_scheduler_gamma,
            lr_scheduler_step_size,
            lr_scheduler_milestones,
            lr_scheduler_patience,
            lr_scheduler_threshold   
        ),
        trainer = create_trainer_config(
            trainer_training_epochs,
            trainer_weighted_loss_fn,
            trainer_model_saving_period,
            trainer_stop_loss_epochs,
            trainer_stop_acc_epochs,
            trainer_stop_acc_ema_alpha,
            trainer_stop_acc_threshold       
        )
    )    

## <font style="color:green">3. Evaluation Metric [10 Points]</font>

Define methods or classes that will be used in model evaluation, for example, accuracy, f1-score, etc.

### Loss Function

The number of images per class are slightly imbalanced. The most represented class, chapati, has approximately five times more images than the least represented class, kukuchoma. Consequently, I will explore weighted and non-weighted cross-entropy loss functions. I will use `nn.CrossEntropyLoss()` with and without passing a tensor of rescaling weights. The computation of the rescaling weights is described below.

The number of images per class were obtained via the following code.
```
    config = create_master_config()
    setup_system(config.system)       
    data = KenyanFood13Data(
        data_root = config.dataset.data_dir,
        valid_size = config.dataset.valid_size,
        random_seed = config.system.seed
    )
    images_per_class = np.column_stack((data.classes, data.class_counts))
    print(images_per_class)

    [['bhaji' 632]
     ['chapati' 862]
     ['githeri' 479]
     ['kachumbari' 494]
     ['kukuchoma' 173]
     ['mandazi' 620]
     ['masalachips' 438]
     ['matoke' 483]
     ['mukimo' 212]
     ['nyamachoma' 784]
     ['pilau' 329]
     ['sukumawiki' 402]
     ['ugali' 628]]
```
The normalized rescaling weights given to each class were obtained via the following code.
```
    weights = np.sum(data.class_counts) / data.class_counts
    norm_weights = weights / np.sum(weights)
    print(norm_weights)

    [0.04989366 0.03658097 0.06583046 0.06383156 0.18227048 0.05085934
     0.07199268 0.06528528 0.14873959 0.0402204  0.09584435 0.07843978
     0.05021145]
```

In [None]:
loss_rescaling_weight = torch.tensor([
    0.04989366, 0.03658097, 0.06583046, 0.06383156, 0.18227048,
    0.05085934, 0.07199268, 0.06528528, 0.14873959, 0.04022040,
    0.09584435, 0.07843978, 0.05021145,
])

### <font style="color:blue">Metric Function</font>

I am using the <b>trainer</b> module's <i>AccuracyEstimator</i> class from <u>metrics.py</u> file.

## <font style="color:green">4. Train and Validation [5 Points]</font>

Write the methods or classes that will be used for training and validation.

## Assignment Response

Since I am using the **trainer** module, I made the following modifications to the `trainer.py` file.
* Added the ability to save the model only when the test loss reaches a new minimum.
* Added the ability to terminate training after a specified number of epochs where the test loss is not further reduced.
* Added the ability to terminate training after a specified number of epochs where the exponential moving average of the test loss does not significantly increase.

I made the following modifications to the `visualizer.py` and `tensorboard_visualizer.py` files.
* Added an <code>add_image(self, tag, image)</code> method to visualize the dataset.
* Added an <code>add_figure(elf, tag, figure, close=True)</code> method to visulize matplotlib figures, e.g., confusion matrices.
* Added an <code>add_graph(self, model, images)</code> method to document the model.
* Added an <code>add_pr_curves(self, classes, pred_probs, targets)</code> method to document the precision-recall curves of the fully trained model for each class type.

In [None]:
class Optimizer(Enum):
    SGD = auto()
    ADAM = auto()
    
def get_optimizer(
    model: nn.Module,
    optimizer: Optimizer = Optimizer.SGD,
    config: OptimizerConfig = OptimizerConfig()
):
    """
    Gets the specified optimizer.
    """
    
    if optimizer == Optimizer.SGD:
        return optim.SGD(
            model.parameters(),
            lr = config.learning_rate,
            weight_decay = config.weight_decay,
            momentum = config.momentum
        )
    
    elif optimizer == Optimizer.ADAM:
        return optim.Adam(
            model.parameters(),
            lr = config.learning_rate,
            betas = config.betas
        )
    
    else:
        raise SystemExit("Invalid lr_scheduler value.")

In [None]:
class LrScheduler(Enum):
    STEP = auto()
    MULTI_STEP = auto()
    EXPONENTIAL = auto()
    REDUCE_ON_PLATEAU = auto()
    
def get_lr_scheduler(
    optimizer: optim.Optimizer,
    lr_scheduler: LrScheduler = LrScheduler.STEP,
    config: SchedulerConfig = SchedulerConfig()
):
    """
    Gets the specified LR scheduler.
    """

    if lr_scheduler == LrScheduler.STEP:
        return optim.lr_scheduler.StepLR(
            optimizer,
            step_size = config.step_size,
            gamma = config.gamma
        )
    
    elif lr_scheduler == LrScheduler.MULTI_STEP:
        return optim.lr_scheduler.MultiStepLR(
            optimizer, 
            milestones = config.milestones, 
            gamma = config.gamma
        )
    
    elif lr_scheduler == LrScheduler.EXPONENTIAL:
        return optim.lr_scheduler.ExponentialLR(
            optimizer, 
            gamma = config.gamma
        )
    
    
    elif lr_scheduler == LrScheduler.REDUCE_ON_PLATEAU:
        return optim.lr_scheduler.ReduceLROnPlateau(
            optimizer, 
            factor = config.gamma,
            patience = config.patience,
            threshold = config.threshold
        )
    
    else:
        raise SystemExit("Invalid lr_scheduler value.")

In [None]:
def predict_batch(model, data, max_prob=True):
    """
    Get prediction for a batch of data. This function assumes the model and data
    have be sent to the appropriate device and the model is in evaluation mode.
    """

    output = model(data)

    # get probability score using softmax
    prob = F.softmax(output, dim=1)
    
    if max_prob:
        # get the max probability
        pred_prob = prob.data.max(dim=1)[0]
    else:
        # return all probabilties
        pred_prob = prob.data
    
    # get the index of the max probability
    pred_index = prob.data.max(dim=1)[1]
    
    return pred_index.cpu().numpy(), pred_prob.cpu().numpy()

In [None]:
def get_targets_and_pred_probs(model, dataloader, device):
    """
    Get targets and prediction probabilities.
    """
    
    model.to(device)  # send model to cpu or cuda
    model.eval()      # set model to evaluation mode

    targets = []
    pred_probs = []

    for _, (data, target) in enumerate(dataloader):
        _, probs = predict_batch(model, data.to(device), max_prob=False)       
        pred_probs.append(probs)
        targets.append(target.numpy())
        
    targets = np.concatenate(targets).astype(int)
    pred_probs = np.concatenate(pred_probs, axis=0)
    
    return targets, pred_probs

In [None]:
def predict_test_data(model, dataloader, device):
    """
    Predict the class of the test data.
    """

    model.to(device)  # send model to cpu or cuda
    model.eval()      # set model to evaluation mode

    fnames = []
    preds = []

    for _, (data, fname) in enumerate(dataloader):
        pred, _ = predict_batch(model, data.to(device), max_prob=True)       
        fnames.append(fname)
        preds.append(pred)
        
    fnames = np.concatenate(fnames)
    preds = np.concatenate(preds).astype(int)
    
    return fnames, preds

In [None]:
def predict_valid_data(model, dataloader, device):
    """
    Predict the class of the validation data.
    """

    model.to(device)  # send model to cpu or cuda
    model.eval()      # set model to evaluation mode

    targets = []
    preds = []

    for _, (data, target) in enumerate(dataloader):
        pred, _ = predict_batch(model, data.to(device), max_prob=True)       
        targets.append(target)
        preds.append(pred)
        
    targets = np.concatenate(targets)
    preds = np.concatenate(preds).astype(int)
    
    return targets, preds

## <font style="color:green">5. Model [5 Points]</font>

Define your model in this section.

**You are allowed to use any pre-trained model.**

## Assignment Response

Since my approach is to explore transfer learning in numerous pretrained models, I created classes
to easily set the "tuning level" of the ResNet, VGG, and DenseNet family of TorchVision models. I
also want to see how the model I developed for Project 1 performs, so I created as a class for it as well.

In [None]:
TuningParam = namedtuple("TuningParam", ["level", "block", "layers"])

In [None]:
class TorchVisionModel(nn.Module):
    """
    Base class for TorchVision models, which provides a method to freeze network
    layers allowing fine tuning. This class does change the network's output layer.
    Derived classes must do this!
    """
    
    def __init__(self, network: nn.Module):
        super().__init__()
        self._network = network
        
    def forward(self, x):
        return self._network(x)
    
    def _freeze_layers(
        self, 
        tuning_params: List[TuningParam], 
        pretrained:bool, 
        tuning_level:int
    ):
        # freeze network if using a pretrained model
        if pretrained:
            self._set_requires_grad(self._network, False)
        
        # unfreeze blocks/layers based on tuning_level
        for param in tuning_params:
            if param.level <= tuning_level:
                block = getattr(self._network, param.block)
                if param.layers is None:
                    self._set_requires_grad(block, True)
                else:
                    for layer in param.layers:
                        if isinstance(layer, int):
                            self._set_requires_grad(block[layer], True)
                        else:
                            self._set_requires_grad(getattr(block, layer), True)
            
    def _set_requires_grad(self, block, value):
        for param in block.parameters():
            param.requires_grad = value
            
    def _inclusive_range(self, start:int, stop:int) -> List[int]:
        return list(range(start, stop + 1))

In [None]:
class ResNetBase(TorchVisionModel):
    """
    Base class for ResNet models that may be pretrained and fine tuned. The
    tuning_level parameter controls the degree of fine tuning as depicted in
    the table below.
        
        ResNet     tuning_level
        -------    ------------
        conv1          >= 5        
        bn1            >= 5
        relu           >= 5
        maxpool        >= 5
        layer1         >= 4
        layer2         >= 3
        layer3         >= 2
        layer4         >= 1
        avgpool        >= 1
        fc             >= 0
        
    If tuning_level = 0, then only the classifier layer is trained.
    If tuning_level = 5, then the entire network is trained.
    """
    
    def __init__(self, model_fn: Callable, pretrained=True, tuning_level=0):
        super().__init__(model_fn(pretrained=pretrained))

        # change the output layer
        last_layer_in = self._network.fc.in_features
        self._network.fc = nn.Linear(last_layer_in, 13)

        # ToDo: Omit layer types that do not have trainable parameters
        tuning_params = [
            TuningParam(0, "fc", None),
            TuningParam(1, "avgpool", None),
            TuningParam(1, "layer4", None),
            TuningParam(2, "layer3", None),
            TuningParam(3, "layer2", None),
            TuningParam(4, "layer1", None),
            TuningParam(5, "maxpool", None),
            TuningParam(5, "relu", None),
            TuningParam(5, "bn1", None),
            TuningParam(5, "conv1", None)
        ]

        self._freeze_layers(tuning_params, pretrained, tuning_level)
    
    def forward(self, x):
        return self._network(x)

In [None]:
class ResNet18(ResNetBase):
    def __init__(self, pretrained=True, tuning_level=0):
        super().__init__(models.resnet18, pretrained, tuning_level)

In [None]:
class ResNet34(ResNetBase):
    def __init__(self, pretrained=True, tuning_level=0):
        super().__init__(models.resnet34, pretrained, tuning_level)

In [None]:
class ResNet50(ResNetBase):
    def __init__(self, pretrained=True, tuning_level=0):
        super().__init__(models.resnet50, pretrained, tuning_level)

In [None]:
class ResNet101(ResNetBase):
    def __init__(self, pretrained=True, tuning_level=0):
        super().__init__(models.resnet101, pretrained, tuning_level)

In [None]:
class ResNet152(ResNetBase):
    def __init__(self, pretrained=True, tuning_level=0):
        super().__init__(models.resnet152, pretrained, tuning_level)

In [None]:
class ResNeXt50(ResNetBase):
    def __init__(self, pretrained=True, tuning_level=0):
        super().__init__(models.resnext50_32x4d, pretrained, tuning_level)

In [None]:
class ResNeXt101(ResNetBase):
    def __init__(self, pretrained=True, tuning_level=0):
        super().__init__(models.resnext101_32x8d, pretrained, tuning_level)

In [None]:
class WideResNet50(ResNetBase):
    def __init__(self, pretrained=True, tuning_level=0):
        super().__init__(models.wide_resnet50_2, pretrained, tuning_level)

In [None]:
class WideResNet101(ResNetBase):
    def __init__(self, pretrained=True, tuning_level=0):
        super().__init__(models.wide_resnet101_2, pretrained, tuning_level)

In [None]:
class VGGBase(TorchVisionModel):
    """
    Base class for ResNet models that may be pretrained and fine tuned.
    """
    
    def __init__(self, model_fn: Callable, pretrained=True):
        super().__init__(model_fn(pretrained=pretrained))

        last_layer_in = self._network.classifier[6].in_features
        self._network.classifier[6] = nn.Linear(last_layer_in, 13)
    
    def forward(self, x):
        return self._network(x)

In [None]:
class VGG11BN(VGGBase):
    """
    VGG11BN model that may be pretrained and fine tuned. The tuning_level
    parameter controls the degree of fine tuning as depicted in the table
    below.
    
        VGG11_BN            tuning_level
        ----------------    ------------
        features
          [00-02] CNR           >= 5
          [03] MaxPool2d        >= 5
          [04-06] CNR           >= 4
          [07] MaxPool2d        >= 4
          [08-10] CNR           >= 3
          [11-13] CNR           >= 3
          [14] MaxPool2d        >= 3
          [15-17] CNR           >= 2
          [18-20] CNR           >= 2
          [21] MaxPool2d        >= 2
          [22-24] CNR           >= 1
          [25-27] CNR           >= 1
          [28] MaxPool2d        >= 1
        avgpool                 >= 1
        classifier              
          [00-02] LRD           >= 0
          [03-05] LRD           >= 0
          [06] Linear           >= 0
    """

    def __init__(self, pretrained=True, tuning_level=0):
        super().__init__(models.vgg11_bn, pretrained)
            
        # ToDo: Omit layer types that do not have trainable parameters
        tuning_params = [
            TuningParam(0, "classifier", None),
            TuningParam(1, "avgpool", None),
            TuningParam(1, "features", self._inclusive_range(22, 28)),
            TuningParam(2, "features", self._inclusive_range(15, 21)),
            TuningParam(3, "features", self._inclusive_range(8, 14)),
            TuningParam(4, "features", self._inclusive_range(4, 7)),
            TuningParam(5, "features", self._inclusive_range(0, 3))
        ]

        self._freeze_layers(tuning_params, pretrained, tuning_level)

In [None]:
class VGG13BN(VGGBase):
    """
    VGG13BN model that may be pretrained and fine tuned. The tuning_level
    parameter controls the degree of fine tuning as depicted in the table
    below.
    
        VGG13_BN            tuning_level
        ----------------    ------------
        features
          [00-02] CNR           >= 5
          [03-05] CNR           >= 5
          [06] MaxPool2d        >= 5
          [07-09] CNR           >= 4
          [10-12] CNR           >= 4
          [13] MaxPool2d        >= 4
          [14-16] CNR           >= 3
          [17-19] CNR           >= 3
          [20] MaxPool2d        >= 3
          [21-23] CNR           >= 2
          [24-26] CNR           >= 2
          [27] MaxPool2d        >= 2
          [28-30] CNR           >= 1
          [31-33] CNR           >= 1
          [34] MaxPool2d        >= 1
        avgpool                 >= 1
        classifier
          [00-02] LRD           >= 0
          [03-05] LRD           >= 0
          [06] Linear           >= 0
    """

    def __init__(self, pretrained=True, tuning_level=0):
        super().__init__(models.vgg13_bn, pretrained)
            
        # ToDo: Omit layer types that do not have trainable parameters
        tuning_params = [
            TuningParam(0, "classifier", None),
            TuningParam(1, "avgpool", None),
            TuningParam(1, "features", self._inclusive_range(28, 34)),
            TuningParam(2, "features", self._inclusive_range(21, 27)),
            TuningParam(3, "features", self._inclusive_range(14, 20)),
            TuningParam(4, "features", self._inclusive_range(7, 13)),
            TuningParam(5, "features", self._inclusive_range(0, 6))
        ]

        self._freeze_layers(tuning_params, pretrained, tuning_level)

In [None]:
class VGG16BN(VGGBase):
    """
    VGG16BN model that may be pretrained and fine tuned. The tuning_level
    parameter controls the degree of fine tuning as depicted in the table
    below.
    
        VGG16_BN            tuning_level
        ----------------    ------------
        features
          [00-02] CNR           >= 5
          [03-05] CNR           >= 5
          [06] MaxPool2d        >= 5
          [07-09] CNR           >= 4
          [10-12] CNR           >= 4
          [13] MaxPool2d        >= 4
          [14-16] CNR           >= 3
          [17-19] CNR           >= 3
          [20-22] CNR           >= 3
          [23] MaxPool2d        >= 3
          [24-26] CNR           >= 2
          [27-29] CNR           >= 2
          [30-32] CNR           >= 2
          [33] MaxPool2d        >= 2
          [34-36] CNR           >= 1
          [37-39] CNR           >= 1
          [40-42] CNR           >= 1
          [43] MaxPool2d        >= 1
        avgpool                 >= 1
        classifier              
          [00-02] LRD           >= 0
          [03-05] LRD           >= 0
          [06] Linear           >= 0
    """

    def __init__(self, pretrained=True, tuning_level=0):
        super().__init__(models.vgg16_bn, pretrained)
            
        # ToDo: Omit layer types that do not have trainable parameters
        tuning_params = [
            TuningParam(0, "classifier", None),
            TuningParam(1, "avgpool", None),
            TuningParam(1, "features", self._inclusive_range(34, 43)),
            TuningParam(2, "features", self._inclusive_range(24, 33)),
            TuningParam(3, "features", self._inclusive_range(14, 23)),
            TuningParam(4, "features", self._inclusive_range(7, 13)),
            TuningParam(5, "features", self._inclusive_range(0, 6))
        ]

        self._freeze_layers(tuning_params, pretrained, tuning_level)

In [None]:
class VGG19BN(VGGBase):
    """
    VGG19BN model that may be pretrained and fine tuned. The tuning_level
    parameter controls the degree of fine tuning as depicted in the table
    below.

        VGG11_BN            tuning_level
        ----------------    ------------
        features
          [00-02] CNR           >= 5
          [03-05] CNR           >= 5
          [06] MaxPool2d        >= 5
          [07-09] CNR           >= 4
          [10-12] CNR           >= 4
          [13] MaxPool2d        >= 4
          [14-16] CNR           >= 3
          [17-19] CNR           >= 3
          [20-22] CNR           >= 3
          [23-25] CNR           >= 3
          [26] MaxPool2d        >= 3
          [27-29] CNR           >= 2
          [30-32] CNR           >= 2
          [33-35] CNR           >= 2
          [36-38] CNR           >= 2
          [39] MaxPool2d        >= 2
          [40-42] CNR           >= 1
          [43-45] CNR           >= 1
          [46-48] CNR           >= 1
          [49-51] CNR           >= 1
          [52] MaxPool2d        >= 1
        avgpool                 >= 1
        classifier              
          [00-02] LRD           >= 0
          [03-05] LRD           >= 0
          [06] Linear           >= 0
    """

    def __init__(self, pretrained=True, tuning_level=0):
        super().__init__(models.vgg19_bn, pretrained)
            
        # ToDo: Omit layer types that do not have trainable parameters
        tuning_params = [
            TuningParam(0, "classifier", None),
            TuningParam(1, "avgpool", None),
            TuningParam(1, "features", self._inclusive_range(40, 52)),
            TuningParam(2, "features", self._inclusive_range(27, 39)),
            TuningParam(3, "features", self._inclusive_range(14, 26)),
            TuningParam(4, "features", self._inclusive_range(7, 13)),
            TuningParam(5, "features", self._inclusive_range(0, 6))
        ]

        self._freeze_layers(tuning_params, pretrained, tuning_level)

In [None]:
class DenseNetBase(TorchVisionModel):
    """
    Base class for DenseNet models that may be pretrained and fine tuned. The
    tuning_level parameter controls the degree of fine tuning as depicted in
    the table below.
        
        DenseNet          tuning_level
        -------------     ------------
        features
          conv0               >= 5
          norm0               >= 5
          relu0               >= 5
          pool0               >= 5
          denseblock1         >= 4
          transition1         >= 4
          denseblock2         >= 3
          transition2         >= 3
          denseblock3         >= 2
          transition3         >= 2
          denseblock4         >= 1
          norm5               >= 1
        classifier            >= 0

    """

    def __init__(self, model_fn: Callable, pretrained=True, tuning_level=0):
        super().__init__(model_fn(pretrained=pretrained))

        # change the output layer
        last_layer_in = self._network.classifier.in_features
        self._network.classifier = nn.Linear(last_layer_in, 13)

        # ToDo: Omit layer types that do not have trainable parameters
        tuning_params = [
            TuningParam(0, "classifier", None),
            TuningParam(1, "features", ["denseblock4", "norm5"]),
            TuningParam(2, "features", ["denseblock3", "transition3"]),
            TuningParam(3, "features", ["denseblock2", "transition2"]),
            TuningParam(4, "features", ["denseblock1", "transition1"]),
            TuningParam(5, "features", ["conv0", "norm0", "relu0", "pool0"])
        ]

        self._freeze_layers(tuning_params, pretrained, tuning_level)
    
    def forward(self, x):
        return self._network(x)

In [None]:
class DenseNet121(DenseNetBase):
    def __init__(self, pretrained=True, tuning_level=0):
        super().__init__(models.densenet121, pretrained, tuning_level)

In [None]:
class DenseNet169(DenseNetBase):
    def __init__(self, pretrained=True, tuning_level=0):
        super().__init__(models.densenet169, pretrained, tuning_level)

In [None]:
class DenseNet201(DenseNetBase):
    def __init__(self, pretrained=True, tuning_level=0):
        super().__init__(models.densenet201, pretrained, tuning_level)

In [None]:
class DenseNet161(DenseNetBase):
    def __init__(self, pretrained=True, tuning_level=0):
        super().__init__(models.densenet161, pretrained, tuning_level)

In [None]:
class Project1Model(nn.Module):
    """
    Modified the last layer to output 13, rather than 3, features.
    """
    def __init__(self):
        super().__init__()

        # Convolution layers
        self._body = nn.Sequential(
            # input 3 x 224 x 224
            nn.Conv2d(in_channels=3, out_channels=16, kernel_size=7, padding=3),
            nn.BatchNorm2d(16),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2),

            # input 24 * 112 * 112
            nn.Conv2d(in_channels=16, out_channels=24, kernel_size=5, padding=2),
            nn.BatchNorm2d(24),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2),

            # input 36 * 56 * 56
            nn.Conv2d(in_channels=24, out_channels=36, kernel_size=5, padding=2),
            nn.BatchNorm2d(36),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2),

            #input 54 * 28 * 28
            nn.Conv2d(in_channels=36, out_channels=54, kernel_size=5, padding=2),
            nn.BatchNorm2d(54),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2),

            #input 81 * 14 * 14
            nn.Conv2d(in_channels=54, out_channels=81, kernel_size=5, padding=2),
            nn.BatchNorm2d(81),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2),
        )

        # Fully connected layers
        self._head = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(in_features=81*7*7, out_features=1024), 
            nn.ReLU(inplace=True),

            nn.Dropout(0.5),
            nn.Linear(in_features=1024, out_features=256), 
            nn.ReLU(inplace=True),

            nn.Linear(in_features=256, out_features=13)            
        )
        
    def forward(self, x):
        x = self._body(x)
        x = x.view(x.size()[0], -1)
        x = self._head(x)
        return x

## <font style="color:green">6. Utils [5 Points]</font>

Define your methods or classes which are not covered in the above sections.

In [None]:
def create_confusion_matrix(cm, classes, model_name=None):
    """
    Creates and returns a confusion matrix figure that can be saved to a file or .
    """

    from mpl_toolkits.axes_grid1 import make_axes_locatable

    # compute accuracy, normalized confusion matrix
    accuracy = np.trace(cm) / float(np.sum(cm))
    cm_norm = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]

    # initialize the plot tick marks and title
    tick_marks = np.arange(len(classes))
    title = "Confusion Matrix"
    if model_name is not None:
        title = title + " ({})".format(model_name)
    
    # plot the confusion matrix
    plt.style.use('default')
    fig = plt.figure(figsize=(11,10), tight_layout=True)
    im = plt.imshow(cm_norm, interpolation="nearest", cmap=plt.cm.Blues, vmin=0., vmax=1.)

    plt.title(title + "\n")
    plt.xticks(tick_marks, classes, rotation=22.5)
    plt.yticks(tick_marks, classes)

    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(
            j, i,
            format(cm[i, j], "d") + "\n" + format(cm_norm[i, j], ".2f"), 
            horizontalalignment="center",
            verticalalignment="center",
            color="white" if cm_norm[i, j] > 0.5 else "black"
        )

    plt.ylabel("Target Labels")
    plt.xlabel("Predicted Labels\nAccuracy={:0.4f}".format(accuracy))
    
    # plot the color bar
    divider = make_axes_locatable(plt.gca())
    cax = divider.append_axes("right", size=0.3, pad=0.2)
    plt.colorbar(im, cax=cax)

    # close the plot and return the figure
    plt.close()
    return fig

In [None]:
def get_requires_grad_status(block) -> str:
    params = list(block.parameters())
    if not params:
        return "N/A"
    
    or_of_params = False
    and_of_params = True
    for param in params:
        or_of_params = or_of_params or param.requires_grad
        and_of_params = and_of_params and param.requires_grad
    if or_of_params and and_of_params:
        return "True"
    elif not or_of_params and not and_of_params:
        return "False"
    else:
        return "Mixed"

In [None]:
def print_top_level_model_blocks(
    model:nn.Module, 
    include_grandchildren:bool = False, 
    display_requires_grad = False
):
    status = ""
    if display_requires_grad:
        status = f", requires_grad={get_requires_grad_status(model)}"
    print(f"{type(model).__name__}{status}")
    for child in model.named_children():
        if display_requires_grad:
            status = f", requires_grad={get_requires_grad_status(child[1])}"
        print(f"  {child[0]}{status}")
        if include_grandchildren:
            for grandchild in child[1].named_children():
                if display_requires_grad:
                    status = f", requires_grad={get_requires_grad_status(grandchild[1])}"
                if not grandchild[0].isnumeric():
                    print(f"    { grandchild[0]}{status}")
                else:
                    print(f"    [{grandchild[0]}] {type(grandchild[1]).__name__}{status}")

### Output the architecture of several pretrained PyTorch models.

The following models all have the same high level ResNet architecture.
* ResNet-18
* ResNet-34
* ResNet-50
* ResNet-101
* ResNet-152
* ResNeXt-50-32x4d
* Wide ResNet-50-2
* Wide ResNet-101-2

The following models all have he same high level DenseNet architecture.
* Densenet-121
* Densenet-169
* Densenet-201
* Densenet-161

```
print_top_level_model_blocks(models.resnet18(), False)
print_top_level_model_blocks(models.densenet121(), True)
print_top_level_model_blocks(models.vgg11_bn(), True)
print_top_level_model_blocks(models.vgg13_bn(), True)
print_top_level_model_blocks(models.vgg16_bn(), True)
print_top_level_model_blocks(models.vgg19_bn(), True)
```

The (formatted) output the `print_top_level_model_blocks` statements yields the following.

Note:
* Groups of Conv2d, BatchNorm, and ReLU layers have been condensed to CNR
* Groups of Linear, ReLU, and Dropout layers have been condensed to LRD

```
ResNet           | DenseNet         | VGG11_BN         | VGG13_BN         | VGG16_BN         | VGG19_BN
  conv1          |   features       |   features       |   features       |   features       |   features
  bn1            |     conv0        |     [00-02] CNR  |     [00-02] CNR  |     [00-02] CNR  |     [00-02] CNR
  relu           |     norm0        |                  |     [03-05] CNR  |     [03-05] CNR  |     [03-05] CNR
  maxpool        |     relu0        |     [03] MaxPool |     [06] MaxPool |     [06] MaxPool |     [06] MaxPool2d
  layer1         |     pool0        |     [04-06] CNR  |     [07-09] CNR  |     [07-09] CNR  |     [07-09] CNR
  layer2         |     denseblock1  |                  |     [10-12] CNR  |     [10-12] CNR  |     [10-12] CNR
  layer3         |     transition1  |     [07] MaxPool |     [13] MaxPool |     [13] MaxPool |     [13] MaxPool2d
  layer4         |     denseblock2  |     [08-10] CNR  |     [14-16] CNR  |     [14-16] CNR  |     [14-16] CNR
  avgpool        |     transition2  |     [11-13] CNR  |     [17-19] CNR  |     [17-19] CNR  |     [17-19] CNR
  fc             |     denseblock3  |                  |                  |     [20-22] CNR  |     [20-22] CNR
                 |     transition3  |                  |                  |                  |     [23-25] CNR
                 |     denseblock4  |     [14] MaxPool |     [20] MaxPool |     [23] MaxPool |     [26] MaxPool2d
                 |     norm5        |     [15-17] CNR  |     [21-23] CNR  |     [24-26] CNR  |     [27-29] CNR
                 |   classifier     |     [18-20] CNR  |     [24-26] CNR  |     [27-29] CNR  |     [30-32] CNR
                 |                  |                  |                  |     [30-32] CNR  |     [33-35] CNR
                 |                  |                  |                  |                  |     [36-38] CNR
                 |                  |     [21] MaxPool |     [27] MaxPool |     [33] MaxPool |     [39] MaxPool2d
                 |                  |     [22-24] CNR  |     [28-30] CNR  |     [34-36] CNR  |     [40-42] CNR
                 |                  |     [25-27] CNR  |     [31-33] CNR  |     [37-39] CNR  |     [43-45] CNR
                 |                  |                  |                  |     [40-42] CNR  |     [46-48] CNR
                 |                  |                  |                  |                  |     [49-51] CNR
                 |                  |     [28] MaxPool |     [34] MaxPool |     [43] MaxPool |     [52] MaxPool2d
                 |                  |   avgpool        |   avgpool        |   avgpool        |   avgpool
                 |                  |   classifier     |   classifier     |   classifier     |   classifier
                 |                  |     [00-02] LRD  |     [00-02] LRD  |     [00-02] LRD  |     [00-02] LRD
                 |                  |     [03-05] LRD  |     [03-05] LRD  |     [03-05] LRD  |     [03-05] LRD
                 |                  |     [06] Linear  |     [06] Linear  |     [06] Linear  |     [06] Linear
```

The `print_top_level_model_blocks` function was also used to test whether I properly implemented the fine tuning code. For example,

```
model = ResNet18(pretrained=True, tuning_level=0)
print_top_level_model_blocks(model._network, include_grandchildren=False, display_requires_grad=True)

    ResNet, requires_grad=Mixed
      conv1, requires_grad=False
      bn1, requires_grad=False
      relu, requires_grad=N/A
      maxpool, requires_grad=N/A
      layer1, requires_grad=False
      layer2, requires_grad=False
      layer3, requires_grad=False
      layer4, requires_grad=False
      avgpool, requires_grad=N/A
      fc, requires_grad=True#

model = ResNet18(pretrained=True, tuning_level=1)
print_top_level_model_blocks(model._network, include_grandchildren=False, display_requires_grad=True)

    ResNet, requires_grad=Mixed
      conv1, requires_grad=False
      bn1, requires_grad=False
      relu, requires_grad=N/A
      maxpool, requires_grad=N/A
      layer1, requires_grad=False
      layer2, requires_grad=False
      layer3, requires_grad=False
      layer4, requires_grad=True
      avgpool, requires_grad=N/A
      fc, requires_grad=True

...

model = ResNet18(pretrained=True, tuning_level=5)
print_top_level_model_blocks(model._network, include_grandchildren=False, display_requires_grad=True)

    ResNet, requires_grad=True
      conv1, requires_grad=True
      bn1, requires_grad=True
      relu, requires_grad=N/A
      maxpool, requires_grad=N/A
      layer1, requires_grad=True
      layer2, requires_grad=True
      layer3, requires_grad=True
      layer4, requires_grad=True
      avgpool, requires_grad=N/A
      fc, requires_grad=True
```

In [None]:
def create_submission_csv(path, exp):
    """
    ToDo: Need to test and execute on the best model.
    """

    # create a dictionary of numeric labels to text labels
    label_dict = {}
    for key, value in zip(np.arange(len(exp.classes)), exp.classes):
        label_dict[key] = value

    # get predictions for the test data using the trained model            
    fnames, labels = predict_test_data(exp.trained_model, exp.test_loader, exp.device)

    # convert the numeric labels to their text equivalents
    labels = [label_dict[label] for label in labels]

    # create a pandas data frame and write it to a CSV file
    data_frame = pd.DataFrame(
        np.stack((fnames, labels), axis=-1), 
        columns=["id", "class"]
    )

    data_frame.to_csv(path)

## <font style="color:green">7. Experiment [5 Points]</font>

Choose your optimizer and LR-scheduler and use the above methods and classes to train your model.

### Base Experiment Classes

The following base classes facilitate rapid experiment creation.
* Experiment - Base class for the following classes.
* VisualExperiment - Conduct data visualization experiments.
* ModelExperiment - Conduct model training experiments

In [None]:
class Experiment(ABC):
    def __init__(
        self,
        abbr: Optional[str] = None,
        transform_resize: int = 256,
        transform_crop_size: int = 224,
        data_aug_color_enabled: Optional[bool] = None,
        data_aug_color_brightness: Optional[Tuple[float, float]] = None,
        data_aug_color_contrast: Optional[Tuple[float, float]] = None,
        data_aug_color_saturation: Optional[Tuple[float, float]] = None,
        data_aug_color_hue: Optional[Tuple[float, float]] = None,
        data_aug_horz_flip_prob: Optional[float] = None,
        data_aug_vert_flip_prob: Optional[float] = None,
        data_aug_affine_enabled: Optional[bool] = None,
        data_aug_affine_rotation: Optional[float] = None,
        data_aug_affine_translate: Optional[Tuple[float, float]] = None,
        data_aug_affine_scale: Optional[Tuple[float, float]] = None,
        data_aug_erasing_prob: Optional[float] = None,
        data_aug_erasing_scale: Optional[Tuple[float, float]] = None,
        data_aug_erasing_ratio: Optional[Tuple[float, float]] = None,
        data_loader_batch_size: Optional[int] = None,
        data_loader_num_workers: Optional[int] = None,
        optimizer_learning_rate: Optional[float] = None,
        optimizer_momentum: Optional[float] = None,
        optimizer_weight_decay: Optional[float] = None,
        optimizer_betas: Optional[Tuple[float, float]] = None,
        lr_scheduler_gamma: Optional[float] = None,
        lr_scheduler_step_size: Optional[int] = None,
        lr_scheduler_milestones: Optional[Iterable] = None,
        lr_scheduler_patience: Optional[int] = None,
        lr_scheduler_threshold: Optional[float] = None,
        trainer_training_epochs: Optional[int] = None,
        trainer_weighted_loss_fn: Optional[bool] = None,
        trainer_model_saving_period: Optional[int] = None,
        trainer_stop_loss_epochs: Optional[int] = None,
        trainer_stop_acc_epochs: Optional[int] = None,
        trainer_stop_acc_ema_alpha: Optional[float] = None,
        trainer_stop_acc_threshold: Optional[float] = None
    ):

        """
        This base class for data visualization and model training experiment does the following.
        
            - Creates the master configuration instance accomodating constructor overrides
            - Sets up the system, e.g., ensures reproducibility, enables CUDA acceleration, etc.
            - Initializes the KenyanFood13 dataset
            - Configures experiment visualization 

        """

        # the experient abbreviation is not specified, use the class name removing the prefix
        # "Exp" if present
        if abbr is None:
            name = type(self).__name__
            self._abbr = name
            if name.startswith("Exp"):
                self._abbr = name[3:]
        else:
            self._abbr = abbr
        
        # ToDo: Apply patch if CUDA is not available.
        self._resize = transform_resize
        self._crop_size = transform_crop_size
        self._config = create_master_config(
            transform_resize,
            transform_crop_size,
            data_aug_color_enabled,
            data_aug_color_brightness,
            data_aug_color_contrast,
            data_aug_color_saturation,
            data_aug_color_hue,
            data_aug_horz_flip_prob,
            data_aug_vert_flip_prob,
            data_aug_affine_enabled,
            data_aug_affine_rotation,
            data_aug_affine_translate,
            data_aug_affine_scale,
            data_aug_erasing_prob,
            data_aug_erasing_scale,
            data_aug_erasing_ratio,
            data_loader_batch_size,
            data_loader_num_workers,
            optimizer_learning_rate,
            optimizer_momentum,
            optimizer_weight_decay,
            optimizer_betas,
            lr_scheduler_gamma,
            lr_scheduler_step_size,
            lr_scheduler_milestones,
            lr_scheduler_patience,
            lr_scheduler_threshold,  
            trainer_training_epochs,
            trainer_weighted_loss_fn,
            trainer_model_saving_period,
            trainer_stop_loss_epochs,
            trainer_stop_acc_epochs,
            trainer_stop_acc_ema_alpha,
            trainer_stop_acc_threshold       
        )
        

        setup_system(self._config.system)
        
        self._data = KenyanFood13Data(
            data_root = self._config.dataset.data_dir,
            valid_size = self._config.dataset.valid_size,
            random_seed = self._config.system.seed
        )

        self._classes = self._data.classes
        self._library = self._data.library
        self.__visualizer = None
        
    @property
    def classes(self):
        return self._classes

    @property
    def library(self):
        return self._library

    """
    Protected methods that may or must be overridden by derived classes.
    """
    
    @abstractproperty
    def _visualizer_name(self) -> str:
        pass

    def _open_visualizer(self):
        if self.__visualizer is None:
            self.__visualizer = TensorBoardVisualizer(os.path.join(
                self._config.system.proj_dir,
                self._config.trainer.visualizer_dir, 
                self._visualizer_name
            ))
        return self.__visualizer

    def _close_visualizer(self):
        if self.__visualizer is not None:
            self.__visualizer.close_tensorboard()
            self.__visualizer = None

In [None]:
class VisualExperiment(Experiment):
    """
    This is the base class for data visualization experiments.
    """

    def __init__(
        self,
        abbr: Optional[str] = None,
        log_originals: bool = True,
        log_augmentations:bool = True,
        transform_resize: int = 256,
        transform_crop_size: int = 224,
        data_aug_color_enabled: Optional[bool] = None,
        data_aug_color_brightness: Optional[Tuple[float, float]] = None,
        data_aug_color_contrast: Optional[Tuple[float, float]] = None,
        data_aug_color_saturation: Optional[Tuple[float, float]] = None,
        data_aug_color_hue: Optional[Tuple[float, float]] = None,
        data_aug_horz_flip_prob: Optional[float] = None,
        data_aug_vert_flip_prob: Optional[float] = None,
        data_aug_affine_enabled: Optional[bool] = None,
        data_aug_affine_rotation: Optional[float] = None,
        data_aug_affine_translate: Optional[Tuple[float, float]] = None,
        data_aug_affine_scale: Optional[Tuple[float, float]] = None,
        data_aug_erasing_prob: Optional[float] = None,
        data_aug_erasing_scale: Optional[Tuple[float, float]] = None,
        data_aug_erasing_ratio: Optional[Tuple[float, float]] = None
    ):
        super().__init__(
            abbr,
            transform_resize,
            transform_crop_size,
            data_aug_color_enabled,
            data_aug_color_brightness,
            data_aug_color_contrast,
            data_aug_color_saturation,
            data_aug_color_hue,
            data_aug_horz_flip_prob,
            data_aug_vert_flip_prob,
            data_aug_affine_enabled,
            data_aug_affine_rotation,
            data_aug_affine_translate,
            data_aug_affine_scale,
            data_aug_erasing_prob,
            data_aug_erasing_scale,
            data_aug_erasing_ratio
        )
        
        self.__log_originals = log_originals
        self.__log_augmentations = log_augmentations

    def log_sample_images(
        self, 
        num_of_contact_sheets: int = 1, 
        log_originals: Optional[bool] = None, 
        log_augmentations: Optional[bool] = None
    ):
        """
        Create a 6 x 6 grid of images for each type of food in the data and
        log these images to the visualizer.
        """

        if log_originals is None:
            log_originals = self.__log_originals

        if log_augmentations is None:
            log_augmentations = self.__log_augmentations

        # abort if not logging either originals or augmentations
        if not log_originals and not log_augmentations:
            return

        visualizer = self._open_visualizer()

        for food, fnames in self._library.items():
            
            # since we want to visualize the same images before and after data
            # augmentation, we need to shuffle the data ourselves rather than
            # have the data loader do this for us
            shuffled = random.sample(fnames, len(fnames))

            # create food specific datasets
            original_dataset = KenyanFood13Dataset(
                image_root = self._data.image_root,
                fnames = shuffled,
                transform = self._config.dataset.visual_transforms
            )
            augmented_dataset = KenyanFood13Dataset(
                image_root = self._data.image_root,
                fnames = shuffled,
                transform = self._config.dataset.visual_aug_transforms
            )

            # load 36 original and augmented images
            original_dataloader = DataLoader(original_dataset, batch_size=36, shuffle=False)
            original_iter = iter(original_dataloader)
            augmented_dataloader = DataLoader(augmented_dataset, batch_size=36, shuffle=False)
            augmented_iter = iter(augmented_dataloader)

            for idx in list(range(num_of_contact_sheets)):
                try:
                    original_images, _ = next(original_iter)
                    augmented_images, _ = next(augmented_iter)

                    # save images to project's image directory (needs to exist!)
                    # if log_originals:
                    #     original_name = f"{self._classes[food]}{(idx + 1):02d}.jpg"
                    #     original_path = os.path.join(proj_dir, "images", original_name)
                    #     torchvision.utils.save_image(original_images, fp=original_path, nrow=6)
                    # if log_augmentations:
                    #     augmented_name = f"{self._classes[food]}{(idx + 1):02d}_aug.jpg"
                    #     augmented_path = os.path.join(proj_dir, "images", augmented_name)
                    #     torchvision.utils.save_image(augmented_images, fp=augmented_path, nrow=6)

                    # add image grid to visualizer
                    if log_originals:
                        visualizer.add_image(
                            tag=self._classes[food], 
                            image=torchvision.utils.make_grid(original_images, nrow=6)
                        )
                    if log_augmentations:
                        visualizer.add_image(
                            tag=self._classes[food] + " (augmented)", 
                            image=torchvision.utils.make_grid(augmented_images, nrow=6)
                        )

                except StopIteration:
                    break
        
        self._close_visualizer()
        
    @property
    def _visualizer_name(self) -> str:
        return self._abbr + f"-DV-RS_{self._resize}-CS_{self._crop_size}"

In [None]:
class ModelExperiment(Experiment):
    """
    This is the base class for model training experiments.
    """
    def __init__(
        self,
        abbr: Optional[str] = None,
        data_augmentation: bool = True,
        optimizer: Optimizer = Optimizer.SGD,
        lr_scheduler: LrScheduler = LrScheduler.STEP,
        transform_resize: int = 256,
        transform_crop_size: int = 224,
        data_aug_color_enabled: Optional[bool] = None,
        data_aug_color_brightness: Optional[Tuple[float, float]] = None,
        data_aug_color_contrast: Optional[Tuple[float, float]] = None,
        data_aug_color_saturation: Optional[Tuple[float, float]] = None,
        data_aug_color_hue: Optional[Tuple[float, float]] = None,
        data_aug_horz_flip_prob: Optional[float] = None,
        data_aug_vert_flip_prob: Optional[float] = None,
        data_aug_affine_enabled: Optional[bool] = None,
        data_aug_affine_rotation: Optional[float] = None,
        data_aug_affine_translate: Optional[Tuple[float, float]] = None,
        data_aug_affine_scale: Optional[Tuple[float, float]] = None,
        data_aug_erasing_prob: Optional[float] = None,
        data_aug_erasing_scale: Optional[Tuple[float, float]] = None,
        data_aug_erasing_ratio: Optional[Tuple[float, float]] = None,
        data_loader_batch_size: Optional[int] = None,
        data_loader_num_workers: Optional[int] = None,
        optimizer_learning_rate: Optional[float] = None,
        optimizer_momentum: Optional[float] = None,
        optimizer_weight_decay: Optional[float] = None,
        optimizer_betas: Optional[Tuple[float, float]] = None,
        lr_scheduler_gamma: Optional[float] = None,
        lr_scheduler_step_size: Optional[int] = None,
        lr_scheduler_milestones: Optional[Iterable] = None,
        lr_scheduler_patience: Optional[int] = None,
        lr_scheduler_threshold: Optional[float] = None,
        trainer_training_epochs: Optional[int] = None,
        trainer_weighted_loss_fn: Optional[bool] = None,
        trainer_model_saving_period: Optional[int] = None,
        trainer_stop_loss_epochs: Optional[int] = None,
        trainer_stop_acc_epochs: Optional[int] = None,
        trainer_stop_acc_ema_alpha: Optional[float] = None,
        trainer_stop_acc_threshold: Optional[float] = None,
        use_data_subsets: bool = False
    ):
        super().__init__(
            abbr,
            transform_resize,
            transform_crop_size,
            data_aug_color_enabled,
            data_aug_color_brightness,
            data_aug_color_contrast,
            data_aug_color_saturation,
            data_aug_color_hue,
            data_aug_horz_flip_prob,
            data_aug_vert_flip_prob,
            data_aug_affine_enabled,
            data_aug_affine_rotation,
            data_aug_affine_translate,
            data_aug_affine_scale,
            data_aug_erasing_prob,
            data_aug_erasing_scale,
            data_aug_erasing_ratio,
            data_loader_batch_size,
            data_loader_num_workers,
            optimizer_learning_rate,
            optimizer_momentum,
            optimizer_weight_decay,
            optimizer_betas,
            lr_scheduler_gamma,
            lr_scheduler_step_size,
            lr_scheduler_milestones,
            lr_scheduler_patience,
            lr_scheduler_threshold,
            trainer_training_epochs,
            trainer_weighted_loss_fn,
            trainer_model_saving_period,
            trainer_stop_loss_epochs,
            trainer_stop_acc_epochs,
            trainer_stop_acc_ema_alpha,
            trainer_stop_acc_threshold
        )

        test_transforms = self._config.dataset.test_transforms
        train_transforms = self._config.dataset.train_transforms
        if not data_augmentation:
            train_transforms = test_transforms

        train_dataset, valid_dataset, test_dataset = get_datasets(
            data = self._data,
            test_transforms = test_transforms,
            train_transforms = train_transforms,
            subset = use_data_subsets
        )

        self.__train_loader, self.__valid_loader, self.__test_loader = get_data_loaders(
            train_dataset = train_dataset,
            valid_dataset = valid_dataset,
            test_dataset = test_dataset,
            batch_size = self._config.data_loader.batch_size,
            num_workers = self._config.data_loader.num_workers
        )                
    
        weight = None
        if trainer_weighted_loss_fn:
            weight = loss_rescaling_weight

        self.__model, model_id = self._get_model()
        self.__model_name = self._abbr + "-" + model_id
        self.__model_dir = os.path.join(self._config.system.proj_dir, self._config.trainer.model_dir)
        self.__loss_fn = nn.CrossEntropyLoss(weight=weight)
        self.__metric_fn = AccuracyEstimator(topk=(1, )) # ToDo: Fix! (trainer.py expects a dictionary w/ 'top1' key)
        self.__optimizer = get_optimizer(self.__model, optimizer, self._config.optimizer)
        self.__lr_scheduler = get_lr_scheduler(self.__optimizer, lr_scheduler, self._config.scheduler)

    @property
    def test_loader(self) -> DataLoader:
        return self.__test_loader

    @property
    def train_loader(self):
        return self.__train_loader
    
    @property
    def valid_loader(self):
        return self.__valid_loader

    @property
    def device(self) -> torch.device:
        return torch.device(self._config.trainer.device)

    @property
    def trained_model_path(self):
        return os.path.join(self.__model_dir, self.__model_name + ".pt")
    
    @property
    def trained_model(self) -> nn.Module:
        self.__load_model()
        return self.__model

    def train(self):
        device = self.device
        self.__model = self.__model.to(device)
        self.__loss_fn = self.__loss_fn.to(device)

        visualizer = self._open_visualizer()
        model_trainer = Trainer(
            model=self.__model,
            loader_train=self.__train_loader,
            loader_test=self.__valid_loader,
            loss_fn=self.__loss_fn,
            metric_fn=self.__metric_fn,
            optimizer=self.__optimizer,
            lr_scheduler=self.__lr_scheduler,
            model_save_dir=self.__model_dir,
            model_name=self.__model_name,
            model_saving_period=self._config.trainer.model_saving_period,
            stop_loss_epochs=self._config.trainer.stop_loss_epochs,
            stop_acc_ema_alpha=self._config.trainer.stop_acc_ema_alpha,
            stop_acc_epochs=self._config.trainer.stop_acc_epochs,
            stop_acc_threshold=self._config.trainer.stop_acc_threshold,
            device=device,
            data_getter=itemgetter(0),
            target_getter=itemgetter(1),
            stage_progress=self._config.trainer.progress_bar,
            visualizer=visualizer,
            get_key_metric=itemgetter("top1")
        )
        model_trainer.register_hook("end_epoch", hooks.end_epoch_hook_classification)
        metrics = model_trainer.fit(self._config.trainer.training_epochs)
        self._close_visualizer()

        return metrics
    
    def log_graph(self):
        model = self.trained_model
        images, _ = next(iter(self.valid_loader))
        device = self.device

        visualizer = self._open_visualizer()
        visualizer.add_graph(model.to(device), images.to(device))
        self._close_visualizer()
        
    
    def log_pr_curves(self):
        targets, pred_probs = get_targets_and_pred_probs(
            self.trained_model, 
            self.valid_loader,
            self.device
        )

        visualizer = self._open_visualizer()
        visualizer.add_pr_curves(self._classes, targets, pred_probs)
        self._close_visualizer()
    
    def log_confusion_matrix(self):
        targets, preds = predict_valid_data(
            self.trained_model,
            self.valid_loader,
            self.device
        )

        visualizer = self._open_visualizer()
        cm = confusion_matrix(targets, preds)
        tag = f"Confusion Matrix ({self.__model_name})"
        figure = create_confusion_matrix(cm, self.classes, self.__model_name)
        visualizer.add_figure(tag=tag, figure=figure, close=True)
        self._close_visualizer()

    """
    Protected methods that may or must be overridden by derived classes.
    """
    
    @property
    def _visualizer_name(self) -> str:
        return self.__model_name
            
    @abstractmethod
    def _get_model(self) -> Tuple[nn.Module, str]:
        pass
    
    """
    Private methods that should only be called by this base class.
    """
    
    def __load_model(self):
        path = self.trained_model_path
        if os.path.exists(path):
            self.__model.load_state_dict(torch.load(path))

In [None]:
class AnalyzeTensorBoardRun:
    """
    A utility class to do the following for a TensorBoard run:
    
        - return a list of run matching the filter
        - return a dictionary of scalars *
        - return accuracy at epoch where loss is lowest *
        - return overfitting metric *
        - return a summary of all runs
        - return a figure *
        
        * specific to a given run
        
    Note: The overfitting metric is the slope of the test loss divided
          by the train loss. A value of zero indicates no overfitting.
    """

    def __init__(self, visualizer_dir, filter:str="^.*"):
        import re
        runs = os.listdir(os.path.join(visualizer_dir))
        runs = [run for run in runs if re.search(filter, run) is not None]
        runs.sort()
        self.__visualizer_dir = visualizer_dir
        self.__runs = runs 

    @property
    def runs(self) -> List[str]:
        return self.__runs
    
    def get_scalars(self, run: str):
        from tensorboard.backend.event_processing import event_accumulator
        path = os.path.join(self.__visualizer_dir, run)
        event_acc = event_accumulator.EventAccumulator(path)
        event_acc.Reload()

        scalars = {}
        for tag in sorted(event_acc.Tags()["scalars"]):
            x, y = [], []
            for scalar_event in event_acc.Scalars(tag):
                x.append(scalar_event.step)
                y.append(scalar_event.value)
            scalars[tag] = (np.asarray(x), np.asarray(y))
        return scalars

    def get_accuracy(self, scalars) -> float:
        index = np.argmin(scalars["data/test_loss"][1])
        return scalars["data/test_metric:top1"][1][index].item()

    def get_overfitting_metric(self, scalars, alpha:float=0.3) -> float:
        from numpy.polynomial.polynomial import polyfit
        loss_tst = scalars["data/test_loss"][1]
        loss_trn = scalars["data/train_loss"][1]
        ratio = [tst / trn for tst, trn in zip(loss_tst, loss_trn)]   
        _, m = polyfit(list(range(len(ratio))), ratio, 1)
        return m

    def get_summary(self) -> List[Tuple[str, float, float]]:
        summary = []
        for run in self.runs:
            scalars = self.get_scalars(run)
            accuracy = self.get_accuracy(scalars)
            overfitting = self.get_overfitting_metric(scalars)
            summary.append((run, accuracy, overfitting))
        return summary

    def get_loss_plot(self, scalars, title, alpha:float=0.3):
        from numpy.polynomial.polynomial import polyfit
        accuracy = self.get_accuracy(scalars)
        loss_tst = scalars["data/test_loss"][1]
        loss_trn = scalars["data/train_loss"][1]
        loss_tst_ema = self.__ema_smoothing(loss_tst, alpha)
        loss_trn_ema = self.__ema_smoothing(loss_trn, alpha)

        x = list(range(len(loss_tst_ema)))
        ratio = [tst / trn for tst, trn in zip(loss_tst, loss_trn)]       
        b, m = polyfit(x, ratio, 1)
        ratio_bf = [m * x1 + b for x1 in x] 

        fig = plt.figure(figsize=(8,4))
        plt.suptitle(f"{title} ({accuracy:.2f}%)")
        plt.subplot(1, 2, 1)
        plt.title("loss")
        plt.plot(x, loss_trn_ema, "r", label="test")
        plt.plot(x, loss_tst_ema, "b", label="train")
        plt.legend()
        plt.subplot(1, 2, 2)
        plt.title(f"loss (overfit: {m:.3f})")
        plt.gca().set_ylim([0.0, 4.0])
        plt.plot(x, ratio, color="#a0ffa0")
        plt.plot(x, ratio_bf, color="#00ff00", label="test/train")
        plt.legend()
        plt.close()
        return fig      

    def __ema_smoothing(self, data, alpha=0.3) -> List[float]:
        data_ema = []
        last = data[0]
        for datum in data:
            last = alpha * datum + (1 - alpha) * last
            data_ema.append(last)
        return data_ema

In [None]:
def conduct(
    exp: Experiment, 
    log_graph:bool = True,
    log_pr_curves:bool = True,
    log_confusion_matrix:bool = True,
    free_experiment:bool = True):   
    """
    This method conducts an visual or model experiment. A visual experiment logs original
    and/or augmented images in 6 x 6 contact sheets to TensorBoard. A model experiment
    performs the following steps and returns its training metrics.

    1. Logs the model's graph.
    2. Trains the model.
    3. Logs the precision-recall curve for each food class.
    4. Logs the confusion matrix.
    
    Note: Steps 3 and 4 are performed on the validation data set using model weights that
          achieved the lowest average loss on the validaton data set.
    """
    
    if isinstance(exp, VisualExperiment):
        exp.log_sample_images()
        result = None
    
    elif isinstance(exp, ModelExperiment):
        if log_graph:
            exp.log_graph()
        metrics = exp.train()
        if log_pr_curves:
            exp.log_pr_curves()
        if log_confusion_matrix:
            exp.log_confusion_matrix()
        result = metrics
        
    if free_experiment:
        # after running several experiments, a "RuntimeError: CUDA out of memory"
        # exception was raised ... trying to see whether explicitly deleting the
        # experiment helps
        del exp
        torch.cuda.empty_cache()

In [None]:
def analyze_tensor_board_runs(filter: str = "^B"):
    config = create_trainer_config()
    analyze = AnalyzeTensorBoardRun(os.path.join(proj_dir, config.visualizer_dir), filter=filter)

    # create a markdown table
    print("|Experiment|Accuracy|Overfitting Metric|")
    print("|:---|:---:|:---:|")
    for item in analyze.get_summary():
        print(f"|{item[0]}|{item[1]:.2f}|{item[2]:.3f}|")

    # save loss images ... destination directory must exist!
    for run in analyze.runs:
        path = os.path.join(proj_dir, "images", "loss", run + ".png")
        scalars = analyze.get_scalars(run)
        fig = analyze.get_loss_plot(scalars, run)
        fig.savefig(path, facecolor="#ffffff")
        plt.close(fig)

### Experiment Group A: Data Visualization Experiments and Training Pipeline Check

**Summary**

This first set of experiments log contact sheets, i.e., 6 x 6 grids of images, of each food type to the visualizer with and without data augmentation. 

The second set of experiments train only the classifier layer (fc) of the pretrained Resnet18 model using a subset of the data without augmentation to check the training pipeline. In the first experiment, training will stop after 40 epochs. In the second experiment, training will stop after 40 epochs or when the smoothed accuracy (computed by expontential moving average with an alpha = 0.3) does not decrease by 2% within 10 epochs.

The third set of experiments vary the number of data loader worker threads to determine the optimal number for future experiments. These experiments stop after 11 epochs. The time between logging the 2nd and 11th epochs' test metrics divided by 10 will be used to evaluate data loading efficiency. Saving the model's state is disabled to eliminate its time contribution.

___

**Results**

I used the data augmentation transforms I created for project 1. The data validation experiment revealed an issue with these transforms. First, the color jitter transform dramatically changed the image's color. While this was not detrimental to classify cats, dogs, and pandas; I suspect it may reduce accuracy on the KenyanFood13 dataset. I will test this theory in the last group of experiments after the assignment has been completed.

To properly set the data augmentation transform parameters, I ran several experiments not shown in this notebook. These experiments disabled all but one type of augmentation in order to "tune" it. For example, to properly set the hue parameter of the color jitter transform, I disabled the horizontal/vertical flips, affine, and erase transforms. Furthermore, I set the color jitter's brightness, contrast, and saturation to the values that would produce the original image. I then found acceptable minimum and maximum values for the hue parameter. After conducting all of these data augmentation tuning experiments, I updated the configuration file and re-ran the data visualization experiment. I visualized the entire dataset to external files, but only logged the following 6 x 6 contact sheets to Tensorboard.
* ExpAAA - Original Images
* ExpAAA - "Tuned" Augmented Images
* ExpAAB - "Untuned" Augmented Images

The training pipeline check experiment performed as expected. The training and test loss decreased and the accuracy increased to ~ 60%.

#### Group A, Set A (Data Visualization)

In [None]:
class ExpAAA(VisualExperiment):
    """
    Log original and augmented images using "tuned" parameters.
    """
    def __init__(self):
        super().__init__()

In [None]:
class ExpAAB(VisualExperiment):
    """
    Log augmented images using Project 1's "untuned" parameters.
    """   
    def __init__(self):
        super().__init__(
            log_originals = False,
            log_augmentations = True,
            data_aug_color_enabled = True,
            data_aug_color_brightness = (0.75, 1.25),
            data_aug_color_contrast = (0.75, 1.25),
            data_aug_color_saturation = (0.75, 1.25),
            data_aug_color_hue = (-0.25, 0.25),
            data_aug_horz_flip_prob = 0.5,
            data_aug_vert_flip_prob = 0.5,
            data_aug_affine_enabled = True,
            data_aug_affine_rotation = 45,
            data_aug_affine_translate = (0.2, 0.2),
            data_aug_affine_scale = (0.8, 1.2),
            data_aug_erasing_prob = 0.5,
            data_aug_erasing_scale = (0.02, 0.33),
            data_aug_erasing_ratio = (0.3, 3.3)
        )

#### Group A, Set B (Training Pipeline Check)

In [None]:
class ExpABA(ModelExperiment):
    def __init__(self):
        super().__init__(
            use_data_subsets = True,
            data_augmentation = False,
            trainer_training_epochs = 40 
        )
    def _get_model(self):
        return ResNet18(pretrained=True, tuning_level=0), "ResNet18-PT0"

In [None]:
class ExpABB(ModelExperiment):
    def __init__(self):
        super().__init__(
            use_data_subsets = True,
            data_augmentation = False,
            trainer_training_epochs = 40, 
            trainer_stop_acc_epochs = 10,
            trainer_stop_acc_ema_alpha = 0.3,
            trainer_stop_acc_threshold = 2.0
        )
    def _get_model(self):
        return ResNet18(pretrained=True, tuning_level=0), "ResNet18-PT0"

#### Group A, Set C (Data Loader Optimization)

In [None]:
class ExpAC_(ModelExperiment):
    def __init__(
        self,
        exp_id: str, # expects a single uppercase letter
        data_loader_num_workers: int
    ):
        super().__init__(
            abbr = "AC" + exp_id,
            lr_scheduler_step_size = 20,
            trainer_training_epochs = 11,
            trainer_model_saving_period = -1, # disable
            data_loader_num_workers = data_loader_num_workers
        )
    def _get_model(self):
        return ResNet18(pretrained=True, tuning_level=0), "ResNet18-PT0"

#### Group A - Conduct Function

In [None]:
def conduct_group_A():
    # set 1 - visualize data
    conduct(ExpAAA())
    conduct(ExpAAB())
    # set 2 - check trainer pipeline
    conduct(ExpABA(), log_graph=False, log_pr_curves=False, log_confusion_matrix=False)
    conduct(ExpABB(), log_graph=False, log_pr_curves=False, log_confusion_matrix=False)
    # set 3 - optimizer data loader
    conduct(ExpAC_('A', 1), log_graph=False, log_pr_curves=False, log_confusion_matrix=False)
    conduct(ExpAC_('B', 2), log_graph=False, log_pr_curves=False, log_confusion_matrix=False)
    conduct(ExpAC_('C', 3), log_graph=False, log_pr_curves=False, log_confusion_matrix=False)
    conduct(ExpAC_('D', 4), log_graph=False, log_pr_curves=False, log_confusion_matrix=False)
    conduct(ExpAC_('E', 5), log_graph=False, log_pr_curves=False, log_confusion_matrix=False)
    conduct(ExpAC_('F', 6), log_graph=False, log_pr_curves=False, log_confusion_matrix=False)
    conduct(ExpAC_('G', 7), log_graph=False, log_pr_curves=False, log_confusion_matrix=False)
    conduct(ExpAC_('H', 8), log_graph=False, log_pr_curves=False, log_confusion_matrix=False)
    conduct(ExpAC_('I', 9), log_graph=False, log_pr_curves=False, log_confusion_matrix=False)
    conduct(ExpAC_('J', 10), log_graph=False, log_pr_curves=False, log_confusion_matrix=False)
    conduct(ExpAC_('K', 11), log_graph=False, log_pr_curves=False, log_confusion_matrix=False)
    conduct(ExpAC_('L', 12), log_graph=False, log_pr_curves=False, log_confusion_matrix=False)
    conduct(ExpAC_('M', 13), log_graph=False, log_pr_curves=False, log_confusion_matrix=False)
    conduct(ExpAC_('N', 14), log_graph=False, log_pr_curves=False, log_confusion_matrix=False)
    conduct(ExpAC_('O', 15), log_graph=False, log_pr_curves=False, log_confusion_matrix=False)
    conduct(ExpAC_('P', 16), log_graph=False, log_pr_curves=False, log_confusion_matrix=False)       

### Experiment Group B: Exploring Transfer Learning Approaches

<b>Summary</b>

This group of experiments explores transfer learning. Each set of experiments in this group explores transfer learning on a specific model and conducts the following experiments. 
* ExpB?A - Train the entire untrained model
* ExpB?B - Train the pretrained model's classifier layer (tuning_level = 0)
* ExpB?C - Train the pretrained model's classifier layer and last convolution block (tuning_level = 1)
* ExpB?D - Train the pretrained model's classifier layer and last 2 convolution blocks (tuning_level = 2)
* ExpB?E - Train the pretrained model's classifier layer and last 3 convolution blocks (tuning_level = 3)
* ExpB?F - Train the pretrained model's classifier layer and last 4 convolution blocks (tuning_level = 4)
* ExpB?G - Train the entire pretrained model

Preliminary tests indicate that overfitting is possible with the large models even when the KenyanFood13 images are significantly augmented. My first inclination, which I rejected, was to explore transfer learning on representatives from the ResNet, VGG, and DenseNet model families. My selection criterion was the model with the lowest ImageNet Top-1 error. However, these representatives were significantly more complex than their siblings, so overfitting is likely. Consequently, I will perform transfer learning experiments on every implemented models (see below).
* ResNet-18
* ResNet-34
* ResNet-50
* ResNet-101
* ResNet-152
* ResNeXt-50-32x4d
* ResNeXt-101-32x8d
* Wide ResNet-50-2
* Wide ResNet-101-2
* VGG-11 with batch normalization
* VGG-13 with batch normalization
* VGG-16 with batch normalization
* VGG-19 with batch normalization
* DenseNet-121
* DenseNet-169
* DenseNet-201
* DenseNet-161

Training will stop after 100 epochs or when the smoothed accuracy (computed by expontential moving average with an alpha = 0.3) does not decrease by 1% within 10 epochs.

___

**Results**

TBD

#### Group B, Sets A, B, C, ... (Transfer Learning) 

In [None]:
class ExpB__(ModelExperiment):
    def __init__(
        self,
        set_id: str, # expects a single uppercase letter
        exp_id: str, # expects a single uppercase letter
        model_type: TorchVisionModel,
        model_abbr: str
    ):
        self.__model_type = model_type
        if exp_id == 'A':
            self.__pretrained = False
            self.__tuning_level = 0
            self.__model_abbr = model_abbr
        elif exp_id == 'B':
            self.__pretrained = True
            self.__tuning_level = 0
            self.__model_abbr = model_abbr + "-PT0"
        elif exp_id == 'C':
            self.__pretrained = True
            self.__tuning_level = 1
            self.__model_abbr = model_abbr + "-PT1"
        elif exp_id == 'D':
            self.__pretrained = True
            self.__tuning_level = 2
            self.__model_abbr = model_abbr + "-PT2"
        elif exp_id == 'E':
            self.__pretrained = True
            self.__tuning_level = 3
            self.__model_abbr = model_abbr + "-PT3"
        elif exp_id == 'F':
            self.__pretrained = True
            self.__tuning_level = 4
            self.__model_abbr = model_abbr + "-PT4"
        elif exp_id == 'G':
            self.__pretrained = True
            self.__tuning_level = 5
            self.__model_abbr = model_abbr + "-PT5"

        super().__init__(
            abbr = 'B' + set_id + exp_id,
            data_loader_num_workers = 12,
            trainer_training_epochs = 100, 
            trainer_stop_acc_epochs = 10,
            trainer_stop_acc_ema_alpha = 0.3,
            trainer_stop_acc_threshold = 1.0
        )

    def _get_model(self):
        return self.__model_type(self.__pretrained, self.__tuning_level), self.__model_abbr

#### Group B - Conduct Function

In [None]:
def conduct_group_B():

    model_types = [
        ResNet18, ResNet34, ResNet50, ResNet101, ResNet152,
        ResNeXt50, ResNeXt101, WideResNet50, WideResNet101,
        VGG11BN, VGG13BN, VGG16BN, VGG19BN,
        DenseNet121, DenseNet169, DenseNet201, DenseNet161
    ]

    set_id = 'A'
    for model_type in model_types:
        model_name = model_type.__name__
        conduct(ExpB__(set_id, 'A', model_type, model_name))
        conduct(ExpB__(set_id, 'B', model_type, model_name))
        conduct(ExpB__(set_id, 'C', model_type, model_name))
        conduct(ExpB__(set_id, 'D', model_type, model_name))
        conduct(ExpB__(set_id, 'E', model_type, model_name))
        conduct(ExpB__(set_id, 'F', model_type, model_name))
        conduct(ExpB__(set_id, 'G', model_type, model_name))
        set_id = chr(ord(set_id[0]) + 1)

#### Group C, Sets A, B, C, ... (Transfer Learning) 

In [None]:
class ExpC__(ModelExperiment):
    def __init__(
        self,
        set_id: str, # expects a single uppercase letter
        exp_id: str, # expects a single uppercase letter
        model_type: TorchVisionModel,
        model_abbr: str
    ):
        self.__model_type = model_type
        if exp_id == 'A':
            lr = 1e-3
            self.__model_abbr = model_abbr + "-LR1E-3"
        elif exp_id == 'B':
            lr = 5e-4
            self.__model_abbr = model_abbr + "-LR5E-4"
        elif exp_id == 'C':
            lr = 1e-4
            self.__model_abbr = model_abbr + "-LR1E-4"
        elif exp_id == 'D':
            lr = 5e-5
            self.__model_abbr = model_abbr + "-LR5E-5"
        elif exp_id == 'E':
            lr = 1e-5
            self.__model_abbr = model_abbr + "-LR1E-5"

        super().__init__(
            abbr = 'C' + set_id + exp_id,
            data_loader_num_workers = 12,
            optimizer_learning_rate = lr,
            lr_scheduler_step_size = 1000,
            trainer_training_epochs = 100,
            trainer_stop_loss_epochs = 20,
        )

    def _get_model(self):
        return self.__model_type(pretrained=True, tuning_level=2), self.__model_abbr

#### Group C - Conduct Function

In [None]:
def conduct_group_C():

    model_types = [ResNeXt101, VGG19BN, DenseNet161]

    set_id = 'A'
    for model_type in model_types:
        model_name = model_type.__name__
        conduct(ExpC__(set_id, 'A', model_type, model_name))
        conduct(ExpC__(set_id, 'B', model_type, model_name))
        conduct(ExpC__(set_id, 'C', model_type, model_name))
        conduct(ExpC__(set_id, 'D', model_type, model_name))
        conduct(ExpC__(set_id, 'E', model_type, model_name))
        set_id = chr(ord(set_id[0]) + 1)

### <font style="color:blue">Main Function</font>

A simple function that conducts groups of experiments.

In [None]:
def main():   
    for group in []:
        if group == 'A':
            conduct_group_A()
        elif group == 'B':
            conduct_group_B()
        elif group == 'C':
            conduct_group_C()
    return

In [None]:
if __name__ == '__main__':
    main()

### Analyze Experiment Groups

In [None]:
# Uncomment to run analysis ... make sure image destination directory exist!
# analyze_tensor_board_runs(filter="^B")  # analyze Group B experiments
# analyze_tensor_board_runs(filter="^C")  # analyze Group C experiments

## <font style="color:green">8. TensorBoard Dev Scalars Log Link [5 Points]</font>

Share your tensorboard scalars logs link in this section. You can also share (not mandatory) your GitHub link if you have pushed this project in GitHub. 

For example, [Find Project2 logs here](https://tensorboard.dev/experiment/kMJ4YU0wSNG0IkjrluQ5Dg/#scalars).

## <font style="color:green">9. Kaggle Profile Link [50 Points]</font>

Share your Kaggle profile link here with us so that we can give points for the competition score. 

You should have a minimum accuracy of `75%` on the test data to get all points. If accuracy is less than `70%`, you will not get any points for the section. 

**You must have to submit `submission.csv` (prediction for images in `test.csv`) in `Submit Predictions` tab in Kaggle to get any evaluation in this section.**