<a href="https://colab.research.google.com/github/Martinmbiro/Malaria/blob/main/01%20Malaria%20modular.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Implementing Modules**
> In this notebook, I'll be creating modules for the end to end project on cell classification

> 💎 **Pro Tip**
+ Modules help organize code logically, promote code reusability and cleaner code

In [2]:
# to delete helper_modules directory and modules zip folder:
# !rm -rf /content/helper_modules
# !rm -rf /content/modules.zip

In [3]:
# import torch, torchvision, pathlib
import torch, torchvision, pathlib

# print versions
print(f'torch version: {torch.__version__}')
print(f'torchvision version: {torchvision.__version__}')

torch version: 2.6.0+cu124
torchvision version: 0.21.0+cu124


### Load the data
> First, we'll create a directory to hold all the custom modules we write
+ To create directories, we'll make use of the [`pathlib`](https://docs.python.org/3/library/pathlib.html) python module

> 📝 **Note**  
+ To **write** a code cell's content into a `*.py`, file we'll use the _magic command_ `%%writefile filename.py`
+ To **append** a code cell's content into a `*.py`, file we'll use the _magic command_ `%%writefile -a filename.py`

In [4]:
# create directory for helper modules
HELPER_MODULES = pathlib.Path('helper_modules')
HELPER_MODULES.mkdir(parents=True, exist_ok=True)

In [5]:
%%writefile helper_modules/data_loader.py
import torch, kagglehub as kh, shutil, random, zipfile, torchvision, os, numpy as np, shutil
from pathlib import Path
from PIL import Image
from torch.utils.data import random_split, Dataset, DataLoader
import torchvision.transforms.v2 as T
from itertools import chain
from random import shuffle, sample

gen = torch.Generator().manual_seed(42)


Writing helper_modules/data_loader.py


In [6]:
%%writefile -a helper_modules/data_loader.py

# a pytorch Dataset to hold images
class CellsDataset(Dataset):
  def __init__(self, cells_list:list, transforms:torchvision.transforms.v2.Compose):
    self.ls = cells_list
    self.transforms = transforms

    # get targets
    _targets = list()
    for x in range(len(self.ls)):
      lb = 0 if self.ls[x].parent.name.lower()=='uninfected' else 1
      _targets.append(lb)
    self.targets = np.array(_targets)

  def __len__(self):
    return len(self.ls)

  def __getitem__(self, idx):
    image = self.transforms(Image.open(self.ls[idx]))
    label = self.targets[idx].item()
    return image, label


Appending to helper_modules/data_loader.py


In [7]:
%%writefile -a helper_modules/data_loader.py

# define image transforms
_cell_transform = T.Compose([
    T.PILToTensor(),
    T.ToDtype(torch.float32, scale=True),
    T.Resize((128, 128)),
    #imagenet normalization:
    #  * can be gotten from calling model.pretrained_cfg property on
    #    a timm's pretrained model
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])


Appending to helper_modules/data_loader.py


In [8]:
%%writefile -a helper_modules/data_loader.py

# function to download dataset from kaggle, and extract content
def _download_dataset():
  # create download folder
  DOWNLOAD_DIR = Path('malaria')
  DOWNLOAD_DIR.mkdir(parents=True, exist_ok=True)

  # download dataset
  cache = kh.dataset_download(handle='iarunava/cell-images-for-detecting-malaria')

  # archive cache
  shutil.make_archive(base_name=DOWNLOAD_DIR/'malaria', format='zip', root_dir=cache)

  # extract zipped file
  with zipfile.ZipFile(file=DOWNLOAD_DIR/'malaria.zip', mode='r') as zipf:
    zipf.extractall(path=DOWNLOAD_DIR)

  # delete duplicate sub-directories
  if Path.cwd().joinpath('malaria/cell_images/cell_images').is_dir():
    pth = Path.cwd().joinpath('malaria/cell_images/cell_images')
    shutil.rmtree(path=pth, ignore_errors=True)

Appending to helper_modules/data_loader.py


In [9]:
%%writefile -a helper_modules/data_loader.py

# function to return all image paths as shuffled lists
def _make_cells_list() -> tuple[list, list]:
  # set random seed
  random.seed(0)
  # create paths of infected / uninfected
  parasitized = Path.cwd().joinpath('malaria/cell_images/Parasitized')
  uninfected = Path.cwd().joinpath('malaria/cell_images/Uninfected')

  # lists of infected / parasitized paths
  sick_list = list(parasitized.glob('*.png'))
  healthy_list = list(uninfected.glob('*png'))

  # all cells list
  cell_list = sick_list + healthy_list

  # sample random image paths from parasitized & uninfected directories
  # 80% of random parasitized and unparasitized images will be sampled for training
  # the rest will be split into validation and test datasets
  train_list = chain(
        sample(population=sick_list, k=int(len(sick_list)*0.80)),
        sample(population=healthy_list, k=int(len(sick_list)*0.80)))

  # and make a list of images to be used to training
  train_list = list(train_list)

  # delete the extracted image paths in train_list from cell_list
  for i in train_list:
    if i.is_file():
      cell_list.remove(i)

  # shuffle both lists
  shuffle(cell_list), shuffle(train_list)

  return cell_list, train_list


Appending to helper_modules/data_loader.py


In [10]:
%%writefile -a helper_modules/data_loader.py

# function to get dataloaders, and label-mapper dict
def get_dataloaders() -> tuple[DataLoader, DataLoader, DataLoader, dict[int, str]]:
  """
    Splits a dataset into training, validation, and test sets, and creates corresponding DataLoader
    instances for each set. It also returns a dictionary that maps class labels to their respective
    string labels.

    The dataset is split into three subsets with the following proportions:
    - 70% for training
    - 20% for testing
    - 10% for validation

    The resulting DataLoaders are configured with a batch size of 32, use of all available CPU cores
    for parallel data loading, and memory pinning for faster data transfer to the GPU.

    Args:
        None

    Returns:
        tuple: A tuple containing the following elements:
            - train_dl (DataLoader): DataLoader for the training set.
            - test_dl (DataLoader): DataLoader for the test set.
            - val_dl (DataLoader): DataLoader for the validation set.
            - class_mapper (dict): A dictionary that maps integer class labels to string labels,
              where 0 is mapped to 'Uninfected' and 1 is mapped to 'Infected'.

    Example:
        train_dl, test_dl, val_dl, class_mapper = get_dataloaders()
    """
  # download dataset
  _download_dataset()

  # get lists
  cell_list, train_list = _make_cells_list()

  # create a torch Dataset from training images
  tr_set = CellsDataset(cells_list=train_list, transforms=_cell_transform)
  # create a torch Dataset from the rest of images
  val_ts_set = CellsDataset(cells_list=cell_list, transforms=_cell_transform)

  # specify size of train, validation and test sets
  val_size = int(0.75*len(cell_list))
  ts_size = int(len(cell_list) - val_size)

  # test and validation subsets
  ts_set, val_set = random_split(
      dataset=val_ts_set, lengths=[ts_size, val_size], generator=gen)

  # create dataloaders for train, validation, test,
  train_dl = DataLoader(
      dataset=tr_set, batch_size=32, num_workers=os.cpu_count(), pin_memory=True, shuffle=True)
  test_dl = DataLoader(
      dataset=ts_set, batch_size=32, num_workers=os.cpu_count(), pin_memory=True, shuffle=False)
  val_dl = DataLoader(
      dataset=val_set, batch_size=32, num_workers=os.cpu_count(), pin_memory=True, shuffle=False)

  # 0 -> Uninfected, 1 -> Infected
  class_mapper = {0: 'Uninfected', 1:'Infected'}

  return train_dl, test_dl, val_dl, class_mapper


Appending to helper_modules/data_loader.py


## Build the [`resnet`](https://www.digitalocean.com/community/tutorials/popular-deep-learning-architectures-resnet-inceptionv3-squeezenet) architecture
> With the help of the [`timm`](https://huggingface.co/docs/timm/v1.0.15/en/quickstart#quickstart) library, I'll load the [`resnet18`](https://huggingface.co/timm/resnet18.a1_in1k) CNN architecture and alter the `classifier` layer by specifying `2` classes

> The function defined here will return a `model`, `Optimizer` and `loss function`
+ Also, we'll use [`torch.optim.Adam`](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html#adam) as optimizer and [`torch.nn.CrossEntropyLoss`](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#crossentropyloss) as loss function, (since the model will return _logits_ in the shape of `[*, 2]`)


> 🔔 **Info**
+ The model will predict `0` for an `Uninfected` cell and `1` for an `Infected` cell
+ For better generalization and to reduce training time, we'll leverage on **Transfer Learning.** Hence, the model we declare here will be **pre-trained** from the start
+ Also, as earlier specified in `_cell_transform` above, the images will be normalized according to the normalization applied when pretraining the model. See how to do that from the `timm` quickstart guide linked [`here`](https://huggingface.co/docs/timm/v1.0.15/en/quickstart#image-augmentation)

In [11]:
%%writefile helper_modules/model_builder.py
import timm, torch

# function to return model, optimizer and loss function
def get_model(device:str) -> tuple[torch.nn.Module, torch.optim.Optimizer, torch.nn.Module]:
  """
    Creates and initializes a model, optimizer, and loss function for training.

    Parameters
    ----------
    device : str
        The device to which the model should be moved. Common values include 'cuda' for GPU or 'cpu' for CPU.

    Returns
    -------
    tuple
        A tuple containing three elements:
        - model (torch.nn.Module): The model initialized with the resnet10t architecture for 2 classes.
        - opt (torch.optim.Optimizer): The AdamW optimizer initialized with the model's parameters.
        - loss_fn (torch.nn.Module): The CrossEntropyLoss function used for multi-class classification.
    """
  # model
  model = timm.create_model(
      model_name='resnet18',
      num_classes=2,
      pretrained=True).to(device)

  # optimizer
  opt = torch.optim.Adam(params=model.parameters(), lr=0.001)

  # loss function
  loss_fn = torch.nn.CrossEntropyLoss()

  return model, opt, loss_fn

Writing helper_modules/model_builder.py


## Early stopping
> 💎 **Pro Tip**

> [Early stopping](https://www.linkedin.com/advice/1/what-benefits-drawbacks-early-stopping#:~:text=Early%20stopping%20is%20a%20form,to%20increase%20or%20stops%20improving.) is a mechanism of stopping training when the validation loss stops improving; with a view to preventing _overfitting_ on the training data
+ Here, we'll create a class to take care of _early-stopping_

In [12]:
%%writefile helper_modules/utils.py
import torch, pathlib, numpy as np, matplotlib.pyplot as plt
import torch.nn.functional as F
from sklearn.metrics import ConfusionMatrixDisplay
from torch.utils.data import Dataset, Subset
import torch
from copy import deepcopy

class EarlyStopping:
  """
  Early stopping to prevent overfitting.

  Attributes
  ----------
  counter : int
      Counter to track the number of epochs without improvement.
  patience : int
      Number of epochs to wait after the last best score.
  min_delta : float
      Minimum change in the monitored quantity to qualify as an improvement.
  score_type : str
      'loss' or 'metric', determines the direction of improvement.
  best_epoch : int
      Epoch with the best score.
  best_score : float
      Best score achieved so far.
  best_state_dict : dict
      State dictionary of the model at the best score.
  stop_early : bool
      Flag to indicate if early stopping should be triggered.
  """

  def __init__(self, score_type: str, min_delta: float = 0.0, patience: int = 5):
    """
    Initializes the EarlyStopping object.

    Parameters
    ----------
    score_type : str
        'loss' or 'metric', determines the direction of improvement.
    min_delta : float, optional
        Minimum change in the monitored quantity to qualify as an improvement. Defaults to 0.0.
    patience : int, optional
        Number of epochs to wait after the last best score. Defaults to 5.

    Raises
    ------
    Exception
        If score_type is not 'metric' or 'loss'.
    """
    self.counter = 0
    self.patience = patience
    self.min_delta = min_delta
    self.score_type = score_type
    self.best_epoch = None
    self.best_score = None
    self.best_state_dict = None
    self.stop_early = False

    if (self.score_type != 'metric') and (self.score_type != 'loss'):
        err_msg = 'score_type can only be "metric" or "loss"'
        raise Exception(err_msg)

  def __call__(self, model: torch.nn.Module, ep: int, ts_score: float):
    """
    Checks if early stopping should be triggered based on the current score.

    Parameters
    ----------
    model : torch.nn.Module
        The model being trained.
    ep : int
        The current epoch number.
    ts_score : float
        The current score (loss or metric).
    """
    if self.best_epoch is None:
        self.best_epoch = ep
        self.best_score = ts_score
        self.best_state_dict = deepcopy(model.state_dict())

    elif (self.best_score - ts_score >= self.min_delta) and (self.score_type == 'loss'):
        self.best_epoch = ep
        self.best_score = ts_score
        self.best_state_dict = deepcopy(model.state_dict())
        self.counter = 0

    elif (ts_score - self.best_score >= self.min_delta) and (self.score_type == 'metric'):
        self.best_epoch = ep
        self.best_score = ts_score
        self.best_state_dict = deepcopy(model.state_dict())
        self.counter = 0

    else:
        self.counter += 1
        if self.counter >= self.patience:
            self.stop_early = True


Writing helper_modules/utils.py


## Model training / evaluation
> Here, I'll define functions for training and testing batches of data, as well as a function to return true labels, `y_true`, prediction labels, `y_pred` and prediction probabilities `y_proba`

In [13]:
%%writefile helper_modules/train_test.py
import torch, numpy as np
import torch.nn.functional as F
from sklearn.metrics import recall_score, accuracy_score

# del torch, F, f1_score, accuracy_score

# function for model training
def train_batches(model: torch.nn.Module, train_dl: torch.utils.data.DataLoader,
                  optimizer: torch.optim.Optimizer, loss_fn: torch.nn.Module, device: str) -> tuple[float, float, float]:
  """
  Trains model on all batches of the training set DataLoader and returns
  average training loss, accuracy, and F1 score.

  Parameters
  ----------
  model : torch.nn.Module
      The model being trained.
  train_dl : torch.utils.data.DataLoader
      DataLoader for training data.
  optimizer : torch.optim.Optimizer
      The optimizer.
  loss_fn : torch.nn.Module
      Function used to calculate loss.
  device : str
      The device on which computation occurs.

  Returns
  -------
  tuple
      A tuple containing:
          - ls (float): Average training loss across all batches.
          - acc (float): Average training accuracy across all batches.
          - rec (float): Average training recall score  across all batches.
  """
  # for reproducibility
  torch.manual_seed(0)
  torch.cuda.manual_seed(0)
  ls, acc, rec = 0, 0, 0

  # training mode
  model.train()

  for x, y in train_dl:
      # move x, y to device
      x, y = x.to(device), y.to(device)
      # zero_grad
      optimizer.zero_grad()

      # forward pass
      logits = model(x)
      y_pred = F.softmax(logits, dim=1).argmax(dim=1).cpu().numpy()

      # loss
      loss = loss_fn(logits, y)
      # accumulate values
      ls += loss.item()
      acc += accuracy_score(y_true=y.cpu().numpy(), y_pred=y_pred)
      rec += recall_score(y_true=y.cpu().numpy(), y_pred=y_pred)

      # back propagation
      loss.backward()
      # optimizer step
      optimizer.step()

  # compute averages
  ls /= len(train_dl)
  acc /= len(train_dl)
  rec /= len(train_dl)

  # return values
  return ls, acc, rec


def test_batches(model: torch.nn.Module, val_dl: torch.utils.data.DataLoader,
                 loss_fn: torch.nn.Module, device: str) -> tuple[float, float, float]:
  """
  Evaluates model on all batches of the test set DataLoader and returns
  average test loss, accuracy, and F1 score.

  Parameters
  ----------
  model : torch.nn.Module
      The model being evaluated.
  test_dl : torch.utils.data.DataLoader
      DataLoader for test data.
  loss_fn : torch.nn.Module
      Function used to calculate loss.
  device : str
      The device on which computation occurs.

  Returns
  -------
  tuple
      A tuple containing:
          - ls (float): Average test loss across all batches.
          - acc (float): Average test accuracy across all batches.
          - rec (float): Average test recall score across all batches.
  """
  ls, rec, acc = 0, 0, 0

  # evaluation-mode
  model.eval()

  with torch.inference_mode():
    for x, y in val_dl:
        # move x, y to device
        x, y = x.to(device), y.to(device)

        # forward pass
        logits = model(x)
        y_pred = F.softmax(logits, dim=1).argmax(dim=1).cpu().numpy()

        # loss
        loss = loss_fn(logits, y)

        # accumulate values
        ls += loss.item()
        acc += accuracy_score(y_true=y.cpu().numpy(), y_pred=y_pred)
        rec += recall_score(y_true=y.cpu().numpy(), y_pred=y_pred)

  # compute averages
  ls /= len(val_dl)
  acc /= len(val_dl)
  rec /= len(val_dl)

  # return values
  return ls, acc, rec


def true_preds_proba(model: torch.nn.Module, test_dl: torch.utils.data.DataLoader,
                     device: str) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
  """
  A function that returns true labels, predictions, and prediction probabilities
  from the passed DataLoader.

  Parameters
  ----------
  model : torch.nn.Module
      A neural network that subclasses torch.nn.Module.
  test_dl : torch.utils.data.DataLoader
      A DataLoader for the test dataset.
  device : str
      The device on which computation occurs.

  Returns
  -------
  tuple
      A tuple containing:
          - y_true (np.ndarray): A numpy array with true labels.
          - y_pred (np.ndarray): A numpy array with predicted labels.
          - y_proba (np.ndarray): A numpy array with predicted probabilities.
  """
  # empty lists
  y_true, y_preds, y_proba = list(), list(), list()
  with torch.inference_mode():
      model.eval()  # set eval mode
      for x, y in test_dl:
          # move x to device
          x = x.to(device)

          # make prediction
          logits = model(x)

          # prediction and probabilities
          proba = F.softmax(logits, dim=1)
          pred = F.softmax(logits, dim=1).argmax(dim=1)

          # append
          y_preds.append(pred)
          y_proba.append(proba)
          y_true.append(y)

  y_preds = torch.concatenate(y_preds).cpu().numpy()
  y_proba = torch.concatenate(y_proba).cpu().numpy()
  y_true = torch.concatenate(y_true).numpy()

  return y_true, y_preds, y_proba


Writing helper_modules/train_test.py


## Plot results
> Here, I'll define helper functions for plotting training metrics

In [14]:
%%writefile -a helper_modules/utils.py

# function to plot train and test results
def plot_train_results(ep_list: list, train_score: list, test_score: list,
                       ylabel: str, title: str, best_epoch: int):
  """
  Plots training and test results against each other.

  Parameters
  ----------
  ep_list : list
      A list containing all epochs used in the optimization loop.
  train_score : list
      A list containing the training scores from the optimization loop.
  test_score : list
      A list containing the test scores from the optimization loop.
  ylabel : str
      Label for the y-axis of the plot.
  title : str
      Title for the plot.
  best_epoch : int
      Best epoch for which early stopping occurred.

  Returns
  -------
  None
  """
  f, ax = plt.subplots(figsize=(5, 3), layout='constrained')

  # train loss
  ax.plot(ep_list, train_score, label='Training',
          linewidth=1.7, color='#0047ab')

  # test loss
  ax.plot(ep_list, test_score, label='Validation',
          linewidth=1.7, color='#990000')
  # vertical line (for early stopping)
  if best_epoch is not None:
      ax.axvline(best_epoch, linestyle='--', color='#000000', linewidth=1.0,
                  label=f'Best ep ({best_epoch})')

  # axis, title
  ax.set_title(title, weight='black')
  ax.set_ylabel(ylabel)
  ax.set_xlabel('Epoch')
  ax.tick_params(axis='both', labelsize=9)
  plt.grid(color='#e5e4e2')

  # legend
  f.legend(fontsize=9, loc='upper right',
            bbox_to_anchor=(1.28, 0.93),
            fancybox=False)

  plt.show()


def plot_confusion_matrix(y_true: np.ndarray, y_pred: np.ndarray):
  """
  Plots a confusion matrix for all classes.

  Parameters
  ----------
  y_true : np.ndarray
      An ndarray containing the true label values.
  y_pred : np.ndarray
      An ndarray containing the predicted label values.

  Returns
  -------
  None
  """
  # define figure and plot
  _, ax = plt.subplots(figsize=(3.0, 3.0), layout='compressed')
  # plot
  ConfusionMatrixDisplay.from_predictions(
      y_true=y_true,
      y_pred=y_pred, cmap='Blues', colorbar=False, ax=ax)

  # set x and y labels
  ax.set_ylabel('True Labels', weight='black')
  ax.set_xlabel('Predicted Labels', weight='black',
                color='#dc143c')
  # set tick size and position
  ax.xaxis.tick_top()
  ax.xaxis.set_label_position('top')
  ax.tick_params(axis='both', labelsize=9)

  # change annotation font
  for txt in ax.texts:
      txt.set_fontsize(9)

  plt.show()


Appending to helper_modules/utils.py


## Save model
> 🔔 **Info**

> Pytorch's recommended way of saving a model is by saving its [`state_dict`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.state_dict). To do this, the [documentation](https://pytorch.org/tutorials/beginner/saving_loading_models.html#save-load-state-dict-recommended) recommends calling [`torch.save(obj=model.state_dict(), f=PATH)`](https://pytorch.org/docs/stable/generated/torch.save.html#torch-save)
+ `f` - a file-like object or a string or `os.PathLike` object containing a file name. To work with paths, we'll use Python's [`pathlib`](https://docs.python.org/3/library/pathlib.html) module
+ A common PyTorch convention is to save models using either a `.pt` or `.pth` file extension
+ Also, it's good practice to move the model to the `cpu` before saving its `state_dict`

In [15]:
%%writefile -a helper_modules/utils.py

# function to save model to specified directory
def save_model(model: torch.nn.Module, path: pathlib.PosixPath):
    """
    Saves the model's state_dict to a specified path.

    Parameters
    ----------
    model : torch.nn.Module
        The model to save.
    path : pathlib.PosixPath
        The path where the model's state_dict will be saved.

    Returns
    -------
    None
    """
    torch.save(obj=model.cpu().state_dict(), f=path)
    print(f"MODEL'S state_dict SAVED TO: {path}")


Appending to helper_modules/utils.py


### Load saved model
> To load a previously saved model's `state_dict`, we call
 [`torch.load(f=PATH, weights_only=True)`](https://pytorch.org/docs/stable/generated/torch.load.html#torch.load) that loads an object saved using [`torch.save()`](https://pytorch.org/docs/stable/generated/torch.save.html#torch-save) from a file:

```
    model = TheModelClass(*args, **kwargs)
    model.load_state_dict(torch.load(PATH, weights_only=True))
    model.eval()
```

> 🔔 **Info**
+ Remember that you must call `model.eval()` before running inference
+ `f` - a file-like object or a string or `os.PathLike` object containing a file name. To work with paths, we'll use Python's [`pathlib`](https://docs.python.org/3/library/pathlib.html) module
+ Note that a `model` class must have been defined earlier, before calling [`model.load_state_dict()`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.load_state_dict) on the object

In [16]:
%%writefile -a helper_modules/utils.py

# function to load model from a specified path
def load_model(model: torch.nn.Module, path: pathlib.PosixPath):
  """
  Loads the model's state_dict from a specified path.

  Parameters
  ----------
  model : torch.nn.Module
      A new object of the model class.
  path : pathlib.PosixPath
      Path pointing to a previously saved model's state_dict.

  Returns
  -------
  model : torch.nn.Module
      The model returned after loading the state_dict.
  """
  # overwrite state_dict
  model.load_state_dict(torch.load(f=path, weights_only=True))


Appending to helper_modules/utils.py


## Make inference
> Here, I'll declare functions to make inference:
+ On a single random image
+ On multiple `[12]` random images

In [17]:
%%writefile -a helper_modules/utils.py

# function to make inference on a single random image
def make_single_inference(model: torch.nn.Module, dataset: torch.utils.data.Dataset,
                          label_map: dict, device: str):
  """
  Makes inference using a random data point from the test dataset.

  Parameters
  ----------
  model : torch.nn.Module
      A model (subclassing torch.nn.Module) to make inference.
  dataset : torch.utils.data.Dataset
      The Dataset to use for testing purposes.
  label_map : dict
      A dictionary mapping indices to labels (e.g., {0: 'O', 1: 'X'}).
  device : str
      Device on which to perform computation.

  Returns
  -------
  None
  """
  # get random image from test_set
  idx = np.random.choice(len(dataset))
  img, lb = dataset[idx]

  # make prediction
  with torch.inference_mode():
    model.to(device)  # move model to device
    model.eval()  # set eval mode
    lgts = model.to(device)(img.unsqueeze(0).to(device))
    pred = F.softmax(lgts, dim=1).argmax(dim=1)

  # print actual retrieved image
  plt.figure(figsize=(2.0, 2.0))
  # title with label
  if pred == lb:
    plt.title(
        f'Actual: {label_map[lb]}\nPred: {label_map[pred.item()]}',
        fontsize=7)
  else:  # if labels do not match, title = with red color
    plt.title(
        f'Actual: {label_map[lb]}\nPred: {label_map[pred.item()]}',
        fontsize=7, color='#de3163', weight='black')
  plt.axis(False)
  plt.imshow(img.permute(1,2,0).clamp(min=0, max=1))
  plt.show()


def make_multiple_inference(model: torch.nn.Module, dataset: torch.utils.data.Dataset,
                          label_map: dict, device: str):
  """
  Makes inference on multiple random images from the test dataset.

  Parameters
  ----------
  model : torch.nn.Module
      A model (subclassing torch.nn.Module) to make inference.
  dataset : torch.utils.data.Dataset
      The Dataset used for evaluation purposes.
  label_map : dict
      A dictionary mapping indices to labels (e.g., {0: 'O', 1: 'X'}).
  device : str
      Device on which to perform computation.

  Returns
  -------
  None
  """
  # get array of 12 random indices of images in test_dataset
  indices = np.random.choice(len(dataset), size=12, replace=False)
  # create subset from the 12 indices
  sub_set = Subset(dataset=dataset, indices=indices)

  # define a figure and subplots
  f, axs = plt.subplots(2, 6, figsize=(7.5, 5.5), layout='compressed')

  # move model to device & set eval mode
  model.to(device)
  model.eval()

  # loop through each subplot
  for i, ax in enumerate(axs.flat):
    img, lb = sub_set[i]  # return image and label

    # make inference on image returned
    with torch.inference_mode():
        lg = model(img.unsqueeze(0).to(device))
        pred = F.softmax(lg, dim=1).argmax(dim=1)

    ax.imshow(img.permute(1,2,0).clamp(min=0, max=1))
    ax.axis(False)
    if pred == lb:
        ax.set_title(
            f'Actual: {label_map[lb]}\nPred: {label_map[pred.item()]}',
            fontsize=7)
    else:  # if labels do not match, title = with red color
        ax.set_title(
            f'Actual: {label_map[lb]}\nPred: {label_map[pred.item()]}',
            fontsize=7, color='#de3163', weight='black')

  f.suptitle('Inference Made on 12 Random Test Images',
              weight='black', y=0.83)
  plt.show()

Appending to helper_modules/utils.py


## Archive modules
> Here, we'll create a function to archive all the modules `*.py` files into a `*.zip` file with the help of [`make_archive`](https://docs.python.org/3/library/shutil.html#shutil.make_archive) function from the [`shutil`](https://docs.python.org/3/library/shutil.html#module-shutil) python module

> ✋ **Info**
+ The `zip` file containing the helper modules will be then uploaded to the GitHub repository [here](https://github.com/Martinmbiro/Malaria/raw/refs/heads/main/helper%20modules/modules.zip). That way, the modules can be downloaded and extracted dynamically in code

In [18]:
import shutil, pathlib

def archive_modules(path_to_files: pathlib.PosixPath, zip_name: str):
    """
    Archive a directory into a ZIP file.

    Parameters
    ----------
    path_to_files : pathlib.PosixPath
        The path to the directory or files to be archived.

    zip_name : str
        The name of the resulting ZIP file (without the extension).

    Returns
    -------
    None
        This function does not return any value. It creates a ZIP archive at the specified location.

    Notes
    -----
    This function uses `shutil.make_archive` to create the archive. The archive will be created
    in the current working directory unless a full path is provided in the `zip_name`.

    Examples
    --------
    >>> archive_modules(pathlib.Path('/path/to/files'), 'my_archive')
    This will create a ZIP archive named 'my_archive.zip' containing the files from the specified directory.

    """
    shutil.make_archive(
        base_name=zip_name, format='zip', root_dir=path_to_files)

In [19]:
%%time
# archive modules
archive_modules(HELPER_MODULES, 'modules')

CPU times: user 1.47 ms, sys: 1.92 ms, total: 3.39 ms
Wall time: 6.9 ms


> ▶️ **Up Next**
+ Having created modules out of the most reusable code, I'll implement an end to tend project for classifying playing card images in the subsequent notebook, `02 Malaria end to end.ipynb`