# BiteMe | Train

This notebook includes the most important part of the project - the modelling. The notebook tests methodologies for training, and in it the chosen algorithm is decided. Validation also occurs before final testing, which is conducted in the test notebook. This stage is highly iterative, so all model artefacts, logs and configurations are recorded and saved to disk automatically. This initial setup of what will eventually become MLOps for the final product will be really useful, and helps keep track of what is successful and what isn't.

Models to try:
 - resnet50v2
 - resnet101v2
 - resnet152v2
 - vgg19
 - densenet169
 - densenet121
 - densenet201
 - inceptionv3
 - inception_resnetv2
 - resnext50
 - resnext101
 - xception
 - efficientnet_b0
 - efficientnet_b1
 - efficientnet_b2
 - efficientnet_b3
 - efficientnet_b4
 - efficientnet_b5

Initial model work is done by using simple, typical image recognition models (CNN architectures) to see how effective these models can be for the problem. Although I don't expect them to be particularly successful, it's important to establish baselines and take a holistic approach to modelling when it's possible.


# TODO:

Was previously working on dataloader - have to convert labels to one hot encoded!

In [2]:
# Basic imports
import pandas as pd
import numpy as np
import os
import sys
from argparse import ArgumentParser


# Data visualisation
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn

# Image processing
import cv2
import albumentations as A
import imgaug as ia
import imgaug.augmenters as iaa

# Model evaluation
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import accuracy_score, recall_score, precision_score, roc_auc_score, f1_score

import torch
import pretrainedmodels
import pytorch_lightning as pl
from pytorch_lightning.callbacks import EarlyStopping, ModelCheckpoint

# Local imports
sys.path.append("..")
from models.models import se_resnet50
from utils.loss_function import CrossEntropyLossOneHot
from utils.lrs_scheduler import WarmRestart, warm_restart
from utils.utils import read_images, augs, get_augs, seed_reproducer, init_logger
from utils.constants import *

plt.rcParams["figure.figsize"] = (14, 8)

In [3]:
# Define directories
base_dir_path = "../"

data_dir_path = os.path.join(base_dir_path, "data")
data_preprocessed_dir_path = os.path.join(data_dir_path, "preprocessed")
data_preprocessed_train_dir_path = os.path.join(data_dir_path, "preprocessed/train")

data_dir = os.listdir(data_dir_path)
data_preprocessed_dir = os.listdir(data_preprocessed_dir_path)
data_preprocessed_train_dir = os.listdir(data_preprocessed_train_dir_path)

metadata_preprocessed_path = os.path.join(data_preprocessed_dir_path, "metadata.csv")
metadata = pd.read_csv(metadata_preprocessed_path)
# Subset to train only
metadata = metadata.loc[metadata.split == "train"]

metadata.head()

Unnamed: 0,img_name,img_path,label,split
0,7059b14d2aa03ed6c4de11afa32591995181d31c.jpg,../data/cleaned/none/7059b14d2aa03ed6c4de11afa...,none,train
1,ea1b100b581fcdb7ddfae52cc62347a99e304ba4.jpg,../data/cleaned/none/ea1b100b581fcdb7ddfae52cc...,none,train
3,6eac051b9c45ff6821ec8675216f371711b7cea9.jpg,../data/cleaned/none/6eac051b9c45ff6821ec86752...,none,train
4,fc72767f8520df9b2b83941077dc0ee013eb9399.jpg,../data/cleaned/none/fc72767f8520df9b2b8394107...,none,train
5,cf812984268e2aec9a167d3ebe1026f610dd862b.jpg,../data/cleaned/none/cf812984268e2aec9a167d3eb...,none,train


In [4]:
# Read in train images
X_train = read_images(
    data_dir_path=data_preprocessed_train_dir_path, 
    rows=ROWS, 
    cols=COLS, 
    channels=CHANNELS, 
    write_images=False, 
    output_data_dir_path=None,
    verbose=VERBOSE
)

# Get labels
y_train = np.array(metadata["label"])

Reading images from: ../data/preprocessed/train
Rows set to 512
Columns set to 512
Channels set to 3
Writing images is set to: False
Reading images...


100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 195.45it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 26/26 [00:00<00:00, 103.28it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 72.68it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 56.91it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 45.60it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 38.

Image reading complete.
Image array shape: (192, 512, 512, 3)


## Set Parameters

In [5]:
# Choose augmentations to use in preprocessing
# For full list see helpers.py
augs_to_select = [
    "Fliplr", 
    "Flipud"
]
# Subset augs based on those selected
augs = dict((aug_name, augs[aug_name]) for aug_name in augs_to_select)

# Modelling constants - add this to constants.py when needed
MODEL_NAME = "se_resnet50"
MAX_EPOCHS = 6

# Create dictionary of configurations used in modelling
# this will be updated as modelling progresses if necessary, for logging
conf = {
    "device": "cuda" if torch.cuda.device_count() > 0 else "cpu",
    "device_name": torch.cuda.get_device_name(0),
    "n_workers": 128,
    "rows": ROWS,
    "cols": COLS,
    "channels": CHANNELS,
    "seed": SEED,
    "n_classes": len(metadata["label"].unique()),
    "classes": np.unique(y_train, return_counts=True)[0],
    "class_counts": np.unique(y_train, return_counts=True)[1],
    "test_size": TEST_SIZE,
    "num_train_sample": y_train.shape[0],
    "num_augs": len(augs),
    "augs": augs,
    "model_name": MODEL_NAME,
    "train_batch_size": 16,
    "val_batch_size": 16,
    "max_epochs": MAX_EPOCHS,
    "lr": 1e-5,
    "optimizer": "Adam",
    "n_splits": N_SPLITS,
    "precision": PRECISION,
    "gradient_clip_val": GRADIENT_CLIP_VAL
}

In [6]:
def init_hparams():
    """
    Initialise hyperparameters for modelling.
    
    Returns
    ---------
    hparams : argparse.Namespace
        Parsed hyperparameters
    """
    parser = ArgumentParser(add_help=False)
    parser.add_argument("-backbone", "--backbone", type=str, default="se_resnet50")
    parser.add_argument("--gpus", default=[0])
    parser.add_argument("--n_workers", type=int, default=conf["n_workers"])
    parser.add_argument("--image_size", nargs="+", default=[conf["rows"], conf["cols"]])
    parser.add_argument("--seed", type=int, default=conf["seed"])
    parser.add_argument("--max_epochs", type=int, default=conf["max_epochs"])
    parser.add_argument("-tbs", "--train_batch_size", type=int, default=conf["train_batch_size"])
    parser.add_argument("-vbs", "--val_batch_size", type=int, default=conf["val_batch_size"])
    parser.add_argument("--n_splits", type=int, default=conf["n_splits"])
    parser.add_argument("--precision", type=int, default=conf["precision"])
    parser.add_argument("--gradient_clip_val", type=float, default=conf["gradient_clip_val"])

    # This needs more work
    parser.add_argument("--log_dir", type=str, default="logs")
    
    try:
        hparams, unknown = parser.parse_known_args()
    except:
        hparams, unknown = parser.parse_args([])

    if len(hparams.gpus) == 1:
        hparams.gpus = [int(hparams.gpus[0])]
    else:
        hparams.gpus = [int(gpu) for gpu in hparams.gpus]

    hparams.image_size = [int(size) for size in hparams.image_size]
    
    return hparams

In [7]:
hparams = init_hparams()

### Create Model

In [8]:
class CoolSystem(pl.LightningModule):
    def __init__(self, hparams):
        super().__init__()
        self.hparams = hparams

        seed_reproducer(self.hparams.seed)

        self.model = se_resnet50()
        self.criterion = CrossEntropyLossOneHot()
        self.logger_kun = init_logger("kun_in", hparams.log_dir)

    def forward(self, x):
        return self.model(x)

    def configure_optimizers(self):
        self.optimizer = torch.optim.Adam(self.parameters(), lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)
        self.scheduler = WarmRestart(self.optimizer, T_max=10, T_mult=1, eta_min=1e-5)
        return [self.optimizer], [self.scheduler]

    def training_step(self, batch, batch_idx):
        step_start_time = time()
        images, labels, data_load_time = batch

        scores = self(images)
        loss = self.criterion(scores, labels)
        # self.logger_kun.info(f"loss : {loss.item()}")
        # ! can only return scalar tensor in training_step
        # must return key -> loss
        # optional return key -> progress_bar optional (MUST ALL BE TENSORS)
        # optional return key -> log optional (MUST ALL BE TENSORS)
        data_load_time = torch.sum(data_load_time)

        return {
            "loss": loss,
            "data_load_time": data_load_time,
            "batch_run_time": torch.Tensor([time() - step_start_time + data_load_time]).to(data_load_time.device),
        }

    def training_epoch_end(self, outputs):
        # outputs is the return of training_step
        train_loss_mean = torch.stack([output["loss"] for output in outputs]).mean()
        self.data_load_times = torch.stack([output["data_load_time"] for output in outputs]).sum()
        self.batch_run_times = torch.stack([output["batch_run_time"] for output in outputs]).sum()

        self.current_epoch += 1
        if self.current_epoch < (self.trainer.max_epochs - 4):
            self.scheduler = warm_restart(self.scheduler, T_mult=2)

        return {"train_loss": train_loss_mean}

    def validation_step(self, batch, batch_idx):
        step_start_time = time()
        images, labels, data_load_time = batch
        data_load_time = torch.sum(data_load_time)
        scores = self(images)
        loss = self.criterion(scores, labels)

        # must return key -> val_loss
        return {
            "val_loss": loss,
            "scores": scores,
            "labels": labels,
            "data_load_time": data_load_time,
            "batch_run_time": torch.Tensor([time() - step_start_time + data_load_time]).to(data_load_time.device),
        }

    def validation_epoch_end(self, outputs):
        # compute loss
        val_loss_mean = torch.stack([output["val_loss"] for output in outputs]).mean()
        self.data_load_times = torch.stack([output["data_load_time"] for output in outputs]).sum()
        self.batch_run_times = torch.stack([output["batch_run_time"] for output in outputs]).sum()

        # compute roc_auc
        scores_all = torch.cat([output["scores"] for output in outputs]).cpu()
        labels_all = torch.round(torch.cat([output["labels"] for output in outputs]).cpu())
        val_roc_auc = roc_auc_score(labels_all, scores_all)

        # terminal logs
        self.logger_kun.info(
            f"{self.hparams.fold_i}-{self.current_epoch} | "
            f"lr : {self.scheduler.get_lr()[0]:.6f} | "
            f"val_loss : {val_loss_mean:.4f} | "
            f"val_roc_auc : {val_roc_auc:.4f} | "
            f"data_load_times : {self.data_load_times:.2f} | "
            f"batch_run_times : {self.batch_run_times:.2f}"
        )
        # f"data_load_times : {self.data_load_times:.2f} | "
        # f"batch_run_times : {self.batch_run_times:.2f}"
        # must return key -> val_loss
        return {"val_loss": val_loss_mean, "val_roc_auc": val_roc_auc}

## Cross Validation

In [2]:
np.eye(1)#[targets]

NameError: name 'np' is not defined

In [17]:
y_train

array(['none', 'none', 'none', 'none', 'none', 'none', 'none', 'none',
       'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none',
       'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none',
       'none', 'tick', 'tick', 'tick', 'tick', 'tick', 'tick', 'tick',
       'tick', 'tick', 'tick', 'tick', 'tick', 'tick', 'tick', 'tick',
       'tick', 'tick', 'tick', 'tick', 'tick', 'tick', 'tick', 'tick',
       'tick', 'tick', 'tick', 'mite', 'mite', 'mite', 'mite', 'mite',
       'mite', 'mite', 'mite', 'mite', 'mite', 'mite', 'mite', 'mite',
       'mite', 'mite', 'mite', 'mite', 'mite', 'mite', 'mite', 'mite',
       'mosquito', 'mosquito', 'mosquito', 'mosquito', 'mosquito',
       'mosquito', 'mosquito', 'mosquito', 'mosquito', 'mosquito',
       'mosquito', 'mosquito', 'mosquito', 'mosquito', 'mosquito',
       'mosquito', 'mosquito', 'mosquito', 'mosquito', 'mosquito',
       'mosquito', 'mosquito', 'mosquito', 'mosquito', 'mosquito',
       'horsefly', 'horsef

In [9]:
# Split cross validation idx
# Subset images and labels for cross validation
# Create image augmentations and additional labels
# Read in pretrained weights
# Any additional layers
# Create model instance
# Create error metric
# Run training
# Make val predictions
# Val error metric
# Create directory for instance
# Save model
# Save log and config 
# Append train/val errors to csv

In [16]:
train_data#.iloc[index, 0]

NameError: name 'index' is not defined

In [None]:
cv2.cvtColor(
            cv2.imread(train_data.iloc[index, 0]), 
            cv2.COLOR_BGR2RGB
        )

Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7fd2267d4a60>
Traceback (most recent call last):
  File "/home/edwardsims/miniconda3/envs/biteme/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1328, in __del__
    self._shutdown_workers()
  File "/home/edwardsims/miniconda3/envs/biteme/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1268, in _shutdown_workers
    if not self._shutdown:
AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute '_shutdown'
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7fd2267d4a60>
Traceback (most recent call last):
  File "/home/edwardsims/miniconda3/envs/biteme/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1328, in __del__
    self._shutdown_workers()
  File "/home/edwardsims/miniconda3/envs/biteme/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1301, in _shutdown_workers
    w.join(timeout=_utils.MP_STATUS_

In [13]:
import os
from time import time

# Third party libraries
import cv2
import numpy as np
import pandas as pd
import torch
from albumentations import Compose, Normalize, Resize

from torch.utils.data import DataLoader, Dataset


class BiteDataset(Dataset):
    """ 
    ADD DOCSTRING
    """

    def __init__(self, data, transforms=None):
        self.data = data
        self.transforms = transforms

    def __getitem__(self, index):
        start_time = time()
        # Read image
        image = cv2.cvtColor(
            cv2.imread(self.data.iloc[index, 0]), 
            cv2.COLOR_BGR2RGB
        )

        # Convert if not the right shape
        if image.shape != IMG_SHAPE:
            image = image.transpose(1, 0, 2)

        # Perform augmentation
        if self.transforms is not None:
            image = self.transforms(image=image)["image"].transpose(2, 0, 1)

        label = torch.FloatTensor(self.data.iloc[index, 1].values.astype(np.int64))
        print(image)
        print(label)
        
        
        return image, label, time() - start_time

    def __len__(self):
        return len(self.data)


def generate_transforms(image_size):
    """
    ADD DOCSTRING
    """

    train_transform = Compose(
        [
            Resize(height=image_size[0], width=image_size[1]),
            #OneOf([RandomBrightness(limit=0.1, p=1), RandomContrast(limit=0.1, p=1)]),
            #OneOf([MotionBlur(blur_limit=3), MedianBlur(blur_limit=3), GaussianBlur(blur_limit=3)], p=0.5),
            #VerticalFlip(p=0.5),
            #HorizontalFlip(p=0.5),
            #ShiftScaleRotate(
            #    shift_limit=0.2,
            #    scale_limit=0.2,
            #    rotate_limit=20,
            #    interpolation=cv2.INTER_LINEAR,
            #    border_mode=cv2.BORDER_REFLECT_101,
            #    p=1,
            #),
            Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0, p=1.0),
        ]
    )

    val_transform = Compose(
        [
            Resize(height=image_size[0], width=image_size[1]),
            Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), max_pixel_value=255.0, p=1.0),
        ]
    )

    return {"train_transforms": train_transform, "val_transforms": val_transform}


def generate_dataloaders(hparams, train_data, val_data, transforms):
    """
    ADD DOCSTRING
    """
    
    train_dataset = BiteDataset(
        data=train_data, transforms=None
    )
    
    val_dataset = BiteDataset(
        data=val_data, transforms=None
    )
    
    train_dataloader = DataLoader(
        train_dataset,
        batch_size=hparams.train_batch_size,
        shuffle=True,
        num_workers=hparams.n_workers,
        pin_memory=True,
        drop_last=True,
    )
    
    val_dataloader = DataLoader(
        val_dataset,
        batch_size=hparams.val_batch_size,
        shuffle=False,
        num_workers=hparams.n_workers,
        pin_memory=True,
        drop_last=False,
    )

    return train_dataloader, val_dataloader

In [14]:
# List for validation scores 
valid_roc_auc_scores = []
# Initialise cross validation
folds = StratifiedKFold(n_splits=3, shuffle=True, random_state=hparams.seed)

# Start cross validation
for fold_i, (train_index, val_index) in enumerate(folds.split(metadata, y_train)):
    hparams.fold_i = fold_i
    # Split train images and validation sets
    train_data = metadata.iloc[train_index, :].reset_index(drop=True)
    val_data = metadata.iloc[val_index, :].reset_index(drop=True)
    
    train_dataloader, val_dataloader = generate_dataloaders(hparams, train_data, val_data, None)
    
    # TO REMOVE
    epoch=1
    val_loss=20
    val_roc_auc=25
    # TO REMOVE
    
    checkpoint_callback = ModelCheckpoint(
        monitor="val_roc_auc",
        save_top_k=6,
        mode="max",
        filepath=os.path.join(hparams.log_dir, f"fold={fold_i}" + "-{epoch}-{val_loss:.4f}-{val_roc_auc:.4f}")
    )
    
    
    early_stop_callback = EarlyStopping(monitor="val_roc_auc", patience=10, mode="max", verbose=True)
    
    # Instance Model, Trainer and train model
    model = CoolSystem(hparams)
    trainer = pl.Trainer(
        gpus=hparams.gpus,
        min_epochs=2,
        max_epochs=hparams.max_epochs,
        early_stop_callback=early_stop_callback,
        checkpoint_callback=checkpoint_callback,
        progress_bar_refresh_rate=0,
        precision=hparams.precision,
        num_sanity_val_steps=0,
        profiler=False,
        weights_summary=None,
        gradient_clip_val=hparams.gradient_clip_val,
    )
    trainer.fit(model, train_dataloader, val_dataloader)
    
    print("-"*40)


GPU available: True, used: True
TPU available: False, using: 0 TPU cores
CUDA_VISIBLE_DEVICES: [0]
Using native 16bit precision.
[ WARN:0@209.217] global /home/conda/feedstock_root/build_artifacts/libopencv_1654062669265/work/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('ff05114ed0314e63fe19e4fcade287d3207b3438.jpg'): can't open/read file: check file path/integrity
[ WARN:0@209.217] global /home/conda/feedstock_root/build_artifacts/libopencv_1654062669265/work/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('8f578f71581bbb75e78b56318c7f3c924ad8380b.jpg'): can't open/read file: check file path/integrity
[ WARN:0@209.217] global /home/conda/feedstock_root/build_artifacts/libopencv_1654062669265/work/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('6c47b92ec8500adbc823b2e520f5c87a31c65244.jpg'): can't open/read file: check file path/integrity
[ WARN:0@209.218] global /home/conda/feedstock_root/build_artifacts/libopencv_1654062669265/work/modules/i

error: Caught error in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/edwardsims/miniconda3/envs/biteme/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/edwardsims/miniconda3/envs/biteme/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/edwardsims/miniconda3/envs/biteme/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/tmp/ipykernel_29515/3780588977.py", line 26, in __getitem__
    image = cv2.cvtColor(
cv2.error: OpenCV(4.5.5) /home/conda/feedstock_root/build_artifacts/libopencv_1654062669265/work/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'



In [None]:
import torch.nn as nn
nn.Sequential(
    pretrainedmodels.__dict__["se_resnet50"]
)

In [None]:
pretrainedmodels.__dict__["se_resnet50"](num_classes=1000, pretrained='imagenet')