# 🐳 Happywhale - Whale and Dolphin Identification on Kaggle

**Goal: reach 0.50 and release code on public**

**[📈 WandB Experiment Record for this Notebook](https://wandb.ai/snoop2head/HappyWhale)**

### Done
- ~~Notate top k 5 accuracy metrics~~
- ~~Integrate wandb~~
- ~~Compare Crossentropy and FocalLoss -> FocalLoss turned out to be better~~
- ~~Stack grayscale images from [height, width] to [h, w, 3] image~~
- ~~stratified Kfold for training~~
- ~~stratified Kfold for Out of Fold inference~~

### To-Do
**Train**
- Apply Arcface Head (or Loss)
- Check how AMP model.half() affects performance and time saving
- **Contrastive Learning using negative batching from the same species**
- Find optimal learning rate using efficientnetB0

**Inference**
- Inference Method with KNN
- Set threshold for new_individual in inference

### Dependencies Installation

In [None]:
from IPython.display import clear_output
!pip install albumentations==0.4.6
!pip install timm
!pip install adamp
!pip install wandb
clear_output()

## Define Train / Validation Configuration

**You may restart runtime from here**
- Factory reset runtime deletes files attached to colab and disconnects from machine
- Resetting runtime only removes variables from the instance but keeps connection to the server device

In [None]:
import os
import wandb
wandb.login()
CFG = wandb.config # wandb.config provides functionality of easydict.EasyDict

CFG.DEBUG = False

print(f"YOU ARE WORKING ON DEBUG = {CFG.DEBUG} MODE")

In [None]:
CFG.learning_rate = 3e-4 # Efficientnet Learning rate
CFG.weight_decay = 3e-6 # default weight decay ratio is 0.1(https://github.com/clovaai/AdamP), but using 0.01 for small dataset 
# CFG.learning_rate = 1e-5 # swin transformers learning rate for finetuning: In ImageNet-1K fine-tuning, we train the models for 30 epochs with a batch size of 1024, a constant learning rate of 10−5, 
# CFG.weight_decay = 1e-8 # and a weightdecay of 10−8.
CFG.gamma = 0.5 # gamma for focal loss
CFG.image_resolution = 512
CFG.input_resolution = 512
CFG.rgb_mean = [0.43818492, 0.49098103, 0.54812671]
CFG.rgb_sd = [0.16021039, 0.16227327, 0.16930125]
CFG.train_batch_size = 22
CFG.val_batch_size = 16
CFG.num_epochs = 9
CFG.split_ratio = 0.0
CFG.num_folds = 5
CFG.logging_steps = 900

# overwrite configuration when debug mode
if CFG.DEBUG:
  CFG.image_resolution = 512
  CFG.input_resolution = 384
  # EfficientNetB0 & 224 resolution & batch_size 400 = 40GB on GPU -> 224 resolution is fast but not high enough to get resolution
  # EfficientNetB0 & 384 resolution & batch_size 128 = 38.4GB on GPU
  CFG.train_batch_size = 128 # bigger the batch, faster the iteration.
  CFG.val_batch_size = 64 #
  CFG.num_epochs = 10
  
print(f"YOU ARE WORKING ON DEBUG = {CFG.DEBUG} MODE")

### Get dataset and Define Paths

In [None]:
# get current working directory
import os
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
PROJECT_PATH = os.getcwd()
PROJECT_PATH

In [None]:
ROOT_DIR = os.path.join(
    os.getcwd(), 
    f"{CFG.image_resolution}x{CFG.image_resolution}_resized_dataset"
)
# TRAIN_DIR = os.path.join(ROOT_DIR, "train_images")
TRAIN_DIR = ROOT_DIR
os.path.exists(ROOT_DIR)

In [None]:
if not os.path.exists(ROOT_DIR):
    from google.colab import drive
    drive.mount('/content/drive')

In [None]:
import shutil
if not os.path.exists(ROOT_DIR):
    # get train dataset from google drive
    shutil.copy(f"/content/drive/MyDrive/HappyWhale/data/{CFG.image_resolution}x{CFG.image_resolution}_resized_dataset.zip", "./")

    # get test dataset
    shutil.copy(f"/content/drive/MyDrive/HappyWhale/data/{CFG.image_resolution}x{CFG.image_resolution}_resized_test_dataset.zip", "./")

    # copy csv files
    shutil.copy("/content/drive/MyDrive/HappyWhale/data/sample_submission.csv", f"./{CFG.image_resolution}x{CFG.image_resolution}_resized_dataset")
    shutil.copy("/content/drive/MyDrive/HappyWhale/data/train.csv", f"./{CFG.image_resolution}x{CFG.image_resolution}_resized_dataset")

In [None]:
# unzip train dataset
import os
from IPython.display import clear_output

if not os.path.exists(ROOT_DIR):
    TRAIN_ZIP_FILE_PATH = f"/content/{CFG.image_resolution}x{CFG.image_resolution}_resized_dataset.zip"
    TEST_ZIP_FILE_PATH = f"/content/{CFG.image_resolution}x{CFG.image_resolution}_resized_test_dataset.zip"
    !unzip $TRAIN_ZIP_FILE_PATH
    !unzip $TEST_ZIP_FILE_PATH
    os.remove(TRAIN_ZIP_FILE_PATH)
    os.remove(TEST_ZIP_FILE_PATH)
    clear_output()

In [None]:
ROOT_DIR

In [None]:
if os.path.exists(ROOT_DIR):
    from google.colab import drive
    drive.flush_and_unmount()

### Read training labels

In [None]:
import random
import torch
import numpy as np

def seed_everything(seed) :
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed) # if use multi-GPU
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    np.random.seed(seed)
    random.seed(seed)
seed_everything(42)

In [None]:
import pandas as pd
import numpy as np
path_label = os.path.join(ROOT_DIR, "train.csv")
label = pd.read_csv(path_label)
label.head()

In [None]:
def get_train_file_path(id):
    return f"{TRAIN_DIR}/{id}"

df = pd.read_csv(path_label)
df['file_path'] = df['image'].apply(get_train_file_path)
df.head()

In [None]:
species = df.species.unique().tolist()
print('Number of species in the dataset:', len(species))

In [None]:
individual_ids = df.individual_id.unique().tolist()
print('Number of Mr and Mrs Dolphins/Whales in the dataset:', len(individual_ids))

In [None]:
path_submission = os.path.join(ROOT_DIR, "sample_submission.csv")
df_sample_submission = pd.read_csv(path_submission)
df_sample_submission.head()

## Encoding invidiaul_ids(string object) to integer labels with Labelencoder

In [None]:
from sklearn.preprocessing import LabelEncoder

# encode object string label into integer label mapping
# https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html
le_species = LabelEncoder()
le_species.fit(df.species)
df.species = le_species.transform(df.species)
df.head()

In [None]:
from sklearn.preprocessing import LabelEncoder

le_individual_id = LabelEncoder()
le_individual_id.fit(df.individual_id)
df.individual_id = le_individual_id.transform(df.individual_id)
df.head()

In [None]:
# visualize species value counts distribution with histogram
print(df.individual_id.value_counts().values[:100])

In [None]:
# visualize species value counts distribution with histogram
print(df.individual_id.value_counts().values[-100:])

In [None]:
import joblib

with open("le_individual_id.pkl", "wb") as fp:
    joblib.dump(le_individual_id, fp)

## Set Dataset Class

In [None]:
import torch
from torch.utils.data import Dataset, DataLoader
from PIL import Image

class HappyWhaleDataset(Dataset):
    def __init__(self, df, transforms=None):
        self.df = df
        self.file_names = df.file_path.values
        self.species = df.species.values
        self.labels = df.individual_id.values
        self.transforms = transforms
    
    def __getitem__(self, index):
        image_path = self.file_names[index]
        image = Image.open(image_path)
        image = np.array(image)
        # take care of grayscale image by adding channel
        if len(image.shape) == 2: 
          image = np.dstack((image,)*3)

        label = self.labels[index]
        
        if self.transforms:
            image_transform = self.transform(image=image)['image']
            return image_transform, label
        else:
            return image, label

    def __len__(self):
        return len(self.df)
    
    def set_transform(self, transform):
        self.transform = transform

In [None]:
from albumentations import *
from albumentations.pytorch import ToTensorV2


def get_transforms(
    need=('train', 'val'), 
    img_size=(CFG.input_resolution, CFG.input_resolution), 
    mean= [0.43818492, 0.49098103, 0.54812671],
    std= [0.16021039, 0.16227327,0.16930125]
    ):
    # https://vfdev-5-albumentations.readthedocs.io/en/docs_pytorch_fix/api/augmentations.html
    transformations = {}
    if 'train' in need:  
        transformations['train'] = Compose([
            Resize(img_size[0], img_size[1], p=1.0),
            # CenterCrop(height = CFG.input_resolution, width = CFG.input_resolution), # add centercrop
            
            # shape augmentation
            HorizontalFlip(p=0.5),
            ShiftScaleRotate(p=0.5), ## NEED TO CHECK WHETHER THIS IS GOOD OR NOT. IF PERFORMANCE DROPS EVEN AFTER CHANGING THE MODEL TO EFFB5, THIS MIGHT BE THE REASON

            # pixel level augmentation
            HueSaturationValue(hue_shift_limit=0.2, sat_shift_limit=0.2, val_shift_limit=0.2, p=0.2),
            RandomBrightnessContrast(brightness_limit=(-0.1, 0.1), contrast_limit=(-0.1, 0.1), p=0.2),
            # GaussNoise(p=0.5),

            # normalizing and tensorizing
            Normalize(mean=mean, std=std, max_pixel_value=255.0, p=1.0),
            ToTensorV2(p=1.0),
        ], p=1.0)
    
    if 'val' in need:
        transformations['val'] = Compose([
            Resize(img_size[0], img_size[1]),
            # CenterCrop(height = CFG.input_resolution, width = CFG.input_resolution), # add centercrop
            Normalize(mean=mean, std=std, max_pixel_value=255.0, p=1.0),
            ToTensorV2(p=1.0),
        ], p=1.0)
    
    # minimal augmentation for the transformation
    if CFG.DEBUG == True:
        transformations['train'] = transformations['val']
    
    return transformations

In [None]:
if CFG.DEBUG == True:
  transform = get_transforms(
      img_size=(384, 384), 
      mean=CFG.rgb_mean, 
      std=CFG.rgb_sd
  )
else:
  transform = get_transforms(
      img_size = (CFG.input_resolution, CFG.input_resolution),
      mean=CFG.rgb_mean, 
      std=CFG.rgb_sd
  )

In [None]:
from torch.utils.data import random_split

if CFG.num_folds:
  print(f"Stratified K-Fold Training Scheme {CFG.num_folds}")
  train_dataset = HappyWhaleDataset(df, transforms=transform)
  val_dataset = HappyWhaleDataset(df, transforms=transform)

  train_dataset.set_transform(transform['train'])
  val_dataset.set_transform(transform['val'])

elif CFG.num_folds==0 and CFG.split_ratio != 0:
  print(f"Random Split Scheme {CFG.split_ratio}")
  dataset = HappyWhaleDataset(df, transforms=transform)
  
  # split train and validation dataset
  n_val = int(len(dataset) * CFG.split_ratio)
  n_train = len(dataset) - n_val
  train_dataset, val_dataset = random_split(dataset, [n_train, n_val])

  # after random split assign augmentation method
  train_dataset.dataset.set_transform(transform['train'])
  val_dataset.dataset.set_transform(transform['val'])

elif CFG.num_folds==0 and CFG.split_ratio == 0:
  print(f"Train Dataset only scheme")
  train_dataset = HappyWhaleDataset(df, transforms=transform)
  train_dataset.set_transform(transform['train'])

CFG.transformations = transform['train'] # record transformation on config

In [None]:
# swintransformers batch size of 8, img size of 386 x 386 exceeds 16GB
# swintransformers batch size of 24, img size of 386 x 386 equals to 40GB
# swintransformers batch size of 42, img size of 224 x 224 equals to 20GB
# swintransformers batch size of 64, img size of 224 x 224 equals to 28GB

## Custom Transfer Learning

In [None]:
# device designation
if torch.cuda.is_available():    
    print(f'There are {torch.cuda.device_count()} GPU(s) available.')
    print('GPU Name:', torch.cuda.get_device_name(0))
    device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
else:
    print('No GPU, using CPU.')
    device = torch.device("cpu")

# if cpu then num_workers are 0 else num_workers = 2
NUM_WORKERS = 2 if torch.cuda.is_available() else 0

## Use Pretrained models as backbone model
- [swintransformers](https://github.com/rwightman/pytorch-image-models/blob/ef72ad417709b5ba6404d85d3adafd830d507b2a/timm/models/swin_transformer.py#L47-L89)
- [convnext](https://github.com/rwightman/pytorch-image-models/blob/738a9cd63554104635351ced21d6f5808c1b6072/timm/models/convnext.py#L40-L71)
- [efficinetnet](https://github.com/rwightman/pytorch-image-models/blob/83b40c5a58b1fc43d053de537ef3201362cc4753/timm/models/efficientnet.py#L190-L201)

In [None]:
# import resnet and set model
from torch import nn
from torchvision import models
import timm

individual_ids = df.individual_id.unique().tolist()
print('Number of Mr.Whales in the dataset:', len(individual_ids))

class SwinTransformersModel(nn.Module):
    def __init__(self, num_classes: int = len(individual_ids)):
        super(SwinTransformersModel, self).__init__()        
        # https://github.com/rwightman/pytorch-image-models/blob/ef72ad417709b5ba6404d85d3adafd830d507b2a/timm/models/swin_transformer.py#L47-L89
        # model_architecture = "swin_large_patch4_window7_224" # pretrained with classifier output with 1000 classes
        # model_architecture = "swin_large_patch4_window7_224_in22k" # pretrained with classifier output with 22000 classes
        model_architecture = "swin_large_patch4_window12_384_in22k" # pretrained model with bigger resolution
        self.model = timm.create_model(model_architecture, pretrained=True)
        # self.backbone = timm.create_model(model_architecture, pretrained=True)
        # self.backbone.classifier.out_features = num_classes
        num_input_features = self.model.head.in_features # pretrained model's default fully connected Linear Layer
        self.model.head = nn.Linear(in_features=num_input_features, out_features=num_classes, bias=True)  # replacing output with class number

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.model(x)
        return x

class ConvNextModel(nn.Module):
    def __init__(self, num_classes: int = len(individual_ids)):
        super(ConvNextModel, self).__init__()
        
        model_architecture = "convnext_large_384_in22ft1k"
        self.backbone = timm.create_model(model_architecture, pretrained=True)
        # self.backbone.classifier.out_features = num_classes
        n_features = self.backbone.classifier.in_features
        self.backbone.fc = nn.Linear(in_features=n_features, out_features=num_classes, bias=True)
        
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.backbone(x)
        return x

class EfficientNetB0Model(nn.Module):
    # https://github.com/lukemelas/EfficientNet-PyTorch
    # https://github.com/rwightman/pytorch-image-models/blob/83b40c5a58b1fc43d053de537ef3201362cc4753/timm/models/efficientnet.py#L190-L201
    # inputsize: https://github.com/lukemelas/EfficientNet-PyTorch/blob/7e8b0d312162f335785fb5dcfa1df29a75a1783a/efficientnet_pytorch/utils.py#L457-L479
    def __init__(self, num_classes: int = len(individual_ids)):
        super(EfficientNetB0Model, self).__init__()

        self.backbone = models.efficientnet_b0(pretrained=True)
        self.backbone.classifier = nn.Sequential(
            nn.Dropout(p=0.2, inplace=True),
            nn.Linear(in_features=1280, out_features=num_classes, bias=True),
        )
        
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.backbone(x)
        return x

class EfficientNetB5Model(nn.Module):
    # https://github.com/lukemelas/EfficientNet-PyTorch
    # https://github.com/rwightman/pytorch-image-models/blob/83b40c5a58b1fc43d053de537ef3201362cc4753/timm/models/efficientnet.py#L190-L201
    # inputsize: https://github.com/lukemelas/EfficientNet-PyTorch/blob/7e8b0d312162f335785fb5dcfa1df29a75a1783a/efficientnet_pytorch/utils.py#L457-L479
    def __init__(self, num_classes: int = len(individual_ids)):
        super(EfficientNetB5Model, self).__init__()

        self.backbone = models.efficientnet_b5(pretrained=True)
        self.backbone.classifier = nn.Sequential(
            nn.Dropout(p=0.4, inplace=True),
            nn.Linear(in_features=2048, out_features=num_classes, bias=True),
        )
        
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.backbone(x)
        return x

## Loss function Optimizer and scheduler

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

# https://github.com/clcarwin/focal_loss_pytorch/blob/master/focalloss.py
class FocalLoss(nn.Module):
    def __init__(self, gamma=0, alpha=None, size_average=True):
        super(FocalLoss, self).__init__()
        self.gamma = gamma
        self.alpha = alpha
        if isinstance(alpha,(float,int)): self.alpha = torch.Tensor([alpha,1-alpha])
        if isinstance(alpha,list): self.alpha = torch.Tensor(alpha)
        self.size_average = size_average

    def forward(self, input, target):
        if input.dim()>2:
            input = input.view(input.size(0),input.size(1),-1)  # N,C,H,W => N,C,H*W
            input = input.transpose(1,2)    # N,C,H*W => N,H*W,C
            input = input.contiguous().view(-1,input.size(2))   # N,H*W,C => N*H*W,C
        target = target.view(-1,1)

        logpt = F.log_softmax(input)
        logpt = logpt.gather(1,target)
        logpt = logpt.view(-1)
        pt = Variable(logpt.data.exp())

        if self.alpha is not None:
            if self.alpha.type()!=input.data.type():
                self.alpha = self.alpha.type_as(input.data)
            at = self.alpha.gather(0,target.data.view(-1))
            logpt = logpt * Variable(at)

        loss = -1 * (1-pt)**self.gamma * logpt
        if self.size_average: return loss.mean()
        else: return loss.sum()

In [None]:
# criterion = nn.CrossEntropyLoss()
criterion = FocalLoss(gamma=CFG.gamma)

## Train and Validation Functions

In [None]:
# Metrics Class definition
# Reference: https://github.com/pytorch/examples/blob/00ea159a99f5cb3f3301a9bf0baa1a5089c7e217/imagenet/main.py#L361-L450

from enum import Enum

class Summary(Enum):
    NONE = 0
    AVERAGE = 1
    SUM = 2
    COUNT = 3

class AverageMeter(object):
    """Computes and stores the average and current value"""
    def __init__(self, name, fmt=':f', summary_type=Summary.AVERAGE):
        self.name = name
        self.fmt = fmt
        self.summary_type = summary_type
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

    def __str__(self):
        fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
        return fmtstr.format(**self.__dict__)
    
    def summary(self):
        fmtstr = ''
        if self.summary_type is Summary.NONE:
            fmtstr = ''
        elif self.summary_type is Summary.AVERAGE:
            fmtstr = '{name} {avg:.3f}'
        elif self.summary_type is Summary.SUM:
            fmtstr = '{name} {sum:.3f}'
        elif self.summary_type is Summary.COUNT:
            fmtstr = '{name} {count:.3f}'
        else:
            raise ValueError('invalid summary type %r' % self.summary_type)
        
        return fmtstr.format(**self.__dict__)

class ProgressMeter(object):
    def __init__(self, num_batches, meters, prefix=""):
        self.batch_fmtstr = self._get_batch_fmtstr(num_batches)
        self.meters = meters
        self.prefix = prefix

    def display(self, batch):
        entries = [self.prefix + self.batch_fmtstr.format(batch)]
        entries += [str(meter) for meter in self.meters]
        print('\t'.join(entries))
        
    def display_summary(self):
        entries = [" *"]
        entries += [meter.summary() for meter in self.meters]
        print(' '.join(entries))

    def _get_batch_fmtstr(self, num_batches):
        num_digits = len(str(num_batches // 1))
        fmt = '{:' + str(num_digits) + 'd}'
        return '[' + fmt + '/' + fmt.format(num_batches) + ']'

def accuracy(output, target, topk=(1,)):
    """Computes the accuracy over the k top predictions for the specified values of k"""
    with torch.no_grad():
        maxk = max(topk)
        batch_size = target.size(0)

        _, pred = output.topk(maxk, 1, True, True)
        pred = pred.t()
        correct = pred.eq(target.view(1, -1).expand_as(pred))

        res = []
        for k in topk:
            correct_k = correct[:k].reshape(-1).float().sum(0, keepdim=True)
            res.append(correct_k.mul_(100.0 / batch_size))
        return res

In [None]:
# validation function
from tqdm.notebook import tqdm

def validate(val_loader, model, criterion, device):
    # Reference: https://github.com/pytorch/examples/blob/00ea159a99f5cb3f3301a9bf0baa1a5089c7e217/imagenet/main.py#L313-L353
    batch_time = AverageMeter('Time', ':6.3f', Summary.NONE)
    losses = AverageMeter('Loss', ':.4f', Summary.AVERAGE)
    top1 = AverageMeter('Acc@1', ':6.2f', Summary.AVERAGE)
    top5 = AverageMeter('Acc@5', ':6.2f', Summary.AVERAGE)
    progress = ProgressMeter(
        len(val_loader),
        [batch_time, losses, top1, top5],
        prefix='Test: ')

    # switch to evaluate mode
    model.eval()

    with torch.no_grad():
        end = time.time()
        for i, (images, labels) in enumerate(tqdm(val_loader)):
            images = images.to(device)
            labels = labels.to(device)

            # compute output
            output = model(images)
            loss = criterion(output, labels)

            # measure accuracy and record loss
            acc1, acc5 = accuracy(output, labels, topk=(1, 5))
            losses.update(loss.item(), images.size(0))
            top1.update(acc1[0], images.size(0))
            top5.update(acc5[0], images.size(0))

            # measure elapsed time
            batch_time.update(time.time() - end)
            end = time.time()

        progress.display_summary()

    return losses.avg, top1.avg, top5.avg

In [None]:
import time
from tqdm.notebook import tqdm

def train(model, epochs, train_loader, valid_loader, optimizer, save_path, scheduler = None):
    best_valid_acc = 0
    best_valid_loss = 10
    
    for epoch in range(epochs):
        # Train Code Reference: https://github.com/pytorch/examples/blob/00ea159a99f5cb3f3301a9bf0baa1a5089c7e217/imagenet/main.py#L266-L310
        batch_time = AverageMeter('Time', ':6.3f')
        data_time = AverageMeter('Data', ':6.3f')
        losses = AverageMeter('Loss', ':.4f')
        top1 = AverageMeter('Acc@1', ':6.2f')
        top5 = AverageMeter('Acc@5', ':6.2f')
        progress = ProgressMeter(
            len(train_loader),
            [batch_time, data_time, losses, top1, top5],
            prefix="Epoch: [{}]".format(epoch))

        end = time.time()
        for iter, (images, labels) in enumerate(tqdm(train_loader)):
            # initialize gradients
            optimizer.zero_grad()

            # assign images and labels to the device
            # images, labels = images.type(torch.FloatTensor).to(device), labels.to(device)
            images, labels = images.to(device), labels.to(device)

            # switch to train mode
            model.train()
            
            # compute output
            output = model(images)
            loss = criterion(output, labels)

            # measure accuracy and record loss
            acc1, acc5 = accuracy(output, labels, topk=(1, 5))
            losses.update(loss.item(), images.size(0))
            top1.update(acc1[0], images.size(0))
            top5.update(acc5[0], images.size(0))

            # compute gradient and do SGD step
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            if scheduler:
              scheduler.step()
            
            # measure elapsed time
            batch_time.update(time.time() - end)
            end = time.time()

            if iter % CFG.logging_steps == 0:
                progress.display(iter) # display train status
                wandb.log(
                    {
                      'epoch': epoch + 1,
                      'train/loss':losses.avg, 
                      'train/top5_accuracy':top5.avg, 
                      'train/top1_accuracy':top1.avg,
                      'learning_rate':optimizer.param_groups[0]['lr'],
                    }
                )
        
        # Validate on each epoch
        print("Epoch Finished... Validating")
        valid_loss, valid_acc1, valid_acc5 = validate(valid_loader, model, criterion, device)
        wandb.log(
                    {
                      'valid/loss':valid_loss, 
                      'valid/top5_accuracy':valid_acc5, 
                      'valid/top1_accuracy':valid_acc1,
                    }
        )
        if valid_loss < best_valid_loss:
            print("New valid model for val loss! saving the model...")
            torch.save(model.state_dict(),save_path + f"{epoch:03}_loss_{valid_loss:4.2}.ckpt")
            best_valid_loss = valid_loss
            wandb.log({'best_valid_loss':best_valid_loss})
        
        if valid_acc5 > best_valid_acc:
            print("New valid model for val accuracy! saving the model...")
            torch.save(model.state_dict(),save_path + f"{epoch:03}_loss_{valid_loss:4.2}.ckpt")
            best_valid_acc = valid_acc5
            wandb.log({'best_valid_top5_acc':best_valid_acc})

## Training Iteration (K-Fold)

In [None]:
# K-Fold training iteration
from torch.optim import Adam, AdamW
from adamp import AdamP, SGDP

from sklearn.model_selection import StratifiedKFold
from torch.utils.data.dataset import Subset

if CFG.num_folds != 0:
  SAVE_PATH = "./"
  run_name = "EfficientNetB5Model-5-fold"
  wandb.init(project="HappyWhale", name=run_name)

  stf = StratifiedKFold(n_splits = CFG.num_folds, shuffle = True, random_state = seed_everything(42))

  for fold_num, (train_idx, valid_idx) in enumerate(stf.split(df, list(df['individual_id']))):
    print(f"#################### Fold: {fold_num + 1} ######################")

    # make subset
    train_set = Subset(train_dataset, train_idx)
    val_set = Subset(val_dataset, valid_idx)

    # make dataloader out of subset
    train_loader = DataLoader(
      train_set,
      batch_size=CFG.train_batch_size,
      num_workers=NUM_WORKERS,
      shuffle=True
    )
    valid_loader = DataLoader(
        val_set,
        batch_size=CFG.val_batch_size,
        num_workers=NUM_WORKERS,
        shuffle=False
    )

    # designate model
    if CFG.DEBUG == True:
      model = EfficientNetB0Model(num_classes=len(individual_ids))
      model = model.to(device)
    else:
      model = EfficientNetB5Model(num_classes=len(individual_ids))
      model = model.to(device)
    
    model_name = model.__class__.__name__
    print(f"Training with {model_name}")

    # get optimizer and scheduler
    optimizer = AdamP(model.parameters(), lr=CFG.learning_rate, betas=(0.9, 0.999), weight_decay= CFG.weight_decay)

    # scheduler comparison: https://www.kaggle.com/isbhargav/guide-to-pytorch-learning-rate-scheduling
    # scheduler = torch.optim.lr_scheduler.OneCycleLR(
    #     optimizer, 
    #     max_lr=CFG.learning_rate, 
    #     steps_per_epoch=len(train_loader),
    #     epochs=CFG.num_epochs,
    #     anneal_strategy='linear'
    # )

    # conduct training
    train(model, CFG.num_epochs, train_loader, valid_loader, optimizer, SAVE_PATH, scheduler=None)

    # Prevent Out Of Memory error
    model.cpu()
    del model
    torch.cuda.empty_cache()

## Training Iteration (Not K-Fold)

In [None]:
# DataLoader
import numpy as np 

if CFG.num_folds==0 and CFG.split_ratio != 0:
    train_loader = DataLoader(
        train_dataset,
        batch_size=CFG.train_batch_size,
        num_workers=NUM_WORKERS,
        shuffle=True
    )

    valid_loader = DataLoader(
            val_dataset,
            batch_size=CFG.val_batch_size,
            num_workers=NUM_WORKERS,
            shuffle=False
    )

elif CFG.num_folds==0 and CFG.split_ratio == 0:
    train_loader = DataLoader(
      train_dataset,
      batch_size=CFG.train_batch_size,
      num_workers=NUM_WORKERS,
      shuffle=True
    )

images, labels = next(iter(train_loader))
print(f'images shape: {images.shape}')
print(f'labels shape: {labels.shape}')

In [None]:
# Training Iteration when not using K-Fold
if CFG.num_folds==0:
    
    SAVE_PATH = "./"
    run_name = "EfficientNetB5Model-90to10-split"
    wandb.init(project="HappyWhale", name=run_name)
    
    # designate model
    if CFG.DEBUG == True:
      model = EfficientNetB0Model(num_classes=len(individual_ids))
      model = model.to(device)
    else:
      model = EfficientNetB5Model(num_classes=len(individual_ids))
      model = model.to(device)
    
    model_name = model.__class__.__name__
    print(f"Training with {model_name}")

    # get optimizer: AdamP = AdamW > Adam > SGDP > SGD https://github.com/clovaai/AdamP. 
    optimizer = AdamP(model.parameters(), lr=CFG.learning_rate, betas=(0.9, 0.999), weight_decay= CFG.weight_decay)

    # scheduler: https://www.kaggle.com/isbhargav/guide-to-pytorch-learning-rate-scheduling
    # scheduler = torch.optim.lr_scheduler.OneCycleLR(
    #     optimizer, 
    #     max_lr=CFG.learning_rate, 
    #     steps_per_epoch=len(train_loader),
    #     epochs=CFG.num_epochs,
    #     anneal_strategy='linear'
    # )

    # conduct training
    train(model, CFG.num_epochs, train_loader, valid_loader, optimizer, SAVE_PATH, scheduler=None)

In [None]:
!nvidia-smi

In [None]:
model.cpu()
del model
torch.cuda.empty_cache() 

## Inference: Out of Fold Ensemble

In [None]:
# label decoder read from the le.pkl dump

import joblib
with open("le_individual_id.pkl", "rb") as fp:
    le_individual_id = joblib.load(fp)

In [None]:
TEST_DIR = os.path.join(PROJECT_PATH, f"{CFG.image_resolution}x{CFG.image_resolution}_resized_test_dataset")

def get_test_file_path(id):
    return f"{TEST_DIR}/{id}"

path_submission = os.path.join(ROOT_DIR, "sample_submission.csv")
df_sample_submission = pd.read_csv(path_submission)
df_sample_submission['file_path'] = df_sample_submission['image'].apply(get_test_file_path)
df_sample_submission.head()

In [None]:
df_sample_submission.shape

In [None]:
df_sample_submission.describe()

In [None]:
import torch
from torch.utils.data import Dataset, DataLoader
from PIL import Image

class HappyWhaleTestDataset(Dataset):
    def __init__(self, df, transforms=None):
        self.df = df
        self.image_name = df.image.values
        self.file_names = df.file_path.values
        self.transforms = transforms
    
    def __getitem__(self, index):
        image_name = self.image_name[index]
        image_path = self.file_names[index]
        image = Image.open(image_path)
        image = np.array(image)
        
        if len(image.shape) == 2: 
            image = np.dstack((image,)*3)
        
        if self.transforms:
            image_transform = self.transform(image=image)['image']
            return image_transform, image_name, image_path
        else:
            return image, image_name, image_path

    def __len__(self):
        return len(self.df)
    
    def set_transform(self, transform):
        self.transform = transform

In [None]:
test_dataset = HappyWhaleTestDataset(df_sample_submission, transforms=transform)
test_dataset.set_transform(transform['val'])
test_loader = DataLoader(test_dataset, batch_size=CFG.val_batch_size, shuffle=False, num_workers=NUM_WORKERS)

In [None]:
# inference compatible both for k-fold and which are not
from tqdm import tqdm
from torch import topk

oof_model_paths = [
              "/content/008_loss_ 7.5.ckpt"
]

if len(oof_model_paths) == 1:
  model_path = oof_model_paths[0]
  ckpt_name = model_path.split("/")[-1]
  ckpt_name_without_type = ckpt_name.split(".")[0]
  model_loss = ckpt_name.split(".")[1]
  ckpt_name = ckpt_name_without_type + model_loss

oof_pred = [] # out of fold prediction list
for model_path in oof_model_paths:
  if CFG.DEBUG == True:
    model = EfficientNetB0Model(num_classes=len(individual_ids))
  else:
    model = EfficientNetB5Model(num_classes=len(individual_ids))
  model_name = model.__class__.__name__
  model.load_state_dict(torch.load(model_path, map_location=device)) # load state dict defaults to load on device 
  model.to(device)
  model.eval()
  
  output_pred = []
  for images, image_name, path in tqdm(test_loader):
    with torch.no_grad():
      images = images.type(torch.FloatTensor).to(device)
      outputs = model(images)
      output_pred.extend(outputs.cpu().detach().numpy())
  # change logit to prbability
  output_proba = F.softmax(torch.Tensor(output_pred), dim=1) 
  oof_pred.append(np.array(output_proba)[:,np.newaxis])

  # Prevent OOM error
  model.cpu()
  del model
  torch.cuda.empty_cache()

# mean logits of fold predictions
oof_pred = np.mean(oof_pred, axis=0)
# designate oof_pred as torch tensor
oof_pred = torch.Tensor(oof_pred)

all_predictions = list(topk(oof_pred, 5))[1].cpu().numpy() # indices of highest topk
all_probabilities = list(topk(oof_pred, 5))[0].cpu().numpy() # get top 5 predictions' probability

In [None]:
def flatten(t):
    return [item for sublist in t for item in sublist]

# flatten batched items into array
flattened_labels = flatten(all_predictions)
flattened_probabilities = flatten(all_probabilities)

# decode integer prediction labels into string labels
decoded_labels = []
for item in flattened_labels:
    top_5_label = le_individual_id.inverse_transform(item)
    str_top_5_label = np.array2string(top_5_label, separator=',')
    str_top_5_label_without_ln = str_top_5_label.replace("\n", "")
    decoded_labels.append(str_top_5_label_without_ln)

In [None]:
from ast import literal_eval

pd.options.display.max_colwidth = 100 # display max length for pandas
df_empty = pd.DataFrame({})
df_empty["image"] = df_sample_submission.image

# make rows for predictions
df_empty['predictions'] = decoded_labels
df_empty['predictions'] = df_empty['predictions'].apply(literal_eval) # str obj val -> list obj val

df_empty['probabilities'] = flattened_probabilities
df_empty['sum_probability'] = df_empty['probabilities'].apply(lambda x: np.sum(x))
df_empty['probabilities_sd'] = df_empty['probabilities'].apply(lambda x: np.std(x))
df_empty.head()

In [None]:
# get sum of top 5 probability: lower it is, lower the model's confidence 
# (side note) sum of top 5 probability is not equal to 1
quantile_sum = df_empty['sum_probability'].quantile(0.25)

# get the quantile standard deviation of probabilities: lower the standard deviation the difficult classification was
quantile_sd = df_empty['probabilities_sd'].quantile(0.25)

print("sum:", quantile_sum, "standard deviation:", quantile_sd)

In [None]:
for index, row in df_empty.iterrows():
    # if model's 1st pick isn't confident and difficult the classification it is, assign one label as new individual
    if row['sum_probability'] < quantile_sum:
        row['predictions'][-1] = "new_individual"
df_empty.head()

In [None]:
df_submission = df_empty[['image','predictions']]
df_submission['predictions'] = df_submission['predictions'].apply(lambda x: ' '.join(x))
df_submission['predictions'] = df_submission['predictions'].apply(lambda x: x.strip())
df_submission.head()

In [None]:
file_name = f"{model_name}_submission_{ckpt_name}_{CFG.num_folds}-folds_{CFG.split_ratio}-Split.csv"
df_submission.to_csv(file_name, index=False)

### References
- [📄 Kaggle Competition Description](https://www.kaggle.com/c/happy-whale-and-dolphin/overview/evaluation)
- [EFFNET B6 WHALE COMP 0.605](https://www.kaggle.com/manojprabhaakr/effnet-b6-whale-comp)
- [Accuracy 0.586 code with efficientnet b6, KFold](https://www.kaggle.com/aikhmelnytskyy/happywhale-arcface-baseline-eff-net-kfold5-0-586)
- [HappyWhale ArcFace Baseline (TPU) 0.522](https://www.kaggle.com/ks2019/happywhale-arcface-baseline-tpu)
- [😊🐳&🐬 - EDA and Baseline Solution 0.402](https://www.kaggle.com/dschettler8845/eda-and-baseline-solution#model_baseline)

- EDA: https://www.kaggle.com/bsridatta/happywhale
- Pytorch + VGG16: https://www.kaggle.com/palash97/happywhale-pytorch-vgg16
- Timm + EfficientNetB0 + ArcFace: https://www.kaggle.com/debarshichanda/pytorch-arcface-gem-pooling-starter
