# Removing failure cases of cropping bouding boxes
It seems like having a good cropping method in this competition is crucial to boost any solution performances. It removes unecessary informations and potential distractions for the model such as mountains in brackground, people, boats<br>

In this notebook, I start from the amazing work of @AWSAF in the notebook [Happywhale: BoundingBox [YOLOv5] 📦](https://www.kaggle.com/awsaf49/happywhale-boundingbox-yolov5/), and try to develop a way of keeping only tho ones that are correct.<br>
The work achieved in the above mentioned notebook is amazing, however their are a significant number of diverse failure cases which cannot be ignored.

**EDIT**: (v4) I modified this notebook to take the detic cropped dataset as input

In this notebook I will use Pytorch Lightning ⚡

# Failure cases ❌
## Overcropped images
![](https://i.imgur.com/ZqsaZio.png)
## Undercropped images
![](https://i.imgur.com/0cWky4A.png)
## Images cropped on something else than a 🐳 or a 🐬
![](https://i.imgur.com/5drD858.png)
## "Adversarial" examples
![](https://i.imgur.com/HNu69Ey.png)
## Wtf are you doing examples
![](https://i.imgur.com/RLn2lSA.png)

# Possibles methods 🙋
There are several possibilities to achieve this goal.<br>
The first I thought about was to use OOD (Out Of Distribution) detection methods, because a wrong bounding box can be interpreted as an OOD sample.
However, I'm not familiar enough with these methods so I'll try an homemade method inspired from what I know of these.<br>

# Method of this notebook 👨‍🎓
I've trained a EfficientNetB0 classifier to recognize the species of the train images, and I plan to use it to detect failure cases by:
1. Making a prediction on the raw images
2. Making a prediction on the cropped images
3. Comparing the predictions

💡 The idea is that if a bounding box is correct, then the prediction should be similar if not better (because the details on the animal are better detectable), while if it's not, the prediction should be less confident and or wrong.<br>
As we want to make predictions for the test set, we do not have the correct labels of the species so we will not use them. Instead, we will only look at the confidence scores of the answers and the entropy of the prediction.

## Examples
The prediction should be somewhat similar if not better:
![](https://i.imgur.com/CykezjQ.png)

The prediction should be very different and very less confident:
![image](https://i.imgur.com/Kceynis.png)

## Things to be careful about
🚨 As we will use the predictions we have to make sure that the samples don't come for the training distribution of the model. That's why I'll use 4 models trained using 4 folds.<br>
For the training data, I will make the predictions using the model that was not trained on their corresponding fold. For the test data, I'll take the average of the predictions of the 4 models.<br>

🚨 While I attempt to fix most of the failure cases, there are still that are left using this method. Indeed, when a failure case is detected, it will be replaced by the original image so no cropping will be performed. Also, this approach doesn't fix the images that were already not annotated by the original notebook.

In [None]:
!pip install -q bbox-utility # check https://github.com/awsaf49/bbox for source code

In [None]:
import numpy as np
import pandas as pd
import os
import torch
import torchmetrics
import math
import copy
import cv2
import matplotlib.pyplot as plt
import time
import glob
import shutil
import albumentations
from tqdm.notebook import tqdm
from bbox.utils import yolo2voc, draw_bboxes
from sklearn.model_selection import StratifiedKFold
from torch.utils.data import Dataset
from albumentations.pytorch.transforms import ToTensorV2
from torchvision import transforms
from torch import nn, Tensor
from torch.nn import functional as F
from pytorch_lightning import Callback, LightningModule, Trainer
from pytorch_lightning.core.lightning import LightningModule

In [None]:
class CFG:
    # Seed to have deterministic results
    SEED = 0
    # Size of the images
    size = 224
    # inference batch size
    batch_size = 128 * 8#64
    # Number of workers to load the data
    num_workers = 2
    # Number of folds
    FOLDS = 4
    # Number of classes
    N_CLASSES = 26
    # Expected proportion of the data to filter
    FLAG_QUANTILE = 0.015
    # Minimal number of flag to be treated as out sample
    N_FLAGS = 2

In [None]:
def plot_images(batch, row=2, col=2, base_path="../input/w-d-224x224-fast-dataset/train_images/"):
    """
        Copied and adapted from https://www.kaggle.com/awsaf49/happywhale-data-distribution
    """
    plt.figure(figsize=(col*3, row*3))
    for i in range(row*col):
        plt.subplot(row, col, i+1)
        img = cv2.imread(os.path.join(base_path,  batch["image"].iloc[i]))
        if img is None:
            continue
        img = img[:, :, ::-1]
        plt.imshow(img)
        if "species" in batch:
            plt.title(batch["species"].iloc[i])
        plt.axis('off')
    plt.tight_layout()
    plt.show()


In [None]:
class WandDLoader(Dataset):
    def __init__(self, data, d_type, crop=False):
        self.crop = crop
        self.images = (data["image"].str[:-3] + "bmp").values.tolist()
        
        transformations = albumentations.Compose([
            albumentations.Normalize(),
            ToTensorV2(p=1.0)
        ])

        def albumentations_transform(image, transform=transformations):
            if transform:
                image_np = np.array(image)
                augmented = transform(image=image_np)
            return augmented

        self.transforms = transforms.Compose([
            transforms.Lambda(albumentations_transform),
        ])
        path = "../input/w-d-fast-224x224-cropped-dataset" if crop else "../input/w-d-224x224-fast-dataset"
        self.base_path = os.path.join(path, f"{d_type}_images")
                        
    def __getitem__(self, idx):
        image_name = self.images[idx]
            
        path = os.path.join(self.base_path, image_name)
        img = cv2.imread(path)
            
        if self.transforms is not None:
            img = self.transforms(img)["image"]
        return img
    
    def __len__(self):
        return len(self.images)

In [None]:
"""
    https://pytorch.org/vision/main/_modules/torchvision/models/efficientnet.html
"""

from typing import Any, Callable, Optional, List, Sequence
from functools import partial
try:
    from torch.hub import load_state_dict_from_url  # noqa: 401
except ImportError:
    from torch.utils.model_zoo import load_url as load_state_dict_from_url  # noqa: 401

def stochastic_depth(input: Tensor, p: float, mode: str, training: bool = True) -> Tensor:
    """
    Implements the Stochastic Depth from `"Deep Networks with Stochastic Depth"
    <https://arxiv.org/abs/1603.09382>`_ used for randomly dropping residual
    branches of residual architectures.
    Args:
        input (Tensor[N, ...]): The input tensor or arbitrary dimensions with the first one
                    being its batch i.e. a batch with ``N`` rows.
        p (float): probability of the input to be zeroed.
        mode (str): ``"batch"`` or ``"row"``.
                    ``"batch"`` randomly zeroes the entire input, ``"row"`` zeroes
                    randomly selected rows from the batch.
        training: apply stochastic depth if is ``True``. Default: ``True``
    Returns:
        Tensor[N, ...]: The randomly zeroed tensor.
    """
    if p < 0.0 or p > 1.0:
        raise ValueError(f"drop probability has to be between 0 and 1, but got {p}")
    if mode not in ["batch", "row"]:
        raise ValueError(f"mode has to be either 'batch' or 'row', but got {mode}")
    if not training or p == 0.0:
        return input

    survival_rate = 1.0 - p
    if mode == "row":
        size = [input.shape[0]] + [1] * (input.ndim - 1)
    else:
        size = [1] * input.ndim
    noise = torch.empty(size, dtype=input.dtype, device=input.device)
    noise = noise.bernoulli_(survival_rate)
    if survival_rate > 0.0:
        noise.div_(survival_rate)
    return input * noise

class StochasticDepth(nn.Module):
    """
    See :func:`stochastic_depth`.
    """

    def __init__(self, p: float, mode: str) -> None:
        super().__init__()
        self.p = p
        self.mode = mode

    def forward(self, input: Tensor) -> Tensor:
        return stochastic_depth(input, self.p, self.mode, self.training)

    def __repr__(self) -> str:
        tmpstr = self.__class__.__name__ + "("
        tmpstr += "p=" + str(self.p)
        tmpstr += ", mode=" + str(self.mode)
        tmpstr += ")"
        return tmpstr

def _make_divisible(v: float, divisor: int, min_value: Optional[int] = None) -> int:
    """
    This function is taken from the original tf repo.
    It ensures that all layers have a channel number that is divisible by 8
    It can be seen here:
    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    """
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v

class ConvNormActivation(torch.nn.Sequential):
    """
    Configurable block used for Convolution-Normalzation-Activation blocks.

    Args:
        in_channels (int): Number of channels in the input image
        out_channels (int): Number of channels produced by the Convolution-Normalzation-Activation block
        kernel_size: (int, optional): Size of the convolving kernel. Default: 3
        stride (int, optional): Stride of the convolution. Default: 1
        padding (int, tuple or str, optional): Padding added to all four sides of the input. Default: None, in wich case it will calculated as ``padding = (kernel_size - 1) // 2 * dilation``
        groups (int, optional): Number of blocked connections from input channels to output channels. Default: 1
        norm_layer (Callable[..., torch.nn.Module], optional): Norm layer that will be stacked on top of the convolutiuon layer. If ``None`` this layer wont be used. Default: ``torch.nn.BatchNorm2d``
        activation_layer (Callable[..., torch.nn.Module], optinal): Activation function which will be stacked on top of the normalization layer (if not None), otherwise on top of the conv layer. If ``None`` this layer wont be used. Default: ``torch.nn.ReLU``
        dilation (int): Spacing between kernel elements. Default: 1
        inplace (bool): Parameter for the activation layer, which can optionally do the operation in-place. Default ``True``
        bias (bool, optional): Whether to use bias in the convolution layer. By default, biases are included if ``norm_layer is None``.

    """

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        kernel_size: int = 3,
        stride: int = 1,
        padding: Optional[int] = None,
        groups: int = 1,
        norm_layer: Optional[Callable[..., torch.nn.Module]] = torch.nn.BatchNorm2d,
        activation_layer: Optional[Callable[..., torch.nn.Module]] = torch.nn.ReLU,
        dilation: int = 1,
        inplace: Optional[bool] = True,
        bias: Optional[bool] = None,
    ) -> None:
        if padding is None:
            padding = (kernel_size - 1) // 2 * dilation
        if bias is None:
            bias = norm_layer is None
        layers = [
            torch.nn.Conv2d(
                in_channels,
                out_channels,
                kernel_size,
                stride,
                padding,
                dilation=dilation,
                groups=groups,
                bias=bias,
            )
        ]
        if norm_layer is not None:
            layers.append(norm_layer(out_channels))
        if activation_layer is not None:
            params = {} if inplace is None else {"inplace": inplace}
            layers.append(activation_layer(**params))
        super().__init__(*layers)
        self.out_channels = out_channels

class SqueezeExcitation(torch.nn.Module):
    """
    This block implements the Squeeze-and-Excitation block from https://arxiv.org/abs/1709.01507 (see Fig. 1).
    Parameters ``activation``, and ``scale_activation`` correspond to ``delta`` and ``sigma`` in in eq. 3.

    Args:
        input_channels (int): Number of channels in the input image
        squeeze_channels (int): Number of squeeze channels
        activation (Callable[..., torch.nn.Module], optional): ``delta`` activation. Default: ``torch.nn.ReLU``
        scale_activation (Callable[..., torch.nn.Module]): ``sigma`` activation. Default: ``torch.nn.Sigmoid``
    """

    def __init__(
        self,
        input_channels: int,
        squeeze_channels: int,
        activation: Callable[..., torch.nn.Module] = torch.nn.ReLU,
        scale_activation: Callable[..., torch.nn.Module] = torch.nn.Sigmoid,
    ) -> None:
        super().__init__()
        self.avgpool = torch.nn.AdaptiveAvgPool2d(1)
        self.fc1 = torch.nn.Conv2d(input_channels, squeeze_channels, 1)
        self.fc2 = torch.nn.Conv2d(squeeze_channels, input_channels, 1)
        self.activation = activation()
        self.scale_activation = scale_activation()

    def _scale(self, input: Tensor) -> Tensor:
        scale = self.avgpool(input)
        scale = self.fc1(scale)
        scale = self.activation(scale)
        scale = self.fc2(scale)
        return self.scale_activation(scale)

    def forward(self, input: Tensor) -> Tensor:
        scale = self._scale(input)
        return scale * input


model_urls = {
    # Weights ported from https://github.com/rwightman/pytorch-image-models/
    "efficientnet_b0": "https://download.pytorch.org/models/efficientnet_b0_rwightman-3dd342df.pth",
    "efficientnet_b1": "https://download.pytorch.org/models/efficientnet_b1_rwightman-533bc792.pth",
    "efficientnet_b2": "https://download.pytorch.org/models/efficientnet_b2_rwightman-bcdf34b7.pth",
    "efficientnet_b3": "https://download.pytorch.org/models/efficientnet_b3_rwightman-cf984f9c.pth",
    "efficientnet_b4": "https://download.pytorch.org/models/efficientnet_b4_rwightman-7eb33cd5.pth",
    # Weights ported from https://github.com/lukemelas/EfficientNet-PyTorch/
    "efficientnet_b5": "https://download.pytorch.org/models/efficientnet_b5_lukemelas-b6417697.pth",
    "efficientnet_b6": "https://download.pytorch.org/models/efficientnet_b6_lukemelas-c76e70fd.pth",
    "efficientnet_b7": "https://download.pytorch.org/models/efficientnet_b7_lukemelas-dcc49843.pth",
}


class MBConvConfig:
    # Stores information listed at Table 1 of the EfficientNet paper
    def __init__(
        self,
        expand_ratio: float,
        kernel: int,
        stride: int,
        input_channels: int,
        out_channels: int,
        num_layers: int,
        width_mult: float,
        depth_mult: float,
    ) -> None:
        self.expand_ratio = expand_ratio
        self.kernel = kernel
        self.stride = stride
        self.input_channels = self.adjust_channels(input_channels, width_mult)
        self.out_channels = self.adjust_channels(out_channels, width_mult)
        self.num_layers = self.adjust_depth(num_layers, depth_mult)

    def __repr__(self) -> str:
        s = self.__class__.__name__ + "("
        s += "expand_ratio={expand_ratio}"
        s += ", kernel={kernel}"
        s += ", stride={stride}"
        s += ", input_channels={input_channels}"
        s += ", out_channels={out_channels}"
        s += ", num_layers={num_layers}"
        s += ")"
        return s.format(**self.__dict__)

    @staticmethod
    def adjust_channels(channels: int, width_mult: float, min_value: Optional[int] = None) -> int:
        return _make_divisible(channels * width_mult, 8, min_value)

    @staticmethod
    def adjust_depth(num_layers: int, depth_mult: float):
        return int(math.ceil(num_layers * depth_mult))


class MBConv(nn.Module):
    def __init__(
        self,
        cnf: MBConvConfig,
        stochastic_depth_prob: float,
        norm_layer: Callable[..., nn.Module],
        se_layer: Callable[..., nn.Module] = SqueezeExcitation,
    ) -> None:
        super().__init__()

        if not (1 <= cnf.stride <= 2):
            raise ValueError("illegal stride value")

        self.use_res_connect = cnf.stride == 1 and cnf.input_channels == cnf.out_channels

        layers: List[nn.Module] = []
        activation_layer = nn.SiLU

        # expand
        expanded_channels = cnf.adjust_channels(cnf.input_channels, cnf.expand_ratio)
        if expanded_channels != cnf.input_channels:
            layers.append(
                ConvNormActivation(
                    cnf.input_channels,
                    expanded_channels,
                    kernel_size=1,
                    norm_layer=norm_layer,
                    activation_layer=activation_layer,
                )
            )

        # depthwise
        layers.append(
            ConvNormActivation(
                expanded_channels,
                expanded_channels,
                kernel_size=cnf.kernel,
                stride=cnf.stride,
                groups=expanded_channels,
                norm_layer=norm_layer,
                activation_layer=activation_layer,
            )
        )

        # squeeze and excitation
        squeeze_channels = max(1, cnf.input_channels // 4)
        layers.append(se_layer(expanded_channels, squeeze_channels, activation=partial(nn.SiLU, inplace=True)))

        # project
        layers.append(
            ConvNormActivation(
                expanded_channels, cnf.out_channels, kernel_size=1, norm_layer=norm_layer, activation_layer=None
            )
        )

        self.block = nn.Sequential(*layers)
        self.stochastic_depth = StochasticDepth(stochastic_depth_prob, "row")
        self.out_channels = cnf.out_channels

    def forward(self, input: Tensor) -> Tensor:
        result = self.block(input)
        if self.use_res_connect:
            result = self.stochastic_depth(result)
            result += input
        return result


class EfficientNet(nn.Module):
    def __init__(
        self,
        inverted_residual_setting: List[MBConvConfig],
        dropout: float,
        stochastic_depth_prob: float = 0.2,
        num_classes: int = 1000,
        block: Optional[Callable[..., nn.Module]] = None,
        norm_layer: Optional[Callable[..., nn.Module]] = None,
        **kwargs: Any,
    ) -> None:
        """
        EfficientNet main class

        Args:
            inverted_residual_setting (List[MBConvConfig]): Network structure
            dropout (float): The droupout probability
            stochastic_depth_prob (float): The stochastic depth probability
            num_classes (int): Number of classes
            block (Optional[Callable[..., nn.Module]]): Module specifying inverted residual building block for mobilenet
            norm_layer (Optional[Callable[..., nn.Module]]): Module specifying the normalization layer to use
        """
        super().__init__()

        if not inverted_residual_setting:
            raise ValueError("The inverted_residual_setting should not be empty")
        elif not (
            isinstance(inverted_residual_setting, Sequence)
            and all([isinstance(s, MBConvConfig) for s in inverted_residual_setting])
        ):
            raise TypeError("The inverted_residual_setting should be List[MBConvConfig]")

        if block is None:
            block = MBConv

        if norm_layer is None:
            norm_layer = nn.BatchNorm2d

        layers: List[nn.Module] = []

        # building first layer
        firstconv_output_channels = inverted_residual_setting[0].input_channels
        layers.append(
            ConvNormActivation(
                3, firstconv_output_channels, kernel_size=3, stride=2, norm_layer=norm_layer, activation_layer=nn.SiLU
            )
        )

        # building inverted residual blocks
        total_stage_blocks = sum(cnf.num_layers for cnf in inverted_residual_setting)
        stage_block_id = 0
        for cnf in inverted_residual_setting:
            stage: List[nn.Module] = []
            for _ in range(cnf.num_layers):
                # copy to avoid modifications. shallow copy is enough
                block_cnf = copy.copy(cnf)

                # overwrite info if not the first conv in the stage
                if stage:
                    block_cnf.input_channels = block_cnf.out_channels
                    block_cnf.stride = 1

                # adjust stochastic depth probability based on the depth of the stage block
                sd_prob = stochastic_depth_prob * float(stage_block_id) / total_stage_blocks

                stage.append(block(block_cnf, sd_prob, norm_layer))
                stage_block_id += 1

            layers.append(nn.Sequential(*stage))

        # building last several layers
        lastconv_input_channels = inverted_residual_setting[-1].out_channels
        lastconv_output_channels = 4 * lastconv_input_channels
        layers.append(
            ConvNormActivation(
                lastconv_input_channels,
                lastconv_output_channels,
                kernel_size=1,
                norm_layer=norm_layer,
                activation_layer=nn.SiLU,
            )
        )

        self.features = nn.Sequential(*layers)
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.classifier = nn.Sequential(
            nn.Dropout(p=dropout, inplace=True),
            nn.Linear(lastconv_output_channels, num_classes),
        )

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode="fan_out")
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                init_range = 1.0 / math.sqrt(m.out_features)
                nn.init.uniform_(m.weight, -init_range, init_range)
                nn.init.zeros_(m.bias)

    def _forward_impl(self, x: Tensor) -> Tensor:
        channels = self.features(x)

        x = self.avgpool(channels)
        features = torch.flatten(x, 1)

        x = self.classifier(features)

        return x, features

    def forward(self, x: Tensor) -> Tensor:
        return self._forward_impl(x)
    
    
def _efficientnet(
    arch: str,
    width_mult: float,
    depth_mult: float,
    dropout: float,
    pretrained: bool,
    progress: bool,
    **kwargs: Any,
) -> EfficientNet:
    bneck_conf = partial(MBConvConfig, width_mult=width_mult, depth_mult=depth_mult)
    inverted_residual_setting = [
        bneck_conf(1, 3, 1, 32, 16, 1),
        bneck_conf(6, 3, 2, 16, 24, 2),
        bneck_conf(6, 5, 2, 24, 40, 2),
        bneck_conf(6, 3, 2, 40, 80, 3),
        bneck_conf(6, 5, 1, 80, 112, 3),
        bneck_conf(6, 5, 2, 112, 192, 4),
        bneck_conf(6, 3, 1, 192, 320, 1),
    ]
    model = EfficientNet(inverted_residual_setting, dropout, **kwargs)
    if pretrained:
        if model_urls.get(arch, None) is None:
            raise ValueError(f"No checkpoint is available for model type {arch}")
        state_dict = load_state_dict_from_url(model_urls[arch], progress=progress)
        model.load_state_dict(state_dict)
    return model

def efficientnet_b0(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> EfficientNet:
    """
    Constructs a EfficientNet B0 architecture from
    `"EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks" <https://arxiv.org/abs/1905.11946>`_.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _efficientnet("efficientnet_b0", 1.0, 1.0, 0.2, pretrained, progress, **kwargs)


def efficientnet_b1(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> EfficientNet:
    """
    Constructs a EfficientNet B1 architecture from
    `"EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks" <https://arxiv.org/abs/1905.11946>`_.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _efficientnet("efficientnet_b1", 1.0, 1.1, 0.2, pretrained, progress, **kwargs)


def efficientnet_b2(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> EfficientNet:
    """
    Constructs a EfficientNet B2 architecture from
    `"EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks" <https://arxiv.org/abs/1905.11946>`_.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _efficientnet("efficientnet_b2", 1.1, 1.2, 0.3, pretrained, progress, **kwargs)


def efficientnet_b3(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> EfficientNet:
    """
    Constructs a EfficientNet B3 architecture from
    `"EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks" <https://arxiv.org/abs/1905.11946>`_.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _efficientnet("efficientnet_b3", 1.2, 1.4, 0.3, pretrained, progress, **kwargs)


def efficientnet_b4(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> EfficientNet:
    """
    Constructs a EfficientNet B4 architecture from
    `"EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks" <https://arxiv.org/abs/1905.11946>`_.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _efficientnet("efficientnet_b4", 1.4, 1.8, 0.4, pretrained, progress, **kwargs)


def efficientnet_b5(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> EfficientNet:
    """
    Constructs a EfficientNet B5 architecture from
    `"EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks" <https://arxiv.org/abs/1905.11946>`_.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _efficientnet(
        "efficientnet_b5",
        1.6,
        2.2,
        0.4,
        pretrained,
        progress,
        norm_layer=partial(nn.BatchNorm2d, eps=0.001, momentum=0.01),
        **kwargs,
    )



def efficientnet_b6(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> EfficientNet:
    """
    Constructs a EfficientNet B6 architecture from
    `"EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks" <https://arxiv.org/abs/1905.11946>`_.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _efficientnet(
        "efficientnet_b6",
        1.8,
        2.6,
        0.5,
        pretrained,
        progress,
        norm_layer=partial(nn.BatchNorm2d, eps=0.001, momentum=0.01),
        **kwargs,
    )



def efficientnet_b7(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> EfficientNet:
    """
    Constructs a EfficientNet B7 architecture from
    `"EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks" <https://arxiv.org/abs/1905.11946>`_.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _efficientnet(
        "efficientnet_b7",
        2.0,
        3.1,
        0.5,
        pretrained,
        progress,
        norm_layer=partial(nn.BatchNorm2d, eps=0.001, momentum=0.01),
        **kwargs,
    )

In [None]:
class WDModel(LightningModule):
    def __init__(self, weights=torch.ones(CFG.N_CLASSES)):
        super().__init__()
        #self.save_hyperparameters()
        self.model = efficientnet_b0(True)
        self.classification_head = nn.Linear(1280, CFG.N_CLASSES)
        #self.criterion = LabelSmoothingCrossEntropy(0.0)
        self.criterion = nn.CrossEntropyLoss(weight=torch.tensor(weights))
        
        acc = torchmetrics.Accuracy()
        # use .clone so that each metric can maintain its own state
        self.train_acc = acc.clone()
        # assign all metrics as attributes of module so they are detected as children
        self.val_acc = acc.clone()
        
        self.validation_outputs = []
    
    def forward(self, x):
        _, x = self.model(x)
        x = self.classification_head(x)
        return x

In [None]:
train_data = pd.read_csv("../input/w-d-classification/preds.csv")
train_data.index = train_data["image"].str[:-3] + "jpg"
sample_submission = pd.read_csv("../input/w-d-224x224-fast-dataset/sample_submission.csv")
sample_submission.index = sample_submission["image"].str[:-3] + "jpg"

# Inference 🔮
For each model, I want to make 4 loaders for:
1. The raw training images
2. The cropped training images
3. The raw test images
4. The cropped test images

<br>
The raw training images have already a prediction coming from the training notebook, so all I have left to do for each model is:<br>

1. Make predictions on the cropped training images that were not used for training
2. Make predictions on the raw test images
3. Make predictions on the cropped test images

In [None]:
test_dataset = WandDLoader(sample_submission, "test")
test_crop_dataset = WandDLoader(sample_submission, "test", True)

test_loader = torch.utils.data.DataLoader(
    test_dataset,
    batch_size=CFG.batch_size,
    num_workers=CFG.num_workers,
    pin_memory=True,
    shuffle=False
)
test_crop_loader = torch.utils.data.DataLoader(
    test_crop_dataset,
    batch_size=CFG.batch_size,
    num_workers=CFG.num_workers,
    pin_memory=True,
    shuffle=False
)

test_preds = []
test_crop_preds = []
cropped_species_pred = np.zeros((len(train_data), CFG.N_CLASSES), dtype=float)
for i in range(CFG.FOLDS):
    base_dir = f"../input/w-d-classification/model-wandd-species-fold{i}-val/"
    file = sorted(os.listdir(base_dir))[0]
    path = os.path.join(base_dir, file)

    model = WDModel.load_from_checkpoint(path)
    trainer = Trainer(
        gpus=1
    )
    
    train_crop_dataset = WandDLoader(train_data[train_data["fold"] == i], "train", True)
    train_crop_loader = torch.utils.data.DataLoader(
        train_crop_dataset,
        batch_size=CFG.batch_size,
        num_workers=CFG.num_workers,
        pin_memory=True,
        shuffle=False
    )
    
    preds = trainer.predict(model, dataloaders=train_crop_loader)
    preds = torch.cat(preds, dim=0)
    cropped_species_pred[train_data["fold"] == i] = preds.numpy().tolist()
    
    preds = trainer.predict(model, dataloaders=test_loader)
    preds = torch.cat(preds, dim=0)
    test_preds.append(preds.numpy())
    
    preds = trainer.predict(model, dataloaders=test_crop_loader)
    preds = torch.cat(preds, dim=0)
    test_crop_preds.append(preds.numpy())
    
sample_submission["species_pred"] = np.mean(test_preds, axis=0).tolist()
sample_submission["cropped_species_pred"] = np.mean(test_crop_preds, axis=0).tolist()
train_data["cropped_species_pred"] = cropped_species_pred.tolist()

In [None]:
# Apply softmax to logits
train_data["species_pred"] = train_data["species_pred"].map(eval).map(np.array).map(lambda x: x - x.max()).map(np.exp).map(lambda x: x/x.sum())
train_data["cropped_species_pred"] = train_data["cropped_species_pred"].map(np.array).map(lambda x: x - x.max()).map(np.exp).map(lambda x: x/x.sum())
sample_submission["species_pred"] = sample_submission["species_pred"].map(np.array).map(lambda x: x - x.max()).map(np.exp).map(lambda x: x/x.sum())
sample_submission["cropped_species_pred"] = sample_submission["cropped_species_pred"].map(np.array).map(lambda x: x - x.max()).map(np.exp).map(lambda x: x/x.sum())

In [None]:
def plot_flagged_samples(flagged_samples):
    print("Number of flagged samples:", len(flagged_samples))
    print("Original image")
    plot_images(flagged_samples, row=3, col=4, base_path="../input/w-d-224x224-fast-dataset/train_images")
    print("Cropped")
    plot_images(flagged_samples, row=3, col=4, base_path="../input/w-d-fast-224x224-cropped-dataset/train_images")

# Compute the metrics 📈
For each metric, we will compute the raw metric of the image for both original and cropped images, and the difference between the two.<br>

## 1. Max confidence
The first metric we will have a look at will be the value of the maximum confidence. It's expected to be lower for OOD samples.<br>

## 2. Entropy
The second metric we will use is the measure of the shannon entropy of the prediction distribution. To give some intuition, a large value of entropy means that the prediction is not confident towards any class while a low value of entropy means a big confidence towards one class.

In [None]:
"""
    Compute the metrics
"""

def compute_max_confidence_metrics(data):
    data["pred_max_conf"] = data["species_pred"].map(np.max)
    data["cropped_pred_max_conf"] = data["cropped_species_pred"].map(np.max)
    data["pred_max_conf_delta"] = data["pred_max_conf"] - data["cropped_pred_max_conf"]

def compute_entropy_metrics(data):
    data["pred_entropy"] = data["species_pred"].map(lambda x: -np.sum(x*np.log2(x)))
    data["cropped_pred_entropy"] = data["cropped_species_pred"].map(lambda x: -np.sum(x*np.log2(x)))
    data["pred_entropy_delta"] = data["pred_entropy"] - data["cropped_pred_entropy"]
    
compute_max_confidence_metrics(train_data)
compute_max_confidence_metrics(sample_submission)
compute_entropy_metrics(train_data)
compute_entropy_metrics(sample_submission)

In [None]:
# Save the results
train_data.to_csv("train.csv", index=False)
sample_submission.to_csv("test.csv", index=False)

# Examples of flagged samples 🔎

In [None]:
n_flags = np.zeros(len(train_data))

## Flag based on the predicted class confidence

In [None]:
cropped_pred_max_conf = train_data["cropped_pred_max_conf"].values
plt.title("Distribution of the metric in the dataset")
plt.hist(cropped_pred_max_conf)
plt.show()
m = train_data["cropped_pred_max_conf"] <= np.quantile(cropped_pred_max_conf, CFG.FLAG_QUANTILE)
flagged_samples = train_data[m]
n_flags[m] += 1
plot_flagged_samples(flagged_samples)

## Flag based on the delta between the predicted class confidence

In [None]:
pred_max_conf_delta = train_data["pred_max_conf_delta"].values
plt.title("Distribution of the metric in the dataset")
plt.hist(pred_max_conf_delta)
plt.show()
m = train_data["pred_max_conf_delta"] > np.quantile(pred_max_conf_delta, 1 - CFG.FLAG_QUANTILE)
flagged_samples = train_data[m]
n_flags[m] += 1
plot_flagged_samples(flagged_samples)

## Flag based on the entropy of the prediction distribution

In [None]:
cropped_pred_entropy = train_data["cropped_pred_entropy"].values
plt.title("Distribution of the metric in the dataset")
plt.hist(cropped_pred_entropy)
plt.show()
m = train_data["cropped_pred_entropy"] > np.quantile(cropped_pred_entropy, 1 - CFG.FLAG_QUANTILE)
flagged_samples = train_data[m]
n_flags[m] += 1
plot_flagged_samples(flagged_samples)

## Flag based on the delta between the entropy of the prediction distribution

In [None]:
pred_entropy_delta = train_data["pred_entropy_delta"].values
plt.title("Distribution of the metric in the dataset")
plt.hist(pred_entropy_delta)
plt.show()
m = pred_entropy_delta <= np.quantile(pred_entropy_delta, CFG.FLAG_QUANTILE)
flagged_samples = train_data[m]
n_flags[m] += 1
plot_flagged_samples(flagged_samples)

# Ensembling the flags 🚩
Now that we computed the different metrics, we can try to filter some false positive by removing only the samples that got flagged more than N times 

In [None]:
train_data["n_flags"] = n_flags
flagged_samples = train_data[train_data["n_flags"] >= CFG.N_FLAGS]
plot_flagged_samples(flagged_samples)

# Conclusion 🤷
🤓  We can see that we still have some false positives with the proposed method of filtering, but hyperparameters can be tweaked depending on the usage. The result of this notebook could be used with very conservative parameters to create a dataset of valid bounding boxes on which we could train an other YOLOv5 model.<br>
📈  Results could be enhanced by using other models or other OOD distribution metrics.<br>

👍  If you found this notebook helpful please consider giving an upvote, and if you disagree with the content, I'll be pleased to dicsuss it with you in the coments.<br>

😊  Happy Kaggling everyone !