# Version
* `v4`: **2-cls filter**
* `v5`: **2-cls filter** + [**1x1 bbox trick** 🔥](https://www.kaggle.com/c/vinbigdata-chest-xray-abnormalities-detection/discussion/211971)

# 🌟2 Class Filter🌟
Previously I have trained `YOLOv5` using `14` class data. As it creates `FP` we can tackle that just simply using a `2 class filter`. Here I'll be using 2 class model (`AUC`:`0.98`) prediction to filter out the `FP` predictions. I used `EfficientNetB6` to generate these predictions.
It should increase the score as `FP` would be reduced significantly

**Notebooks**
* [14 class train](https://www.kaggle.com/awsaf49/vinbigdata-cxr-ad-yolov5-14-class-train)
* [14 class infer](https://www.kaggle.com/awsaf49/vinbigdata-cxr-ad-yolov5-14-class-infer)

**Dataset:**
* [YOLOv5 Labels](https://www.kaggle.com/awsaf49/vinbigdata-yolo-labels-dataset)
* [1024x1024 Dataset](https://www.kaggle.com/awsaf49/vinbigdata-1024-image-dataset)
* [512x512 Dataset](https://www.kaggle.com/awsaf49/vinbigdata-512-image-dataset)
* [256x256 Dataset](https://www.kaggle.com/awsaf49/vinbigdata-512-image-dataset)
* [Original Size '.jpg'](https://www.kaggle.com/awsaf49/vinbigdata-original-image-dataset)

# Loading Package

In [None]:
import pandas as pd
import numpy as np
from glob import glob
import shutil

# Threshold For `2 Class Filter`
**NB**: The threshold was chosen arbitarily

In [None]:
low_thr  = 0.08
high_thr = 0.95

# Loading csv

In [None]:
pred_14cls = pd.read_csv('../input/vinbigdata-14-class-submission-lb0154/submission.csv')
pred_2cls = pd.read_csv('../input/vinbigdata-2class-prediction/2-cls test pred.csv')

In [None]:
pred_14cls.head()

In [None]:
pred_2cls.head()

In [None]:
pred = pd.merge(pred_14cls, pred_2cls, on = 'image_id', how = 'left')
pred.head()

# Before 2 Class Filter Number of `No Finding`

In [None]:
pred['PredictionString'].value_counts().iloc[[0]]

# 2 Class Filter + [**1x1 bbox trick** 🔥](https://www.kaggle.com/c/vinbigdata-chest-xray-abnormalities-detection/discussion/211971)

In [None]:
def filter_2cls(row, low_thr=low_thr, high_thr=high_thr):
    prob = row['target']
    if prob<low_thr:
        ## Less chance of having any disease
        row['PredictionString'] = '14 1 0 0 1 1'
    elif low_thr<=prob<high_thr:
        ## More change of having any diesease
        row['PredictionString']+=f' 14 {prob} 0 0 1 1'
    elif high_thr<=prob:
        ## Good chance of having any disease so believe in object detection model
        row['PredictionString'] = row['PredictionString']
    else:
        raise ValueError('Prediction must be from [0-1]')
    return row

In [None]:
sub = pred.apply(filter_2cls, axis=1)
sub.head()

# After 2 Class Filter Number of `No Finding`

In [None]:
sub['PredictionString'].value_counts().iloc[[0]]

As we can see from above that applying `2 class filter` Number of `'No Finding'`increases significanly. **[549->1912]**. We can also see that `1x1 bbox trick` increases the result

In [None]:
sub[['image_id', 'PredictionString']].to_csv('submission.csv',index = False)

# Result
As we can see applying the `2 class filter` improves the result significantly, from `0.154` to `0.201`. But bear in mind that choosing the `thershold` could be a bit `tricky`.

# Please Upvote If You Have Found This Notebook Useful 😃