# Version
* `v4`: **2-cls filter**
* `v5`: **2-cls filter** + [**1x1 bbox trick** 🔥](https://www.kaggle.com/c/vinbigdata-chest-xray-abnormalities-detection/discussion/211971)

# 🌟2 Class Filter🌟
Previously I have trained `YOLOv5` using `14` class data. As it creates `FP` we can tackle that just simply using a `2 class filter`. Here I'll be using 2 class model (`AUC`:`0.98`) prediction to filter out the `FP` predictions. I used `EfficientNetB6` to generate these predictions.
It should increase the score as `FP` would be reduced significantly

**Notebooks**
* [14 class train](https://www.kaggle.com/awsaf49/vinbigdata-cxr-ad-yolov5-14-class-train)
* [14 class infer](https://www.kaggle.com/awsaf49/vinbigdata-cxr-ad-yolov5-14-class-infer)

**Dataset:**
* [YOLOv5 Labels](https://www.kaggle.com/awsaf49/vinbigdata-yolo-labels-dataset)
* [1024x1024 Dataset](https://www.kaggle.com/awsaf49/vinbigdata-1024-image-dataset)
* [512x512 Dataset](https://www.kaggle.com/awsaf49/vinbigdata-512-image-dataset)
* [256x256 Dataset](https://www.kaggle.com/awsaf49/vinbigdata-512-image-dataset)
* [Original Size '.jpg'](https://www.kaggle.com/awsaf49/vinbigdata-original-image-dataset)

# Loading Package

In [1]:
import pandas as pd
import numpy as np
from glob import glob
import shutil

# Threshold For `2 Class Filter`
**NB**: The threshold was chosen arbitarily

In [2]:
low_thr  = 0.08
high_thr = 0.95

# Loading csv

In [3]:
pred_14cls = pd.read_csv('../input/vinbigdata-14-class-submission-lb0154/submission.csv')
pred_2cls = pd.read_csv('../input/vinbigdata-2class-prediction/2-cls test pred.csv')

In [4]:
pred_14cls.head()

Unnamed: 0,image_id,PredictionString
0,83caa8a85e03606cf57e49147d7ac569,0 0.2 1057 742 1328 990 3 0.8 814 1164 1737 1496
1,7550347fa2bb96c2354a3716dfa3a69c,0 0.5 1234 731 1527 1028 5 0.5 1755 1288 2201 ...
2,74b23792db329cff5843e36efb8aa65a,14 1 0 0 1 1
3,94568a546be103177cb582d3e91cd2d8,0 0.6 974 1025 1211 1343 3 0.8 650 1614 1507 1936
4,6da36354fc904b63bc03eb3884e0c35c,11 0.3 578 292 859 353 0 0.4 1076 548 1301 747...


In [5]:
pred_2cls.head()

Unnamed: 0,image_id,target
0,002a34c58c5b758217ed1f584ccbcfe9,0.013326
1,004f33259ee4aef671c2b95d54e4be68,0.037235
2,008bdde2af2462e86fd373a445d0f4cd,0.9397
3,009bc039326338823ca3aa84381f17f1,0.123799
4,00a2145de1886cb9eb88869c85d74080,0.654006


In [6]:
pred = pd.merge(pred_14cls, pred_2cls, on = 'image_id', how = 'left')
pred.head()

Unnamed: 0,image_id,PredictionString,target
0,83caa8a85e03606cf57e49147d7ac569,0 0.2 1057 742 1328 990 3 0.8 814 1164 1737 1496,0.970583
1,7550347fa2bb96c2354a3716dfa3a69c,0 0.5 1234 731 1527 1028 5 0.5 1755 1288 2201 ...,0.039873
2,74b23792db329cff5843e36efb8aa65a,14 1 0 0 1 1,0.01024
3,94568a546be103177cb582d3e91cd2d8,0 0.6 974 1025 1211 1343 3 0.8 650 1614 1507 1936,0.065679
4,6da36354fc904b63bc03eb3884e0c35c,11 0.3 578 292 859 353 0 0.4 1076 548 1301 747...,0.838772


# Before 2 Class Filter Number of `No Finding`

In [7]:
pred['PredictionString'].value_counts().iloc[[0]]

14 1 0 0 1 1    614
Name: PredictionString, dtype: int64

# 2 Class Filter + [**1x1 bbox trick** 🔥](https://www.kaggle.com/c/vinbigdata-chest-xray-abnormalities-detection/discussion/211971)

In [8]:
def filter_2cls(row, low_thr=low_thr, high_thr=high_thr):
    prob = row['target']
    if prob<low_thr:
        ## Less chance of having any disease
        row['PredictionString'] = '14 1 0 0 1 1'
    elif low_thr<=prob<high_thr:
        ## More change of having any diesease
        row['PredictionString']+=f' 14 {prob} 0 0 1 1'
    elif high_thr<=prob:
        ## Good chance of having any disease so believe in object detection model
        row['PredictionString'] = row['PredictionString']
    else:
        raise ValueError('Prediction must be from [0-1]')
    return row

In [9]:
sub = pred.apply(filter_2cls, axis=1)
sub.head()

Unnamed: 0,image_id,PredictionString,target
0,83caa8a85e03606cf57e49147d7ac569,0 0.2 1057 742 1328 990 3 0.8 814 1164 1737 1496,0.970583
1,7550347fa2bb96c2354a3716dfa3a69c,14 1 0 0 1 1,0.039873
2,74b23792db329cff5843e36efb8aa65a,14 1 0 0 1 1,0.01024
3,94568a546be103177cb582d3e91cd2d8,14 1 0 0 1 1,0.065679
4,6da36354fc904b63bc03eb3884e0c35c,11 0.3 578 292 859 353 0 0.4 1076 548 1301 747...,0.838772


# After 2 Class Filter Number of `No Finding`

In [10]:
sub['PredictionString'].value_counts().iloc[[0]]

14 1 0 0 1 1    1912
Name: PredictionString, dtype: int64

As we can see from above that applying `2 class filter` Number of `'No Finding'`increases significanly. **[549->1912]**. We can also see that `1x1 bbox trick` increases the result

In [11]:
sub[['image_id', 'PredictionString']].to_csv('submission.csv',index = False)

# Result
As we can see applying the `2 class filter` improves the result significantly, from `0.154` to `0.201`. But bear in mind that choosing the `thershold` could be a bit `tricky`.

# Please Upvote If You Have Found This Notebook Useful 😃