# [SIIM-FISABIO-RSNA COVID-19 Detection](https://www.kaggle.com/c/siim-covid19-detection)
> Identify and localize COVID-19 abnormalities on chest radiographs

![](https://storage.googleapis.com/kaggle-competitions/kaggle/26680/logos/header.png)

# Overview:
* Basic idea was to use **classification** model for **Study-Level** & **detection** model for **Image-Level**,

# Notebooks:

#### Study-Level:
* **train**: [SIIM-COVID-19: EffNetB6 Study-Level [train] TPU🩺](https://www.kaggle.com/awsaf49/siim-covid-19-effnetb6-study-level-train-tpu/)
* **infer**: [SIIM-COVID-19: EffNetB6 Study-Level [infer]🩺](https://www.kaggle.com/awsaf49/siim-covid-19-effnetb6-study-level-infer) [LB: **0.338**]
* **data**: [SIIM-COVID-19: 512x512 tfrec Data](https://www.kaggle.com/awsaf49/siim-covid-19-512x512-tfrec-data)

#### Image-Level:
* **train**: [SIIM-COVID-19: YOLOv5 Image-Level [train]](https://www.kaggle.com/awsaf49/siim-covid-19-yolov5-image-level-train)
* **infer**: [SIIM-COVID-19: YOLOv5 Image-Level [infer]](https://www.kaggle.com/awsaf49/siim-covid-19-yolov5-image-level-infer) **placeholder**, seems someting is wrong with `image-level` data, gives very small score `0.051`.

# Dataset:

#### JPEG
* [1024x1024](https://www.kaggle.com/awsaf49/siimcovid19-1024-jpg-image-dataset)
* [512x512](https://www.kaggle.com/awsaf49/siimcovid19-512-jpg-image-dataset)
* [256x256](https://www.kaggle.com/awsaf49/siimcovid19-256-jpg-image-dataset)

#### TFRECORD
* [1024x1024](https://www.kaggle.com/awsaf49/siimcovid19-1024x1024-tfrec-dataset)
* [512x512](https://www.kaggle.com/awsaf49/siimcovid19-512x512-tfrec-dataset)
* [256x256](https://www.kaggle.com/awsaf49/siimcovid19-256x256-tfrec-dataset)

In [None]:
!pip install -q --upgrade seaborn

In [None]:
import numpy as np, pandas as pd
from glob import glob
import shutil, os
import matplotlib.pyplot as plt
from sklearn.model_selection import GroupKFold
from tqdm.notebook import tqdm
import seaborn as sns

In [None]:
dim = 512 #512, 256, 'original'
epochs = 25
batch_size = 16
fold = 0

In [None]:
train_df = pd.read_csv(f'../input/siim-covid19-yolov5-2class-labels/meta.csv')
train_df['image_path'] = '../input/siimcovid19-512-jpg-image-dataset/train/'+train_df.image_id+'.jpg'
train_df.head()

# Split

In [None]:
gkf  = GroupKFold(n_splits = 5)
train_df['fold'] = -1
for fold, (train_idx, val_idx) in enumerate(gkf.split(train_df, groups = train_df.StudyInstanceUID.tolist())):
    train_df.loc[val_idx, 'fold'] = fold
train_df.head()

In [None]:
train_files = []
val_files   = []
val_files += list(train_df[train_df.fold==fold].image_path.unique())
train_files += list(train_df[train_df.fold!=fold].image_path.unique())
len(train_files), len(val_files)

# Copying Files

In [None]:
os.makedirs('/kaggle/working/siim-covid-19/labels/train', exist_ok = True)
os.makedirs('/kaggle/working/siim-covid-19/labels/val', exist_ok = True)
os.makedirs('/kaggle/working/siim-covid-19/images/train', exist_ok = True)
os.makedirs('/kaggle/working/siim-covid-19/images/val', exist_ok = True)
label_dir = '/kaggle//input/siim-covid19-yolov5-2class-labels/labels/'
for file in tqdm(train_files):
    shutil.copy(file, '/kaggle/working/siim-covid-19/images/train')
    filename = file.split('/')[-1].split('.')[0]
    shutil.copy(os.path.join(label_dir, filename+'.txt'), '/kaggle/working/siim-covid-19/labels/train')
    
for file in tqdm(val_files):
    shutil.copy(file, '/kaggle/working/siim-covid-19/images/val')
    filename = file.split('/')[-1].split('.')[0]
    shutil.copy(os.path.join(label_dir, filename+'.txt'), '/kaggle/working/siim-covid-19/labels/val')

# Get Class Name

In [None]:
class_ids  = {0:'opacity'}
class_names = ['opacity']

# [YOLOv5](https://github.com/ultralytics/yolov5)
![](https://user-images.githubusercontent.com/26833433/98699617-a1595a00-2377-11eb-8145-fc674eb9b1a7.jpg)
![](https://user-images.githubusercontent.com/26833433/90187293-6773ba00-dd6e-11ea-8f90-cd94afc0427f.png)

# YOLOv5 Stuff

In [None]:
from os import listdir
from os.path import isfile, join
import yaml

cwd = '/kaggle/working/'

with open(join( cwd , 'train.txt'), 'w') as f:
    for path in glob('/kaggle/working/siim-covid-19/images/train/*'):
        f.write(path+'\n')
            
with open(join( cwd , 'val.txt'), 'w') as f:
    for path in glob('/kaggle/working/siim-covid-19/images/val/*'):
        f.write(path+'\n')

data = dict(
    train =  join( cwd , 'train.txt') ,
    val   =  join( cwd , 'val.txt' ),
    nc    = 1,
    names = class_names
    )

with open(join( cwd , 'siim-covid-19.yaml'), 'w') as outfile:
    yaml.dump(data, outfile, default_flow_style=False)

f = open(join( cwd , 'siim-covid-19.yaml'), 'r')
print('\nyaml:')
print(f.read())

In [None]:
# https://www.kaggle.com/ultralytics/yolov5
# !git clone https://github.com/ultralytics/yolov5  # clone repo
# %cd yolov5
shutil.copytree('/kaggle/input/yolov5-official-v50-dataset/', '/kaggle/working/yolov5')
os.chdir('/kaggle/working/yolov5')
%pip install -qr requirements.txt # install dependencies

import torch
from IPython.display import Image, clear_output  # to display images

clear_output()
print('Setup complete. Using torch %s %s' % (torch.__version__, torch.cuda.get_device_properties(0) if torch.cuda.is_available() else 'CPU'))

In [None]:
!python detect.py --weights yolov5s.pt --img 640 --conf 0.25 --source data/images/
Image(filename='runs/detect/exp/zidane.jpg', width=600)

## Pretrained Checkpoints:

| Model | AP<sup>val</sup> | AP<sup>test</sup> | AP<sub>50</sub> | Speed<sub>GPU</sub> | FPS<sub>GPU</sub> || params | FLOPS |
|---------- |------ |------ |------ | -------- | ------| ------ |------  |  :------: |
| [YOLOv5s](https://github.com/ultralytics/yolov5/releases/tag/v3.0)    | 37.0     | 37.0     | 56.2     | **2.4ms** | **416** || 7.5M   | 13.2B
| [YOLOv5m](https://github.com/ultralytics/yolov5/releases/tag/v3.0)    | 44.3     | 44.3     | 63.2     | 3.4ms     | 294     || 21.8M  | 39.4B
| [YOLOv5l](https://github.com/ultralytics/yolov5/releases/tag/v3.0)    | 47.7     | 47.7     | 66.5     | 4.4ms     | 227     || 47.8M  | 88.1B
| [YOLOv5x](https://github.com/ultralytics/yolov5/releases/tag/v3.0)    | **49.2** | **49.2** | **67.7** | 6.9ms     | 145     || 89.0M  | 166.4B
| | | | | | || |
| [YOLOv5x](https://github.com/ultralytics/yolov5/releases/tag/v3.0) + TTA|**50.8**| **50.8** | **68.9** | 25.5ms    | 39      || 89.0M  | 354.3B
| | | | | | || |
| [YOLOv3-SPP](https://github.com/ultralytics/yolov5/releases/tag/v3.0) | 45.6     | 45.5     | 65.2     | 4.5ms     | 222     || 63.0M  | 118.0B

# Selecting Models
In this notebok I'm using `v5x`. To select your prefered model just replace `--cfg models/yolov5s.yaml --weights yolov5s.pt` with the following command:
* `v5s` : `--cfg models/yolov5s.yaml --weights yolov5s.pt`
* `v5m` : `--cfg models/yolov5m.yaml --weights yolov5m.pt`
* `v5l` : `--cfg models/yolov5l.yaml --weights yolov5l.pt`
* `v5x` : `--cfg models/yolov5x.yaml --weights yolov5x.pt`

# Train

In [None]:
# !WANDB_MODE="dryrun" python train.py --img 640 --batch 16 --epochs 3 --data coco128.yaml --weights yolov5s.pt --nosave --cache 
!WANDB_MODE="dryrun" python train.py --img $dim --batch $batch_size\
--epochs $epochs --data /kaggle/working/siim-covid-19.yaml\
--weights yolov5x.pt --cache

# Class Distribution

In [None]:
plt.figure(figsize = (20,20))
plt.axis('off')
plt.imshow(plt.imread('runs/train/exp/labels_correlogram.jpg'));

In [None]:
plt.figure(figsize = (20,20))
plt.axis('off')
plt.imshow(plt.imread('runs/train/exp/labels.jpg'));

# Batch Image

In [None]:
import matplotlib.pyplot as plt
plt.figure(figsize = (15, 15))
plt.imshow(plt.imread('runs/train/exp/train_batch0.jpg'))

plt.figure(figsize = (15, 15))
plt.imshow(plt.imread('runs/train/exp/train_batch1.jpg'))

plt.figure(figsize = (15, 15))
plt.imshow(plt.imread('runs/train/exp/train_batch2.jpg'))

# GT Vs Pred

In [None]:
fig, ax = plt.subplots(3, 2, figsize = (3*5,4*5), constrained_layout = True)
for row in range(3):
    ax[row][0].imshow(plt.imread(f'runs/train/exp/test_batch{row}_labels.jpg'))
    ax[row][0].set_xticks([])
    ax[row][0].set_yticks([])
    ax[row][0].set_title(f'test_batch{row}.jpg', fontsize = 12)
    
    ax[row][1].imshow(plt.imread(f'runs/train/exp/test_batch{row}_pred.jpg'))
    ax[row][1].set_xticks([])
    ax[row][1].set_yticks([])
    ax[row][1].set_title(f'test_batch{row}.jpg', fontsize = 12)

# (Loss, Map) Vs Epoch

In [None]:
plt.figure(figsize=(30,15))
plt.axis('off')
plt.imshow(plt.imread('runs/train/exp/results.png'));

# Confusion Matrix

In [None]:
plt.figure(figsize=(30,15))
plt.axis('off')
plt.imshow(plt.imread('runs/train/exp/confusion_matrix.png'));

# Precision, Recall, Precision-Recall, F1 Curve

In [None]:
plt.figure(figsize=(2*10, 2*8))
for idx, tag in enumerate(['P', 'R', 'PR', 'F1']):
    plt.subplot(2, 2, idx+1)
    plt.imshow(plt.imread(f'runs/train/exp/{tag}_curve.png'));
    plt.axis('OFF')
    plt.title(tag, fontsize=15)
plt.tight_layout()
plt.show()

# Removing Files

In [None]:
shutil.rmtree('/kaggle/working/siim-covid-19')
shutil.rmtree('runs/detect')
for file in (glob('**/*.png', recursive = True)+glob('**/*.jpg', recursive = True)):
    os.remove(file)