## 🦄 Acknowledgement
- Title...................: [Train] COVID-19 Detection using YOLOv5
- Link....................: https://www.kaggle.com/ayuraj/train-covid-19-detection-using-yolov5#%E2%98%80%EF%B8%8F-Imports-and-Setup
- Author..............: Ayush Thakur (https://www.kaggle.com/ayuraj)
- Version.............: 10

This is the inferring version of above [notebook](https://www.kaggle.com/ayuraj/train-covid-19-detection-using-yolov5#%E2%98%80%EF%B8%8F-Imports-and-Setup). Since our inferring code needs to be run with hidden dataset in a disconnected state, so we need to process test data from scratch in this code, at the same time introduce the pre-trained model and yolov5 repository. 

**Notice:**
- This code is only for image level samples, and keep the "PredictionString" of study level samples same as submission.csv.
- This notebook use yolov5s which trained 20 epochs as pretrained model, you could train your own yolov5 model with [this notebook](https://www.kaggle.com/ayuraj/train-covid-19-detection-using-yolov5#%E2%98%80%EF%B8%8F-Imports-and-Setup). If you want to improve your accuracy, consider using 5x instead of 5s. But please notice that it would take more time in training and prediction.

![img](https://user-images.githubusercontent.com/26833433/114313216-f0a5e100-9af5-11eb-8445-c682b60da2e3.png)

## ⌨️ Unzip YOLOv5

In [None]:
%cd /kaggle
!mkdir YOLO
!unzip -o input/github-yolov5/yolov5.zip -d YOLO/

In [None]:
%ls YOLO/yolov5

## 📷 Transform test data

In [None]:
!conda install '/kaggle/input/pydicom-conda-helper/libjpeg-turbo-2.1.0-h7f98852_0.tar.bz2' -c conda-forge -y
!conda install '/kaggle/input/pydicom-conda-helper/libgcc-ng-9.3.0-h2828fa1_19.tar.bz2' -c conda-forge -y
!conda install '/kaggle/input/pydicom-conda-helper/gdcm-2.8.9-py37h500ead1_1.tar.bz2' -c conda-forge -y
!conda install '/kaggle/input/pydicom-conda-helper/conda-4.10.1-py37h89c1867_0.tar.bz2' -c conda-forge -y
!conda install '/kaggle/input/pydicom-conda-helper/certifi-2020.12.5-py37h89c1867_1.tar.bz2' -c conda-forge -y
!conda install '/kaggle/input/pydicom-conda-helper/openssl-1.1.1k-h7f98852_0.tar.bz2' -c conda-forge -y

In [None]:
import numpy as np
import pandas as pd
import os
from PIL import Image
from tqdm.auto import tqdm
import pydicom
from pydicom.pixel_data_handlers.util import apply_voi_lut

In [None]:
# reference: https://www.kaggle.com/xhlulu/siim-covid-19-convert-to-jpg-256px/notebook?scriptVersionId=63196459
def read_xray(path, voi_lut = True, fix_monochrome = True):
    # Original from: https://www.kaggle.com/raddar/convert-dicom-to-np-array-the-correct-way
    dicom = pydicom.read_file(path)
    
    # VOI LUT (if available by DICOM device) is used to transform raw DICOM data to 
    # "human-friendly" view
    if voi_lut:
        data = apply_voi_lut(dicom.pixel_array, dicom)
    else:
        data = dicom.pixel_array
               
    # depending on this value, X-ray may look inverted - fix that:
    if fix_monochrome and dicom.PhotometricInterpretation == "MONOCHROME1":
        data = np.amax(data) - data
        
    data = data - np.min(data)
    data = data / np.max(data)
    data = (data * 255).astype(np.uint8)
        
    return data


def resize(array, size, keep_ratio=False, resample=Image.LANCZOS):
    # Original from: https://www.kaggle.com/xhlulu/vinbigdata-process-and-resize-to-image
    im = Image.fromarray(array)
    
    if keep_ratio:
        im.thumbnail((size, size), resample)
    else:
        im = im.resize((size, size), resample)
    
    return im

In [None]:
TEST_PATH = '/kaggle/tmp/test_data/'
os.makedirs(TEST_PATH, exist_ok=True)
dims_mapping = dict()

for dirname, _, filenames in tqdm(os.walk(f'/kaggle/input/siim-covid19-detection/test')):
    for file in filenames:
        # set keep_ratio=True to have original aspect ratio
        xray = read_xray(os.path.join(dirname, file))
        im = resize(xray, size=256)
        im.save(os.path.join(TEST_PATH, file.replace('dcm', 'jpg')))
        dims_mapping[file.replace('.dcm', '')] = xray.shape

## 🎨 Predict

In [None]:
MODEL_PATH = '/kaggle/input/yolov5s-20epochs/best.pt'
IMG_SIZE = 256

In [None]:
%cd /kaggle/YOLO/yolov5
!python detect.py --weights {MODEL_PATH} \
                  --source {TEST_PATH} \
                  --img {IMG_SIZE} \
                  --conf 0.281 \
                  --iou-thres 0.5 \
                  --max-det 3 \
                  --save-txt \
                  --save-conf

In [None]:
PRED_PATH = 'runs/detect/exp/labels'
prediction_files = os.listdir(PRED_PATH)
print('Number of test images predicted as opacity: ', len(prediction_files))

## 💾 Submit

In [None]:
# The submisison requires xmin, ymin, xmax, ymax format. 
# YOLOv5 returns x_center, y_center, width, height
def correct_bbox_format(bboxes, id_name):
    correct_bboxes = []
    H, W = dims_mapping[id_name]
    for b in bboxes:
        xc, yc = int(np.round(b[0] * W)), int(np.round(b[1] * H))
        w, h = int(np.round(b[2] * W)), int(np.round(b[3] * H))

        xmin = xc - int(np.round(w/2))
        xmax = xc + int(np.round(w/2))
        ymin = yc - int(np.round(h/2))
        ymax = yc + int(np.round(h/2))
        
        correct_bboxes.append([xmin, ymin, xmax, ymax])
        
    return correct_bboxes

# Read the txt file generated by YOLOv5 during inference and extract 
# confidence and bounding box coordinates.
def get_conf_bboxes(file_path):
    confidence = []
    bboxes = []
    with open(file_path, 'r') as file:
        for line in file:
            preds = line.strip('\n').split(' ')
            preds = list(map(float, preds))
            confidence.append(preds[-1])
            bboxes.append(preds[1:-1])
    return confidence, bboxes

In [None]:
sub_df = pd.read_csv('/kaggle/input/siim-covid19-detection/sample_submission.csv')
sub_df.tail()

In [None]:
# Prediction loop for submission
predictions = []

for i in tqdm(range(len(sub_df))):
    row = sub_df.loc[i]
    id_name = row.id.split('_')[0]
    id_level = row.id.split('_')[-1]
    
    if id_level == 'study':
        # do study-level classification
        predictions.append("negative 1 0 0 1 1") # dummy prediction
        
    elif id_level == 'image':
        # we can do image-level classification here.
        # also we can rely on the object detector's classification head.
        # for this example submisison we will use YOLO's classification head. 
        # since we already ran the inference we know which test images belong to opacity.
        if f'{id_name}.txt' in prediction_files:
            # opacity label
            confidence, bboxes = get_conf_bboxes(f'{PRED_PATH}/{id_name}.txt')
            bboxes = correct_bbox_format(bboxes, id_name)
            pred_string = ''
            for j, conf in enumerate(confidence):
                pred_string += f'opacity {conf} ' + ' '.join(map(str, bboxes[j])) + ' '
            predictions.append(pred_string[:-1]) 
        else:
            predictions.append("none 1 0 0 1 1")

In [None]:
sub_df['PredictionString'] = predictions
sub_df.to_csv('/kaggle/working/submission.csv', index=False)
sub_df.tail()