# COVID-19 Detection on Chest Radiographs
**Please upvote if you found the notebook helpful :)**

In this competition, you’ll identify COVID-19 abnormalities on chest radiographs. In particular, you'll categorize the radiographs as **negative** for pneumonia or **typical**, **indeterminate**, or **atypical** for COVID-19. 

If successful, you'll help radiologists diagnose the millions of COVID-19 patients more confidently and quickly. This will also enable doctors to see the extent of the disease and help them make decisions regarding treatment. 

dataset files used: 
* **rain_study_level.csv**:  the train study-level metadata, with one row for each study, including correct labels.

* **train_image_level.csv**: the train image-level metadata, with one row for each image, including both correct labels and any bounding boxes in a dictionary format. Some images in both test and train have multiple bounding boxes.

* **sample_submission.csv**: a sample submission file containing all image- and study-level IDs.

**Please upvote if you found it useful to you**

In [None]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
from glob import glob
import pydicom
from pydicom.pixel_data_handlers.util import apply_voi_lut
from skimage import exposure
import ast
import cv2
import warnings
import os


# Loading and exploring data (EDA)

In [None]:
files_path = '../input/siim-covid19-detection'
train_img_df = pd.read_csv("../input/siim-covid19-detection/train_image_level.csv")
train_study_data = pd.read_csv("../input/siim-covid19-detection/train_study_level.csv")

| id    | unique study identifier                                      |
|-------|--------------------------------------------------------------|
| boxes | bounding boxes in easily-readable dictionary format          |
| label | the correct prediction label for the provided bounding boxes |

# train_image_level.csv file

In [None]:
train_img_df.head()

**Total (rows, cols)**

In [None]:
train_img_df.shape

In [None]:
train_img_df.info()

In [None]:
# Count missing values
train_img_df.isnull().sum()

In [None]:
# missing values display

train_img_df[train_img_df.isnull().any(axis=1)].head()

2,040 null values on boxes column  

# **train_study_level.csv file**

In [None]:
train_study_data = pd.read_csv("../input/siim-covid19-detection/train_study_level.csv")

In [None]:
train_study_data.head()

**Total (rows, cols)**

In [None]:
train_study_data.shape

In [None]:
train_study_data.info()

In [None]:
study_result = ['Negative for Pneumonia',
                'Typical Appearance',
                'Indeterminate Appearance',
                 'Atypical Appearance']
np.unique(train_study_data[study_result].values, axis=0)


**train_study_level.csv:**


| id                       | unique study identifier                                  | value                     |
|--------------------------|----------------------------------------------------------|---------------------------|
| Negative for Pneumonia   | 1: if the study is negative for pneumonia, 0: otherwise  | 1	0	0	0 |
| Typical Appearance       | 1: if the study has this appearance, 0: otherwise        | 0	1	0	0 |
| Indeterminate Appearance | 1: if the study has this appearance, 0: otherwise        | 0	0	1	0 |
| Atypical Appearance      | 1: if the study has this appearance, 0: otherwise        | 0	0	0	1 |

In [None]:
plt.figure(figsize = (10,5))
plt.bar([1,2,3,4], train_study_data[study_result].values.sum(axis=0))
plt.xticks([1,2,3,4],study_result)
plt.ylabel('Frequency')
plt.show()

In [None]:

def img_list(path, voi_lut=True, fix_monochrome=True):
    dicom = pydicom.read_file(path)
    if voi_lut:
        data = apply_voi_lut(dicom.pixel_array, dicom)
    else:
        data = dicom.pixel_array
    if fix_monochrome and dicom.PhotometricInterpretation == "MONOCHROME1":
        data = np.amax(data) - data
    data = data - np.min(data)
    data = data / np.max(data)
    data = (data * 255).astype(np.uint8)
    return data
        
    
def plot_img(img, size=(7, 7), is_rgb=True,
             title="", cmap='gray'):
    plt.figure(figsize=size)
    plt.imshow(img, cmap=cmap)
    plt.suptitle(title)
    plt.show()

def plot_imgs(imgs, cols=5, size=7, is_rgb=True, title="",
              cmap='gray', img_size=(500,500)):
    rows = len(imgs)//cols + 1
    fig = plt.figure(figsize=(cols*size, rows*size))
    for i, img in enumerate(imgs):
        if img_size is not None:
            img = cv2.resize(img, img_size)
        fig.add_subplot(rows, cols, i+1)
        plt.imshow(img, cmap=cmap)
    plt.suptitle(title)
    plt.show()
dicom_paths = glob(f'{files_path}/train/*/*/*.dcm')
imgs = [img_list(path) for path in dicom_paths[:10]]
plot_imgs(imgs)


**Apply histogram equalization on images (contrast adjustment).**

In [None]:
imgs = [exposure.equalize_hist(img) for img in imgs]
plot_imgs(imgs)

In [None]:
boxes = ast.literal_eval(train_img_df.loc[0, 'boxes'])
boxes

In [None]:
train_study_data['StudyInstanceUID'] = train_study_data['id'].apply(lambda x: x.replace('_study', ''))
del train_study_data['id']
train_img_df = train_img_df.merge(train_study_data, on='StudyInstanceUID')
train_img_df.head()
def bar_plot(train_img_df, variable):
    var = train_img_df[variable]
    varValue = var.value_counts()
    
    # visualize
    plt.figure(figsize = (9,3))
    plt.bar(varValue.index, varValue)
    plt.xticks(varValue.index, varValue.index.values)
    plt.ylabel("Frequency")
    plt.title(variable)
    plt.show()
    print("{}: \n {}".format(variable,varValue))
    
train_img_df['target'] = 'Negative for Pneumonia'
train_img_df.loc[train_img_df['Typical Appearance']==1, 'target'] = 'Typical Appearance'
train_img_df.loc[train_img_df['Indeterminate Appearance']==1, 'target'] = 'Indeterminate Appearance'
train_img_df.loc[train_img_df['Atypical Appearance']==1, 'target'] = 'Atypical Appearance'
bar_plot(train_img_df, 'target') 

In [None]:
#train_img_df["target"].value_counts().plot(kind = 'pie', autopct='%1.1f%%', figsize=(6, 6)).legend()
train_img_df["target"].value_counts().plot(kind = 'pie', autopct='%1.1f%%', figsize=(6, 6))
print(train_img_df.target.value_counts())
#pie_plot(train_img_df, 'target')   

# Plot Images with bounding box

In [None]:
appearance_types = ['Typical Appearance','Negative for Pneumonia', 'Indeterminate Appearance', 'Atypical Appearance']
unique_classes = np.unique(train_img_df[appearance_types].values, axis=0)
imgs = []
label2color = {
    '[1, 0, 0, 0]': [0,255,0], # Typical Appearance
    '[0, 1, 0, 0]': [255,0,0], # Indeterminate Appearance
    '[0, 0, 1, 0]': [0,0,255], # Atypical Appearance
    '[0, 0, 0, 1]': [255,255,0], # Negative for Pneumonia
}
print('Typical Appearance: green')
print('Indeterminate Appearance: red')
print('Atypical Appearance: blue')
print('Negative for Pneumonia: yellow')
print(' \n  \n ')
thickness = 5
scale = 5

for _, row in train_img_df[train_img_df['Negative for Pneumonia']==0].iloc[:10].iterrows():
    study_id = row['StudyInstanceUID']
    img_path = glob(f'{files_path}/train/{study_id}/*/*')[0]
    img = img_list(path=img_path)
    img = cv2.resize(img, None, fx=1/scale, fy=1/scale)
    img = np.stack([img, img, img], axis=-1)
    
    claz = row[appearance_types].values
    color = label2color[str(claz.tolist())]

    bboxes = []
    bbox = []
    for i, l in enumerate(row['label'].split(' ')):
        if (i % 6 == 0) | (i % 6 == 1):
            continue
        bbox.append(float(l)/scale)
        if i % 6 == 5:
            bboxes.append(bbox)
            bbox = []    
    
    for box_frame in bboxes:
        img = cv2.rectangle(
            img,
            (int(box_frame[0]), int(box_frame[1])),
            (int(box_frame[2]), int(box_frame[3])),
            color, thickness
    )
    img = cv2.resize(img, (500,500))
    imgs.append(img)
    
plot_imgs(imgs, cmap=None)



# Submit the result

In [None]:
submission_df = pd.read_csv('../input/siim-covid19-detection/sample_submission.csv')
submission_df.head()

In [None]:
submission_df.to_csv('submission.csv', index=False)

**Please upvote if you found it useful to you !.**