## <font color='#ff0000'>Overview</font>


<img src="https://i.imgur.com/ieiJVoI.jpg" alt="Xray.png">


            Covid-19 is a contagious disease. Many people lost their lives because of covid-19. Only way to stop this      
            
            disease is maintaining social distancing. This disease had spread worldwide and the world is in financial 
            
            crisis, many lost their jobs.
            
            Symptoms of covid-19 are variable, often cases include
            
            * Fever
            
            * Cough
            
            * Fatigue
            
            * Breathing difficulties
            
            * Loss of smell, taste
            
            One method of detecting covid-19 is from chest X-rays scans of a patients.
            
            In this competition, we’ll identify and localize COVID-19 abnormalities on chest radiographs. In particular, 
            
            we'll categorize the radiographs as negative for pneumonia or typical, indeterminate, or atypical for 
            
            COVID - 19. This is an object detection and classification problem.

## <font color='#ff0000'>Data description</font>

In [None]:
import warnings # importing warnings module
warnings.filterwarnings('ignore') # we will ignore all the warnings generated

In [None]:
import os
path = '../input/siim-covid19-detection' # specifying the path
print(os.listdir(path)) # printing all the files in the given path

        The dataset has 5 files namely sample_submission.csv, train_image_level.csv, test,train,train_study_level.csv.

## <font color='#ff0000'>Loading data</font>

In [None]:
import pandas as pd
train_image_level  = pd.read_csv(path+'/train_image_level.csv') #read_csv is used to read the .csv file
train_image_level.head()  # head() is used to print the first five rows

    train_image_level.csv - 
    
        the train image-level metadata, with one row for each image, including both correct labels and 
    
        any bounding boxes in a dictionary format. Some images in both test and train have multiple bounding boxes.
    
    columns :
    
        id - unique image identifier

        boxes - bounding boxes in easily-readable dictionary format

        label - the correct prediction label for the provided bounding boxes

In [None]:
train_study_level  = pd.read_csv(path+'/train_study_level.csv') #read_csv is used to read the .csv file
train_study_level.head() # head() is used to print the first five rows

    train_study_level.csv - 
        
        the train study-level metadata, with one row for each study, including correct labels.
        
    columns :
        
         id - unique study identifier
            
        Negative for Pneumonia - 1 if the study is negative for pneumonia, 0 otherwise

        Typical Appearance - 1 if the study has this appearance, 0 otherwise

        Indeterminate Appearance  - 1 if the study has this appearance, 0 otherwise

        Atypical Appearance  - 1 if the study has this appearance, 0 otherwise


## <font color='#ff0000'>Distribution of classes</font>

In [None]:
import plotly.express as pe # plotly.express is used for beautiful visualization
count_of_classes = train_study_level.loc[ : , train_study_level.columns != 'id'].sum() #get the sum of the classes
column_names = train_study_level.columns[1:] # creating a list with the column names
data = {'Count_of_classes':list(count_of_classes),
        'class_names':list(column_names)}
df = pd.DataFrame(data) # creating a dataframe with count of classes and column names
pie_chart = pe.pie(df, values='Count_of_classes', names='class_names', title='Distribution of classes in data')
pie_chart.show() # displaying the pie chart

        Typical appearances are most with 47.2 percentile and atypical appearances are least with 7.83 percentile.
        
        key take away :
        
            Data imbalance is there in the given data

## <font color='#ff0000'>Plotting raw images</font>

In [None]:
# reference : https://www.kaggle.com/trungthanhnguyen0502/eda-vinbigdata-chest-x-ray-abnormalities
from glob import glob
import pydicom
from pydicom.pixel_data_handlers.util import apply_voi_lut
import numpy as np
import matplotlib.pyplot as plt
import cv2
def dicom_images(path, voi_lut=True, fix_monochrome=True):
    '''This function is used to convert the dcom images '''
    dicom = pydicom.read_file(path) # readingg the dcom files
    # VOI LUT (if available by DICOM device) is used to
    # transform raw DICOM data to "human-friendly" view
    if voi_lut:
        data = apply_voi_lut(dicom.pixel_array, dicom)
    else:
        data = dicom.pixel_array
    # depending on this value, X-ray may look inverted - fix that:
    if fix_monochrome and dicom.PhotometricInterpretation == "MONOCHROME1":
        data = np.amax(data) - data
    data = data - np.min(data)
    data = data / np.max(data)
    data = (data * 255).astype(np.uint8)
    return data
        
    
def plot_img(img, size=(7, 7), is_rgb=True, title="", cmap='gray'):
    '''This function is used to plot the images'''
    plt.figure(figsize=size)
    plt.imshow(img, cmap=cmap)
    plt.suptitle(title)
    plt.show()


def plot_imgs(imgs, cols=4, size=7, is_rgb=True, title="", cmap='gray', img_size=(500,500)):
    rows = len(imgs)//cols + 1
    fig = plt.figure(figsize=(cols*size, rows*size))
    for i, img in enumerate(imgs):
        if img_size is not None:
            img = cv2.resize(img, img_size)
        fig.add_subplot(rows, cols, i+1)
        plt.imshow(img, cmap=cmap)
    plt.suptitle(title)
    plt.show()

In [None]:
image_paths = glob(f'{path}/train/*/*/*.dcm') # specifying the image paths
imgs = [dicom_images(path) for path in image_paths[:4]]
plot_imgs(imgs) # plotting images

## <font color='#ff0000'>Preprocessing raw images - [Convert dicom to np.array - the correct way]</font>

    Thanks to this great idea : https://www.kaggle.com/raddar/convert-dicom-to-np-array-the-correct-way

    raw dicom data is not actually linearly convertable to "human-friendly" png/jpg. In fact, most of DICOM's store pixel 
    
    values in exponential scale, which is resolved by standard standard DICOM viewers.So in order to get jpg/png as 
    
    radiologists would initially see in their workspace, you need to apply some transformations. DICOM metadata stores 
    
    information how to make such "human-friendly" transformations.

In [None]:
image_paths = glob(f'{path}/train/*/*/*.dcm') # path
imgs = [dicom_images(path,fix_monochrome=False) for path in image_paths[:4]]
plot_imgs(imgs) # plotting images

        with fix_monochrome = False, images are clearer than the raw images

## <font color='#ff0000'>Preprocessing raw images - trick (2) [Histogram equalization]</font>

In [None]:
from skimage import exposure
imgs = [exposure.equalize_hist(img) for img in imgs] # equalizing histograms
plot_imgs(imgs)

        With histogram equalization, images are more clearer than the raw images.

## <font color='#ff0000'>Plotting bounding boxes</font>

In [None]:
train_study_level['id'][:5] # displaying 1st five rows of id column

In [None]:
train_image_level['StudyInstanceUID'][:5] # displaying 1st five rows of id column

        If we observe clearly , in both the dataframe in id , we have suffixes different, we will remove those suffix and 
         
        merge the dataframes

In [None]:
train_study_level['StudyInstanceUID'] = train_study_level['id'].apply(lambda x: x.replace('_study', '')) # replacing the suffix with ''
del train_study_level['id'] # deleting the id column in one dataframe
train = train_study_level.merge(train_image_level, on='StudyInstanceUID') # merging the dataframes
train.head()

In [None]:
# reference : https://www.kaggle.com/yujiariyasu/plot-3positive-classes by YujiAriyasu
class_names = ['Typical Appearance', 'Indeterminate Appearance', 'Atypical Appearance']
imgs = []

imgs = []
label2color = {
    '[1, 0, 0]': [0,255,0], # Typical Appearance
}

thickness = 3
scale = 5

for _, row in train[train['Typical Appearance'] == 1].iloc[:16].iterrows():
    study_id = row['StudyInstanceUID']
    img_path = glob(f'{path}/train/{study_id}/*/*')[0]
    img = dicom_images(path=img_path)
    img = cv2.resize(img, None, fx=1/scale, fy=1/scale)
    img = np.stack([img, img, img], axis=-1)
    
    claz = row[class_names].values
    color = label2color[str(claz.tolist())]

    bboxes = []
    bbox = []
    for i, l in enumerate(row['label'].split(' ')):
        if (i % 6 == 0) | (i % 6 == 1):
            continue
        bbox.append(float(l)/scale)
        if i % 6 == 5:
            bboxes.append(bbox)
            bbox = []    
    
    for box in bboxes:
        img = cv2.rectangle(
            img,
            (int(box[0]), int(box[1])),
            (int(box[2]), int(box[3])),
            color, thickness
    )
    img = cv2.resize(img, (500,500))
    imgs.append(img)
    
plot_imgs(imgs, cmap=None)

# <font color='#0398fc'>More work in progress !.... </font>

# <font color='#ff6666'> If you like my work, please upvote it !..

# <font color='#99ccff'>   Thanks for reading ....
</font>