[<font size="5">Description of COVID-19 Detection Problem</font>](#1)

Five times more deadly than the flu, COVID-19 causes significant morbidity and mortality. Like other pneumonias, pulmonary infection with COVID-19 results in inflammation and fluid in the lungs. COVID-19 looks very similar to other viral and bacterial pneumonias on chest radiographs, which makes it difficult to diagnose. This computer vision model for detection and localization of COVID-19 would help doctors provide a quick and confident diagnosis. As a result, patients could get the right treatment before the most severe effects of the virus take hold.

Currently, COVID-19 can be diagnosed via polymerase chain reaction to detect genetic material from the virus or chest radiograph. However, it can take a few hours and sometimes days before the molecular test results are back. By contrast, chest radiographs can be obtained in minutes. While guidelines exist to help radiologists differentiate COVID-19 from other types of infection, their assessments vary. In addition, non-radiologists could be supported with better localization of the disease, such as with a visual bounding box.

In this competition, the task is to identify and localize COVID-19 abnormalities on chest radiographs. In particular, categorization of the radiographs as negative for pneumonia or typical, indeterminate, or atypical for COVID-19.

In [None]:
#first we import some important libraries
import numpy as np #for using arrays and matrices
import pandas as pd #for converting files to dataframe format, opening for example 
import matplotlib #for plotting graphs and tables
import matplotlib.pyplot as plt
import pydicom as dicom # Dicom (Digital Imaging) 
import os #allows us to use the operating system interation
from tqdm import tqdm # allows you to output a smart progress bar by wrapping around any iterable

<font size="5">Category of Radiographs </font>


1. NEGATIVE FOR PNEUMONIA - No lung opacities

2. TYPICAL APPEARANCE - Multifocal bilateral, peripheral opacities with rounded morphology, lower lung–predominant distribution

3. INDETERMINATE APPEARANCE - Absence of typical findings AND unilateral, central or upper lung predominant distribution

4. ATYPICAL APPEARANCE - Pneumothorax, pleural effusion, pulmonary edema, lobar consolidation, solitary lung nodule or mass, diffuse tiny nodules, cavity

<font size ="5"> Data Files </font>


1. train_study_level.csv - the train study-level metadata, with one row for each study, including correct labels.
    
2. train_image_level.csv - the train image-level metadata, with one row for each image, including both correct labels and any bounding boxes in a dictionary format. Some images in both test and train have multiple bounding boxes.
    
3. sample_submission.csv - a sample submission file containing all image- and study-level IDs.

4. train folder - comprises 6334 chest scans in DICOM format, stored in paths with the form study/series/image
    
5. test folder - The hidden test dataset is of roughly the same scale as the training dataset. Studies in the test set may contain more than one label.

[<font size="5">Interpretating the Problem </font>](#2)

The problem is about classification and detection and we are told to find in a test set, a *bounding box*, which is a rectangular box for identifying or even, detecting where a certain object is, and also it's said to find a class.
With the labels 'Negative for Pneumonia' 'Typical Appearance' 'Indeterminate Appearance' 'Atypical Appearance', we have to make a prediction string, for example:
> atypical 1 0 0 1 1


In [None]:
path = '/kaggle/input/siim-covid19-detection/'
train_image_level = pd.read_csv(path + "train_image_level.csv")


In [None]:
identity = train_image_level["id"] #we will write an specific column, in this case id
print(identity)

#if we wanted to show the first two columns we would use train_image_level["id", "boxes"]



In [None]:
train_image_level.describe()
train_image_level.head(8) #we show the first 8 lines


In [None]:
train_image_level.loc[0,:] #for showing the first row in all columsn
#train_image_level.loc[row, column]


[<font size="5">Plotting in the Graphic Form </font>](#3)

In [None]:
#extraction of an image 

def extract_image(i):
    path_train = path + 'train/' + training_set.loc[i, 'StudyInstanceUID']
    last_folder_in_path = os.listdir(path_train)[0]
    path_train = path_train + '/{}/'.format(last_folder_in_path)
    img_id = training_set.loc[i, 'id_y'].replace('_image','.dcm')
    print(img_id)
    data_file = dicom.dcmread(path_train + img_id)
    img = data_file.pixel_array
    return img

In [None]:
fig, axes = plt.subplots(3,4, figsize=(20,20)) #create 4 figures in horizontal and 3 in vertical with size (x,y) = (20,20)
fig.subplots_adjust(hspace=0.1, wspace=0.1) #(hspace, wspace) = (spacebetween two graphs in vertical, spacebetween two graphs in horizontal)

#above are some explanation for the plotting in the problem
#observe that figures are blank


In [None]:
train_study_level = pd.read_csv(path + "train_study_level.csv")
train_study_level_key = train_study_level.id.str[:-6]
training_set = pd.merge(left = train_study_level, right = train_image_level, how = 'right', left_on = train_study_level_key, right_on = 'StudyInstanceUID')
training_set.drop(['id_x'], axis = 1)


In [None]:
training_set.head()

In [None]:
training_set.loc[0,'label'] #we show the row 0 with column label


In [None]:
#for a generic axes ax we have the following important functions

#ax.set_xticklabels([]) #ticks just denoting data points on axes in this case [] blank
#ax.set_yticklabels([]) #ticks just denoting data points on axes in this case [] blanck