## COVID-19 - STAGE 0 PREDIAGNOSIS TOOL

The COVID-19 X-ray image dataset we'll be using for this tutorial was curated by Dr. Joseph Cohen, a postdoctoral fellow at the University of Montreal.

The database of COVID-19 cases contains chest X-ray or CT images.
All images and data were collected by Dr. Cohen, a postdoctoral fellow at the University of Montreal, and are publicly available in this [GitHub](https://github.com/ieee8023/covid-chestxray-dataset) repo. Thank you for this huge contribution.

Inside the repo you’ll find example of COVID-19 cases, as well as MERS, SARS, and ARDS.

Dr. Cohen paper in this [link](https://arxiv.org/pdf/2003.11597.pdf).

For healthy X-ray images, we'll be usgin from this [Kaggle](https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia) dataset with images diagnosed with pneumonia but also contains healthy diagnosis.

---

## <span style='background :yellow'> <font color='black'> The project is under construction, WIP  .. </span>

---

## LOAD DATA

---

In [1]:
# import the necessary packages
import pandas as pd

### COVID-19 Px

In [2]:
data = pd.read_csv('../covid-chestxray-dataset/metadata.csv')
data.tail(3)

Unnamed: 0,patientid,offset,sex,age,finding,survival,intubated,intubation_present,went_icu,needed_supplemental_O2,...,date,location,folder,filename,doi,url,license,clinical_notes,other_notes,Unnamed: 27
351,196,4,M,73.0,COVID-19,,Y,Y,,,...,,,images,extubation-4.jpg,,https://radiologyassistant.nl/chest/covid-19-c...,,Day 4: bilateral consolidations intubated. His...,,
352,196,8,M,73.0,COVID-19,,Y,,,,...,,,images,extubation-8.jpg,,https://radiologyassistant.nl/chest/covid-19-c...,,Day 8: bilateral consolidation. History: 73 ye...,,
353,196,13,M,73.0,COVID-19,,,,,,...,,,images,extubation-13.jpg,,https://radiologyassistant.nl/chest/covid-19-c...,,Day 13: extubation. History: 73 year old male ...,,


In [3]:
# Rows / columns
print('**********\n** Data **\n**********')
print('The dataset contains {0} observations (Px) and {1} attributes describing each of the observations'.format(data.shape[0], (data.shape[1])))

**********
** Data **
**********
The dataset contains 354 observations (Px) and 28 attributes describing each of the observations


In [4]:
# Features
print('*************************\n** Feature description **\n*************************')
data.columns

*************************
** Feature description **
*************************


Index(['patientid', 'offset', 'sex', 'age', 'finding', 'survival', 'intubated',
       'intubation_present', 'went_icu', 'needed_supplemental_O2', 'extubated',
       'temperature', 'pO2_saturation', 'leukocyte_count', 'neutrophil_count',
       'lymphocyte_count', 'view', 'modality', 'date', 'location', 'folder',
       'filename', 'doi', 'url', 'license', 'clinical_notes', 'other_notes',
       'Unnamed: 27'],
      dtype='object')

In [5]:
# Keep information regarding patients and their conditions
px_covid = data[['patientid','finding','survival','view','filename']]

# Get px with COVID-19 including COVID-19, ARDS only (i.e., ignoring MERS, SARS, and ARDS cases).
#px_covid = px_covid[(px_covid['finding'] == 'COVID-19, ARDS') & (px_covid['finding'] == 'COVID')]
px_covid = px_covid[px_covid['finding'].isin(['COVID-19', 'COVID-19, ARDS'])]

px_covid.survival.unique()

array(['Y', nan, 'N'], dtype=object)

In [6]:
px_covid.tail(3)

Unnamed: 0,patientid,finding,survival,view,filename
351,196,COVID-19,,PA,extubation-4.jpg
352,196,COVID-19,,PA,extubation-8.jpg
353,196,COVID-19,,PA,extubation-13.jpg


In [7]:
# Keep only the registers with Posterioranterior (PA) view of the lungs
px_covid = px_covid[px_covid['view'] == 'PA']

# Rows / columns
print('**********\n** Data **\n**********')
print('Processed dataset contains {0} observations (Px) and {1} attributes describing each of the observations'.format(px_covid.shape[0], (px_covid.shape[1])))
px_covid.tail(5)

**********
** Data **
**********
Processed dataset contains 153 observations (Px) and 5 attributes describing each of the observations


Unnamed: 0,patientid,finding,survival,view,filename
349,195,COVID-19,N,PA,7-fatal-covid19.jpg
350,196,COVID-19,,PA,extubation-1.jpg
351,196,COVID-19,,PA,extubation-4.jpg
352,196,COVID-19,,PA,extubation-8.jpg
353,196,COVID-19,,PA,extubation-13.jpg


In [8]:
px_covid.survival.unique()

array(['Y', nan, 'N'], dtype=object)

### Healthy Px

Take a look [here](https://nihcc.app.box.com/v/ChestXray-NIHCC) 

[NIH Chest X-ray dataset](https://cloud.google.com/healthcare/docs/resources/public-datasets/nih-chest)


In [9]:
data_normal = pd.read_csv('../covid-chestxray-dataset/Normal_Data_Entry_2017_v2020.csv')
data_normal.tail(3)

Unnamed: 0,Image Index,Finding Labels,Follow-up #,Patient ID,Patient Age,Patient Gender,View Position,OriginalImage[Width,Height],OriginalImagePixelSpacing[x,y]
112117,00030803_000.png,No Finding,0,30803,42,F,PA,2048,2500,0.168,0.168
112118,00030804_000.png,No Finding,0,30804,29,F,PA,2048,2500,0.168,0.168
112119,00030805_000.png,No Finding,0,30805,26,M,PA,2048,2500,0.171,0.171


In [10]:
print('**********\n** Data **\n**********')
print('Original dataset contains {0} **observations** (Px) and {1} attributes describing each of the observations'.format(data_normal.shape[0], (data_normal.shape[1])))
data_normal = data_normal[data_normal['Finding Labels'].isin(['No Finding'])] 
print('No Covid-19 dataset contains {0} **observations** (Px) and {1} attributes describing each of the observations'.format(data_normal.shape[0], (data_normal.shape[1])))

# Keep only the registers with Posterioranterior (PA) view of the lungs
data_normal = data_normal[data_normal['View Position'] == 'PA']
print('Processed dataset contains **{0}** observations (Px) and {1} attributes describing each of the observations'.format(data_normal.shape[0], (data_normal.shape[1])))

**********
** Data **
**********
Original dataset contains 112120 **observations** (Px) and 11 attributes describing each of the observations
No Covid-19 dataset contains 60361 **observations** (Px) and 11 attributes describing each of the observations
Processed dataset contains **39302** observations (Px) and 11 attributes describing each of the observations


.

.

.


WIP