# eICU Collaborative Research Database

# Workshop 1: Exploring the `patient` table

Before starting this workshop, you will need to copy the eicu demo database file ('eicu_demo.sqlite3') to the `data` directory.

Documentation on the eICU Collaborative Research Database can be found at: http://eicu-crd.mit.edu/. The `patient` table contains patient demographics and admission and discharge details for hospital and ICU stays. For more detail on the `patient` table, see: http://eicu-crd.mit.edu/eicutables/patient/

In [None]:
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import sqlite3
import os

In [None]:
# Plot settings
%matplotlib inline
plt.style.use('ggplot')
fontsize = 20 # size for x and y ticks
plt.rcParams['legend.fontsize'] = fontsize
plt.rcParams.update({'font.size': fontsize})

In [None]:
# Connect to the database
fn = os.path.join('data','eicu_demo.sqlite3')
con = sqlite3.connect(fn)
cur = con.cursor()

## 1. Display list of tables

In [None]:
query = \
"""
SELECT type, name
FROM sqlite_master 
WHERE type='table'
ORDER BY name;
"""

list_of_tables = pd.read_sql_query(query,con)

In [None]:
list_of_tables

## 2. Reviewing the patient table

In [None]:
# query to load data from the patient table
query = \
"""
SELECT *
FROM patient
"""

print(query)

In [None]:
# run the query and assign the output to a variable
patient_tab = pd.read_sql_query(query,con)

In [None]:
# display the first few rows of the dataframe
patient_tab.head()

In [None]:
# list all of the columns in the table
patient_tab.columns

### Questions

- What does `patientunitstayid` represent? (hint, see: http://eicu-crd.mit.edu/eicutables/patient/)
- What does `patienthealthsystemstayid` represent?
- What does `uniquepid` represent?

In [None]:
# select a limited number of columns to view
columns = ['patientunitstayid','gender','age','unitdischargestatus']
patient_tab[columns].head()

In [None]:
# what are the unique values for age?
age_col = 'age'
patient_tab[age_col].sort_values().unique()

### Questions

- Try plotting a histogram of ages using the commands in the cell below. Why does the plot fail?

```python
# try plotting a histogram of ages
figsize = (18,8)
patient_tab[age_col].plot(kind='hist',
                          figsize=figsize, 
                          fontsize=fontsize,
                          bins=15)
```

In [None]:
# create a column containing numerical ages
# If ‘coerce’, then invalid parsing will be set as NaN
agenum_col = 'age_num'
patient_tab[agenum_col] = pd.to_numeric(patient_tab[age_col], errors='coerce')
patient_tab[agenum_col].sort_values().unique()

In [None]:
# try plotting a histogram of ages
figsize = (18,8)
patient_tab[agenum_col].plot(kind='hist',
                             figsize=figsize, 
                             fontsize=fontsize,
                             bins=15)

### Questions

- Use the `mean()` method to find the mean age (hint: `patient_tab[agenum_col].mean()`). What is the mean? Why might we expect this to be lower than the true mean?
- Use the `describe()` method to explore the `admissionweight` of patients. What issue do you see? What are some methods that you could use to deal with this issue?

In [None]:
# set threshold based on 99th quantile
adweight_col = 'admissionweight'
quant = patient_tab[adweight_col].quantile(0.99)
patient_tab[patient_tab[adweight_col] > quant] = None

In [None]:
# describe the admission weights
patient_tab[adweight_col].describe()

### Questions

- What is the average change in weight between `admissionweight` and `dischargeweight`?
- Plot a distribution of the weight change

In [None]:
# set threshold based on 99th quantile
disweight_col = 'dischargeweight'
quant = patient_tab[disweight_col].quantile(0.99)
patient_tab[patient_tab[disweight_col] > quant] = None

In [None]:
# describe the discharge weights
patient_tab[disweight_col].describe()

In [None]:
patient_tab['weight_change'] = patient_tab[adweight_col] - patient_tab[disweight_col]

In [None]:
# plot the weight changes
figsize = (18,8)
patient_tab['weight_change'].plot(kind='hist',
                             figsize=figsize, 
                             fontsize=fontsize,
                             bins=50)