# eICU Collaborative Research Database

Before starting this workshop, you will need to copy the eicu demo database file ('eicu_demo.sqlite3') to the `data` directory.

Documentation on the eICU Collaborative Research Database can be found at: http://eicu-crd.mit.edu/.

In [None]:
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import sqlite3
import os

In [None]:
# Plot settings
%matplotlib inline
plt.style.use('ggplot')
fontsize = 20 # size for x and y ticks
plt.rcParams['legend.fontsize'] = fontsize
plt.rcParams.update({'font.size': fontsize})

In [None]:
# Connect to the database
fn = os.path.join('data','eicu_demo.sqlite3')
con = sqlite3.connect(fn)
cur = con.cursor()

## Choose a project, or try your own!

If you're interested, you could try one of the following projects:

* Congestive heart failure is a common illness for ICU patients, but the severity of the illness can vary substantially. Are there distinct subgroups among patients admitted with congestive heart failure? For example, are patients with preserved ejection fraction substantially different than those without?
* Sepsis is a life-threatening condition usually associated with infection - but little research investigates how septic patients vary based on the source of the infection. As APACHE diagnoses are organ specific (e.g. `SEPSISPULM`, `SEPSISGI`, `SEPSISUTI`), can we find any substantial differences among septic patients based upon the initial location of the infection?
* Lab measurements take up to 6 hours to measure and can be costly. Can we predict a future lab measurement based upon previous measures and simultaneous non-invasive measurements?
* Dialysis is a major intervention provided in the ICU, and it serves to replace the function of the kidneys (cleaning the blood). Can we predict if a patient will receive dialysis from physiology?
* Mechanical ventilation is a similar treatment used to replace the function of the lungs. This intervention eases the work of the breathing, allowing the patient's own lungs to heal. However, knowing when to cease mechanical ventilation (extubate) is an open problem, and clinicians largely utilize a "try and see" approach. Can we predict future extubation from the patient's current data?
* Missing data is extremely common in clinical data. Measurements are sparse, noisy, irregularly sampled, and not missing at random. Gaussian Processes are an elegant technique for handling difficult data scenarious such as this. As many algorithms require complete data, techniques to impute realistic values for missing data are increasingly relevant. Can we use a Gaussian Process to estimate the trajectory of physiologic data and as a generative model for imputing missing data?

## Load data from all tables

If you prefer a purely python approach, you can load in all the tables using the below code. It should take only 1-2 minutes. If you would like to speed it up further, comment out the tables you don't need.

In [None]:
admissiondx = pd.read_sql_query("select * from admissiondx", con)
apacheapsvar = pd.read_sql_query("select * from apacheapsvar", con)
apachepatientresult = pd.read_sql_query("select * from apachepatientresult", con)
apachepredvar = pd.read_sql_query("select * from apachepredvar", con)
careplancareprovider = pd.read_sql_query("select * from careplancareprovider", con)
careplaneol = pd.read_sql_query("select * from careplaneol", con)
careplangeneral = pd.read_sql_query("select * from careplangeneral", con)
careplangoal = pd.read_sql_query("select * from careplangoal", con)
careplaninfectiousdisease = pd.read_sql_query("select * from careplaninfectiousdisease", con)
diagnosis = pd.read_sql_query("select * from diagnosis", con)
lab = pd.read_sql_query("select * from lab", con)
pasthistory = pd.read_sql_query("select * from pasthistory", con)
patient = pd.read_sql_query("select * from patient", con)
treatment = pd.read_sql_query("select * from treatment", con)
vitalaperiodic = pd.read_sql_query("select * from vitalaperiodic", con)
vitalperiodic = pd.read_sql_query("select * from vitalperiodic", con)