Wei-Hung Weng (ckbjimmy [at] mit [dot] edu)
12 October 2018
- Know where to find the existing codes for similar problems
- Understand basic SQL syntax
- Extract fluid/medication information from both MIMIC/eICU databases
-
Code repository MIT-LCP/mimic-code
SELECT [columns]
FROM [table]
WHERE [condition1 AND (condition2 OR condition3)]
GROUP BY [columns]
ORDER BY [columns] [ASC/DESC]
LEFT JOIN
INNER JOIN
RIGHT JOIN
SELECT [columns]
FROM [table1] t1
LEFT JOIN [table2] t2
ON t1.[column] = t2.[column]
WHERE t1.[condition1]
The syntax WITH
can create a temporary table to break a large SQL query to smaller pieces. Here is a simple case to demonstrate how to use it. You can also create materialized views in your database, which you can literally use those views as if they are real tables.
WITH dates AS (
SELECT subject_id, admittime, deathtime, deathtime-admittime AS los
FROM admissions
WHERE deathtime IS NOT NULL
)
SELECT subject_id, los
FROM dates
In the workshop, we use the MIMIC and eICU databases located on Google Cloud Platform, and connect to them through the R package bigrquery
. For demonstration, we just extract the data from demo dataset (mimiciii_demo
and eicu_crd_demo
). Please modify the path of dataset in R and SQL scripts into the full dataset (mimiciii_clinical
and eicu_crd
) for your projects.
# Establishing GCP connection
library(bigrquery)
project_id <- "hst-953-2018"
options(httr_oauth_cache=FALSE)
run_query <- function(query){
data <- query_exec(query, project=project_id, use_legacy_sql=FALSE)
return(data)
}
# test and get auth code
run_query(paste0("SELECT count(1) FROM `physionet-data.mimiciii_demo`.`patients`"))
To make our files neat and organized, we put all SQL scripts under the folder ./sql/
. We will use the following get_sql
function to read the SQL code in the file, and pass it to the execution function run_query
later.
get_sql <- function(f){
con = file(f, "r")
sql.string <- ""
while (TRUE) {
line <- readLines(con, n=1)
if (length(line) == 0) {
break
}
line <- gsub("\\t", " ", line)
if (grepl("--", line) == TRUE) {
line <- paste(sub("--", "/*", line), "*/")
}
sql.string <- paste(sql.string, line)
}
close(con)
return(sql.string)
}
If you plan to use Python instead of R, the following chunk can help you do the same task as the above R code. You can also try colab version.
import os
import pandas as pd
from google.colab import auth
from google.cloud import bigquery
auth.authenticate_user()
project_id = 'hst-953-2018'
os.environ["GOOGLE_CLOUD_PROJECT"] = project_id
def run_query(query):
return pd.io.gbq.read_gbq(query,
project_id=project_id,
verbose=False,
configuration={'query':{'useLegacySql': False}})
def get_sql(f):
file = open(f, 'r')
s = " ".join(file.readlines())
return s
df = run_query("SELECT * FROM `physionet-data.mimiciii_demo`.`patients` limit 10")
df
q = get_sql('./sql/bq-eicu-med.sql')
df = run_query(q)
df
In the following sections, we provide some MIMIC and eICU SQL codes related to our topics, intake/output and medication, which are adopted from MIT-LCP repository but modified for the databases on Google Cloud. For people who want to run the codes on local postgreSQL server, please go back to the GitHub repo and use the original SQL. (This is because that the Google BigQuery and PostgreSQL still have some differences in syntax.)
We thank Dr. Alistair Johnson, Dr. Tom Pollar, and other MIT LCP members for their effort constructing all these code repositories and sharing their queries to us.
- Using
outputevents
table. The first example is to sum up the urine output using theoutputevents
table. There are two scripts doing the same task with slightly different code writing styles.
- concepts/fluid-balance/urine-output.sql
- concepts/pivot/pivoted-uo.sql
- You can use the
itemid
in the SQL codes for identifying urine output items
q <- get_sql("sql/bq-mimic-uo.sql")
res <- run_query(q)
head(res)
q <- get_sql("sql/bq-mimic-pivoted-uo.sql")
res2 <- run_query(q)
head(res2)
- Using
chartevents
table. The second example for MIMIC-III is to identify the duration of continuous renal replacement therapy (CRRT).
- concepts/durations/crrt-durations.sql
- You can use the
itemid
here to identify the items for hemodialysis. It summarizes all the item ids inchartevents
, CareVue and MetaVision versions of coding.
q <- get_sql("sql/bq-mimic-crrt-durations.sql")
res <- run_query(q)
head(res)
- Using
inputevents_mv
andinputevents_cv
tables. This example is to identify how many crystalloid bolus was given for each patient.
- You can use the
itemid
here to identify the items for crystalloid fluid. Crystalloid fluid is the main component of measuring fluid input.
q <- get_sql("sql/bq-mimic-crystalloid-bolus.sql")
res <- run_query(q)
head(res)
- Using
intakeoutput
table. The goal of this query is to identify the daily fluid balance (output - input) for each ICU patient.
- In
intakeoutput
table, we can simply identify intake/output items using string matching incellpath
column. - We thank Dr. Matthieu Komorowski providing the script
q <- get_sql("sql/bq-eicu-io.sql")
res <- run_query(q)
head(res)
- Using
prescriptions
table. In this example, we try to identify whether the antibiotics is used on each patient during their ICU stay (binary). We need to use string matching to identify if your target medications show up in thedrug_name_generic
column ofprescriptions
table.
- You can use the medication list here for antibiotics use
- You need to have clinicians (physicians or pharmacists) to help you ensure what medications are needed for your study. e.g.
%lol%
for beta-blockers
q <- get_sql("sql/bq-mimic-med.sql")
res <- run_query(q)
head(res)
- Using
inputevents_mv
andinputevents_cv
tables. Please check the concepts/durations/ sql codes for computing dose and duration of infusion drugs. The way to modify the code is similar to the CRRT example above.
- Using
medication
table. This is almost the same as the query for MIMICprescriptions
table, since they both need to do string matching for identifying medications.
q <- get_sql("sql/bq-eicu-med.sql")
res <- run_query(q)
head(res)
- Using
medication
table but also using HICL codes.drughiclseqno
column is considered in this case. In this example, we want to see whether the patient has received some specific infusion medications. It is still similar to the first case, but in slightly different coding style.
q <- get_sql("sql/bq-eicu-pivoted-med.sql")
res <- run_query(q)
head(res)
- Using
infusiondrug
table. This example is to identify the usage of infusion drugs (dopamine, dobutamine, norepinephrine, phenylephrine, vasopressin, milrinone, heparin) for each patient.
- concepts/pivoted/pivoted-infusion.sql
- You can use the identified strings of seven infusion drugs in the script as your infusion drug list.
- You may compare the result of this table and
medication
table. For the formal analysis, you need to consider the union of the results from two tables.
q <- get_sql("sql/bq-eicu-pivoted-infusion.sql")
res <- run_query(q)
head(res)