Skip to content

HST.953 Workshop: Using MIMIC/eICU I/O and Medication Data

License

Notifications You must be signed in to change notification settings

ckbjimmy/hst953_iomed

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Workshop - I/O and Medication

Wei-Hung Weng (ckbjimmy [at] mit [dot] edu)
12 October 2018

Objective

  • Know where to find the existing codes for similar problems
  • Understand basic SQL syntax
  • Extract fluid/medication information from both MIMIC/eICU databases

Resource

MIMIC-III

eICU

Basic SQL Syntax

SELECT [columns]
FROM   [table]
WHERE  [condition1 AND (condition2 OR condition3)]
GROUP BY [columns]
ORDER BY [columns] [ASC/DESC]

JOIN

  • LEFT JOIN
  • INNER JOIN
  • RIGHT JOIN
SELECT [columns]
FROM   [table1] t1
LEFT JOIN [table2] t2
ON t1.[column] = t2.[column]
WHERE t1.[condition1]

WITH

The syntax WITH can create a temporary table to break a large SQL query to smaller pieces. Here is a simple case to demonstrate how to use it. You can also create materialized views in your database, which you can literally use those views as if they are real tables.

WITH dates AS (
  SELECT subject_id, admittime, deathtime, deathtime-admittime AS los
  FROM admissions
  WHERE deathtime IS NOT NULL
)
SELECT subject_id, los
FROM dates

Exercise

Preparation

In the workshop, we use the MIMIC and eICU databases located on Google Cloud Platform, and connect to them through the R package bigrquery. For demonstration, we just extract the data from demo dataset (mimiciii_demo and eicu_crd_demo). Please modify the path of dataset in R and SQL scripts into the full dataset (mimiciii_clinical and eicu_crd) for your projects.

# Establishing GCP connection
library(bigrquery)

project_id <- "hst-953-2018"
options(httr_oauth_cache=FALSE)

run_query <- function(query){
  data <- query_exec(query, project=project_id, use_legacy_sql=FALSE)
  return(data)
}

# test and get auth code
run_query(paste0("SELECT count(1) FROM `physionet-data.mimiciii_demo`.`patients`"))

To make our files neat and organized, we put all SQL scripts under the folder ./sql/. We will use the following get_sql function to read the SQL code in the file, and pass it to the execution function run_query later.

get_sql <- function(f){
  con = file(f, "r")
  sql.string <- ""
  while (TRUE) {
    line <- readLines(con, n=1)
    if (length(line) == 0) {
      break
    }
    line <- gsub("\\t", " ", line)
    if (grepl("--", line) == TRUE) {
      line <- paste(sub("--", "/*", line), "*/")
    }
    sql.string <- paste(sql.string, line)
  }
  close(con)
  return(sql.string)
}

If you plan to use Python instead of R, the following chunk can help you do the same task as the above R code. You can also try colab version.

import os
import pandas as pd

from google.colab import auth
from google.cloud import bigquery

auth.authenticate_user()

project_id = 'hst-953-2018'
os.environ["GOOGLE_CLOUD_PROJECT"] = project_id

def run_query(query):
  return pd.io.gbq.read_gbq(query, 
                            project_id=project_id, 
                            verbose=False, 
                            configuration={'query':{'useLegacySql': False}})

def get_sql(f):
  file = open(f, 'r')
  s = " ".join(file.readlines())
  return s

df = run_query("SELECT * FROM `physionet-data.mimiciii_demo`.`patients` limit 10")
df

q = get_sql('./sql/bq-eicu-med.sql')
df = run_query(q)
df

In the following sections, we provide some MIMIC and eICU SQL codes related to our topics, intake/output and medication, which are adopted from MIT-LCP repository but modified for the databases on Google Cloud. For people who want to run the codes on local postgreSQL server, please go back to the GitHub repo and use the original SQL. (This is because that the Google BigQuery and PostgreSQL still have some differences in syntax.)

We thank Dr. Alistair Johnson, Dr. Tom Pollar, and other MIT LCP members for their effort constructing all these code repositories and sharing their queries to us.

Intake/Output

MIMIC-III

  1. Using outputevents table. The first example is to sum up the urine output using the outputevents table. There are two scripts doing the same task with slightly different code writing styles.
q <- get_sql("sql/bq-mimic-uo.sql")
res <- run_query(q)
head(res)

q <- get_sql("sql/bq-mimic-pivoted-uo.sql")
res2 <- run_query(q)
head(res2)
  1. Using chartevents table. The second example for MIMIC-III is to identify the duration of continuous renal replacement therapy (CRRT).
  • concepts/durations/crrt-durations.sql
  • You can use the itemid here to identify the items for hemodialysis. It summarizes all the item ids in chartevents, CareVue and MetaVision versions of coding.
q <- get_sql("sql/bq-mimic-crrt-durations.sql")
res <- run_query(q)
head(res)
  1. Using inputevents_mv and inputevents_cv tables. This example is to identify how many crystalloid bolus was given for each patient.
  • You can use the itemid here to identify the items for crystalloid fluid. Crystalloid fluid is the main component of measuring fluid input.
q <- get_sql("sql/bq-mimic-crystalloid-bolus.sql")
res <- run_query(q)
head(res)

eICU

  1. Using intakeoutput table. The goal of this query is to identify the daily fluid balance (output - input) for each ICU patient.
  • In intakeoutput table, we can simply identify intake/output items using string matching in cellpath column.
  • We thank Dr. Matthieu Komorowski providing the script
q <- get_sql("sql/bq-eicu-io.sql")
res <- run_query(q)
head(res)

Medication

MIMIC-III

  1. Using prescriptions table. In this example, we try to identify whether the antibiotics is used on each patient during their ICU stay (binary). We need to use string matching to identify if your target medications show up in the drug_name_generic column of prescriptions table.
  • You can use the medication list here for antibiotics use
  • You need to have clinicians (physicians or pharmacists) to help you ensure what medications are needed for your study. e.g. %lol% for beta-blockers
q <- get_sql("sql/bq-mimic-med.sql")
res <- run_query(q)
head(res)
  1. Using inputevents_mv and inputevents_cv tables. Please check the concepts/durations/ sql codes for computing dose and duration of infusion drugs. The way to modify the code is similar to the CRRT example above.

eICU

  1. Using medication table. This is almost the same as the query for MIMIC prescriptions table, since they both need to do string matching for identifying medications.
q <- get_sql("sql/bq-eicu-med.sql")
res <- run_query(q)
head(res)
  1. Using medication table but also using HICL codes. drughiclseqno column is considered in this case. In this example, we want to see whether the patient has received some specific infusion medications. It is still similar to the first case, but in slightly different coding style.
q <- get_sql("sql/bq-eicu-pivoted-med.sql")
res <- run_query(q)
head(res)
  1. Using infusiondrug table. This example is to identify the usage of infusion drugs (dopamine, dobutamine, norepinephrine, phenylephrine, vasopressin, milrinone, heparin) for each patient.
  • concepts/pivoted/pivoted-infusion.sql
  • You can use the identified strings of seven infusion drugs in the script as your infusion drug list.
  • You may compare the result of this table and medication table. For the formal analysis, you need to consider the union of the results from two tables.
q <- get_sql("sql/bq-eicu-pivoted-infusion.sql")
res <- run_query(q)
head(res)

About

HST.953 Workshop: Using MIMIC/eICU I/O and Medication Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages