From 4729bfb3ce54285244d402e2ebed30ad9b9ee705 Mon Sep 17 00:00:00 2001 From: Alistair Johnson Date: Wed, 21 Jun 2017 20:36:38 -0400 Subject: [PATCH] added notebook on crrt/how to extract data --- notebooks/crrt-notebook.ipynb | 2073 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 2073 insertions(+) create mode 100644 notebooks/crrt-notebook.ipynb diff --git a/notebooks/crrt-notebook.ipynb b/notebooks/crrt-notebook.ipynb new file mode 100644 index 0000000..9d4b0f8 --- /dev/null +++ b/notebooks/crrt-notebook.ipynb @@ -0,0 +1,2073 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Continuous renal replacement therapy (CRRT)\n", + "\n", + "This notebook overviews the process of defining CRRT: a treatment used to dialyse or filter a patient's blood continuously. Key to CRRT is its lower speed compared to conventional dialysis: avoidance of rapid solute/fluid loss is suspected to be the main reason why CRRT tends to be tolerated better than intermittent hemodialysis.\n", + "\n", + "The primary aim of this notebook is to define the start and end times of CRRT for patients in the MIMIC-III database v1.4.\n", + "\n", + "A secondary aim of this notebook is to provide insight into how to extract clinical concepts from the MIMIC-III database.\n", + "\n", + "Many thanks to Sharon O'Donoghue for her invaluable advice in the creation of this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Outline\n", + "\n", + "The main steps in defining a clinical concept in MIMIC-III are as follows:\n", + "\n", + "1. Identification of key terms and phrases which describe the concept\n", + "2. Search for these terms in D_ITEMS (or D_LABITEMS if searching for a laboratory measurement)\n", + "3. Extraction of the data from tables specified in the `LINKSTO` column of D_ITEMS\n", + "4. Definition of the concept using rules applied to the data extracted\n", + "5. Validation of the concepts by individual inspection and aggregate statistics\n", + "\n", + "This process is iterative and not as clear cut as the above - validation may lead you to redefine data extraction, and so on. Furthermore, in the case of MIMIC-III v1.4, this process must be repeated twice: once for Metavision, once for CareVue.\n", + "\n", + "## MetaVision vs. CareVue\n", + "\n", + "One issue in MIMIC-III is that it is a combination of two ICU database systems. As a result, concepts are split among different `ITEMID` values. For example, a patient's heart rate is a relatively simple concept to extract, however, if we look in the D_ITEMS table for labels matching 'heart rate', we find at least two `ITEMID`:\n", + "\n", + "itemid | label | abbreviation | dbsource | linksto\n", + "--------|-------------------------|-----------------|------------|-------------\n", + "211 | Heart Rate | | carevue | chartevents\n", + "220045 | Heart Rate | HR | metavision | chartevents\n", + "\n", + "Both these `ITEMID` values capture heart rate - but one is used for the CareVue database system (`dbsource = 'carevue'`) and one is used for the MetaVision database system (`dbsource = 'metavision'`). The data extraction step must be repeated twice: once for `dbsource = 'carevue'` and once for `dbsource = 'metavision'`. In general, it is recommended to extract data from MetaVision first, as the data is better structured and provides useful information for what data elements to include. For example, `ITEMID` values in MetaVision have abbrevations with each label - these abbreviations can then be used to search for data elements in CareVue." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Step 0: import libraries, connect to the database" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "# Import libraries\n", + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "import psycopg2\n", + "from IPython.display import display, HTML # used to print out pretty pandas dataframes\n", + "import matplotlib.dates as dates\n", + "import matplotlib.lines as mlines\n", + "\n", + "%matplotlib inline\n", + "plt.style.use('ggplot') \n", + "\n", + "# specify user/password/where the database is\n", + "sqluser = 'postgres'\n", + "sqlpass = 'postgres'\n", + "dbname = 'mimic'\n", + "schema_name = 'mimiciii'\n", + "host = 'localhost'\n", + "\n", + "query_schema = 'SET search_path to ' + schema_name + ';'\n", + "\n", + "# connect to the database\n", + "con = psycopg2.connect(dbname=dbname, user=sqluser, password=sqlpass, host=host)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Step 1: Identification of key terms\n", + "\n", + "We are interested in continuous renal replacement therapy (CRRT). First, we look for 'CRRT' in the database, isolating ourselves to metavision data:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "\n", + "\n", + "query = query_schema + \"\"\"\n", + "select itemid, label, category, linksto\n", + "from d_items\n", + "where dbsource = 'metavision'\n", + "and lower(label) like '%crrt%'\n", + "\"\"\"\n", + "df = pd.read_sql_query(query,con)\n", + "cur.close()\n", + "\n", + "df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above gives us some hints to expand our initial search:\n", + "\n", + "* `category = 'Dialysis'`\n", + "* `lower(label) like '%dialysis%'`\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Step 2: Extraction of `ITEMID`s from tables\n", + "\n", + "## Get list of `itemid` related to CRRT" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "cur = con.cursor()\n", + "cur.execute('SET search_path to ' + schema_name)\n", + "\n", + "\n", + "query = \"\"\"\n", + "select itemid, label, category, linksto\n", + "from d_items di\n", + "where dbsource = 'metavision'\n", + "and (lower(label) like '%dialy%'\n", + "or category = 'Dialysis'\n", + "or lower(label) like '%crrt%'\n", + ")\n", + "order by linksto, category, label\n", + "\"\"\"\n", + "df = pd.read_sql_query(query,con)\n", + "cur.close()\n", + "\n", + "HTML(df.head().to_html().replace('NaN', ''))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Manually label above `itemid`\n", + "\n", + "The above is a list of all the potential data elements which could be used to define CRRT. The next step is to identify the specific elements which can be used to define start/stop time. This process requires clinical expertise in the area.\n", + "\n", + "The following tables are a result of reviewing all `ITEMID` labels and flagging them as \"consider for further review\" or \"not relevant\".\n", + "\n", + "\n", + "### Links to CHARTEVENTS\n", + "\n", + "itemid | label | category | linksto | Included/comment\n", + "--- | --- | --- | --- | ---\n", + "225740 | Dialysis Catheter Discontinued | Access Lines - Invasive | chartevents | No - access line\n", + "227357 | Dialysis Catheter Dressing Occlusive | Access Lines - Invasive | chartevents | No - access line\n", + "225776 | Dialysis Catheter Dressing Type | Access Lines - Invasive | chartevents | No - access line\n", + "226118 | Dialysis Catheter placed in outside facility | Access Lines - Invasive | chartevents | No - access line\n", + "227753 | Dialysis Catheter Placement Confirmed by X-ray | Access Lines - Invasive | chartevents | No - access line\n", + "225323 | Dialysis Catheter Site Appear | Access Lines - Invasive | chartevents | No - access line\n", + "225725 | Dialysis Catheter Tip Cultured | Access Lines - Invasive | chartevents | No - access line\n", + "227124 | Dialysis Catheter Type | Access Lines - Invasive | chartevents | No - access line\n", + "225126 | Dialysis patient | Adm History/FHPA | chartevents | No - admission information\n", + "224149 | Access Pressure | Dialysis | chartevents | Yes - CRRT setting\n", + "224404 | ART Lumen Volume | Dialysis | chartevents | Yes - CRRT setting\n", + "224144 | Blood Flow (ml/min) | Dialysis | chartevents | Yes - CRRT setting\n", + "228004 | Citrate (ACD-A) | Dialysis | chartevents | Yes - CRRT setting\n", + "227290 | CRRT mode | Dialysis | chartevents | Yes - CRRT setting\n", + "225183 | Current Goal | Dialysis | chartevents | Yes - CRRT setting\n", + "225977 | Dialysate Fluid | Dialysis | chartevents | Yes - CRRT setting\n", + "224154 | Dialysate Rate | Dialysis | chartevents | Yes - CRRT setting\n", + "224135 | Dialysis Access Site | Dialysis | chartevents | No - access line\n", + "225954 | Dialysis Access Type | Dialysis | chartevents | No - access line\n", + "224139 | Dialysis Site Appearance | Dialysis | chartevents | No - access line\n", + "225810 | Dwell Time (Peritoneal Dialysis) | Dialysis | chartevents | No - peritoneal dialysis\n", + "224151 | Effluent Pressure | Dialysis | chartevents | Yes - CRRT setting\n", + "224150 | Filter Pressure | Dialysis | chartevents | Yes - CRRT setting\n", + "226499 | Hemodialysis Output | Dialysis | chartevents | No - hemodialysis\n", + "225958 | Heparin Concentration (units/mL) | Dialysis | chartevents | Yes - CRRT setting\n", + "224145 | Heparin Dose (per hour) | Dialysis | chartevents | Yes - CRRT setting\n", + "224191 | Hourly Patient Fluid Removal | Dialysis | chartevents | Yes - CRRT setting\n", + "225952 | Medication Added #1 (Peritoneal Dialysis) | Dialysis | chartevents | No - peritoneal dialysis\n", + "227638 | Medication Added #2 (Peritoneal Dialysis) | Dialysis | chartevents | No - peritoneal dialysis\n", + "225959 | Medication Added Amount #1 (Peritoneal Dialysis) | Dialysis | chartevents | No - peritoneal dialysis\n", + "227639 | Medication Added Amount #2 (Peritoneal Dialysis) | Dialysis | chartevents | No - peritoneal dialysis\n", + "225961 | Medication Added Units #1 (Peritoneal Dialysis) | Dialysis | chartevents | No - peritoneal dialysis\n", + "227640 | Medication Added Units #2 (Peritoneal Dialysis) | Dialysis | chartevents | No - peritoneal dialysis\n", + "228005 | PBP (Prefilter) Replacement Rate | Dialysis | chartevents | Yes - CRRT setting\n", + "225965 | Peritoneal Dialysis Catheter Status | Dialysis | chartevents | No - peritoneal dialysis\n", + "225963 | Peritoneal Dialysis Catheter Type | Dialysis | chartevents | No - peritoneal dialysis\n", + "225951 | Peritoneal Dialysis Fluid Appearance | Dialysis | chartevents | No - peritoneal dialysis\n", + "228006 | Post Filter Replacement Rate | Dialysis | chartevents | Yes - CRRT setting\n", + "225956 | Reason for CRRT Filter Change | Dialysis | chartevents | Yes - CRRT setting\n", + "225976 | Replacement Fluid | Dialysis | chartevents | Yes - CRRT setting\n", + "224153 | Replacement Rate | Dialysis | chartevents | Yes - CRRT setting\n", + "224152 | Return Pressure | Dialysis | chartevents | Yes - CRRT setting\n", + "225953 | Solution (Peritoneal Dialysis) | Dialysis | chartevents | No - peritoneal dialysis\n", + "224146 | System Integrity | Dialysis | chartevents | Yes - CRRT setting\n", + "226457 | Ultrafiltrate Output | Dialysis | chartevents | Yes - CRRT setting\n", + "224406 | VEN Lumen Volume | Dialysis | chartevents | Yes - CRRT setting\n", + "225806 | Volume In (PD) | Dialysis | chartevents | No - peritoneal dialysis\n", + "227438 | Volume not removed | Dialysis | chartevents | No - peritoneal dialysis\n", + "225807 | Volume Out (PD) | Dialysis | chartevents | No - peritoneal dialysis\n", + "\n", + "### Links to DATETIMEEVENTS\n", + "\n", + "itemid | label | category | linksto | Included/comment\n", + "--- | --- | --- | --- | ---\n", + "225318 | Dialysis Catheter Cap Change | Access Lines - Invasive | datetimeevents | No - access lines\n", + "225319 | Dialysis Catheter Change over Wire Date | Access Lines - Invasive | datetimeevents | No - access lines\n", + "225321 | Dialysis Catheter Dressing Change | Access Lines - Invasive | datetimeevents | No - access lines\n", + "225322 | Dialysis Catheter Insertion Date | Access Lines - Invasive | datetimeevents | No - access lines\n", + "225324 | Dialysis CatheterTubing Change | Access Lines - Invasive | datetimeevents | No - access lines\n", + "225128 | Last dialysis | Adm History/FHPA | datetimeevents | No - admission information\n", + "\n", + "### Links to INPUTEVENTS_MV\n", + "\n", + "itemid | label | category | linksto | Included/comment\n", + "--- | --- | --- | --- | ---\n", + "227525 | Calcium Gluconate (CRRT) | Medications | inputevents_mv | Yes - CRRT setting\n", + "227536 | KCl (CRRT) | Medications | inputevents_mv | Yes - CRRT setting\n", + "\n", + "### Links to PROCEDUREEVENTS_MV\n", + "\n", + "itemid | label | category | linksto | Included/comment\n", + "--- | --- | --- | --- | ---\n", + "225441 | Hemodialysis | 4-Procedures | procedureevents_mv | No - hemodialysis\n", + "224270 | Dialysis Catheter | Access Lines - Invasive | procedureevents_mv | No - access lines\n", + "225436 | CRRT Filter Change | Dialysis | procedureevents_mv | Yes - CRRT setting\n", + "225802 | Dialysis - CRRT | Dialysis | procedureevents_mv | Yes - CRRT setting\n", + "225803 | Dialysis - CVVHD | Dialysis | procedureevents_mv | Yes - CRRT setting\n", + "225809 | Dialysis - CVVHDF | Dialysis | procedureevents_mv | Yes - CRRT setting\n", + "225955 | Dialysis - SCUF | Dialysis | procedureevents_mv | Yes - CRRT setting\n", + "225805 | Peritoneal Dialysis | Dialysis | procedureevents_mv | No - peritoneal dialysis\n", + "\n", + "## Reasons for inclusion/exclusion\n", + "\n", + "* CRRT Setting - yes (included) - these settings are only documented when a patient is receiving CRRT.\n", + "* Access lines- no (excluded) - these ITEMIDs are not included as the presence of an access line does *not* guarantee that CRRT is being delivered. While having an access line is a requirement of performing CRRT, these lines are present even when a patient is not actively being hemodialysed.\n", + "* Peritoneal dialysis - no (excluded) - Peritoneal dialysis is a different form of dialysis, and is not CRRT\n", + "* Hemodialysis - no (excluded) - Similar as above, hemodialysis is a different form of dialysis and is not CRRT" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Define rules based upon ITEMIDs\n", + "\n", + "Above, we acquired a list of `itemid` which we determined to be related to administration of CRRT. The next step is to determine *how* these `itemid` relate to CRRT: do they indicate it is started, stopped, continuing, or something else.\n", + "\n", + "We will evaluate `itemid` from three tables, in turn: CHARTEVENTS, INPUTEVENTS_MV, and PROCEDUREEVENTS_MV. Note that the \\_MV subscript indicates that the table only has data from MetaVision (half the patients), while \\_CV indicates the table only has data from CareVue (the other half of patients). Note that after we extract data from MetaVision patients, we will repeat this exercise for CareVue patients.\n", + "\n", + "### table 1 of 3: `itemid` from CHARTEVENTS\n", + "\n", + "These are the included CRRT settings in CHARTEVENTS:\n", + "\n", + "itemid | label | param_type \n", + "--------|----------------------------------|------------\n", + "224144 | Blood Flow (ml/min) | Numeric\n", + "224145 | Heparin Dose (per hour) | Numeric\n", + "224146 | System Integrity | Text\n", + "224149 | Access Pressure | Numeric\n", + "224150 | Filter Pressure | Numeric\n", + "224151 | Effluent Pressure | Numeric\n", + "224152 | Return Pressure | Numeric\n", + "224153 | Replacement Rate | Numeric\n", + "224154 | Dialysate Rate | Numeric\n", + "224191 | Hourly Patient Fluid Removal | Numeric\n", + "224404 | ART Lumen Volume | Numeric\n", + "224406 | VEN Lumen Volume | Numeric\n", + "225183 | Current Goal | Numeric\n", + "225956 | Reason for CRRT Filter Change | Text\n", + "225958 | Heparin Concentration (units/mL) | Text\n", + "225976 | Replacement Fluid | Text\n", + "225977 | Dialysate Fluid | Text\n", + "226457 | Ultrafiltrate Output | Numeric\n", + "227290 | CRRT mode | Text\n", + "228004 | Citrate (ACD-A) | Numeric\n", + "228005 | PBP (Prefilter) Replacement Rate | Numeric\n", + "228006 | Post Filter Replacement Rate | Numeric\n", + "\n", + "First, we examine the numeric fields. These fields are the core CRRT settings which, according to clinical advice, should be documented hourly for patients actively on CRRT:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "cur = con.cursor()\n", + "cur.execute('SET search_path to ' + schema_name)\n", + "\n", + "\n", + "query = \"\"\"\n", + "select\n", + " ce.icustay_id, di.label, ce.charttime\n", + " , ce.value\n", + " , ce.valueuom\n", + "from chartevents ce\n", + "inner join d_items di\n", + "on ce.itemid = di.itemid\n", + "where ce.icustay_id = 246866\n", + "and ce.itemid in\n", + "(\n", + " 224404, -- | ART Lumen Volume\n", + " 224406, -- | VEN Lumen Volume\n", + " 228004, -- | Citrate (ACD-A)\n", + " 224145, -- | Heparin Dose (per hour)\n", + " 225183, -- | Current Goal\n", + " 224149, -- | Access Pressure\n", + " 224144, -- | Blood Flow (ml/min)\n", + " 224154, -- | Dialysate Rate\n", + " 224151, -- | Effluent Pressure\n", + " 224150, -- | Filter Pressure\n", + " 224191, -- | Hourly Patient Fluid Removal\n", + " 228005, -- | PBP (Prefilter) Replacement Rate\n", + " 228006, -- | Post Filter Replacement Rate\n", + " 224153, -- | Replacement Rate\n", + " 224152, -- | Return Pressure\n", + " 226457 -- | Ultrafiltrate Output\n", + ")\n", + "order by ce.icustay_id, ce.charttime, di.label;\n", + "\"\"\"\n", + "df = pd.read_sql_query(query,con)\n", + "cur.close()\n", + "\n", + "HTML(df.head().to_html().replace('NaN', ''))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Above we can see that `ART Lumen Volume and VEN Lumen Volume` are documented at a drastically different time than the other settings. Upon discussion with a clinical expert, they confirmed that this is expected, as these volumes indicate settings to keep open the line and are not directly relevant to the administration of CRRT - at best they are superfluous and at worst they can mislead the start/stop times. As a result `ART Lumen Volume` and `VEN Lumen Volume` are excluded. This leaves us with the final set of `ITEMID`s:\n", + "\n", + "```sql\n", + "224149, -- Access Pressure\n", + "224144, -- Blood Flow (ml/min)\n", + "228004, -- Citrate (ACD-A)\n", + "225183, -- Current Goal\n", + "224154, -- Dialysate Rate\n", + "224151, -- Effluent Pressure\n", + "224150, -- Filter Pressure\n", + "224145, -- Heparin Dose (per hour)\n", + "224191, -- Hourly Patient Fluid Removal\n", + "228005, -- PBP (Prefilter) Replacement Rate\n", + "228006, -- Post Filter Replacement Rate\n", + "224153, -- Replacement Rate\n", + "224152, -- Return Pressure\n", + "226457 -- Ultrafiltrate Output\n", + "```\n", + "\n", + "The next step is to examine the remaining text based `ITEMID`:\n", + "\n", + "\n", + "\n", + "itemid | label | param_type \n", + "--------|----------------------------------|------------\n", + "224146 | System Integrity | Text\n", + "225956 | Reason for CRRT Filter Change | Text\n", + "225958 | Heparin Concentration (units/mL) | Text\n", + "225976 | Replacement Fluid | Text\n", + "225977 | Dialysate Fluid | Text\n", + "227290 | CRRT mode | Text\n", + "\n", + "We define a helper function which prints out the number of observations for a given `itemid`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "def print_itemid_info(con, itemid):\n", + " # get name of itemid\n", + " query = 'SET search_path to ' + schema_name + \"\"\";\n", + " select label\n", + " from d_items\n", + " where itemid = \"\"\" + str(itemid)\n", + " \n", + " df = pd.read_sql_query(query,con)\n", + " print('Values for {} - {}...'.format(itemid, df['label'][0]))\n", + " \n", + " \n", + " query = 'SET search_path to ' + schema_name + \"\"\";\n", + " select value\n", + " , count(distinct icustay_id) as number_of_patients\n", + " , count(icustay_id) as number_of_observations\n", + " from chartevents\n", + " where itemid = \"\"\" + str(itemid) + \"\"\" \n", + " group by value\n", + " order by value\n", + " \"\"\"\n", + " df = pd.read_sql_query(query,con)\n", + " display(HTML(df.to_html().replace('NaN', '')))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 224146 - System Integrity" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "print_itemid_info(con, 224146)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In discussion with a clinical expert, each of these settings indicate different stages of the CRRT treatment. We can simplify them into three modes: started, stopped, or active. Since active implies that the CRRT is running, the first active event could also be a start time, therefore we call it \"active/started\". Here we list the manually curated mapping:\n", + "\n", + "value | count | interpretation\n", + "--- | --- | ---\n", + "Active | 539 | CRRT active/started\n", + "Clots Increasing | 245 | CRRT active/started\n", + "Clots Present | 427 | CRRT active/started\n", + "Clotted | 233 | CRRT **stopped**\n", + "Discontinued | 339 | CRRT **stopped**\n", + "Line pressure inconsistent | 127 | CRRT active/started\n", + "New Filter | 357 | CRRT **started**\n", + "No Clot Present | 275 | CRRT active/started\n", + "Recirculating | 172 | CRRT **stopped**\n", + "Reinitiated | 336 | CRRT **started**\n", + "\n", + "Later on we will code special rules to incorporate this `itemid`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 225956 - Reason for CRRT Filter Change" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "print_itemid_info(con, 225956)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above is a **stop** time as the filter needed to be changed at this time. Any subsequent CRRT would be a restart of CRRT - and not a continuation of an ongoing CRRT session." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 225958 - Heparin Concentration (units/mL)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "print_itemid_info(con, 225958)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above is a normal setting and can be combined with the numeric fields." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 225976 - Replacement Fluid" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "print_itemid_info(con, 225976)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above is a normal setting and can be combined with the numeric fields." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 225977 - Dialysate Fluid" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "print_itemid_info(con, 225977)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above is a normal setting and can be combined with the numeric fields." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 227290 - CRRT mode" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "print_itemid_info(con, 227290)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "While all of this looks good, it's feasible that the documentation of the CRRT mode is not done directly concurrent to the actual administration of CRRT. We thus investigate whether CRRT mode is available for all patients with a CRRT setting." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "# Examining CRRT mode\n", + "query = query_schema + \"\"\"\n", + "with t1 as \n", + "(\n", + "select icustay_id,\n", + " max(case when itemid = 227290 then 1 else 0 end) as HasMode\n", + "from chartevents ce\n", + "where itemid in\n", + "(\n", + " 227290, -- CRRT mode\n", + " 228004, -- Citrate (ACD-A)\n", + " 225958, -- Heparin Concentration (units/mL)\n", + " 224145, -- Heparin Dose (per hour)\n", + " 225183, -- Current Goal -- always there\n", + " 224149, -- Access Pressure\n", + " 224144, -- Blood Flow (ml/min)\n", + " 225977, -- Dialysate Fluid\n", + " 224154, -- Dialysate Rate\n", + " 224151, -- Effluent Pressure\n", + " 224150, -- Filter Pressure\n", + " 224191, -- Hourly Patient Fluid Removal\n", + " 228005, -- PBP (Prefilter) Replacement Rate\n", + " 228006, -- Post Filter Replacement Rate\n", + " 225976, -- Replacement Fluid\n", + " 224153, -- Replacement Rate\n", + " 224152, -- Return Pressure\n", + " 226457 -- Ultrafiltrate Output\n", + ")\n", + "group by icustay_id\n", + ")\n", + "select count(icustay_id) as Num_ICUSTAY_ID\n", + ", sum(hasmode) as Num_With_Mode\n", + "from t1\n", + "\"\"\"\n", + "df = pd.read_sql_query(query,con)\n", + "cur.close()\n", + "\n", + "HTML(df.to_html().replace('NaN', ''))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can take this analysis a bit further and ask: is CRRT mode is present when *none* of the other settings are present?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "query = 'SET search_path to ' + schema_name + \"\"\";\n", + "with t1 as \n", + "(\n", + "select icustay_id, charttime\n", + " , max(case when itemid = 227290 then 1 else 0 end) as HasCRRTMode\n", + " , max(case when itemid != 227290 then 1 else 0 end) as OtherITEMID\n", + "from chartevents ce\n", + "where itemid in\n", + "(\n", + " 227290, -- CRRT mode\n", + " 228004, -- Citrate (ACD-A)\n", + " 225958, -- Heparin Concentration (units/mL)\n", + " 224145, -- Heparin Dose (per hour)\n", + " 225183, -- Current Goal -- always there\n", + " 224149, -- Access Pressure\n", + " 224144, -- Blood Flow (ml/min)\n", + " 225977, -- Dialysate Fluid\n", + " 224154, -- Dialysate Rate\n", + " 224151, -- Effluent Pressure\n", + " 224150, -- Filter Pressure\n", + " 224191, -- Hourly Patient Fluid Removal\n", + " 228005, -- PBP (Prefilter) Replacement Rate\n", + " 228006, -- Post Filter Replacement Rate\n", + " 225976, -- Replacement Fluid\n", + " 224153, -- Replacement Rate\n", + " 224152, -- Return Pressure\n", + " 226457 -- Ultrafiltrate Output\n", + ")\n", + "group by icustay_id, charttime\n", + ")\n", + "select count(icustay_id) as NumObs\n", + ", sum(case when HasCRRTMode = 1 and OtherITEMID = 1 then 1 else 0 end) as Both\n", + ", sum(case when HasCRRTMode = 1 and OtherITEMID = 0 then 1 else 0 end) as OnlyCRRTMode\n", + ", sum(case when HasCRRTMode = 0 and OtherITEMID = 1 then 1 else 0 end) as NoCRRTMode\n", + "from t1\n", + "\"\"\"\n", + "df = pd.read_sql_query(query,con)\n", + "\n", + "HTML(df.to_html().replace('NaN', ''))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As CRRT mode is relatively redundant, doesn't necessarily indicate CRRT is being actively performed, and documentation for it is not 100% compliant, we exclude it from the list of `ITEMID`.\n", + "\n", + "### CHARTEVENTS wrap up\n", + "\n", + "The following is the final set of `ITEMID` from CHARTEVENTS which indicate CRRT is started/ongoing:\n", + "\n", + "```sql\n", + "224149, -- Access Pressure\n", + "224144, -- Blood Flow (ml/min)\n", + "228004, -- Citrate (ACD-A)\n", + "225183, -- Current Goal\n", + "225977, -- Dialysate Fluid\n", + "224154, -- Dialysate Rate\n", + "224151, -- Effluent Pressure\n", + "224150, -- Filter Pressure\n", + "225958, -- Heparin Concentration (units/mL)\n", + "224145, -- Heparin Dose (per hour)\n", + "224191, -- Hourly Patient Fluid Removal\n", + "228005, -- PBP (Prefilter) Replacement Rate\n", + "228006, -- Post Filter Replacement Rate\n", + "225976, -- Replacement Fluid\n", + "224153, -- Replacement Rate\n", + "224152, -- Return Pressure\n", + "226457 -- Ultrafiltrate Output\n", + "```\n", + "\n", + "The following `ITEMID` are the final set which indicate CRRT is started/stopped/ongoing (i.e. require special rules):\n", + "\n", + "```sql\n", + "224146, -- System Integrity\n", + "225956 -- Reason for CRRT Filter Change\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### table 2 of 3: INPUTEVENTS_MV\n", + "\n", + "The following is the final set of ITEMID from INPUTEVENTS_MV:\n", + "\n", + "```sql\n", + "227525,-- Calcium Gluconate (CRRT)\n", + "227536 -- KCl (CRRT)\n", + "```\n", + "\n", + "No special examination is required for these fields - they are guaranteed to be CRRT (as verified by a clinician) - we can use these to indicate that CRRT is active/started." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### table 3 of 3: PROCEDUREEVENTS_MV" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following are the set of `itemid` from above related to PROCEDUREEVENTS_MV:\n", + "\n", + "itemid | label\n", + "--- | ---\n", + "225436 | CRRT Filter Change\n", + "225802 | Dialysis - CRRT\n", + "225803 | Dialysis - CVVHD\n", + "225809 | Dialysis - CVVHDF\n", + "225955 | Dialysis - SCUF\n", + "\n", + "The only contentious `ITEMID` is 225436 (CRRT Filter Change). This `ITEMID` indicates a break from CRRT, and it reinitiates at the end of this change. While in principle this could be used as an end time, documentation on it is not 100%, and as recommended by staff it's easier to ignore this and use the filter change field from CHARTEVENTS to define the end of CRRT events.\n", + "\n", + "The final set of `ITEMID` used for CRRT are:\n", + "\n", + "```sql\n", + "225802, -- Dialysis - CRRT\n", + "225803, -- Dialysis - CVVHD\n", + "225809, -- Dialysis - CVVHDF\n", + "225955 -- Dialysis - SCUF\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Step 4: definition of concept using rules\n", + "\n", + "Let's review the goal of this notebook. We would like to define the duration of CRRT for each patient. Concretely, this means we must define, for each `ICUSTAY_ID`:\n", + "\n", + "* a `STARTTIME`\n", + "* an `ENDTIME`\n", + "\n", + "As CRRT can be started/stopped throughout a patient's stay, there may be multiple `STARTTIME` and `ENDTIME` for a single `ICUSTAY_ID` - but they should not overlap.\n", + "\n", + "Recall that CHARTEVENTS stores data at charted times (`CHARTTIME`), and as a result the settings are stored at a single point in time. For CHARTEVENTS, the main task thus becomes converting a series of `CHARTTIME` into pairs of `STARTTIME` and `ENDTIME`. Intuitively this can be done by looking for consecutive settings each hour, and combining these into a single CRRT event. The first observed `CHARTTIME` becomes the `STARTTIME`, and the last observed `CHARTTIME` becomes the `ENDTIME`. However, CHARTEVENTS is not the only source of data. To improve the accuracy of our calculation, we also include data from INPUTEVENTS_MV and PROCEDUREEVENTS_MV. For INPUTEVENTS_MV, this does not complicate things too much. Each observation in INPUTEVENTS_MV is also stored at a single `CHARTTIME`, and so we simply need to combine this table with CHARTEVENTS before proceeding (likely by using the SQL `UNION` command).\n", + "\n", + "PROCEDUREEVENTS_MV is more complicated as it actually stores data with a `STARTTIME` and an `ENDTIME` column already. We need to merge the extracted data from CHARTEVENTS/INPUTEVENTS_MV with this already nicely formatted data from PROCEDUREEVENTS_MV.\n", + "\n", + "With the task laid out, let's get started. We will:\n", + "\n", + "1. Aggregate INPUTEVENTS_MV into durations\n", + "2. Convert CHARTEVENTS into durations\n", + "2. Compare these durations with PROCEDUREVENTS_MV and decide on a rule for merging the two\n", + "3. Merge PROCEDUREEVENTS_MV with INPUTEVENTS_MV/CHARTEVENTS for a final durations table for Metavision" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# define the example ICUSTAY_ID for the below code\n", + "# originally, this was 246866 - if changed, the interpretation provided will no longer make sense\n", + "query_where_clause = \"and icustay_id = 246866\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To make sure we don't display data we don't have to, we define a function which: (i) doesn't display icustay_id, and (ii) simplifies the date by removing the month/year." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "ie['starttime'].dtype" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "def display_df(df):\n", + " col = [x for x in df.columns if x != 'icustay_id']\n", + " df_tmp = df[col].copy()\n", + " for c in df_tmp.columns:\n", + " if '[ns]' in str(df_tmp[c].dtype):\n", + " df_tmp[c] = df_tmp[c].dt.strftime('Day %d, %H:%M')\n", + " \n", + " display(HTML(df_tmp.to_html().replace('NaN', '')))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Aggregating INPUTEVENTS_MV\n", + "\n", + "First, let's look at INPUTEVENTS_MV. Each entry is stored with a `starttime` and an `endtime`. Note we have to exclude `statusdescription = 'Rewritten'` as these are undelivered medications which have been rewritten (useful for auditing purposes but does not give you information about drugs delivered to the patient)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "print(\"Durations from INPUTEVENTS for one patient with KCl...\")\n", + "query = 'SET search_path to ' + schema_name + \"\"\";\n", + "select \n", + " linkorderid\n", + " , orderid\n", + " , case when itemid = 227525 then 'Calcium' else 'KCl' end as label\n", + " , starttime, endtime\n", + " , rate, rateuom\n", + " , statusdescription\n", + "from inputevents_mv\n", + "where itemid in\n", + "(\n", + " --227525,-- Calcium Gluconate (CRRT)\n", + " 227536 -- KCl (CRRT)\n", + ")\n", + "and statusdescription != 'Rewritten'\n", + "\"\"\" + query_where_clause + \"\"\"\n", + "order by starttime, endtime\n", + "\"\"\"\n", + "ie = pd.read_sql_query(query,con)\n", + "\n", + "display_df(ie)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Normally `linkorderid` links together administrations which are consecutive but may have changes in rate, but from the above we can note that `linkorderid` seems to rarely group entries. Rows 8-10 and 16-18 are grouped (i.e. they are sequential administrations where the rate may or may not have changed), but many aren't *even though* they occur sequentially. We'd like to merge together sequential events to simplify the durations - and it appears we can greatly simplify this data by merging two rows if `endtime(row-1) == starttime(row)`.\n", + "\n", + "We can do this in three steps:\n", + "\n", + "1. Create a binary flag that indicates when new \"events\" occur, where an \"event\" is defined as a continuous segment of administration, i.e. the binary flag is 1 if the row does not immediately follow the previous row, and 0 if the row does immediately follow the previous row\n", + "2. Aggregate this binary flag so each individual event is assigned a unique integer (i.e. create a partition over these events)\n", + "3. Create an integer to identify the last row in the event (so we can get useful information from this row)\n", + "4. Group the data based off the partition to result in a single `starttime` and `endtime` for each continguous medication administration\n", + "\n", + "Now we'll go through the code for doing this step by step." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 1: create a binary flag for new events\n", + "\n", + "```sql\n", + "with t1 as\n", + "(\n", + "select icustay_id\n", + " , case when itemid = 227525 then 'Calcium' else 'KCl' end as label\n", + " , starttime\n", + " , endtime\n", + " , case\n", + " when lag(endtime) over (partition by icustay_id, itemid order by starttime, endtime) = starttime\n", + " then 0\n", + " else 1 end\n", + " as new_event_flag\n", + " , rate, rateuom\n", + " , statusdescription\n", + "from inputevents_mv\n", + "where itemid in\n", + "(\n", + " --227525,-- Calcium Gluconate (CRRT)\n", + " 227536 -- KCl (CRRT)\n", + ")\n", + "and statusdescription != 'Rewritten'\n", + "\"\"\" + query_where_clause + \"\"\"\n", + "```\n", + "\n", + "This selects data from INPUTEVENTS_MV for just KCl using a single patient specified by the `query_where_clause` (this is so it can act as an example - you can omit the single patient and it will work on all the data).\n", + "\n", + "The key code block is here:\n", + "\n", + "```sql\n", + " , case\n", + " when lag(endtime) over (partition by icustay_id, itemid order by starttime, endtime) = starttime\n", + " then 0\n", + " else 1 end\n", + " as new_event_flag\n", + "```\n", + "\n", + "This creates a boolean flag which is 1 every time the current `starttime` is not equal to the previous `endtime`, i.e. it marks new \"events\". We can see it in action here:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "print(\"Durations from INPUTEVENTS_MV, new events noted with time_partition...\")\n", + "query = 'SET search_path to ' + schema_name + \"\"\";\n", + "with t1 as\n", + "(\n", + "select \n", + " icustay_id\n", + " , case when itemid = 227525 then 'Calcium' else 'KCl' end as label\n", + " , starttime, endtime\n", + " , lag(endtime) over (partition by icustay_id, itemid order by starttime, endtime) as endtime_lag\n", + " , case\n", + " when lag(endtime) over (partition by icustay_id, itemid order by starttime, endtime) = starttime\n", + " then 0\n", + " else 1 end\n", + " as new_event_flag\n", + " , rate, rateuom\n", + " , statusdescription\n", + "from inputevents_mv\n", + "where itemid in\n", + "(\n", + " --227525,-- Calcium Gluconate (CRRT)\n", + " 227536 -- KCl (CRRT)\n", + ")\n", + "and statusdescription != 'Rewritten'\n", + "\"\"\" + query_where_clause + \"\"\"\n", + ")\n", + "select \n", + " label\n", + ", starttime\n", + ", endtime\n", + ", endtime_lag\n", + ", new_event_flag\n", + ", rate, rateuom\n", + ", statusdescription\n", + "from t1\n", + "\"\"\"\n", + "ie = pd.read_sql_query(query,con)\n", + "\n", + "display_df(ie)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note we have added the `endtime_lag` column to give a clearer idea of how the query is working. We can see the first row starts with `new_event_flag = 1` since `endtime_lag` is null. Next, the `endtime_lag != starttime`, so `new_event_flag` is again `= 1`.\n", + "\n", + "Finally, for row 2 (marked by **2** on the far left), the `endtime_lag == starttime` - and so `new_event_flag` is 0. This continues all the way until row **9**, where we can again see `endtime_lag != starttime`. Note that the `statusdescription` on row **8** even informs us why: it states that the administration has been \"Paused\". This is why we mentioned earlier that we were interested in the last row from an event.\n", + "\n", + "### Step 2: create a binary flag for new events\n", + "\n", + "With SQL, in order to aggregate groups of rows, we need a *partition*. That is, we need some key (usually an integer) which is unique for that set of rows. Once we have this unique key, we can do all the standard SQL aggregations like `max()`, `min()`, and so on (note: SQL \"window\" functions operate on the same principle, except you define the partition explicitly from a combination of columns).\n", + "\n", + "With this in mind, our next step is to use this flag to create a unique integer for each set of rows we'd like grouped. Since we'd like to group new events together, we can run a cumulative sum along `new_event_flag`: every time a new event occurs, the integer will increase and consequently that event will all have the same unique key. The code to do this is:\n", + "\n", + "```sql\n", + "SUM(new_event_flag) OVER (partition by icustay_id, label order by starttime, endtime) as time_partition \n", + "```\n", + "\n", + "Let's see this in action:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "print(\"Durations from INPUTEVENTS for one patient with KCl...\")\n", + "query = 'SET search_path to ' + schema_name + \"\"\";\n", + "with t1 as\n", + "(\n", + "select icustay_id\n", + " , case when itemid = 227525 then 'Calcium' else 'KCl' end as label\n", + " , starttime\n", + " , endtime\n", + " , case\n", + " when lag(endtime) over (partition by icustay_id, itemid order by starttime, endtime) = starttime\n", + " then 0\n", + " else 1 end\n", + " as new_event_flag\n", + " , rate, rateuom\n", + " , statusdescription\n", + "from inputevents_mv\n", + "where itemid in\n", + "(\n", + " --227525,-- Calcium Gluconate (CRRT)\n", + " 227536 -- KCl (CRRT)\n", + ")\n", + "and statusdescription != 'Rewritten'\n", + "\"\"\" + query_where_clause + \"\"\"\n", + ")\n", + ", t2 as\n", + "(\n", + " select \n", + " icustay_id\n", + " , label\n", + " , starttime, endtime\n", + " , new_event_flag\n", + " , SUM(new_event_flag) OVER (partition by icustay_id, label order by starttime, endtime) as time_partition \n", + " , rate, rateuom, statusdescription\n", + " from t1\n", + ")\n", + "select \n", + " label\n", + " , starttime\n", + " , endtime\n", + " , new_event_flag\n", + " , time_partition \n", + " , rate, rateuom, statusdescription\n", + "\n", + "from t2\n", + "order by starttime, endtime\n", + "\"\"\"\n", + "ie = pd.read_sql_query(query,con)\n", + "\n", + "display_df(ie)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above (hopefully) makes it clear how a unique partition for each continuous segment of KCl administration can be delineated by cumulatively summing `new_event_flag` to create `time_partition`.\n", + "\n", + "### Step 3: create an integer to mark the last row of an event\n", + "\n", + "From above, it appears as though the *last* `statusdescription` would provide us useful debugging information as to why the administration event stopped - so we have another inline view where we create an integer which is 1 for the last `statusdescription`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "print(\"Durations from INPUTEVENTS for one patient with KCl...\")\n", + "query = 'SET search_path to ' + schema_name + \"\"\";\n", + "with t1 as\n", + "(\n", + "select icustay_id\n", + " , case when itemid = 227525 then 'Calcium' else 'KCl' end as label\n", + " , starttime\n", + " , endtime\n", + " , case\n", + " when lag(endtime) over (partition by icustay_id, itemid order by starttime, endtime) = starttime\n", + " then 0\n", + " else 1 end\n", + " as new_event_flag\n", + " , rate, rateuom\n", + " , statusdescription\n", + "from inputevents_mv\n", + "where itemid in\n", + "(\n", + " --227525,-- Calcium Gluconate (CRRT)\n", + " 227536 -- KCl (CRRT)\n", + ")\n", + "and statusdescription != 'Rewritten'\n", + "\"\"\" + query_where_clause + \"\"\"\n", + ")\n", + ", t2 as\n", + "(\n", + " select \n", + " icustay_id\n", + " , label\n", + " , starttime, endtime\n", + " , SUM(new_event_flag) OVER (partition by icustay_id, label order by starttime, endtime) as time_partition \n", + " , rate, rateuom, statusdescription\n", + " from t1\n", + ")\n", + ", t3 as\n", + "(\n", + "select\n", + " icustay_id\n", + " , label\n", + " , starttime, endtime\n", + " , time_partition \n", + " , rate, rateuom, statusdescription\n", + " , ROW_NUMBER() over (PARTITION BY icustay_id, label, time_partition order by starttime desc, endtime desc) as lastrow\n", + "from t2\n", + ")\n", + "select \n", + "label\n", + ", starttime\n", + ", endtime\n", + ", time_partition\n", + ", rate, rateuom\n", + ", statusdescription\n", + ", lastrow\n", + " from t3\n", + "order by starttime, endtime\n", + "\"\"\"\n", + "ie = pd.read_sql_query(query,con)\n", + "\n", + "display_df(ie)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 4: aggregate to merge together contiguous start/end times\n", + "\n", + "Now we aggregate the `starttime` and `endtime` together by grouping by `time_partition`, as follows:\n", + "\n", + "* we want the first `starttime`, so we use `min(starttime)`\n", + "* we want the last `endtime`, so we use `max(endtime)`\n", + "* we want the `statusdescription` at the last row, so we aggregate a column where all rows except the last are null\n", + "\n", + "To give more detail on the last step, let's look at the SQL code:\n", + "\n", + "```sql\n", + ", min(case when lastrow = 1 then statusdescription else null end) as statusdescription\n", + "```\n", + "\n", + "Aggregate functions ignore null values, so if we set the column to null for all but `lastrow = 1`, then the aggregate function is guaranteed to only return the value at `lastrow = 1`. The use of aggregate function could be either `min()` or `max()` - since it only effectively operates on a single value.\n", + "\n", + "Tying it all together, we have the final query:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false, + "scrolled": true + }, + "outputs": [], + "source": [ + "print(\"Durations from INPUTEVENTS for one patient with KCl...\")\n", + "query = 'SET search_path to ' + schema_name + \"\"\";\n", + "with t1 as\n", + "(\n", + "select icustay_id\n", + " , case when itemid = 227525 then 'Calcium' else 'KCl' end as label\n", + " , starttime\n", + " , endtime\n", + " , case\n", + " when lag(endtime) over (partition by icustay_id, itemid order by starttime, endtime) = starttime\n", + " then 0\n", + " else 1 end\n", + " as new_event_flag\n", + " , rate, rateuom\n", + " , statusdescription\n", + "from inputevents_mv\n", + "where itemid in\n", + "(\n", + " 227525,-- Calcium Gluconate (CRRT)\n", + " 227536 -- KCl (CRRT)\n", + ")\n", + "and statusdescription != 'Rewritten'\n", + "\"\"\" + query_where_clause + \"\"\"\n", + ")\n", + ", t2 as\n", + "(\n", + " select \n", + " icustay_id\n", + " , label\n", + " , starttime, endtime\n", + " , SUM(new_event_flag) OVER (partition by icustay_id, label order by starttime, endtime) as time_partition \n", + " , rate, rateuom, statusdescription\n", + " from t1\n", + ")\n", + ", t3 as\n", + "(\n", + "select\n", + " icustay_id\n", + " , label\n", + " , starttime, endtime\n", + " , time_partition \n", + " , rate, rateuom, statusdescription\n", + " , ROW_NUMBER() over (PARTITION BY icustay_id, label, time_partition order by starttime desc, endtime desc) as lastrow\n", + "from t2\n", + ")\n", + "select\n", + " label\n", + " --, time_partition\n", + " , min(starttime) AS starttime\n", + " , max(endtime) AS endtime\n", + " , min(rate) AS rate_min\n", + " , max(rate) AS rate_max\n", + " , min(rateuom) AS rateuom\n", + " , min(case when lastrow = 1 then statusdescription else null end) as statusdescription\n", + "from t3\n", + "group by icustay_id, label, time_partition\n", + "order by starttime, endtime\n", + "\"\"\"\n", + "ie = pd.read_sql_query(query,con)\n", + "\n", + "display_df(ie)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above looks good - so we save the query to `query_inputevents` without the clause that isolates the data to one patient." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "query_inputevents = 'SET search_path to ' + schema_name + \"\"\";\n", + "with t1 as\n", + "(\n", + "select icustay_id\n", + " , case when itemid = 227525 then 'Calcium' else 'KCl' end as label\n", + " , starttime\n", + " , endtime\n", + " , case\n", + " when lag(endtime) over (partition by icustay_id, itemid order by starttime, endtime) = starttime\n", + " then 0\n", + " else 1 end\n", + " as new_event_flag\n", + " , rate, rateuom\n", + " , statusdescription\n", + "from inputevents_mv\n", + "where itemid in\n", + "(\n", + " 227525,-- Calcium Gluconate (CRRT)\n", + " 227536 -- KCl (CRRT)\n", + ")\n", + "and statusdescription != 'Rewritten'\n", + ")\n", + ", t2 as\n", + "(\n", + " select \n", + " icustay_id\n", + " , label\n", + " , starttime, endtime\n", + " , SUM(new_event_flag) OVER (partition by icustay_id, label order by starttime, endtime) as time_partition \n", + " , rate, rateuom, statusdescription\n", + " from t1\n", + ")\n", + ", t3 as\n", + "(\n", + "select\n", + " icustay_id\n", + " , label\n", + " , starttime, endtime\n", + " , time_partition \n", + " , rate, rateuom, statusdescription\n", + " , ROW_NUMBER() over (PARTITION BY icustay_id, label, time_partition order by starttime desc, endtime desc) as lastrow\n", + "from t2\n", + ")\n", + "select\n", + " icustay_id\n", + " , time_partition as num\n", + " , min(starttime) AS starttime\n", + " , max(endtime) AS endtime\n", + " , label\n", + " --, min(rate) AS rate_min\n", + " --, max(rate) AS rate_max\n", + " --, min(rateuom) AS rateuom\n", + " --, min(case when lastrow = 1 then statusdescription else null end) as statusdescription\n", + "from t3\n", + "group by icustay_id, label, time_partition\n", + "order by starttime, endtime\n", + "\"\"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Conclusion\n", + "\n", + "We now have a good method of combining contiguous events from `INPUTEVENTS_MV`. Note that this is *usually* not required, as the `linkorderid` is meant to partition these events for us. For example, lets look at a very common sedative agent used in the ICU, propofol:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "print(\"Durations from INPUTEVENTS for one patient given propofol...\")\n", + "query = 'SET search_path to ' + schema_name + \"\"\";\n", + "with t1 as\n", + "(\n", + "select icustay_id\n", + " , di.label\n", + " , mv.linkorderid, mv.orderid\n", + " , starttime\n", + " , endtime\n", + " , rate, rateuom\n", + " , amount, amountuom\n", + "from inputevents_mv mv\n", + "inner join d_items di\n", + "on mv.itemid = di.itemid\n", + "and statusdescription != 'Rewritten'\n", + "\"\"\" + query_where_clause + \"\"\"\n", + "and mv.itemid = 222168\n", + ")\n", + "select \n", + " label\n", + " , linkorderid, orderid\n", + " , starttime\n", + " , endtime\n", + " , rate, rateuom\n", + " , amount, amountuom\n", + "from t1\n", + "order by starttime, endtime\n", + "\"\"\"\n", + "ie = pd.read_sql_query(query,con)\n", + "\n", + "display_df(ie)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here we see that `linkorderid` nicely delineates contiguous events without us having to put in the effort of above. It also separates *distinct* administrations. Above, at row **6**, we can see a \"1 minute\" delivery of propofol. This is how MetaVision tables (those which end in `_mv`) mark \"instant\" events - in the case of drug delivery, these are boluses of drugs administered to the patient. \n", + "\n", + "When using this data, we can group like events on a partition (as we did above), but we don't have to create the partition: it already exists with `linkorderid`. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false, + "scrolled": true + }, + "outputs": [], + "source": [ + "print(\"Grouped durations from INPUTEVENTS for one patient given propofol...\")\n", + "query = 'SET search_path to ' + schema_name + \"\"\";\n", + "with t1 as\n", + "(\n", + "select icustay_id\n", + " , di.itemid, di.label\n", + " , mv.linkorderid, mv.orderid\n", + " , starttime\n", + " , endtime\n", + " , amount, amountuom\n", + " , rate, rateuom\n", + "from inputevents_mv mv\n", + "inner join d_items di\n", + "on mv.itemid = di.itemid\n", + "and statusdescription != 'Rewritten'\n", + "\"\"\" + query_where_clause + \"\"\"\n", + "and mv.itemid = 222168\n", + ")\n", + "select icustay_id\n", + " , label\n", + " , linkorderid\n", + " , min(starttime) as starttime\n", + " , max(endtime) as endtime\n", + " , min(rate) as rate_min\n", + " , max(rate) as rate_max\n", + " , max(rateuom) as rateuom\n", + " , min(amount) as amount_min\n", + " , max(amount) as amount_max\n", + " , max(amountuom) as amountuom\n", + "from t1\n", + "group by icustay_id, itemid, label, linkorderid\n", + "order by starttime, endtime\n", + "\"\"\"\n", + "ie = pd.read_sql_query(query,con)\n", + "\n", + "display_df(ie)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It's also worth noting that bolus administrations *do not have a `rate`*. They only have an `amount`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Convert CHARTEVENTS into durations" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "# convert CHARTEVENTS into durations\n", + "# NOTE: we only look at a single patient as an exemplar\n", + "print(\"Durations from CHARTEVENTS...\")\n", + "query = 'SET search_path to ' + schema_name + \"\"\";\n", + "with crrt_settings as\n", + "(\n", + "select ce.icustay_id, ce.charttime\n", + ", max(\n", + " case\n", + " when ce.itemid in\n", + " (\n", + " 224149, -- Access Pressure\n", + " 224144, -- Blood Flow (ml/min)\n", + " 228004, -- Citrate (ACD-A)\n", + " 225183, -- Current Goal\n", + " 225977, -- Dialysate Fluid\n", + " 224154, -- Dialysate Rate\n", + " 224151, -- Effluent Pressure\n", + " 224150, -- Filter Pressure\n", + " 225958, -- Heparin Concentration (units/mL)\n", + " 224145, -- Heparin Dose (per hour)\n", + " 224191, -- Hourly Patient Fluid Removal\n", + " 228005, -- PBP (Prefilter) Replacement Rate\n", + " 228006, -- Post Filter Replacement Rate\n", + " 225976, -- Replacement Fluid\n", + " 224153, -- Replacement Rate\n", + " 224152, -- Return Pressure\n", + " 226457 -- Ultrafiltrate Output\n", + " ) then 1\n", + " else 0 end)\n", + " as RRT\n", + "-- Below indicates that a new instance of CRRT has started\n", + ", max(\n", + " case\n", + " -- System Integrity\n", + " when ce.itemid = 224146 and value in ('New Filter','Reinitiated')\n", + " then 1\n", + " else 0\n", + " end ) as RRT_start\n", + "-- Below indicates that the current instance of CRRT has ended\n", + ", max(\n", + " case\n", + " -- System Integrity\n", + " when ce.itemid = 224146 and value in ('Discontinued','Recirculating')\n", + " then 1\n", + " when ce.itemid = 225956\n", + " then 1\n", + " else 0\n", + " end ) as RRT_end\n", + "from chartevents ce\n", + "where ce.itemid in\n", + "(\n", + " -- MetaVision ITEMIDs\n", + " -- Below require special handling\n", + " 224146, -- System Integrity\n", + " 225956, -- Reason for CRRT Filter Change\n", + "\n", + " -- Below are settings which indicate CRRT is started/continuing\n", + " 224149, -- Access Pressure\n", + " 224144, -- Blood Flow (ml/min)\n", + " 228004, -- Citrate (ACD-A)\n", + " 225183, -- Current Goal\n", + " 225977, -- Dialysate Fluid\n", + " 224154, -- Dialysate Rate\n", + " 224151, -- Effluent Pressure\n", + " 224150, -- Filter Pressure\n", + " 225958, -- Heparin Concentration (units/mL)\n", + " 224145, -- Heparin Dose (per hour)\n", + " 224191, -- Hourly Patient Fluid Removal\n", + " 228005, -- PBP (Prefilter) Replacement Rate\n", + " 228006, -- Post Filter Replacement Rate\n", + " 225976, -- Replacement Fluid\n", + " 224153, -- Replacement Rate\n", + " 224152, -- Return Pressure\n", + " 226457 -- Ultrafiltrate Output\n", + ")\n", + "and ce.value is not null\n", + "\"\"\" + query_where_clause + \"\"\"\n", + "group by icustay_id, charttime\n", + ")\n", + "\n", + "-- create the durations for each CRRT instance\n", + "select icustay_id\n", + " , ROW_NUMBER() over (partition by icustay_id order by num) as num\n", + " , min(charttime) as starttime\n", + " , max(charttime) as endtime\n", + "from\n", + "(\n", + " select vd1.*\n", + " -- create a cumulative sum of the instances of new CRRT\n", + " -- this results in a monotonically increasing integer assigned to each CRRT\n", + " , case when RRT_start = 1 or RRT=1 or RRT_end = 1 then\n", + " SUM( NewCRRT )\n", + " OVER ( partition by icustay_id order by charttime )\n", + " else null end\n", + " as num\n", + " --- now we convert CHARTTIME of CRRT settings into durations\n", + " from ( -- vd1\n", + " select\n", + " icustay_id\n", + " -- this carries over the previous charttime\n", + " , case\n", + " when RRT=1 then\n", + " LAG(CHARTTIME, 1) OVER (partition by icustay_id, RRT order by charttime)\n", + " else null\n", + " end as charttime_lag\n", + " , charttime\n", + " , RRT\n", + " , RRT_start\n", + " , RRT_end\n", + " -- calculate the time since the last event\n", + " , case\n", + " -- non-null iff the current observation indicates settings are present\n", + " when RRT=1 then\n", + " CHARTTIME -\n", + " (\n", + " LAG(CHARTTIME, 1) OVER\n", + " (\n", + " partition by icustay_id, RRT\n", + " order by charttime\n", + " )\n", + " )\n", + " else null\n", + " end as CRRT_duration\n", + "\n", + " -- now we determine if the current event is a new instantiation\n", + " , case\n", + " when RRT_start = 1\n", + " then 1\n", + " -- if there is an end flag, we mark any subsequent event as new\n", + " when RRT_end = 1\n", + " -- note the end is *not* a new event, the *subsequent* row is\n", + " -- so here we output 0\n", + " then 0\n", + " when\n", + " LAG(RRT_end,1)\n", + " OVER\n", + " (\n", + " partition by icustay_id, case when RRT=1 or RRT_end=1 then 1 else 0 end\n", + " order by charttime\n", + " ) = 1\n", + " then 1\n", + " -- if there is less than 2 hours between CRRT settings, we do not treat this as a new CRRT event\n", + " when (CHARTTIME - (LAG(CHARTTIME, 1)\n", + " OVER\n", + " (\n", + " partition by icustay_id, case when RRT=1 or RRT_end=1 then 1 else 0 end\n", + " order by charttime\n", + " ))) <= interval '2' hour\n", + " then 0\n", + " else 1\n", + " end as NewCRRT\n", + " -- use the temp table with only settings from chartevents\n", + " FROM crrt_settings\n", + " ) AS vd1\n", + " -- now we can isolate to just rows with settings\n", + " -- (before we had rows with start/end flags)\n", + " -- this removes any null values for NewCRRT\n", + " where\n", + " RRT_start = 1 or RRT = 1 or RRT_end = 1\n", + ") AS vd2\n", + "group by icustay_id, num\n", + "having min(charttime) != max(charttime)\n", + "order by icustay_id, num\n", + "\"\"\"\n", + "ce = pd.read_sql_query(query,con)\n", + "\n", + "display_df(ce)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# happy with the above query - repeat it without the isolation to a single ICUSTAY_ID\n", + "query_chartevents = 'SET search_path to ' + schema_name + \"\"\";\n", + "with crrt_settings as\n", + "(\n", + "select ce.icustay_id, ce.charttime\n", + ", max(\n", + " case\n", + " when ce.itemid in\n", + " (\n", + " 224149, -- Access Pressure\n", + " 224144, -- Blood Flow (ml/min)\n", + " 228004, -- Citrate (ACD-A)\n", + " 225183, -- Current Goal\n", + " 225977, -- Dialysate Fluid\n", + " 224154, -- Dialysate Rate\n", + " 224151, -- Effluent Pressure\n", + " 224150, -- Filter Pressure\n", + " 225958, -- Heparin Concentration (units/mL)\n", + " 224145, -- Heparin Dose (per hour)\n", + " 224191, -- Hourly Patient Fluid Removal\n", + " 228005, -- PBP (Prefilter) Replacement Rate\n", + " 228006, -- Post Filter Replacement Rate\n", + " 225976, -- Replacement Fluid\n", + " 224153, -- Replacement Rate\n", + " 224152, -- Return Pressure\n", + " 226457 -- Ultrafiltrate Output\n", + " ) then 1\n", + " else 0 end)\n", + " as RRT\n", + "-- Below indicates that a new instance of CRRT has started\n", + ", max(\n", + " case\n", + " -- System Integrity\n", + " when ce.itemid = 224146 and value in ('New Filter','Reinitiated')\n", + " then 1\n", + " else 0\n", + " end ) as RRT_start\n", + "-- Below indicates that the current instance of CRRT has ended\n", + ", max(\n", + " case\n", + " -- System Integrity\n", + " when ce.itemid = 224146 and value in ('Discontinued','Recirculating')\n", + " then 1\n", + " when ce.itemid = 225956\n", + " then 1\n", + " else 0\n", + " end ) as RRT_end\n", + "from chartevents ce\n", + "where ce.itemid in\n", + "(\n", + " -- MetaVision ITEMIDs\n", + " -- Below require special handling\n", + " 224146, -- System Integrity\n", + " 225956, -- Reason for CRRT Filter Change\n", + "\n", + " -- Below are settings which indicate CRRT is started/continuing\n", + " 224149, -- Access Pressure\n", + " 224144, -- Blood Flow (ml/min)\n", + " 228004, -- Citrate (ACD-A)\n", + " 225183, -- Current Goal\n", + " 225977, -- Dialysate Fluid\n", + " 224154, -- Dialysate Rate\n", + " 224151, -- Effluent Pressure\n", + " 224150, -- Filter Pressure\n", + " 225958, -- Heparin Concentration (units/mL)\n", + " 224145, -- Heparin Dose (per hour)\n", + " 224191, -- Hourly Patient Fluid Removal\n", + " 228005, -- PBP (Prefilter) Replacement Rate\n", + " 228006, -- Post Filter Replacement Rate\n", + " 225976, -- Replacement Fluid\n", + " 224153, -- Replacement Rate\n", + " 224152, -- Return Pressure\n", + " 226457 -- Ultrafiltrate Output\n", + ")\n", + "and ce.value is not null\n", + "group by icustay_id, charttime\n", + ")\n", + "\n", + "-- create the durations for each CRRT instance\n", + "select icustay_id\n", + " , ROW_NUMBER() over (partition by icustay_id order by num) as num\n", + " , min(charttime) as starttime\n", + " , max(charttime) as endtime\n", + "from\n", + "(\n", + " select vd1.*\n", + " -- create a cumulative sum of the instances of new CRRT\n", + " -- this results in a monotonically increasing integer assigned to each CRRT\n", + " , case when RRT_start = 1 or RRT=1 or RRT_end = 1 then\n", + " SUM( NewCRRT )\n", + " OVER ( partition by icustay_id order by charttime )\n", + " else null end\n", + " as num\n", + " --- now we convert CHARTTIME of CRRT settings into durations\n", + " from ( -- vd1\n", + " select\n", + " icustay_id\n", + " -- this carries over the previous charttime\n", + " , case\n", + " when RRT=1 then\n", + " LAG(CHARTTIME, 1) OVER (partition by icustay_id, RRT order by charttime)\n", + " else null\n", + " end as charttime_lag\n", + " , charttime\n", + " , RRT\n", + " , RRT_start\n", + " , RRT_end\n", + " -- calculate the time since the last event\n", + " , case\n", + " -- non-null iff the current observation indicates settings are present\n", + " when RRT=1 then\n", + " CHARTTIME -\n", + " (\n", + " LAG(CHARTTIME, 1) OVER\n", + " (\n", + " partition by icustay_id, RRT\n", + " order by charttime\n", + " )\n", + " )\n", + " else null\n", + " end as CRRT_duration\n", + "\n", + " -- now we determine if the current event is a new instantiation\n", + " , case\n", + " when RRT_start = 1\n", + " then 1\n", + " -- if there is an end flag, we mark any subsequent event as new\n", + " when RRT_end = 1\n", + " -- note the end is *not* a new event, the *subsequent* row is\n", + " -- so here we output 0\n", + " then 0\n", + " when\n", + " LAG(RRT_end,1)\n", + " OVER\n", + " (\n", + " partition by icustay_id, case when RRT=1 or RRT_end=1 then 1 else 0 end\n", + " order by charttime\n", + " ) = 1\n", + " then 1\n", + " -- if there is less than 2 hours between CRRT settings, we do not treat this as a new CRRT event\n", + " when (CHARTTIME - (LAG(CHARTTIME, 1)\n", + " OVER\n", + " (\n", + " partition by icustay_id, case when RRT=1 or RRT_end=1 then 1 else 0 end\n", + " order by charttime\n", + " ))) <= interval '2' hour\n", + " then 0\n", + " else 1\n", + " end as NewCRRT\n", + " -- use the temp table with only settings from chartevents\n", + " FROM crrt_settings\n", + " ) AS vd1\n", + " -- now we can isolate to just rows with settings\n", + " -- (before we had rows with start/end flags)\n", + " -- this removes any null values for NewCRRT\n", + " where\n", + " RRT_start = 1 or RRT = 1 or RRT_end = 1\n", + ") AS vd2\n", + "group by icustay_id, num\n", + "having min(charttime) != max(charttime)\n", + "order by icustay_id, num\n", + "\"\"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Extract durations from PROCEDUREEVENTS_MV\n", + "\n", + "PROCEDUREEVENTS_MV contains entries for dialysis. As a reminder from the above, we picked the following `itemid`:\n", + "\n", + "* 225802 -- Dialysis - CRRT\n", + "* 225803 -- Dialysis - CVVHD\n", + "* 225809 -- Dialysis - CVVHDF\n", + "* 225955 -- Dialysis - SCUF\n", + "\n", + "Extracting data for these entries is straightforward. Each instance of CRRT is documented with a single `starttime` and a single `stoptime`, with no need to merge together different rows." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false, + "scrolled": false + }, + "outputs": [], + "source": [ + "# extract the durations from PROCEDUREEVENTS_MV\n", + "# NOTE: we only look at a single patient as an exemplar\n", + "print(\"Durations from PROCEDUREEVENTS_MV...\")\n", + "query = 'SET search_path to ' + schema_name + \"\"\";\n", + "select icustay_id\n", + " , ROW_NUMBER() over (partition by icustay_id order by starttime, endtime) as num\n", + " , starttime, endtime\n", + "from procedureevents_mv\n", + "where itemid in\n", + "(\n", + " 225802 -- Dialysis - CRRT\n", + " , 225803 -- Dialysis - CVVHD\n", + " , 225809 -- Dialysis - CVVHDF\n", + " , 225955 -- Dialysis - SCUF\n", + ")\n", + "\"\"\" + query_where_clause + \"\"\"\n", + "order by icustay_id, num\n", + "\"\"\"\n", + "pe = pd.read_sql_query(query,con)\n", + "\n", + "display_df(pe)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that the above documentation is quite dilligent: the entry pauses between the first and second row for 1 hour representing an actual pause in the administration of CRRT." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# happy with above query\n", + "query_procedureevents = 'SET search_path to ' + schema_name + \"\"\";\n", + "select icustay_id\n", + " , ROW_NUMBER() over (partition by icustay_id order by starttime, endtime) as num\n", + " , starttime, endtime\n", + "from procedureevents_mv\n", + "where itemid in\n", + "(\n", + " 225802 -- Dialysis - CRRT\n", + " , 225803 -- Dialysis - CVVHD\n", + " , 225809 -- Dialysis - CVVHDF\n", + " , 225955 -- Dialysis - SCUF\n", + ")\n", + "order by icustay_id, num\n", + "\"\"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Roundup: data from INPUTEVENTS_MV, CHARTEVENTS, and PROCEDUREEVENTS_MV" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "print(\"Durations from INPUTEVENTS...\")\n", + "ie = pd.read_sql_query(query_inputevents,con)\n", + "display_df(ie)\n", + "\n", + "print(\"Durations from CHARTEVENTS...\")\n", + "ce = pd.read_sql_query(query_chartevents,con)\n", + "display_df(ce)\n", + "\n", + "print(\"Durations from PROCEDUREEVENTS...\")\n", + "pe = pd.read_sql_query(query_procedureevents,con)\n", + "display_df(pe)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Compare durations\n", + "\n", + "We now need to merge together the above durations into a single, master set of CRRT administrations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "# how many PROCEDUREEVENTS_MV dialysis events encapsulate CHARTEVENTS/INPUTEVENTS_MV?\n", + "# vice-versa?\n", + "iid = 205508\n", + "idxDisplay = df['icustay_id'] == iid\n", + "\n", + "# compare the above durations\n", + "ce['source'] = 'chartevents'\n", + "ie['source'] = 'inputevents_kcl'\n", + "ie.loc[ie['label']=='Calcium','source'] = 'inputevents_ca' \n", + "pe['source'] = 'procedureevents'\n", + "df = pd.concat([ie[['icustay_id','num','starttime','endtime','source']], ce, pe])\n", + "\n", + "display_df(df)\n", + "\n", + "# 2) how many have no overlap whatsoever?\n", + "\n", + "\n", + "col_dict = {'chartevents': [247,129,191],\n", + " 'inputevents_kcl': [255,127,0],\n", + " 'inputevents_ca': [228,26,28],\n", + " 'procedureevents': [55,126,184]}\n", + "\n", + "for c in col_dict:\n", + " col_dict[c] = [x/256.0 for x in col_dict[c]]\n", + "\n", + "\n", + "fig, ax = plt.subplots(figsize=[16,10])\n", + "m = 0.\n", + "M = np.sum(idxDisplay)\n", + "\n", + "# dummy plots for legend\n", + "legend_handle = list()\n", + "for c in col_dict:\n", + " legend_handle.append(mlines.Line2D([], [], color=col_dict[c], marker='o',\n", + " markersize=15, label=c))\n", + "\n", + "for row in df.loc[idxDisplay,:].iterrows():\n", + " # row is a tuple: [index, actual_data], so we use row[1]\n", + " plt.plot([row[1]['starttime'].to_pydatetime(), row[1]['endtime'].to_pydatetime()], [0+m/M,0+m/M],\n", + " 'o-',color=col_dict[row[1]['source']],\n", + " markersize=15, linewidth=2)\n", + " m=m+1\n", + " \n", + "ax.xaxis.set_minor_locator(dates.HourLocator(byhour=[0,12],interval=1))\n", + "ax.xaxis.set_minor_formatter(dates.DateFormatter('%H:%M'))\n", + "ax.xaxis.grid(True, which=\"minor\")\n", + "ax.xaxis.set_major_locator(dates.DayLocator(interval=1))\n", + "ax.xaxis.set_major_formatter(dates.DateFormatter('\\n%d\\n%a'))\n", + "\n", + "ax.set_ylim([-0.1,1.0])\n", + "\n", + "plt.legend(handles=legend_handle,loc='best')\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "# print out the above for 10 examples\n", + "for iid in np.sort(df.icustay_id.unique()[0:10]):\n", + " iid = int(iid)\n", + " # how many PROCEDUREEVENTS_MV dialysis events encapsulate CHARTEVENTS/INPUTEVENTS_MV?\n", + " # vice-versa?\n", + " idxDisplay = df['icustay_id'] == iid\n", + "\n", + " # compare the above durations\n", + " ce['source'] = 'chartevents'\n", + " ie['source'] = 'inputevents_kcl'\n", + " ie.loc[ie['label']=='Calcium','source'] = 'inputevents_ca' \n", + " pe['source'] = 'procedureevents'\n", + " df = pd.concat([ie[['icustay_id','num','starttime','endtime','source']], ce, pe])\n", + "\n", + " display_df(df)\n", + " \n", + " # 2) how many have no overlap whatsoever?\n", + " col_dict = {'chartevents': [247,129,191],\n", + " 'inputevents_kcl': [255,127,0],\n", + " 'inputevents_ca': [228,26,28],\n", + " 'procedureevents': [55,126,184]}\n", + "\n", + " for c in col_dict:\n", + " col_dict[c] = [x/256.0 for x in col_dict[c]]\n", + "\n", + "\n", + " fig, ax = plt.subplots(figsize=[16,10])\n", + " m = 0.\n", + " M = np.sum(idxDisplay)\n", + "\n", + " # dummy plots for legend\n", + " legend_handle = list()\n", + " for c in col_dict:\n", + " legend_handle.append(mlines.Line2D([], [], color=col_dict[c], marker='o',\n", + " markersize=15, label=c))\n", + "\n", + " for row in df.loc[idxDisplay,:].iterrows():\n", + " # row is a tuple: [index, actual_data], so we use row[1]\n", + " plt.plot([row[1]['starttime'].to_pydatetime(), row[1]['endtime'].to_pydatetime()], [0+m/M,0+m/M],\n", + " 'o-',color=col_dict[row[1]['source']],\n", + " markersize=15, linewidth=2)\n", + " m=m+1\n", + "\n", + " ax.xaxis.set_minor_locator(dates.HourLocator(byhour=[0,6,12,18],interval=1))\n", + " ax.xaxis.set_minor_formatter(dates.DateFormatter('%H:%M'))\n", + " ax.xaxis.grid(True, which=\"minor\")\n", + " ax.xaxis.set_major_locator(dates.DayLocator(interval=1))\n", + " ax.xaxis.set_major_formatter(dates.DateFormatter('\\n%d-%m-%Y'))\n", + "\n", + " ax.set_ylim([-0.1,1.0])\n", + "\n", + " plt.legend(handles=legend_handle,loc='best')\n", + " plt.savefig('crrt_' + str(iid) + '.png')" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 2", + "language": "python", + "name": "python2" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 2 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython2", + "version": "2.7.12" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +}