# A Quick Look at the MIMIC III Data Set



In [None]:
import warnings
warnings.simplefilter("ignore")
from cdsutils.mutils import *
from cdsutils.sqlite import *
import matplotlib.pyplot as plt
from dminteract.creator.utils import *
%matplotlib inline

In [None]:
!cp /home/shared/mimic3.db .

MIMIC III data are stored in a relational database. This is not an exploration of relational database theory or data modeling, but here is my novice quick description.

* Relational databases seek to achieve accurate data representation by eliminating (reducing)  data redundancies and thus the opportunities for data inconsistencies.

This is achieved by splitting data across **tables** and then **joining** the data back together when required.

### First we need to generate a connection to the MIMIC database

As you work through this notebook, you might occasionally get an error that looks something like this (although much longer):

```Python
OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

```

This just means that the connection with the database has timed out. All you need to do is come back up here and rerun the code below to get a new database connection.

In [None]:
conn = get_mimic_connection()
conn.list_tables()
schema=None

## Let's take a look at the tables

>Before you can do anything, you have to understand tables. If you don't have a table, you have nothing to work on. The table is the standard unit of information in a relational database. Everything revolves around tables. Tables are composed of rows and columns. And while that sounds simple, the sad truth is that tables are not simple. (*The Definitive Guide to SQLite*, p. 80 [owens2006definitive})


Since I said data are split across tables, let's look at the tables in the MIMIC II demo database.

### Take a look at the Tables in the Database

In [None]:
HTML(dlist(conn.list_tables(schema=schema), ncols=7, sort=True))

## MIMIC III is well documented

- You can read about each table [here](https://mimic.physionet.org/mimictables/).

- As an example we can look at [microbiologyevents](https://mimic.physionet.org/mimictables/microbiologyevents/)

## What are in the tables?
### Ibis Provides two ways to see the definitions of each table

1. `info()`
1. `schema()1
#### `info`

In [None]:
t = conn.table("icustays")
t.info()


This is fairly ugly output, but tells us quite a bit about the table

- `Column`: This is the column name
- `Type`: This provides two pieces of information
    - The data type used to represent the data (e.g. `int32` (a 32 bit integer)
    - Whether the value is `nullable` (can be missing)
        - Example: `row_id` is represented with a 32 bit integer and CANNOT be missing
        - Example: `outtime` is represented with a `TimeStamp` and CAN be missing
- `Non-NULL #`: The number of rows in the table with non-NULL values for that column

#### `schema()`

`schema()` returns a dictionary-like object that provides the column names and the data tuype for the column, but does not provide any information about whether the value can be missing or not.

In [None]:
itview(conn.table("labevents"))

In [None]:
itview(t.projection(["subject_id", "icustay_id", "los"]))

In [None]:
t.los.execute().describe()

In [None]:
t.projection(["subject_id", "icustay_id", "los"]).filter(t.los > 35).execute()

In [None]:
view_dict(t.schema())

## Look at [`diagnoses_icd`]((https://mimic.physionet.org/mimictables/diagnoses_icd/)

- You might want to use a search engine like [this](https://www.findacode.com/search/search.php) for identifying ICD9-CM codes.

In [None]:
itview(conn.table("diagnoses_icd"))

## Things to notice

- `icd9_code` values are NOT ICD9 codes. They are references (foreign keys) to the definitions in `d_icd_diagnoses`
- `seq_num` is a ranking ("provides the order in which the ICD diagnoses relate to the patient") of the codes.


### Take a look at [`patients`](https://mimic.physionet.org/mimictables/patients/)



In [None]:
itview(conn.table("patients"))

The [documentation](https://mimic.physionet.org/mimictables/patients/) tell us that this table links to `admission` and `icustays` vis the  `subject_id` value.

There are three different date of death columns. You can read about the differences and decide which value you would want to use.

- `NaT` represents a __missing time__.
- `gender`: `GENDER is the genotypical sex of the patient`

According to the WHO

>Humans are born with 46 chromosomes in 23 pairs. The X and Y chromosomes determine a person’s sex. Most women are 46XX and most men are 46XY. Research suggests, however, that in a few births per thousand some individuals will be born with a single sex chromosome (45X or 45Y) (sex monosomies) and some with three or more sex chromosomes (47XXX, 47XYY or 47XXY, etc.) (sex polysomies). In addition, some males are born 46XX due to the translocation of a tiny section of the sex determining region of the Y chromosome. Similarly some females are also born 46XY due to mutations in the Y chromosome. Clearly, there are not only females who are XX and males who are XY, but rather, there is a range of chromosome complements, hormone balances, and phenotypic variations that determine sex. (["Gender and Genetics"](https://www.who.int/genomics/gender/en/index1.html#:~:text=The%20X%20and%20Y%20chromosomes,47XYY%20or%2047XXY%2C%20etc.)))

So how many different genders are in the database?

We can use the `dictinct` method to get the unique values in a column:

In [None]:
t_pat = conn.table("patients")
t_pat['gender'].distinct().execute(limit=None)

## How many total patients are there?

- `count()` counts the number of rows in the table
- A Note about execute: 

In [None]:
t_pat.count().execute(limit=None)

### Look at [`admissions`](https://mimic.physionet.org/mimictables/admissions/) 

Because this table is wider than our display, you might want to __right click__ cell below and 
select "Create New View For Output". This will create a new embedded window that has horizontal scrolling.

In [None]:
itview(conn.table("admissions"))

In addition to the admission, and discharge information, this table also contains demographic information.

### Examine [`prescriptions`](https://mimic.physionet.org/mimictables/prescriptions/)



In [None]:
t_pre = conn.table("prescriptions")
t_pre.info()

In [None]:
tp = conn.table("prescriptions")
display(view_dict(tp.schema()))
itview(tp)

### Examine [`chartevents`](https://mimic.physionet.org/mimictables/chartevents/)

- PATIENTS on SUBJECT_ID
- ADMISSIONS on HADM_ID
- ICUSTAYS on ICUSTAY_ID
- D_ITEMS on ITEMID
- CAREGIVERS on CGID

In [None]:
t_chart = conn.table("chartevents")
display(view_dict(t_chart.schema()))
itview(t_chart)

### Examine [`noteevents`](https://mimic.physionet.org/mimictables/noteevents/)

- Empty in the demo database

In [None]:
itview(conn.table("noteevents"))

In [None]:
@ipw.interact(conn = fixed(conn), table = conn.list_tables())
def display_table(table, conn):
    t = conn.table(table)
    itview(t)

In [None]:
tch = conn.table("chartevents")
tcg = conn.table("caregivers")
tdi = conn.table("d_items")
tch.join(tcg, tch.cgid==tcg.cgid).projection([tch.subject_id, tch.hadm_id, tch.icustay_id, 
                                              tch.itemid, tch.charttime, tcg.cgid, tcg.label, 
                                              tcg.description]).execute(limit=10)

In [None]:
tch = conn.table("chartevents")
tcg = conn.table("caregivers")
tdi = conn.table("d_items")
tch\
.join(tcg, tch.cgid==tcg.cgid)\
.join(tdi, tch.itemid==tdi.itemid)\
.projection([tch.subject_id, tch.hadm_id, tch.icustay_id, 
             tcg.cgid, tcg.label.name("cg_label"),tcg.description, tch.charttime,
             tdi.label, tch.value, tch.valuenum, tch.valueuom,
             tch.warning, tch.error, tch.resultstatus]).execute(limit=10)

In [None]:
tnew=tch\
.join(tcg, tch.cgid==tcg.cgid)\
.join(tdi, tch.itemid==tdi.itemid)\
.projection([tch.subject_id, tch.hadm_id, tch.icustay_id, 
             tcg.cgid, tcg.label.name("cg_label"),tcg.description, tch.charttime,
             tdi.label, tch.value, tch.valuenum, tch.valueuom,
             tch.warning, tch.error, tch.resultstatus])
itview(tnew.filter([tnew.label.like("%blood%")]))

## Note: SQLite is not dealing with dates correctly

- Dates are not being recognized as dates and so are left as strings in the database (`object`)

In [None]:
tch.execute(limit=10).dtypes

- I've written a function (`cast_df_times`) that takes any column with `time` in it and converts it to a date-time object

### How could we select BP measurements for a given patient?

In [None]:
bp = cast_df_times(tnew.filter([tnew.subject_id==40310, (tnew.label=="Arterial Blood Pressure diastolic")|
                                     (tnew.label=="Arterial Blood Pressure systolic")])\
.projection([tnew.subject_id,tnew.charttime, tnew.label, tnew.valuenum, tnew.valueuom]).execute(limit=10))
bp

### How would we analyze this?

- Column `valuenum` contains both systolic and diastolic values
- If you were really good at Ibis/SQL, which I am not, you could work this out with your query
- We'll do it after the fact using Python

In [None]:
diastolic =tnew.filter([tnew.subject_id==40310, tnew.label=="Arterial Blood Pressure diastolic"])\
.projection([tnew.subject_id,tnew.charttime, tnew.label, tnew.valuenum, tnew.valueuom]).execute()
systolic =tnew.filter([tnew.subject_id==40310, tnew.label=="Arterial Blood Pressure systolic"])\
.projection([tnew.subject_id,tnew.charttime, tnew.label, tnew.valuenum, tnew.valueuom]).execute()

In [None]:
diastolic = cast_df_times(diastolic)
systolic = cast_df_times(systolic)

In [None]:
systolic.plot.line(x="charttime", y="valuenum")
diastolic.plot.line(x="charttime", y="valuenum")

In [None]:
fig1, ax1 = plt.subplots(1)
systolic.plot.line(x="charttime", y="valuenum", ax=ax1)
diastolic.plot.line(x="charttime", y="valuenum", ax=ax1)

In [None]:
diastolic = cast_df_times(tnew.filter([tnew.subject_id==40310, tnew.label=="Arterial Blood Pressure diastolic"])\
.projection([tnew.subject_id,tnew.charttime, tnew.label, tnew.valuenum.name("diastolic_BP"), tnew.valueuom]).execute())
systolic = cast_df_times(tnew.filter([tnew.subject_id==40310, tnew.label=="Arterial Blood Pressure systolic"])\
.projection([tnew.subject_id,tnew.charttime, tnew.label, tnew.valuenum.name("systolic_BP") , tnew.valueuom]).execute())

In [None]:
diastolic.head()

In [None]:
fig2, ax2 = plt.subplots(1)
systolic.plot.line(x="charttime", y="systolic_BP", ax=ax2)
diastolic.plot.line(x="charttime", y="diastolic_BP", ax=ax2)

In [None]:
pd.concat([systolic[["subject_id", "charttime", "systolic_BP"]], diastolic["diastolic_BP"]], axis=1)

## Exercises

In [None]:
for w in create_question_bank("day1_bec.yaml", tag="mimic,mf"):
    display(w)