# A Quick Look at the MIMIC III Data Set



In [6]:
import warnings
warnings.simplefilter("ignore")
from cdsutils.mutils import *
from cdsutils.mysql import *
import numpy as np
import numpy.random as ra
import ipywidgets as ipw
from getpass import getpass
schema=None

MIMIC III data are stored in a relational database. This is not an exploration of relational database theory or data modeling, but here is my novice quick description.

* Relational databases seek to achieve accurate data representation by eliminating (reducing)  data redundancies and thus the opportunities for data inconsistencies.

This is achieved by splitting data across **tables** and then **joining** the data back together when required.

### First we need to generate a connection to the MIMIC database

As you work through this notebook, you might occasionally get an error that looks something like this (although much longer):

```Python
OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

```

This just means that the connection with the database has timed out. All you need to do is come back up here and rerun the code below to get a new database connection.

In [3]:
conn = get_mimic_connection()

## Let's take a look at the tables

>Before you can do anything, you have to understand tables. If you don't have a table, you have nothing to work on. The table is the standard unit of information in a relational database. Everything revolves around tables. Tables are composed of rows and columns. And while that sounds simple, the sad truth is that tables are not simple. (*The Definitive Guide to SQLite*, p. 80 [owens2006definitive})


Since I said data are split across tables, let's look at the tables in the MIMIC II demo database.

### Take a look at the Tables in the Database

In [4]:
HTML(dlist(conn.list_tables(), ncols=7, sort=True))

0,1,2,3,4,5,6
a_chartdurations,a_iodurations,a_meddurations,additives,admissions,censusevents,chartevents
comorbidity_scores,d_caregivers,d_careunits,d_chartitems,d_chartitems_detail,d_codeditems,d_demographicitems
d_ioitems,d_labitems,d_meditems,d_parammap_items,d_patients,db_schema,deliveries
demographic_detail,demographicevents,drgevents,icd9,icustay_days,icustay_detail,icustayevents
ioevents,labevents,medevents,microbiologyevents,noteevents,parameter_mapping,poe_med
poe_order,procedureevents,totalbalevents,,,,


In [8]:
ta = conn.table("admissions", schema=schema)
ta.execute()

Unnamed: 0,hadm_id,subject_id,admit_dt,disch_dt
0,2,24807,3033-07-08 00:00:00,3033-07-17 00:00:00
1,3,7675,3388-05-16 00:00:00,3388-05-21 00:00:00
2,6,23547,3381-04-03 00:00:00,3381-04-22 00:00:00
3,10,14884,3015-08-28 00:00:00,3015-09-05 00:00:00
4,12,8652,3125-09-11 00:00:00,3125-09-22 00:00:00
...,...,...,...,...
5069,36005,29309,2541-12-11 00:00:00,2541-12-20 00:00:00
5070,36069,32711,3143-05-20 00:00:00,3143-05-22 00:00:00
5071,36071,32667,2866-02-18 00:00:00,2866-02-27 00:00:00
5072,36077,31134,2724-04-24 00:00:00,2724-04-27 00:00:00


## What are in the tables?
### Ibis Provides two ways to see the definitions of each table

1. `info()`
1. `schema()1
#### `info`

In [11]:
t = conn.table("icustayevents", schema=schema)
t.info()


Table rows: 5844

Column    Type      Non-null #
------    ----      ----------
icustay_idint32[non-nullable]5844      
subject_idint32[non-nullable]5844      
intime    Timestamp(timezone=None, nullable=False)5844      
outtime   Timestamp(timezone=None, nullable=False)5844      
los       float64[non-nullable]5844      
first_careunitint32     5844      
last_careunitint32     5844      


This is fairly ugly output, but tells us quite a bit about the table

- `Column`: This is the column name
- `Type`: This provides two pieces of information
    - The data type used to represent the data (e.g. `int32` (a 32 bit integer)
    - Whether the value is `nullable` (can be missing)
        - Example: `row_id` is represented with a 32 bit integer and CANNOT be missing
        - Example: `outtime` is represented with a `TimeStamp` and CAN be missing
- `Non-NULL #`: The number of rows in the table with non-NULL values for that column

#### `schema()`

`schema()` returns a dictionary-like object that provides the column names and the data tuype for the column, but does not provide any information about whether the value can be missing or not.

In [12]:
view_dict(t.schema())

icustay_id,subject_id,intime,outtime,los,first_careunit,last_careunit


In [None]:
view_table("diagnoses_icd", conn)

### Take a look at `d_patients`

This is the table that defines the individuals in the rest of the database. Each individual is  characterized by a unique identifer (`subject_id`), their sex described by a single-character, a date of birth, a date of death, and a single-character flag indicating whether the patient died in the hospital. 

In [13]:
view_table("d_patients", conn)

interactive(children=(IntSlider(value=5, description='num', max=20, min=5), IntSlider(value=0, description='st…

The [documentation](https://mimic.physionet.org/mimictables/patients/) tell us that this table links to `admission` and `icustays` vis the  `subject_id` value.

There are three different date of death columns. You can read about the differences and decide which value you would want to use.

- `NaT` represents a __missing time__.
- `gender`: `GENDER is the genotypical sex of the patient`

According to the WHO

>Humans are born with 46 chromosomes in 23 pairs. The X and Y chromosomes determine a person’s sex. Most women are 46XX and most men are 46XY. Research suggests, however, that in a few births per thousand some individuals will be born with a single sex chromosome (45X or 45Y) (sex monosomies) and some with three or more sex chromosomes (47XXX, 47XYY or 47XXY, etc.) (sex polysomies). In addition, some males are born 46XX due to the translocation of a tiny section of the sex determining region of the Y chromosome. Similarly some females are also born 46XY due to mutations in the Y chromosome. Clearly, there are not only females who are XX and males who are XY, but rather, there is a range of chromosome complements, hormone balances, and phenotypic variations that determine sex. (["Gender and Genetics"](https://www.who.int/genomics/gender/en/index1.html#:~:text=The%20X%20and%20Y%20chromosomes,47XYY%20or%2047XXY%2C%20etc.)))

So how many different genders are in the database?

We can use the `dictinct` method to get the unique values in a column:

In [None]:
t_pat = conn.table("patients", schema=schema)
t_pat['gender'].distinct().execute(limit=None)

In [None]:
t_pat.filter([t_pat.gender=='M']).count().execute(limit=None)

## How many total patients are there?

- `count()` counts the number of rows in the table
- A Note about execute: 

In [None]:
t_pat.count().execute(limit=None)

### Look at [`admissions`](https://mimic.physionet.org/mimictables/admissions/) 

In [None]:
view_table("admissions", conn)

In addition to the admission, and discharge information, this table also contains demographic information.

### Examine [`prescriptions`](https://mimic.physionet.org/mimictables/prescriptions/)

For a patient being given medication (medication event), we would want to know things like who was the medicine given to, who gave it to them, what was the medicine, when was it given, etc.

Examining the `prescription` table we an see  the nature of a relational database


In [None]:
t_pre = conn.table("prescriptions", schema=schema)
t_pre.info()

In [None]:
display(view_dict(conn.table("prescriptions", schema=schema).schema()))
view_table("prescriptions", conn)