## NOTE : This notebook was originally created by Dr. Brian Chapman and others.  It has been modified slightly for our 2017 course.

# Identifying Patient Cohorts in [MIMIC-II](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3124312/)


[MIMIC-II](https://physionet.org/mimic2/mimic2_clinical_overview.shtml) is a freely available database of ICU patients. To access the full database (now migrated to [MIMIC-III](https://www.nature.com/articles/sdata201635.pdf))  you must sign a data use agreement. However, there is a [demo data set](https://physionet.org/mimic2/demo/) based on 4000 deceased patients that can be used without signing any DUA.

## How to Use the MIMIC-II Database
* [MIMIC-II Cookbook](https://physionet.org/mimic2/demo/MIMICIICookBook_v1.pdf)
* [MIMIC Data Dictionaries](http://physionet.incor.usp.br/physiobank/database/dictionaries/)


## The Varieties of...Data
The data set is very rich and so is a good resource for exploring the varieties of clinical data

![MIMIC Paper](./images/mimic_paper_header.jpg)
(Sources : https://mimic.physionet.org/)

Data incluces free text notes (nursing, radiology, discharg summaries, etc.), input/output events, test results, procedure codes, diagnosis codes, etc.

# Very Short FAQ : 
* Q : What is the difference between MIMIC-II and MIMIC-III?
* A : MIMIC-II spans the time period of 2001 to 2008.  MIMIC-III spans 2001 to 2012 so it contains more data.  In addition, some data structures have been improved to make MIMIC-III easier to work with.  Some data quality issues have been resolved as well


* Q : How can I get access to MIMIC-III for my own research?
* A : You'll need to do CITI training and then some other steps.  Start here: https://mimic.physionet.org/gettingstarted/access/

In [1]:
%matplotlib inline

In [2]:
import pymysql
import pandas as pd
import getpass
import pandas as pd
import seaborn as sns

In [4]:
conn = pymysql.connect(host="mysql",
                       port=3306,user="jovyan",
                       passwd=getpass.getpass("Enter MySQL passwd for jovyan"),db='mimic2')
cursor = conn.cursor()

Enter MySQL passwd for jovyan········


## Example Query: Identifying ICD9 Codes for Patients

In [5]:
icd9_codes = pd.read_sql('SELECT subject_id, code, description from icd9',conn)


In [6]:
icd9_counts = icd9_codes["description"].value_counts(["description"]).to_frame(name="ICD9 Counts")
icd9_counts.head(10)

Unnamed: 0,ICD9 Counts
UNSPECIFIED ESSENTIAL HYPERTENSION,0.027176
CONGESTIVE HEART FAILURE UNSPECIFIED,0.025083
ATRIAL FIBRILLATION,0.022574
ACUTE RESPIRATORY FAILURE,0.016785
ACUTE RENAL FAILURE UNSPECIFIED,0.016068
CORONARY ATHEROSCLEROSIS OF NATIVE CORONARY ARTERY,0.015389
DIABETES MELLITUS WITHOUT COMPLICATION TYPE II OR,0.012843
URINARY TRACT INFECTION SITE NOT SPECIFIED,0.011429
CONGESTIVE HEART FAILURE \r,0.009712
PNEUMONIA ORGANISM UNSPECIFIED,0.009184


# Selecting Cohorts

Let's find some pneumonia patients

In [11]:
display(pd.read_sql_query("SELECT d.HADM_ID FROM  icd9 d    WHERE  (code like '486%%')  GROUP BY d.HADM_ID",conn)[:5])

Unnamed: 0,HADM_ID
0,3
1,15
2,130
3,167
4,202


## Select all the patient with or without pneumonia ICD9 code
### [Codes obtained from CDC](http://www.icd9data.com/2012/Volume1/460-519/480-488/486/486.htm)

Get a list of admission ids with the label of pneumonia diagnoses.

In [40]:
pneumonia_query = """
    SELECT 
a.subject_id
,a.hadm_id
,a.admit_dt
,(CASE WHEN pneu.HADM_ID IS NOT NULL THEN 1 ELSE 0 END) as Encounter_Pneumonia_Diagnosis
FROM admissions a
LEFT JOIN 
(
    SELECT
    d.HADM_ID
    FROM  icd9 d
    WHERE 
        (code like '486%%')
    GROUP BY d.HADM_ID
) pneu
ON a.HADM_ID = pneu.HADM_ID LIMIT 10
"""

adm_data = \
pd.read_sql(pneumonia_query,conn)
copd_data.head(20)

Unnamed: 0,subject_id,hadm_id,admit_dt,Encounter_Pneumonia_Diagnosis
0,56,28766,2644-01-17 00:00:00,0
1,3,2075,2682-09-07 00:00:00,0
2,21,20666,3138-10-29 00:00:00,0
3,21,20882,3139-03-19 00:00:00,0
4,12,12532,2875-09-26 00:00:00,0
5,26,15067,3079-03-03 00:00:00,0
6,37,18052,3264-08-14 00:00:00,1
7,31,15325,2678-08-21 00:00:00,1
8,61,7149,3352-06-23 00:00:00,0
9,61,5712,3353-01-10 00:00:00,0


## Pull radiology notes

In [37]:
chest_xray_note_query = """
SELECT
subject_id
,hadm_id
,LTRIM(RTRIM(text)) as text
FROM noteevents
WHERE category = 'RADIOLOGY_REPORT'
    AND (text like '%%CHEST (PORTABLE AP)%%' OR text like '%%CHEST (PA & LAT)%%')
    AND subject_id is not NULL
    AND hadm_id is not NULL
GROUP BY subject_id, hadm_id, text LIMIT 5
"""
chest_xray_note_df = pd.read_sql_query(chest_xray_note_query, conn)
display(chest_xray_note_df)

Unnamed: 0,subject_id,hadm_id,text
0,3,2075,\n\n\n DATE: [**2682-9-10**] 5:22 AM\n ...
1,3,2075,\n\n\n DATE: [**2682-9-11**] 4:06 PM\n ...
2,3,2075,\n\n\n DATE: [**2682-9-11**] 8:05 AM\n ...
3,3,2075,\n\n\n DATE: [**2682-9-12**] 6:32 AM\n ...
4,3,2075,\n\n\n DATE: [**2682-9-13**] 6:01 AM\n ...


## Select notes from patients with or without pneumonia ICD9 code

In [48]:
notes_query='''
YOU NEED WRITE THIS SQL'''

In [47]:
pneumonia_note_df = \
pd.read_sql(notes_query,conn)
pneumonia_note_df

Unnamed: 0,subject_id,hadm_id,text,Encounter_Pneumonia_Diagnosis
0,3,2075,\n\n\n DATE: [**2682-9-10**] 5:22 AM\n ...,0
1,3,2075,\n\n\n DATE: [**2682-9-11**] 4:06 PM\n ...,0
2,3,2075,\n\n\n DATE: [**2682-9-11**] 8:05 AM\n ...,0
3,3,2075,\n\n\n DATE: [**2682-9-12**] 6:32 AM\n ...,0
4,3,2075,\n\n\n DATE: [**2682-9-13**] 6:01 AM\n ...,0
5,3,2075,\n\n\n DATE: [**2682-9-14**] 11:41 PM\n ...,0
6,3,2075,\n\n\n DATE: [**2682-9-7**] 10:23 PM\n ...,0
7,3,2075,\n\n\n DATE: [**2682-9-7**] 6:16 PM\n ...,0
8,3,2075,\n\n\n DATE: [**2682-9-8**] 1:00 AM\n ...,0
9,3,2075,\n\n\n DATE: [**2682-9-8**] 4:43 PM\n ...,0


<br/><br/>This material presented as part of the DeCART Data Science for the Health Science Summer Program at the University of Utah in 2019.<br/>
Presenters : Dr. Wendy Chapman, Kelly Peterson, Alec Chapman, Jianlin Shi <br> Acknowledgement: Many thanks to Olga Patterson because part of the materials are adopted from his previous work.