# Children with Medical Complex Condtions

## Data 
MIMIC-III (Medical Information Mart for Intensive Care III) is an open access hospital database. It contains de-identified data from over 40,000 patients who were admitted to Beth Israel Deaconess Medical Center in Boston, Massachusetts, from 2001 to 2012.
To gain authorization to the access the data go to (https://mimic.physionet.org/gettingstarted/access/).
The data has been dowloaded, however due to xx

In this project, we will make use of the following MIMIC tables:

- PATIENTS - a table containing information about the patient (with contains all notes for each hospitalization (links with ADMISSIONS on SUBJECT_ID)
- ADMISSIONS - a table containing admission and discharge dates (has a unique identifier HADM_ID for each admission and links with PATIENTS on SUBJECT_ID)
- NOTEEVENTS - contains all notes for each hospitalization (links with ADMISSIONS on HADM_ID and with PATIENTS on SUBJECT_ID)







In [1]:
# load the compressed files 


import gzip

for filename in ["data/PATIENTS.csv.gz","data/ADMISSIONS.csv.gz", "data/NOTEEVENTS.csv.gz"]:
    with gzip.open(filename, 'rt') as f:
        data = f.read()
        
    with open(filename[:-3], 'wt') as f:
        f.write(data)

In [10]:
# load admissions table
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# read patients table
patient_df = pd.read_csv('data/PATIENTS.csv')

patient_df.head()

Unnamed: 0,ROW_ID,SUBJECT_ID,GENDER,DOB,DOD,DOD_HOSP,DOD_SSN,EXPIRE_FLAG
0,234,249,F,2075-03-13 00:00:00,,,,0
1,235,250,F,2164-12-27 00:00:00,2188-11-22 00:00:00,2188-11-22 00:00:00,,1
2,236,251,M,2090-03-15 00:00:00,,,,0
3,237,252,M,2078-03-06 00:00:00,,,,0
4,238,253,F,2089-11-26 00:00:00,,,,0


In [4]:
patient_df.shape

(46520, 8)

In [5]:
# full list of columns containing the data categories 
categories = patient_df.columns.ravel()

In [6]:
categories

array(['ROW_ID', 'SUBJECT_ID', 'GENDER', 'DOB', 'DOD', 'DOD_HOSP',
       'DOD_SSN', 'EXPIRE_FLAG'], dtype=object)

A note about dates from MIMIC website:

DOB has only been shifted for patients older than 89. 

All dates in the database have been shifted to protect patient confidentiality. Dates will be internally consistent for the same patient, but randomly distributed in the future.

In [9]:
# convert to dates
patient_df.DOB = pd.to_datetime(patient_df.DOB, format = '%Y-%m-%d %H:%M:%S', errors = 'coerce')
patient_df.DOD = pd.to_datetime(patient_df.DOD, format = '%Y-%m-%d %H:%M:%S', errors = 'coerce')
patient_df.DOD_HOSP = pd.to_datetime(patient_df.DOD_HOSP, format = '%Y-%m-%d %H:%M:%S', errors = 'coerce')
patient_df.DOD_SSN = pd.to_datetime(patient_df.DOD_SSN, format = '%Y-%m-%d %H:%M:%S', errors = 'coerce')


In [11]:
patient_df.head()

Unnamed: 0,ROW_ID,SUBJECT_ID,GENDER,DOB,DOD,DOD_HOSP,DOD_SSN,EXPIRE_FLAG
0,234,249,F,2075-03-13 00:00:00,,,,0
1,235,250,F,2164-12-27 00:00:00,2188-11-22 00:00:00,2188-11-22 00:00:00,,1
2,236,251,M,2090-03-15 00:00:00,,,,0
3,237,252,M,2078-03-06 00:00:00,,,,0
4,238,253,F,2089-11-26 00:00:00,,,,0
