# Children with Medical Complex Conditions

Motivation:
The motivation behind this project is to understand the framework in which children with complex chronic medical conditions and his families operate. Taking care of kids with several conditions is challenging for the families not only because it reqires coordinating different especialities but also coordinating with school and any other event. Those kids grow up with multiple medical visits, multiple things to track, multiple responsabilities in an age when his friends are just being kids. 

This project analizes first the clinical presentation of young adults to the emrgency room from the MIMIC-III dataset and then the data the clinical presentation of children to the emrgency room from the PIC dataset.

#### Datasets: MIMIC & PIC

There are limited public datasets that are open source and can be use. Here, we use the publicly available dataset Medical Information Mart for Intensive Care III, MIMIC-III and the Paediatric Intensive Care, PIC.

#### MIMIC
MIMIC-III is an open access hospital database that contains de-identified data from over 40,000 patients who were admitted to Beth Israel Deaconess Medical Center in Boston, Massachusetts, from 2001 to 2012.
To gain authorization to the access the data go to https://mimic.physionet.org/gettingstarted/access/ 
http://pic.nbscn.org/
MIMIC does not contain data from paediatric patients.
The data has been dowloaded, however due to xx

#### PIC
PIC (Pediatric Intensive Care) is a large pediatric-specific single-center bilingual database comprising information relating to children admitted to critical care units at a large children’s hospital in China. Data includes vital signs, medications, laboratory measurements, fluid balance, diagnostic codes, hospital length of stay, survival data, and more. 
To gain authorization to the access the data go to http://pic.nbscn.org/

-----------------------

"The databases are released under the Health Insurance Portability and Accountability Act (HIPAA) safe harbor provision."



## Part 1: Young Adults 



## 1.1. Data 

In this project, we will make use of the following MIMIC tables:

- PATIENTS - a table containing information about the patient (with contains all notes for each hospitalization (links with ADMISSIONS on SUBJECT_ID)
- ADMISSIONS - a table containing admission and discharge dates (has a unique identifier HADM_ID for each admission and links with PATIENTS on SUBJECT_ID)
- NOTEEVENTS - contains all notes for each hospitalization (links with ADMISSIONS on HADM_ID and with PATIENTS on SUBJECT_ID)
- ICUSTAYS - a table containing ICU stays (links with ADMISSIONS on HADM_ID and with PATIENTS on SUBJECT_ID)




In [None]:
# load the compressed files 
# 
import gzip

for filename in [ "data/PATIENTS.csv.gz","data/ADMISSIONS.csv.gz", "data/NOTEEVENTS.csv.gz",'data/ICUSTAYS.csv.gz']:
    with gzip.open(filename, 'rt') as f:
        data = f.read()
        
    with open(filename[:-3], 'wt') as f:
        f.write(data)

In [None]:
# load admissions table

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# import datetime 

# read patients table
patient_df = pd.read_csv('data/PATIENTS.csv')

patient_df.head()

In [None]:
# show dimensions
patient_df.shape

In [None]:
patient_df.info()

In [None]:
# full list of columns containing the data categories 
categories = patient_df.columns.ravel()
print (categories)


In [None]:
# read admissions table
adm_df = pd.read_csv('data/ADMISSIONS.csv')

adm_df.head()

In [None]:
# show dimensions
adm_df.shape

In [None]:
# full list of columns containing the data categories 
categories = adm_df.columns.ravel()
print (categories)

In [None]:
adm_df.info()

In [None]:
# read admissions table
icustays_df = pd.read_csv('data/ICUSTAYS.csv')

icustays_df.head()

In [None]:
# show dimensions
icustays_df.shape

In [None]:
icustays_df.info()

In [None]:
# full list of columns containing the data categories 
categories = icustays_df.columns.ravel()
print (categories)

In [None]:
# convert to dates
patient_df.DOB = pd.to_datetime(patient_df.DOB, format = '%Y-%m-%d', errors = 'coerce')
patient_df.DOD = pd.to_datetime(patient_df.DOD, format = '%Y-%m-%d', errors = 'coerce')
patient_df.DOD_HOSP = pd.to_datetime(patient_df.DOD_HOSP, format = '%Y-%m-%d', errors = 'coerce')
patient_df.DOD_SSN = pd.to_datetime(patient_df.DOD_SSN,format = '%Y-%m-%d', errors = 'coerce')

In [None]:
# convert to dates
adm_df.ADMITTIME = pd.to_datetime(adm_df.ADMITTIME, format = '%Y-%m-%d', errors = 'coerce')
adm_df.DISCHTIME = pd.to_datetime(adm_df.DISCHTIME, format = '%Y-%m-%d', errors = 'coerce')
adm_df.DEATHTIME = pd.to_datetime(adm_df.DEATHTIME, format='%Y%m%d', errors = 'coerce')

In [None]:
# convert to dates
icustays_df.INTIME = pd.to_datetime(icustays_df.INTIME, format='%Y%m%d', errors = 'coerce')
icustays_df.OUTTIME = pd.to_datetime(icustays_df.OUTTIME, format='%Y%m%d', errors = 'coerce')


In [None]:
# merge datasets
df_adm_pat = pd.merge(adm_df[['SUBJECT_ID','HADM_ID','ADMITTIME']],
                        patient_df[['SUBJECT_ID', 'DOB','EXPIRE_FLAG']], 
                        on = ['SUBJECT_ID'],
                        how = 'left')

assert len(adm_df) == len(df_adm_pat), 'Number of rows increased'


In [None]:
df = pd.merge(patient_df[['SUBJECT_ID', 'DOB','EXPIRE_FLAG']],
              icustays_df[['SUBJECT_ID', 'ICUSTAY_ID','INTIME',"OUTTIME"]],
              on = ['SUBJECT_ID'], how = 'left')


df.head()

In [None]:
df_a_p_i = pd.merge(adm_df[['SUBJECT_ID','HADM_ID','ADMITTIME']],
                    pd.merge(patient_df[['SUBJECT_ID', 'DOB','EXPIRE_FLAG']],
                             icustays_df[['SUBJECT_ID', 'ICUSTAY_ID','INTIME',"OUTTIME"]],
                             on = ['SUBJECT_ID'], how = 'left'),
                    on = ['SUBJECT_ID'],
                    how = 'left')


In [None]:
df_a_p_i.head(10)

In [None]:
# sort by subject_ID and admission date

df_a_p_i = df_adm_pat1.sort_values(['SUBJECT_ID','ADMITTIME','INTIME'])
df_a_p_i = df_adm_pat1.reset_index(drop = True)


In [None]:
# verify that it did what we wanted

df_a_p_i.loc[df_a_p_i.SUBJECT_ID == 124,['SUBJECT_ID','ADMITTIME','INTIME']]

In [None]:
df_adm_pat1['first_admitance']= df_adm_pat1.groupby(['SUBJECT_ID'])['ADMITTIME'].transform('min')



In [None]:
# verify that it did what we wanted

df_adm_pat1.loc[df_adm_pat1.SUBJECT_ID == 124,['SUBJECT_ID','ADMITTIME','INTIME','first_admitance']]

#### Calculate the age at the time of first admission

A note about dates from MIMIC website:

DOB has only been shifted for patients older than 89. 

All dates in the database have been shifted to protect patient confidentiality. Dates will be internally consistent for the same patient, but randomly distributed in the future.

To determine the mortality rate we must first select the proper age group. I am calling them yough adults, and are those whose age is between 16 - 26 and mostlikely be included in their parents health insurance - if they have.

These yough adults are between 16 to 26 years old at the date of their first admission. To perform this query, we use patient admission dates and dates of birth.

In [None]:
# calculate age 

df_adm_pat['first_admitance'] = pd.to_datetime(df_adm_pat.first_admitance, errors='coerce', format = '%Y-%m-%d')

df_adm_pat['DOB'] = pd.to_datetime(df_adm_pat.DOB, errors='coerce',format = '%Y-%m-%d')




In [None]:
df_adm_pat.head()


In [None]:
# calculate age at first admission

df_adm_pat['age_first_adm'] = (df_adm_pat['first_admitance'] - df_adm_pat['DOB']).dt.days // 365 


The age of patients older than 89 has been shifted to protect their identity. 
Those patients have been assigned an age of 90

In [None]:
# age reasigment 

df_adm_pat['age_first_adm'] = [90 if x < 0 else x for x in df_adm_pat['age_first_adm']]
        

In [None]:
# age distribution

fig, axes = plt.subplots(1, 2, figsize=(10, 4))


df_adm_pat.hist('age_first_adm', bins = 50,ax=axes[0],color='orange', grid = False)
df_adm_pat[(df_adm_pat['age_first_adm']>= 16) & (df_adm_pat['age_first_adm'] <= 26)].hist('age_first_adm', bins=10,ax=axes[1],color='orange', grid = False)

axes[0].set(title='Age distribution all patients',xlabel='age (years)', ylabel='frequency' )
axes[1].set(title='Age distribution young adults',xlabel='age (years)', ylabel='frequency')

plt.show()

In [None]:
# choose patients between 16-26 years old at the time of their first admission


df_young = df_adm_pat[(df_adm_pat['age_first_adm']>= 16) & (df_adm_pat['age_first_adm'] <= 26)]


In [None]:
df_young.info()

Check how many ICU admissions each patient had 
