# HDAT Capstone Project

## Research Question - Mortality prediction in the ICU:

#### Task - The task is to build a predictive algorithm using the techniques we learned in this course
#### Objective - To assess the role of machine learning algorithms for predicting mortality by using the MIMIC-II dataset
#### Question - Is it possible to accurately predict mortality based on data from the first 24 hours in ICU?
#### Study population - MIMIC-II dataset

Notes about the datsets:

1. Incorrect values - MIMIC-II was not collected for research and is a combination of two different electronic medical record systems (CareVue and Metavision). This increase the likelihood of inaccuracies in data entry and extraction.

2. Missing data/sparseness: there is variation in the information recorded between patients due to different uses of the EMR (e.g. use of a separate system for recording lab results, or medications) across time, and the data being collected for clinical relevance rather than research.

All patients have a unique identifying ID (subject_id), a hospital stay ID (hadm_id) and an ICU stay ID (icustay_id). These IDs can be used to identify readmissions to hospital and ICU.

## Imports

In [1]:
import pandas as pd
import numpy as np 
import seaborn as sns
sns.set_style("darkgrid")
import matplotlib.pyplot as plt 


## Load in datasets

In [2]:
patients = pd.read_csv('mimic_data/patients.csv') # https://mimic.physionet.org/mimictables/patients/
# Table purpose: Defines each SUBJECT_ID in the database, i.e. defines a single patient
# Links to: ADMISSIONS on SUBJECT_ID, ICUSTAYS on SUBJECT_ID
patients.head()

Unnamed: 0,row_id,subject_id,gender,dob,dod,dod_hosp,dod_ssn,expire_flag
0,234,249,F,2075-03-13 00:00:00,,,,0
1,235,250,F,2164-12-27 00:00:00,2188-11-22 00:00:00,2188-11-22 00:00:00,,1
2,236,251,M,2090-03-15 00:00:00,,,,0
3,237,252,M,2078-03-06 00:00:00,,,,0
4,238,253,F,2089-11-26 00:00:00,,,,0


In [3]:
patients['expire_flag'].value_counts()

0    30761
1    15759
Name: expire_flag, dtype: int64

In [4]:
admissions = pd.read_csv('mimic_data/admissions.csv') # https://mimic.physionet.org/mimictables/admissions/
# Table purpose: Define a patient’s hospital admission, HADM_ID.
# Links to PATIENTS on SUBJECT_ID
admissions.head()

Unnamed: 0,row_id,subject_id,hadm_id,admittime,dischtime,deathtime,admission_type,admission_location,discharge_location,insurance,language,religion,marital_status,ethnicity,edregtime,edouttime,diagnosis,hospital_expire_flag,has_chartevents_data
0,21,22,165315,2196-04-09 12:26:00,2196-04-10 15:54:00,,EMERGENCY,EMERGENCY ROOM ADMIT,DISC-TRAN CANCER/CHLDRN H,Private,,UNOBTAINABLE,MARRIED,WHITE,2196-04-09 10:06:00,2196-04-09 13:24:00,BENZODIAZEPINE OVERDOSE,0,1
1,22,23,152223,2153-09-03 07:15:00,2153-09-08 19:10:00,,ELECTIVE,PHYS REFERRAL/NORMAL DELI,HOME HEALTH CARE,Medicare,,CATHOLIC,MARRIED,WHITE,,,CORONARY ARTERY DISEASE\CORONARY ARTERY BYPASS...,0,1
2,23,23,124321,2157-10-18 19:34:00,2157-10-25 14:00:00,,EMERGENCY,TRANSFER FROM HOSP/EXTRAM,HOME HEALTH CARE,Medicare,ENGL,CATHOLIC,MARRIED,WHITE,,,BRAIN MASS,0,1
3,24,24,161859,2139-06-06 16:14:00,2139-06-09 12:48:00,,EMERGENCY,TRANSFER FROM HOSP/EXTRAM,HOME,Private,,PROTESTANT QUAKER,SINGLE,WHITE,,,INTERIOR MYOCARDIAL INFARCTION,0,1
4,25,25,129635,2160-11-02 02:06:00,2160-11-05 14:55:00,,EMERGENCY,EMERGENCY ROOM ADMIT,HOME,Private,,UNOBTAINABLE,MARRIED,WHITE,2160-11-02 01:01:00,2160-11-02 04:27:00,ACUTE CORONARY SYNDROME,0,1


In [5]:
icu_stay = pd.read_csv('mimic_data/icustays.csv') # https://mimic.physionet.org/mimictables/icustays/
# Table purpose: Defines each ICUSTAY_ID in the database, i.e. defines a single ICU stay.
# Links to: PATIENTS on SUBJECT_ID, ADMISSIONS on HADM_ID
icu_stay.head()

Unnamed: 0,row_id,subject_id,hadm_id,icustay_id,dbsource,first_careunit,last_careunit,first_wardid,last_wardid,intime,outtime,los
0,365,268,110404,280836,carevue,MICU,MICU,52,52,2198-02-14 23:27:38,2198-02-18 05:26:11,3.249
1,366,269,106296,206613,carevue,MICU,MICU,52,52,2170-11-05 11:05:29,2170-11-08 17:46:57,3.2788
2,367,270,188028,220345,carevue,CCU,CCU,57,57,2128-06-24 15:05:20,2128-06-27 12:32:29,2.8939
3,368,271,173727,249196,carevue,MICU,SICU,52,23,2120-08-07 23:12:42,2120-08-10 00:39:04,2.06
4,369,272,164716,210407,carevue,CCU,CCU,57,57,2186-12-25 21:08:04,2186-12-27 12:01:13,1.6202


In [6]:
pt_icu_outcome = pd.read_csv('mimic_data/pt_icu_outcome.csv')
pt_icu_outcome.head()

Unnamed: 0,row_id,subject_id,dob,hadm_id,admittime,dischtime,icustay_id,age_years,intime,outtime,los,hosp_deathtime,icu_expire_flag,hospital_expire_flag,dod,expire_flag,ttd_days
0,1,2,2138-07-17 00:00:00,163353,2138-07-17 19:04:00,2138-07-21 15:48:00,243653,0.0,2138-07-17 21:20:07,2138-07-17 23:32:21,0.0918,,0,0.0,,0,
1,2,3,2025-04-11 00:00:00,145834,2101-10-20 19:08:00,2101-10-31 13:58:00,211552,76.0,2101-10-20 19:10:11,2101-10-26 20:43:09,6.0646,,0,0.0,2102-06-14 00:00:00,1,236.0
2,3,4,2143-05-12 00:00:00,185777,2191-03-16 00:28:00,2191-03-23 18:41:00,294638,47.0,2191-03-16 00:29:31,2191-03-17 16:46:31,1.6785,,0,0.0,,0,
3,4,5,2103-02-02 00:00:00,178980,2103-02-02 04:31:00,2103-02-04 12:15:00,214757,0.0,2103-02-02 06:04:24,2103-02-02 08:06:00,0.0844,,0,0.0,,0,
4,5,6,2109-06-21 00:00:00,107064,2175-05-30 07:15:00,2175-06-15 16:00:00,228232,65.0,2175-05-30 21:30:54,2175-06-03 13:39:54,3.6729,,0,0.0,,0,


In [7]:
pt_icu_outcome['expire_flag'].value_counts(normalize=True)

0    0.606845
1    0.393155
Name: expire_flag, dtype: float64

In [8]:
tranfers = pd.read_csv('mimic_data/transfers.csv')
tranfers['eventtype'].value_counts()

transfer     144045
discharge     58919
admit         58909
Name: eventtype, dtype: int64

In [10]:
pt_stay_hr = pd.read_csv('mimic_data/pt_stay_hr.csv')
pt_stay_hr.head()

Unnamed: 0,icustay_id,hadm_id,subject_id,intime,outtime,starttime,endtime,hr,dy
0,200001,152234,55973,2181-11-25 19:06:12,2181-11-28 20:59:25,2181-11-24 19:06:12,2181-11-24 20:06:12,-24.0,0.0
1,200001,152234,55973,2181-11-25 19:06:12,2181-11-28 20:59:25,2181-11-24 20:06:12,2181-11-24 21:06:12,-23.0,0.0
2,200001,152234,55973,2181-11-25 19:06:12,2181-11-28 20:59:25,2181-11-24 21:06:12,2181-11-24 22:06:12,-22.0,0.0
3,200001,152234,55973,2181-11-25 19:06:12,2181-11-28 20:59:25,2181-11-24 22:06:12,2181-11-24 23:06:12,-21.0,0.0
4,200001,152234,55973,2181-11-25 19:06:12,2181-11-28 20:59:25,2181-11-24 23:06:12,2181-11-25 00:06:12,-20.0,0.0
