# Prediction of Hospital Readmissions

This notebook's goal is to reproduce the claims from Zebin and Chaussalet's paper, 'Design and implementation of a deep recurrent model for prediction of readadmission in urgent care using electronic health records'.



## Claims

When predicting ICU readmissions:

1. LSTM+CNN produced higher accuracy than logistic regression, random forest, and SVM.
2. LSTM+CNN produced higher precision than logistic regression, random forest, and SVM.
3. LSTM+CNN produced higher recall than logistic regression and SVM

In [1]:
import numpy as np
import pandas as pd

In [2]:
patients = pd.read_csv('./mimic-iii/PATIENTS.csv')
patients.head(1)

Unnamed: 0,ROW_ID,SUBJECT_ID,GENDER,DOB,DOD,DOD_HOSP,DOD_SSN,EXPIRE_FLAG
0,234,249,F,2075-03-13 00:00:00,,,,0


In [3]:
admissions = pd.read_csv('./mimic-iii/ADMISSIONS.csv')
print(admissions.shape)
admissions.head(1)

(58976, 19)


Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DEATHTIME,ADMISSION_TYPE,ADMISSION_LOCATION,DISCHARGE_LOCATION,INSURANCE,LANGUAGE,RELIGION,MARITAL_STATUS,ETHNICITY,EDREGTIME,EDOUTTIME,DIAGNOSIS,HOSPITAL_EXPIRE_FLAG,HAS_CHARTEVENTS_DATA
0,21,22,165315,2196-04-09 12:26:00,2196-04-10 15:54:00,,EMERGENCY,EMERGENCY ROOM ADMIT,DISC-TRAN CANCER/CHLDRN H,Private,,UNOBTAINABLE,MARRIED,WHITE,2196-04-09 10:06:00,2196-04-09 13:24:00,BENZODIAZEPINE OVERDOSE,0,1


In [4]:
transfers = pd.read_csv('./mimic-iii/TRANSFERS.csv')
transfers.head(1)

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ICUSTAY_ID,DBSOURCE,EVENTTYPE,PREV_CAREUNIT,CURR_CAREUNIT,PREV_WARDID,CURR_WARDID,INTIME,OUTTIME,LOS
0,657,111,192123,254245.0,carevue,transfer,CCU,MICU,7.0,23.0,2142-04-29 15:27:11,2142-05-04 20:38:33,125.19


In [5]:
icustays = pd.read_csv('./mimic-iii/ICUSTAYS.csv')
icustays.head(1)


Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ICUSTAY_ID,DBSOURCE,FIRST_CAREUNIT,LAST_CAREUNIT,FIRST_WARDID,LAST_WARDID,INTIME,OUTTIME,LOS
0,365,268,110404,280836,carevue,MICU,MICU,52,52,2198-02-14 23:27:38,2198-02-18 05:26:11,3.249


In [6]:
# print(len(icustays.groupby(['HADM_ID'])))
# test = (icustays.groupby(['HADM_ID']).size() > 1).reset_index()
# test[test[0]==True]

## NOTE: Organ Donors

There are some cases where a patient as multiple inconsistent death times in duplicated records. Either the patient has 2 different death times or the admit and discharge times do not align with death time.
It appears that the organ donor collection is processed under a different HADM_ID number and the entries are inconsistent.

In [7]:
# addmissions where DEATH IN HOSPITAL
organ = admissions[admissions['HOSPITAL_EXPIRE_FLAG']==1].reset_index(drop=True)

organ[organ.duplicated('SUBJECT_ID',keep=False)].sort_values('SUBJECT_ID').head(2)

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DEATHTIME,ADMISSION_TYPE,ADMISSION_LOCATION,DISCHARGE_LOCATION,INSURANCE,LANGUAGE,RELIGION,MARITAL_STATUS,ETHNICITY,EDREGTIME,EDOUTTIME,DIAGNOSIS,HOSPITAL_EXPIRE_FLAG,HAS_CHARTEVENTS_DATA
43,533,417,178013,2177-03-22 22:24:00,2177-03-23 07:20:00,2177-03-23 07:20:00,EMERGENCY,EMERGENCY ROOM ADMIT,DEAD/EXPIRED,Private,,UNOBTAINABLE,MARRIED,WHITE,2177-03-22 22:01:00,2177-03-23 00:20:00,SUBARACHNOID HEMORRHAGE,1,1
44,534,417,102633,2177-03-23 16:17:00,2177-03-23 07:20:00,2177-03-23 07:20:00,URGENT,PHYS REFERRAL/NORMAL DELI,DEAD/EXPIRED,Private,,UNOBTAINABLE,MARRIED,WHITE,,,ORGAN DONOR ACCOUNT,1,1


## Example

Patient with `SUBJECT_ID = 250`

In [8]:
# patients[patients['SUBJECT_ID'] == 250]

In [9]:
# admissions[admissions['SUBJECT_ID'] == 250]

In [10]:
# transfers[transfers['SUBJECT_ID'] == 250]

## Create Dataset

SUBJECT_ID exmaples  291, 283, 250

In [11]:
patients = patients.drop('ROW_ID', axis=1)
admissions = admissions.drop('ROW_ID', axis=1)
icustays = icustays.drop('ROW_ID', axis=1)

patients_columns = ['SUBJECT_ID', 'GENDER', 'DOB', 'DOD', 'EXPIRE_FLAG']
admissions_columns = ['SUBJECT_ID', 'HADM_ID', 'ADMITTIME', 'DISCHTIME', 'DEATHTIME',
                    'INSURANCE', 'LANGUAGE', 'RELIGION', 'MARITAL_STATUS', 'ETHNICITY',
                    'DIAGNOSIS', 'HOSPITAL_EXPIRE_FLAG', 'HAS_CHARTEVENTS_DATA']
icustays_columns = ['SUBJECT_ID','HADM_ID','ICUSTAY_ID','FIRST_CAREUNIT',
                    'LAST_CAREUNIT','FIRST_WARDID','LAST_WARDID','INTIME','OUTTIME','LOS']

In [12]:
dataset = pd.merge(icustays[icustays_columns],admissions[admissions_columns],'left',['SUBJECT_ID','HADM_ID'])
dataset = dataset.merge(patients[patients_columns],'left',['SUBJECT_ID'])

In [13]:
dataset.columns

Index(['SUBJECT_ID', 'HADM_ID', 'ICUSTAY_ID', 'FIRST_CAREUNIT',
       'LAST_CAREUNIT', 'FIRST_WARDID', 'LAST_WARDID', 'INTIME', 'OUTTIME',
       'LOS', 'ADMITTIME', 'DISCHTIME', 'DEATHTIME', 'INSURANCE', 'LANGUAGE',
       'RELIGION', 'MARITAL_STATUS', 'ETHNICITY', 'DIAGNOSIS',
       'HOSPITAL_EXPIRE_FLAG', 'HAS_CHARTEVENTS_DATA', 'GENDER', 'DOB', 'DOD',
       'EXPIRE_FLAG'],
      dtype='object')

In [14]:
print(dataset.shape)
dataset.head(1)

(61532, 25)


Unnamed: 0,SUBJECT_ID,HADM_ID,ICUSTAY_ID,FIRST_CAREUNIT,LAST_CAREUNIT,FIRST_WARDID,LAST_WARDID,INTIME,OUTTIME,LOS,...,RELIGION,MARITAL_STATUS,ETHNICITY,DIAGNOSIS,HOSPITAL_EXPIRE_FLAG,HAS_CHARTEVENTS_DATA,GENDER,DOB,DOD,EXPIRE_FLAG
0,268,110404,280836,MICU,MICU,52,52,2198-02-14 23:27:38,2198-02-18 05:26:11,3.249,...,CATHOLIC,SEPARATED,HISPANIC OR LATINO,DYSPNEA,1,1,F,2132-02-21 00:00:00,2198-02-18 00:00:00,1


In [15]:
dataset['DOB'] = pd.to_datetime(dataset['DOB'], errors='coerce')
dataset['DOD'] = pd.to_datetime(dataset['DOD'], errors='coerce')
dataset['ADMITTIME'] = pd.to_datetime(dataset['ADMITTIME'], errors='coerce')
dataset['DISCHTIME'] = pd.to_datetime(dataset['DISCHTIME'], errors='coerce')
dataset['DEATHTIME'] = pd.to_datetime(dataset['DEATHTIME'], errors='coerce')
dataset['INTIME'] = pd.to_datetime(dataset['INTIME'], errors='coerce')
dataset['OUTTIME'] = pd.to_datetime(dataset['OUTTIME'], errors='coerce')


In [16]:
dataset_columns = ['SUBJECT_ID', 'HADM_ID', 'ICUSTAY_ID',  'ADMITTIME', 'DISCHTIME', 'DEATHTIME', 
                   'INTIME', 'OUTTIME','DOB', 'DOD', 'EXPIRE_FLAG', 'HOSPITAL_EXPIRE_FLAG', 
                   'HAS_CHARTEVENTS_DATA', 'FIRST_CAREUNIT', 'LAST_CAREUNIT', 
                   'FIRST_WARDID', 'LAST_WARDID', 'LOS', 'INSURANCE', 'LANGUAGE',
                   'RELIGION', 'MARITAL_STATUS', 'ETHNICITY', 'DIAGNOSIS', 'GENDER']

In [17]:
dataset = dataset[dataset_columns]

In [18]:
dataset = dataset[(dataset['DIAGNOSIS'] != 'ORGAN DONOR ACCOUNT') & (dataset['DIAGNOSIS'] != 'DONOR ACCOUNT') & 
                  (dataset['DIAGNOSIS'] != 'ORGAN DONOR') & (dataset['DIAGNOSIS'].notnull())].reset_index(drop=True)

## Goal: Find cases of readmission

1. Transferred & returned               = 3555
2. Transferred & died                   = 1974
3. Discharged & returned < 30 days      = 3205
4. Discharged & died < 30 days          = 2556


### 1. Transferred & Returned

```python
dataset['TRANSFERRED_RETURNED'] = icustays.duplicated(subset=['HADM_ID']).astype(int)
```

In [19]:
dataset = dataset.sort_values(['HADM_ID','ADMITTIME','INTIME']).reset_index(drop=True)
# same HADM_ID, but diff ICUSTAY_ID
dataset['ICU_VISIT_PER_ADMIT'] = dataset.groupby('HADM_ID')['ICUSTAY_ID'].cumcount()
dataset['TRANSFERRED_RETURNED'] = 0
dataset.loc[dataset['ICU_VISIT_PER_ADMIT'] > 0, 'TRANSFERRED_RETURNED'] = 1

In [20]:
np.sum(dataset['TRANSFERRED_RETURNED']).astype(int)

3746

### 2. Transferred & Died

```python
dataset['TRANSFERRED_DEATH'] = admissions['HOSPITAL_EXPIRE_FLAG'] and not dataset['DEATH_IN_ICU']
```

### Define death

```python
patients['EXPIRE_FLAG'] == 1
```

### Define death in hospital

```python
admissions['HOSPITAL_EXPIRE_FLAG'] == 1
```

### Define death in ICU (merge addmissions & icustays)

```python
dataset['DEATH_IN_ICU'] = (admissions['DEATHTIME'] > icustays['INTIME']) & (admissions['DEATHTIME'] < icustays['OUTTIME']).astype(int)
```

In [21]:
dataset['TRANSFERRED_DEATH'] = ((dataset['DEATHTIME'].notnull()) & 
                                (dataset['HOSPITAL_EXPIRE_FLAG'] == 1) & 
                                ((dataset['DEATHTIME'] < dataset['INTIME']) | 
                                 (dataset['DEATHTIME'] > dataset['OUTTIME']))).astype(int)

In [22]:
np.sum(dataset['TRANSFERRED_DEATH']).astype(int)

2093

### 3. Discharged & Returned < 30

This is the wrong way. This labels the return visit instead of discharged visit

```python
sort admissions by SUBJECT_ID, ADMITTIME
shifted = admissions.shift(1)
shifted['DISCHTIME'] = pd.to_datetime(shifted['DISCHTIME'], errors='coerce')
dataset['DISCHARGED_RETURNED'] = (shifted['SUBJECT_ID'] == admissions['SUBJECT_ID']) & (admissions['ADMITTIME'] - shifted['DISCHTIME'] <= np.timedelta64(30, 'D')) 
```

In [23]:
dataset.sort_values(by=['SUBJECT_ID', 'DISCHTIME'], ignore_index=True, inplace=True)
shifted = dataset.shift(-1)
dataset['DISCHARGED_RETURNED'] = ((shifted['SUBJECT_ID'] == dataset['SUBJECT_ID']) & 
                                  (shifted['HADM_ID'] != dataset['HADM_ID']) & 
                                  (shifted['INTIME'] - dataset['DISCHTIME'] < np.timedelta64(30, 'D'))).astype(int)

In [24]:
np.sum(dataset['DISCHARGED_RETURNED']).astype(int)

3001

### 4. Discharged & Died < 30 (merge patients, addmisions)

Need last discharged visit HADM_ID

```python
dataset['DISCHARGED_DEATH'] = patients['EXPIRE_FLAG'] & not admissions['HOSPITAL_EXPIRE_FLAG'] & (patients['DOD'] - admissions['DISCHTIME'] <= np.timedelta64(30, 'D')) 
```

In [25]:
# dataset.sort_values(by=['SUBJECT_ID', 'DISCHTIME'], ignore_index=True, inplace=True)
dataset['DISCHARGED_DEATH'] = ((dataset['EXPIRE_FLAG']==1) & (dataset['HOSPITAL_EXPIRE_FLAG']==0) & 
                               (dataset['DOD'].notnull()) & 
                               (dataset['DOD'] - dataset['DISCHTIME'] <= np.timedelta64(30, 'D')) ).astype(int)

In [26]:
np.sum(dataset['DISCHARGED_DEATH']).astype(int)

2476

## Age

In [27]:
dataset['AGE'] = ((pd.to_datetime(dataset['ADMITTIME']).dt.date - pd.to_datetime(dataset['DOB']).dt.date) 
                  / np.timedelta64(1, 'Y')).astype(int)

dataset = dataset[(dataset['TRANSFERRED_RETURNED'] == 1) | 
                  (dataset['TRANSFERRED_DEATH'] == 1) | 
                  (dataset['DISCHARGED_RETURNED'] == 1) | 
                  (dataset['DISCHARGED_DEATH'] == 1) |
                  (dataset['AGE'] >= 18)]

## Goal: Find cases of readmission

1. Transferred & returned               = 3555
2. Transferred & died                   = 1974
3. Discharged & returned < 30 days      = 3205
4. Discharged & died < 30 days          = 2556

In [28]:
print(np.sum(dataset['TRANSFERRED_RETURNED']==1))
print(np.sum(dataset['TRANSFERRED_DEATH']==1))
print(np.sum(dataset['DISCHARGED_RETURNED']==1))
print(np.sum(dataset['DISCHARGED_DEATH']==1))

3746
2093
3001
2476
