# Prediction of Hospital Readmissions

This notebook's goal is to reproduce the claims from Zebin and Chaussalet's paper, 'Design and implementation of a deep recurrent model for prediction of readadmission in urgent care using electronic health records'.



## Claims

When predicting ICU readmissions:

1. LSTM+CNN produced higher accuracy than logistic regression, random forest, and SVM.
2. LSTM+CNN produced higher precision than logistic regression, random forest, and SVM.
3. LSTM+CNN produced higher recall than logistic regression and SVM

In [1]:
import numpy as np
import pandas as pd

In [2]:
patients = pd.read_csv('./mimic-iii/PATIENTS.csv')
patients.head(1)

Unnamed: 0,ROW_ID,SUBJECT_ID,GENDER,DOB,DOD,DOD_HOSP,DOD_SSN,EXPIRE_FLAG
0,234,249,F,2075-03-13 00:00:00,,,,0


In [3]:
admissions = pd.read_csv('./mimic-iii/ADMISSIONS.csv')
admissions.head(1)

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DEATHTIME,ADMISSION_TYPE,ADMISSION_LOCATION,DISCHARGE_LOCATION,INSURANCE,LANGUAGE,RELIGION,MARITAL_STATUS,ETHNICITY,EDREGTIME,EDOUTTIME,DIAGNOSIS,HOSPITAL_EXPIRE_FLAG,HAS_CHARTEVENTS_DATA
0,21,22,165315,2196-04-09 12:26:00,2196-04-10 15:54:00,,EMERGENCY,EMERGENCY ROOM ADMIT,DISC-TRAN CANCER/CHLDRN H,Private,,UNOBTAINABLE,MARRIED,WHITE,2196-04-09 10:06:00,2196-04-09 13:24:00,BENZODIAZEPINE OVERDOSE,0,1


In [4]:
transfers = pd.read_csv('./mimic-iii/TRANSFERS.csv')
transfers.head(1)

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ICUSTAY_ID,DBSOURCE,EVENTTYPE,PREV_CAREUNIT,CURR_CAREUNIT,PREV_WARDID,CURR_WARDID,INTIME,OUTTIME,LOS
0,657,111,192123,254245.0,carevue,transfer,CCU,MICU,7.0,23.0,2142-04-29 15:27:11,2142-05-04 20:38:33,125.19


In [5]:
icustays = pd.read_csv('./mimic-iii/ICUSTAYS.csv')
icustays.head(1)


Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ICUSTAY_ID,DBSOURCE,FIRST_CAREUNIT,LAST_CAREUNIT,FIRST_WARDID,LAST_WARDID,INTIME,OUTTIME,LOS
0,365,268,110404,280836,carevue,MICU,MICU,52,52,2198-02-14 23:27:38,2198-02-18 05:26:11,3.249


In [6]:
# print(len(icustays.groupby(['HADM_ID'])))
# test = (icustays.groupby(['HADM_ID']).size() > 1).reset_index()
# test[test[0]==True]

## Example

Patient with `SUBJECT_ID = 250`

In [7]:
# patients[patients['SUBJECT_ID'] == 250]

In [8]:
# admissions[admissions['SUBJECT_ID'] == 250]

In [9]:
# transfers[transfers['SUBJECT_ID'] == 250]

## Create Dataset

SUBJECT_ID exmaples  291, 283, 250

In [10]:
patients['DOB'] = pd.to_datetime(patients['DOB'], errors='coerce')
patients['DOD'] = pd.to_datetime(patients['DOD'], errors='coerce')
patients['DOD_HOSP'] = pd.to_datetime(patients['DOD_HOSP'], errors='coerce')
patients['DOD_SSN'] = pd.to_datetime(patients['DOD_SSN'], errors='coerce')
patients = patients.drop('ROW_ID', axis=1)

In [11]:
admissions['ADMITTIME'] = pd.to_datetime(admissions['ADMITTIME'], errors='coerce')
admissions['DISCHTIME'] = pd.to_datetime(admissions['DISCHTIME'], errors='coerce')
admissions['DEATHTIME'] = pd.to_datetime(admissions['DEATHTIME'], errors='coerce')
admissions = admissions.drop('ROW_ID', axis=1)

In [12]:
transfers['INTIME'] = pd.to_datetime(transfers['INTIME'], errors='coerce')
transfers['OUTTIME'] = pd.to_datetime(transfers['OUTTIME'], errors='coerce')
transfers = transfers.drop('ROW_ID', axis=1)

In [13]:
icustays['INTIME'] = pd.to_datetime(icustays['INTIME'], errors='coerce')
icustays['OUTTIME'] = pd.to_datetime(icustays['OUTTIME'], errors='coerce')
icustays = icustays.drop('ROW_ID', axis=1)

## Goal: Find cases of readmission

1. Transferred & returned               = 3555
2. Transferred & died                   = 1974
3. Discharged & returned < 30 days      = 3205
4. Discharged & died < 30 days          = 2556


In [14]:
admissions_type = admissions['ADMISSION_TYPE'].unique()
admissions_type

array(['EMERGENCY', 'ELECTIVE', 'NEWBORN', 'URGENT'], dtype=object)

### Define death

```python
patients['EXPIRE_FLAG'] == 1
```

### Define death in hospital

```python
admissions['HOSPITAL_EXPIRE_FLAG'] == 1
```

### Define death in ICU (merge addmissions & icustays)

```python
dataset['DEATH_IN_ICU'] = (admissions['DEATHTIME'] > icustays['INTIME']) & (admissions['DEATHTIME'] < icustays['OUTTIME']).astype(int)
```

In [15]:
# There will be duplicate SUBJECT_ID
dataset = pd.merge(admissions[['SUBJECT_ID', 'HADM_ID', 'ADMITTIME', 'DISCHTIME', 'DEATHTIME','HOSPITAL_EXPIRE_FLAG']], icustays[['SUBJECT_ID', 'HADM_ID', 'INTIME', 'OUTTIME']], 'inner',on=['SUBJECT_ID', 'HADM_ID'])
# HADM_ID where patient died in ICU
dataset['DEATH_IN_ICU'] = ((dataset['DEATHTIME'] > dataset['INTIME']) & (dataset['DEATHTIME'] < dataset['OUTTIME'])).astype(int)

In [16]:
np.sum(icustays.duplicated(subset=['HADM_ID']).astype(int))

3746

### 1. Transferred & Returned

```python
dataset['TRANSFERRED_RETURNED'] = icustays.duplicated(subset=['HADM_ID']).astype(int)
```

In [17]:
# HADM_ID where patient has multiple ICUSTAY_ID
tnr = icustays[['SUBJECT_ID', 'HADM_ID', 'ICUSTAY_ID', 'INTIME']]
tnr = tnr.sort_values(['HADM_ID','INTIME'])
tnr['TRANSFERRED_RETURNED'] = (tnr.duplicated(subset=['HADM_ID'])).astype(int)
print(tnr[tnr.duplicated(subset=['HADM_ID'],keep=False)].head(10))
tnr = tnr[tnr['TRANSFERRED_RETURNED'] == 1]


       SUBJECT_ID  HADM_ID  ICUSTAY_ID              INTIME   
37898       29971   100021      252772 2109-08-21 20:02:48  \
37899       29971   100021      220579 2109-08-28 10:21:05   
48197       58947   100037      270105 2183-03-23 18:22:04   
48198       58947   100037      221136 2183-04-20 13:16:43   
1423         1549   100055      215944 2150-07-06 12:43:34   
1424         1549   100055      245659 2150-07-08 12:49:43   
54080       73946   100104      254601 2201-06-21 19:08:30   
54378       73946   100104      240176 2201-06-26 23:55:21   
4561         3278   100130      295574 2109-07-21 00:47:00   
4562         3278   100130      284185 2109-07-28 16:35:00   

       TRANSFERRED_RETURNED  
37898                     0  
37899                     1  
48197                     0  
48198                     1  
1423                      0  
1424                      1  
54080                     0  
54378                     1  
4561                      0  
4562             

### 2. Transferred & Died

```python
dataset['TRANSFERRED_DEATH'] = admissions['HOSPITAL_EXPIRE_FLAG'] and not dataset['DEATH_IN_ICU']
```

In [18]:
# HADM_ID where patient died outside ICU, but still in hospital
tnd = dataset.copy()
tnd['TRANSFERRED_DEATH'] = (tnd['HOSPITAL_EXPIRE_FLAG'] & ~tnd['DEATH_IN_ICU']).astype(int)

### 3. Discharged & Returned < 30

This is the wrong way. This labels the return visit instead of discharged visit

```python
sort admissions by SUBJECT_ID, ADMITTIME
shifted = admissions.shift(1)
shifted['DISCHTIME'] = pd.to_datetime(shifted['DISCHTIME'], errors='coerce')
dataset['DISCHARGED_RETURNED'] = (shifted['SUBJECT_ID'] == admissions['SUBJECT_ID']) & (admissions['ADMITTIME'] - shifted['DISCHTIME'] <= np.timedelta64(30, 'D')) 
```

In [19]:
dnr = admissions[['SUBJECT_ID', 'HADM_ID', 'ADMITTIME', 'DISCHTIME']].copy()
dnr.sort_values(by=['SUBJECT_ID', 'DISCHTIME'], ignore_index=True, inplace=True)
shifted = (dnr.copy()).shift(-1)
dnr['ADMITTIME'] = pd.to_datetime(dnr['ADMITTIME'], errors='coerce')
dnr['DISCHTIME'] = pd.to_datetime(dnr['DISCHTIME'], errors='coerce')
shifted['ADMITTIME'] = pd.to_datetime(shifted['ADMITTIME'], errors='coerce')
shifted['DISCHTIME'] = pd.to_datetime(shifted['DISCHTIME'], errors='coerce')
dnr['DISCHARGED_RETURNED'] = ((shifted['SUBJECT_ID'] == dnr['SUBJECT_ID']) & (shifted['HADM_ID'] != dnr['HADM_ID']) & (shifted['ADMITTIME'] - dnr['DISCHTIME'] <= np.timedelta64(30, 'D'))).astype(int)

In [20]:
dnr[dnr['DISCHARGED_RETURNED']==1].head(10)

Unnamed: 0,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DISCHARGED_RETURNED
35,36,182104,2131-04-30 07:15:00,2131-05-08 14:00:00,1
68,68,170467,2173-12-15 16:16:00,2174-01-03 18:30:00,1
105,103,130744,2144-08-12 17:37:00,2144-08-20 11:15:00,1
108,105,161160,2189-01-28 16:57:00,2189-02-02 16:40:00,1
117,109,164029,2140-01-19 13:25:00,2140-01-21 13:25:00,1
119,109,193281,2140-04-07 19:51:00,2140-05-02 16:30:00,1
123,109,170149,2141-05-24 14:47:00,2141-06-06 19:55:00,1
125,109,131345,2141-09-05 20:04:00,2141-09-08 18:30:00,1
126,109,139061,2141-09-11 10:12:00,2141-09-14 20:00:00,1
127,109,172335,2141-09-18 10:32:00,2141-09-24 13:53:00,1


In [21]:
shifted[dnr['DISCHARGED_RETURNED']==1].head(10)

Unnamed: 0,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME
35,36.0,122659.0,2131-05-12 19:49:00,2131-05-25 13:30:00
68,68.0,108329.0,2174-01-04 22:21:00,2174-01-19 11:30:00
105,103.0,133550.0,2144-08-30 23:09:00,2144-09-01 14:28:00
108,105.0,128744.0,2189-02-21 01:45:00,2189-02-25 10:05:00
117,109.0,108375.0,2140-02-02 02:13:00,2140-02-02 16:25:00
119,109.0,175347.0,2140-05-17 14:27:00,2140-05-20 19:50:00
123,109.0,147469.0,2141-06-11 10:17:00,2141-06-17 16:29:00
125,109.0,139061.0,2141-09-11 10:12:00,2141-09-14 20:00:00
126,109.0,172335.0,2141-09-18 10:32:00,2141-09-24 13:53:00
127,109.0,126055.0,2141-10-13 23:10:00,2141-11-03 18:45:00


### 4. Discharged & Died < 30 (merge patients, addmisions)

Need last discharged visit HADM_ID

```python
dataset['DISCHARGED_DEATH'] = patients['EXPIRE_FLAG'] & not admissions['HOSPITAL_EXPIRE_FLAG'] & (patients['DOD'] - admissions['DISCHTIME'] <= np.timedelta64(30, 'D')) 
```

In [22]:
dnd = pd.merge(patients[['SUBJECT_ID', 'DOD', 'EXPIRE_FLAG']],admissions[['SUBJECT_ID', 'HADM_ID','DISCHTIME','HOSPITAL_EXPIRE_FLAG']],'inner',on='SUBJECT_ID')
dnd['DISCHARGED_DEATH'] = ((dnd['EXPIRE_FLAG']==1) & (dnd['HOSPITAL_EXPIRE_FLAG']==0) & (dnd['DOD'] - dnd['DISCHTIME'] <= np.timedelta64(30, 'D')) ).astype(int)

In [23]:
admissions.head(1)

Unnamed: 0,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DEATHTIME,ADMISSION_TYPE,ADMISSION_LOCATION,DISCHARGE_LOCATION,INSURANCE,LANGUAGE,RELIGION,MARITAL_STATUS,ETHNICITY,EDREGTIME,EDOUTTIME,DIAGNOSIS,HOSPITAL_EXPIRE_FLAG,HAS_CHARTEVENTS_DATA
0,22,165315,2196-04-09 12:26:00,2196-04-10 15:54:00,NaT,EMERGENCY,EMERGENCY ROOM ADMIT,DISC-TRAN CANCER/CHLDRN H,Private,,UNOBTAINABLE,MARRIED,WHITE,2196-04-09 10:06:00,2196-04-09 13:24:00,BENZODIAZEPINE OVERDOSE,0,1


## merge

In [24]:
dataset4 = admissions[['SUBJECT_ID', 'HADM_ID']].merge(tnr[['SUBJECT_ID', 'HADM_ID', 'ICUSTAY_ID', 'TRANSFERRED_RETURNED']], how='left', on=['SUBJECT_ID', 'HADM_ID'])
dataset4 = dataset4.merge(tnd[['SUBJECT_ID', 'HADM_ID','TRANSFERRED_DEATH']], how='left', on=['SUBJECT_ID', 'HADM_ID'])
dataset4 = dataset4.merge(dnr[['SUBJECT_ID', 'HADM_ID','DISCHARGED_RETURNED']], how='left', on=['SUBJECT_ID', 'HADM_ID'])
dataset4 = dataset4.merge(dnd[['SUBJECT_ID', 'DISCHARGED_DEATH']], how='left', on=['SUBJECT_ID'])
dataset4 = dataset4[(dataset4['TRANSFERRED_RETURNED'] == 1) | (dataset4['TRANSFERRED_DEATH'] == 1) | (dataset4['DISCHARGED_RETURNED'] == 1) | (dataset4['DISCHARGED_DEATH'] == 1)]
dataset4 = dataset4.drop_duplicates(subset='HADM_ID')

In [25]:
dataset4 = dataset4.drop_duplicates(subset='HADM_ID')
dataset4.loc[dataset4['TRANSFERRED_RETURNED'].isnull(), 'TRANSFERRED_RETURNED'] = 0
dataset4['TRANSFERRED_RETURNED'] = dataset4['TRANSFERRED_RETURNED'].astype(int)
dataset4.loc[dataset4['TRANSFERRED_DEATH'].isnull(), 'TRANSFERRED_DEATH'] = 0
dataset4['TRANSFERRED_DEATH'] = dataset4['TRANSFERRED_DEATH'].astype(int)
dataset4.head()

Unnamed: 0,SUBJECT_ID,HADM_ID,ICUSTAY_ID,TRANSFERRED_RETURNED,TRANSFERRED_DEATH,DISCHARGED_RETURNED,DISCHARGED_DEATH
19,36,182104,,0,0,1,0
31,41,101757,237024.0,1,0,0,0
38,357,145674,,0,0,1,0
58,358,110872,244658.0,1,0,0,0
64,361,148959,257948.0,1,0,0,0


## Age

In [26]:
remove_age = pd.merge(admissions[['SUBJECT_ID', 'HADM_ID', 'ADMITTIME']], patients[['SUBJECT_ID', 'DOB']], how='inner', on='SUBJECT_ID')

In [27]:
remove_age['AGE'] = ((pd.to_datetime(remove_age['ADMITTIME']).dt.date - pd.to_datetime(remove_age['DOB']).dt.date) / np.timedelta64(1, 'Y')).astype(int)

In [28]:
remove_age['UNDER_18'] = (remove_age['AGE'] < 18).astype(int)


In [29]:
dataset4 = pd.merge(dataset4, remove_age[['HADM_ID', 'AGE']], 'inner',on='HADM_ID')

In [30]:
dataset4 = dataset4[dataset4['AGE'] >= 18]

## Goal: Find cases of readmission

1. Transferred & returned               = 3555
2. Transferred & died                   = 1974
3. Discharged & returned < 30 days      = 3205
4. Discharged & died < 30 days          = 2556

In [31]:
print(np.sum(dataset4['TRANSFERRED_RETURNED']))
print(np.sum(dataset4['TRANSFERRED_DEATH']))
print(np.sum(dataset4['DISCHARGED_RETURNED']))
print(np.sum(dataset4['DISCHARGED_DEATH']))

3151
2024
3159
3171


In [32]:
admissions_subset = admissions[['SUBJECT_ID', 'HADM_ID', 'ADMITTIME', 'DISCHTIME', 'DEATHTIME']]
patients_subset = patients[['SUBJECT_ID','GENDER','DOB','DOD','EXPIRE_FLAG']]
dataset = pd.merge(admissions_subset, patients_subset, how='left', on='SUBJECT_ID')

In [33]:
# Remove under 18
dataset['AGE'] = ((pd.to_datetime(dataset['ADMITTIME']).dt.date - pd.to_datetime(dataset['DOB']).dt.date) / np.timedelta64(1, 'Y')).astype(int)
dataset = dataset[dataset['AGE'] >= 18]

In [34]:
icustays_subset = icustays[['SUBJECT_ID', 'HADM_ID', 'ICUSTAY_ID', 'INTIME', 'OUTTIME', 'LOS']]
dataset = pd.merge(dataset, icustays_subset, how='left', on=['SUBJECT_ID','HADM_ID'])
dataset = dataset.sort_values(by=['SUBJECT_ID', 'DISCHTIME'], ignore_index=True)
#dataset = dataset.reindex()

In [35]:
dataset.head(3)

Unnamed: 0,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DEATHTIME,GENDER,DOB,DOD,EXPIRE_FLAG,AGE,ICUSTAY_ID,INTIME,OUTTIME,LOS
0,3,145834,2101-10-20 19:08:00,2101-10-31 13:58:00,NaT,M,2025-04-11,2102-06-14,1,76,211552.0,2101-10-20 19:10:11,2101-10-26 20:43:09,6.0646
1,4,185777,2191-03-16 00:28:00,2191-03-23 18:41:00,NaT,F,2143-05-12,NaT,0,47,294638.0,2191-03-16 00:29:31,2191-03-17 16:46:31,1.6785
2,6,107064,2175-05-30 07:15:00,2175-06-15 16:00:00,NaT,F,2109-06-21,NaT,0,65,228232.0,2175-05-30 21:30:54,2175-06-03 13:39:54,3.6729


In [36]:
# cases of returning to ICU
dataset['RETURNED_AFTER_TRANSFER'] = dataset.duplicated(subset=['HADM_ID']).astype(int)

In [37]:
shifted = dataset.shift(1)
dataset['RETURNED_AFTER_DISCHARGE'] = ((shifted['SUBJECT_ID'] == dataset['SUBJECT_ID']) & (dataset['DISCHTIME'] - shifted['ADMITTIME'] <= np.timedelta64(30, 'D'))).astype(int)

In [38]:
dataset['DEATH_AFTER_TRANSFER'] = ((~dataset['DEATHTIME'].isnull())  & ((dataset['DEATHTIME'] > dataset['OUTTIME']) | (dataset['DEATHTIME'] < dataset['INTIME']))).astype(int)


In [39]:
dataset['DEATH_AFTER_DISCHARGE'] = ((~dataset['DOD'].isnull())  & (dataset['DOD'] - dataset['DISCHTIME'] <= np.timedelta64(30, 'D')) & (dataset['DOD'] - dataset['DISCHTIME'] > np.timedelta64(0, 'D'))).astype(int)


In [40]:
print('Returned after transfer:\t', np.sum(dataset['RETURNED_AFTER_TRANSFER']))
print('Returned after discharge:\t', np.sum(dataset['RETURNED_AFTER_DISCHARGE']))
print('Died after transfer:\t\t', np.sum(dataset['DEATH_AFTER_TRANSFER']))
print('Died after discharge:\t\t', np.sum(dataset['DEATH_AFTER_DISCHARGE']))


Returned after transfer:	 3637
Returned after discharge:	 4020
Died after transfer:		 2090
Died after discharge:		 2436


In [41]:
print('ICU stays:\t', len(dataset['HADM_ID'].unique()))
print('Patients:\t', len(dataset['SUBJECT_ID'].unique()))

ICU stays:	 50766
Patients:	 38552


In [42]:
dataset = dataset[(dataset['RETURNED_AFTER_TRANSFER']==1)|(dataset['RETURNED_AFTER_DISCHARGE']==1)|(dataset['RETURNED_AFTER_DISCHARGE']==1)|(dataset['DEATH_AFTER_DISCHARGE']==1)]
dataset.head()


Unnamed: 0,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DEATHTIME,GENDER,DOB,DOD,EXPIRE_FLAG,AGE,ICUSTAY_ID,INTIME,OUTTIME,LOS,RETURNED_AFTER_TRANSFER,RETURNED_AFTER_DISCHARGE,DEATH_AFTER_TRANSFER,DEATH_AFTER_DISCHARGE
29,36,122659,2131-05-12 19:49:00,2131-05-25 13:30:00,NaT,M,2061-08-17,NaT,0,69,211200.0,2131-05-16 23:18:26,2131-05-23 19:56:11,6.8595,0,1,0,0
34,41,101757,2132-12-31 10:30:00,2133-01-27 15:45:00,NaT,M,2076-05-13,2133-09-30,1,56,237024.0,2133-01-09 12:18:30,2133-01-12 15:51:03,3.1476,1,1,0,0
55,68,170467,2173-12-15 16:16:00,2174-01-03 18:30:00,NaT,F,2132-02-29,2174-02-11,1,41,225771.0,2173-12-31 01:52:46,2173-12-31 21:33:34,0.82,1,1,0,0
56,68,108329,2174-01-04 22:21:00,2174-01-19 11:30:00,NaT,F,2132-02-29,2174-02-11,1,41,272667.0,2174-01-08 13:12:06,2174-01-14 22:45:42,6.3983,0,0,0,1
64,81,175016,2192-01-09 18:50:00,2192-01-11 13:00:00,NaT,M,2106-12-20,2192-01-12,1,85,222874.0,2192-01-09 18:50:47,2192-01-11 07:10:34,1.5137,0,0,0,1
