# Prediction of Hospital Readmissions

This notebook's goal is to reproduce the claims from Zebin and Chaussalet's paper, 'Design and implementation of a deep recurrent model for prediction of readadmission in urgent care using electronic health records'.



## Claims

When predicting ICU readmissions:

1. LSTM+CNN produced higher accuracy than logistic regression, random forest, and SVM.
2. LSTM+CNN produced higher precision than logistic regression, random forest, and SVM.
3. LSTM+CNN produced higher recall than logistic regression and SVM

In [1]:
import numpy as np
import pandas as pd

In [2]:
patients = pd.read_csv('./mimic-iii/PATIENTS.csv')
patients.head(1)

Unnamed: 0,ROW_ID,SUBJECT_ID,GENDER,DOB,DOD,DOD_HOSP,DOD_SSN,EXPIRE_FLAG
0,234,249,F,2075-03-13 00:00:00,,,,0


In [3]:
admissions = pd.read_csv('./mimic-iii/ADMISSIONS.csv')
print(admissions.shape)
admissions.head(1)

(58976, 19)


Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DEATHTIME,ADMISSION_TYPE,ADMISSION_LOCATION,DISCHARGE_LOCATION,INSURANCE,LANGUAGE,RELIGION,MARITAL_STATUS,ETHNICITY,EDREGTIME,EDOUTTIME,DIAGNOSIS,HOSPITAL_EXPIRE_FLAG,HAS_CHARTEVENTS_DATA
0,21,22,165315,2196-04-09 12:26:00,2196-04-10 15:54:00,,EMERGENCY,EMERGENCY ROOM ADMIT,DISC-TRAN CANCER/CHLDRN H,Private,,UNOBTAINABLE,MARRIED,WHITE,2196-04-09 10:06:00,2196-04-09 13:24:00,BENZODIAZEPINE OVERDOSE,0,1


In [4]:
transfers = pd.read_csv('./mimic-iii/TRANSFERS.csv')
transfers.head(1)

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ICUSTAY_ID,DBSOURCE,EVENTTYPE,PREV_CAREUNIT,CURR_CAREUNIT,PREV_WARDID,CURR_WARDID,INTIME,OUTTIME,LOS
0,657,111,192123,254245.0,carevue,transfer,CCU,MICU,7.0,23.0,2142-04-29 15:27:11,2142-05-04 20:38:33,125.19


In [5]:
icustays = pd.read_csv('./mimic-iii/ICUSTAYS.csv')
icustays.head(1)


Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ICUSTAY_ID,DBSOURCE,FIRST_CAREUNIT,LAST_CAREUNIT,FIRST_WARDID,LAST_WARDID,INTIME,OUTTIME,LOS
0,365,268,110404,280836,carevue,MICU,MICU,52,52,2198-02-14 23:27:38,2198-02-18 05:26:11,3.249


In [6]:
# print(len(icustays.groupby(['HADM_ID'])))
# test = (icustays.groupby(['HADM_ID']).size() > 1).reset_index()
# test[test[0]==True]

## NOTE: Organ Donors

There are some cases where a patient as multiple inconsistent death times in duplicated records. Either the patient has 2 different death times or the admit and discharge times do not align with death time.
It appears that the organ donor collection is processed under a different HADM_ID number and the entries are inconsistent.

In [7]:
# addmissions where DEATH IN HOSPITAL
organ = admissions[admissions['HOSPITAL_EXPIRE_FLAG']==1].reset_index(drop=True)

organ[organ.duplicated('SUBJECT_ID',keep=False)].sort_values('SUBJECT_ID').head(2)

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DEATHTIME,ADMISSION_TYPE,ADMISSION_LOCATION,DISCHARGE_LOCATION,INSURANCE,LANGUAGE,RELIGION,MARITAL_STATUS,ETHNICITY,EDREGTIME,EDOUTTIME,DIAGNOSIS,HOSPITAL_EXPIRE_FLAG,HAS_CHARTEVENTS_DATA
43,533,417,178013,2177-03-22 22:24:00,2177-03-23 07:20:00,2177-03-23 07:20:00,EMERGENCY,EMERGENCY ROOM ADMIT,DEAD/EXPIRED,Private,,UNOBTAINABLE,MARRIED,WHITE,2177-03-22 22:01:00,2177-03-23 00:20:00,SUBARACHNOID HEMORRHAGE,1,1
44,534,417,102633,2177-03-23 16:17:00,2177-03-23 07:20:00,2177-03-23 07:20:00,URGENT,PHYS REFERRAL/NORMAL DELI,DEAD/EXPIRED,Private,,UNOBTAINABLE,MARRIED,WHITE,,,ORGAN DONOR ACCOUNT,1,1


## Example

Patient with `SUBJECT_ID = 250`

In [8]:
# patients[patients['SUBJECT_ID'] == 250]

In [9]:
# admissions[admissions['SUBJECT_ID'] == 250]

In [10]:
# transfers[transfers['SUBJECT_ID'] == 250]

## Create Dataset

SUBJECT_ID exmaples  291, 283, 250

In [11]:
patients['DOB'] = pd.to_datetime(patients['DOB'], errors='coerce')
patients['DOD'] = pd.to_datetime(patients['DOD'], errors='coerce')
patients['DOD_HOSP'] = pd.to_datetime(patients['DOD_HOSP'], errors='coerce')
patients['DOD_SSN'] = pd.to_datetime(patients['DOD_SSN'], errors='coerce')
patients = patients.drop('ROW_ID', axis=1)

In [12]:
admissions['ADMITTIME'] = pd.to_datetime(admissions['ADMITTIME'], errors='coerce')
admissions['DISCHTIME'] = pd.to_datetime(admissions['DISCHTIME'], errors='coerce')
admissions['DEATHTIME'] = pd.to_datetime(admissions['DEATHTIME'], errors='coerce')
admissions = admissions.drop('ROW_ID', axis=1)

In [13]:
transfers['INTIME'] = pd.to_datetime(transfers['INTIME'], errors='coerce')
transfers['OUTTIME'] = pd.to_datetime(transfers['OUTTIME'], errors='coerce')
transfers = transfers.drop('ROW_ID', axis=1)

In [14]:
icustays['INTIME'] = pd.to_datetime(icustays['INTIME'], errors='coerce')
icustays['OUTTIME'] = pd.to_datetime(icustays['OUTTIME'], errors='coerce')
icustays = icustays.drop('ROW_ID', axis=1)

## Goal: Find cases of readmission

1. Transferred & returned               = 3555
2. Transferred & died                   = 1974
3. Discharged & returned < 30 days      = 3205
4. Discharged & died < 30 days          = 2556


In [15]:
admissions_type = admissions['ADMISSION_TYPE'].unique()
admissions_type

array(['EMERGENCY', 'ELECTIVE', 'NEWBORN', 'URGENT'], dtype=object)

### 1. Transferred & Returned

```python
dataset['TRANSFERRED_RETURNED'] = icustays.duplicated(subset=['HADM_ID']).astype(int)
```

In [16]:
np.sum(icustays.duplicated(subset=['HADM_ID']).astype(int))

3746

Paper's code

In [17]:
qwer = icustays.groupby(['HADM_ID']).size().reset_index(name='COUNTS')
qwer = qwer[qwer['COUNTS'] > 1]
np.sum(qwer['COUNTS'] ) - qwer.shape[0]

3746

my code

In [18]:
# HADM_ID where patient has multiple ICUSTAY_ID
tnr = icustays[['SUBJECT_ID', 'HADM_ID', 'ICUSTAY_ID', 'INTIME']]
tnr = tnr.sort_values(['HADM_ID','INTIME'])
# tnr['TRANSFERRED'] = (tnr.duplicated(subset=['HADM_ID'], keep=False)).astype(int)
# tnr = tnr[tnr['TRANSFERRED']==1]

#print(tnr[tnr.duplicated(subset=['HADM_ID'],keep=False)].head(10))
tnr['TRANSFERRED_RETURNED'] = tnr.duplicated(subset=['HADM_ID'], keep='last').astype(int)
tnr = tnr[tnr['TRANSFERRED_RETURNED'] == 1].reset_index(drop=True)
print(tnr.shape)
tnr.head()

(3746, 5)


Unnamed: 0,SUBJECT_ID,HADM_ID,ICUSTAY_ID,INTIME,TRANSFERRED_RETURNED
0,29971,100021,252772,2109-08-21 20:02:48,1
1,58947,100037,270105,2183-03-23 18:22:04,1
2,1549,100055,215944,2150-07-06 12:43:34,1
3,73946,100104,254601,2201-06-21 19:08:30,1
4,3278,100130,295574,2109-07-21 00:47:00,1


### 2. Transferred & Died

```python
dataset['TRANSFERRED_DEATH'] = admissions['HOSPITAL_EXPIRE_FLAG'] and not dataset['DEATH_IN_ICU']
```

### Define death

```python
patients['EXPIRE_FLAG'] == 1
```

### Define death in hospital

```python
admissions['HOSPITAL_EXPIRE_FLAG'] == 1
```

### Define death in ICU (merge addmissions & icustays)

```python
dataset['DEATH_IN_ICU'] = (admissions['DEATHTIME'] > icustays['INTIME']) & (admissions['DEATHTIME'] < icustays['OUTTIME']).astype(int)
```

In [19]:
# (tnr.duplicated(subset=['HADM_ID'], keep='last')).astype(int)
icustays.shape

(61532, 11)

In [20]:
icustays.sort_values('INTIME').drop_duplicates('HADM_ID',keep='last').shape

(57786, 11)

In [21]:
# addmissions where DEATH IN HOSPITAL
temp = admissions[admissions['HOSPITAL_EXPIRE_FLAG']==1].reset_index(drop=True)
temp = temp[(temp['DIAGNOSIS'] != 'ORGAN DONOR ACCOUNT') & (temp['DIAGNOSIS'] != 'DONOR ACCOUNT') & (temp['DIAGNOSIS'] != 'ORGAN DONOR') & (temp['DIAGNOSIS'].notnull())].reset_index(drop=True)
#temp[temp['DIAGNOSIS'] == 'ORGAN DONOR ACCOUNT'].shape
#temp[temp['DIAGNOSIS'].isnull()] # 1 of these
#temp[temp['DIAGNOSIS'] != 'ORGAN DONOR ACCOUNT'].shape
# 'DONOR ACCOUNT'	
# 'ORGAN DONOR ACCOUNT'
# 'ORGAN DONOR'

# a = temp[temp.duplicated('SUBJECT_ID',keep=False)].sort_values('SUBJECT_ID')
# a[(a['DIAGNOSIS'] != 'ORGAN DONOR ACCOUNT')]
temp.shape

(5813, 18)

This should be transferred death, but my number doesn't match paper. It looks like the paper is creating duplicates.

Maybe not... not sure 

In [22]:
# tnd = pd.merge(temp, icustays.sort_values('INTIME').drop_duplicates('HADM_ID',keep='last'), 'left' ,on=['SUBJECT_ID', 'HADM_ID'])
tnd = pd.merge(temp, icustays.sort_values('INTIME'), 'left' ,on=['SUBJECT_ID', 'HADM_ID'])
tnd['TRANSFERRED_DEATH'] = ((tnd['DEATHTIME'].notnull()) & ((tnd['DEATHTIME'] < tnd['INTIME']) | (tnd['DEATHTIME'] > tnd['OUTTIME']))).astype(int)
tnd = tnd[['SUBJECT_ID', 'HADM_ID', 'ICUSTAY_ID', 'ADMITTIME', 'DISCHTIME', 'DEATHTIME','HOSPITAL_EXPIRE_FLAG', 'INTIME', 'OUTTIME', 'TRANSFERRED_DEATH']]
np.sum(tnd['TRANSFERRED_DEATH'])


2093

HOSPITAL_EXPIRE_FLAG compares to DEATHTIME within admit times

In [23]:
admissions[ (admissions['HOSPITAL_EXPIRE_FLAG']==0) & (admissions['DEATHTIME'].notnull()) & (admissions['DEATHTIME'] >= admissions['ADMITTIME']) & (admissions['DEATHTIME'] <= admissions['DISCHTIME'])]

Unnamed: 0,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DEATHTIME,ADMISSION_TYPE,ADMISSION_LOCATION,DISCHARGE_LOCATION,INSURANCE,LANGUAGE,RELIGION,MARITAL_STATUS,ETHNICITY,EDREGTIME,EDOUTTIME,DIAGNOSIS,HOSPITAL_EXPIRE_FLAG,HAS_CHARTEVENTS_DATA


This is the code that the paper used. I think there are some issues. But I need to just keep progressing.

In [24]:
asdf = pd.merge(admissions, icustays, 'inner' ,on=['SUBJECT_ID', 'HADM_ID'])
asdf['TRANSFERRED_DEATH'] = ((asdf['DEATHTIME'].notnull()) & ((asdf['DEATHTIME'] < asdf['INTIME']) | (asdf['DEATHTIME'] > asdf['OUTTIME']))).astype(int)
asdf = asdf[['SUBJECT_ID', 'HADM_ID', 'ADMITTIME', 'DISCHTIME', 'DEATHTIME','HOSPITAL_EXPIRE_FLAG', 'INTIME', 'OUTTIME', 'TRANSFERRED_DEATH']]
np.sum(asdf['TRANSFERRED_DEATH'])

2096

In [25]:
dups = asdf[asdf['TRANSFERRED_DEATH']==1].sort_values('SUBJECT_ID')
dups[dups.duplicated('HADM_ID',keep=False)].head(20)

Unnamed: 0,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DEATHTIME,HOSPITAL_EXPIRE_FLAG,INTIME,OUTTIME,TRANSFERRED_DEATH
485,188,132401,2161-11-01 17:48:00,2162-01-17 05:50:00,2162-01-17 05:50:00,1,2161-12-09 17:03:04,2161-12-17 21:04:31,1
486,188,132401,2161-11-01 17:48:00,2162-01-17 05:50:00,2162-01-17 05:50:00,1,2162-01-10 16:36:40,2162-01-16 18:35:07,1
383,377,139824,2168-03-04 13:56:00,2168-04-04 10:47:00,2168-04-04 10:47:00,1,2168-03-04 13:57:00,2168-03-15 16:33:00,1
384,377,139824,2168-03-04 13:56:00,2168-04-04 10:47:00,2168-04-04 10:47:00,1,2168-03-21 13:13:00,2168-03-23 16:27:00,1
385,377,139824,2168-03-04 13:56:00,2168-04-04 10:47:00,2168-04-04 10:47:00,1,2168-03-26 08:03:00,2168-04-03 15:03:00,1
427,408,173910,2188-10-27 21:25:00,2189-01-11 20:19:00,2189-01-11 20:19:00,1,2188-10-27 21:27:14,2188-10-30 19:54:32,1
428,408,173910,2188-10-27 21:25:00,2189-01-11 20:19:00,2189-01-11 20:19:00,1,2188-11-19 12:30:24,2188-11-26 11:26:04,1
1257,766,183370,2178-03-03 23:18:00,2178-03-27 04:41:00,2178-03-27 04:41:00,1,2178-03-03 23:19:45,2178-03-05 11:40:21,1
1258,766,183370,2178-03-03 23:18:00,2178-03-27 04:41:00,2178-03-27 04:41:00,1,2178-03-07 14:17:44,2178-03-12 13:48:25,1
966,834,153730,2166-06-17 03:00:00,2166-09-12 11:30:00,2166-09-12 11:30:00,1,2166-08-19 06:00:00,2166-08-31 18:55:00,1


OLD

In [26]:
# There will be duplicate SUBJECT_ID
# tnd = pd.merge(admissions[['SUBJECT_ID', 'HADM_ID', 'ADMITTIME', 'DISCHTIME', 'DEATHTIME','HOSPITAL_EXPIRE_FLAG']], icustays[['SUBJECT_ID', 'HADM_ID', 'INTIME', 'OUTTIME']], 'inner',on=['SUBJECT_ID', 'HADM_ID'])
# # HADM_ID where patient died in ICU
# tnd['DEATH_IN_ICU'] = ((tnd['DEATHTIME'].notnull()) & (tnd['DEATHTIME'] >= tnd['INTIME']) & (tnd['DEATHTIME'] <= tnd['OUTTIME'])).astype(int)
# tnd.head()

In [27]:
# HADM_ID where patient died outside ICU, but still in hospital

# tnd['TRANSFERRED_DEATH'] = ((tnd['HOSPITAL_EXPIRE_FLAG']==1) & (tnd['DEATH_IN_ICU']==0)).astype(int)
# tnd = (tnd[tnd['TRANSFERRED_DEATH'] == 1]).reset_index(drop=True)
# tnd.head()

### 3. Discharged & Returned < 30

This is the wrong way. This labels the return visit instead of discharged visit

```python
sort admissions by SUBJECT_ID, ADMITTIME
shifted = admissions.shift(1)
shifted['DISCHTIME'] = pd.to_datetime(shifted['DISCHTIME'], errors='coerce')
dataset['DISCHARGED_RETURNED'] = (shifted['SUBJECT_ID'] == admissions['SUBJECT_ID']) & (admissions['ADMITTIME'] - shifted['DISCHTIME'] <= np.timedelta64(30, 'D')) 
```

In [28]:
dnr = admissions[['SUBJECT_ID', 'HADM_ID', 'ADMITTIME', 'DISCHTIME']].copy()
dnr.sort_values(by=['SUBJECT_ID', 'DISCHTIME'], ignore_index=True, inplace=True)
dnr.head()

Unnamed: 0,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME
0,2,163353,2138-07-17 19:04:00,2138-07-21 15:48:00
1,3,145834,2101-10-20 19:08:00,2101-10-31 13:58:00
2,4,185777,2191-03-16 00:28:00,2191-03-23 18:41:00
3,5,178980,2103-02-02 04:31:00,2103-02-04 12:15:00
4,6,107064,2175-05-30 07:15:00,2175-06-15 16:00:00


In [29]:

shifted = dnr.shift(-1).drop(dnr.shape[0]-1)
shifted = shifted.astype({'SUBJECT_ID':'int64','HADM_ID':'int64','ADMITTIME':'datetime64[ns]','DISCHTIME':'datetime64[ns]'})
shifted.head()

Unnamed: 0,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME
0,3,145834,2101-10-20 19:08:00,2101-10-31 13:58:00
1,4,185777,2191-03-16 00:28:00,2191-03-23 18:41:00
2,5,178980,2103-02-02 04:31:00,2103-02-04 12:15:00
3,6,107064,2175-05-30 07:15:00,2175-06-15 16:00:00
4,7,118037,2121-05-23 15:05:00,2121-05-27 11:57:00


In [30]:

# dnr['ADMITTIME'] = pd.to_datetime(dnr['ADMITTIME'], errors='coerce')
# dnr['DISCHTIME'] = pd.to_datetime(dnr['DISCHTIME'], errors='coerce')
# shifted['ADMITTIME'] = pd.to_datetime(shifted['ADMITTIME'], errors='coerce')
# shifted['DISCHTIME'] = pd.to_datetime(shifted['DISCHTIME'], errors='coerce')
dnr = dnr.drop(dnr.shape[0]-1)
dnr['DISCHARGED_RETURNED'] = ((shifted['SUBJECT_ID'] == dnr['SUBJECT_ID']) & (shifted['HADM_ID'] != dnr['HADM_ID']) & (shifted['ADMITTIME'] - dnr['DISCHTIME'] < np.timedelta64(30, 'D'))).astype(int)
dnr = dnr[dnr['DISCHARGED_RETURNED'] == 1].reset_index(drop=True)
print(dnr.shape)
dnr.head()

(3390, 5)


Unnamed: 0,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DISCHARGED_RETURNED
0,36,182104,2131-04-30 07:15:00,2131-05-08 14:00:00,1
1,68,170467,2173-12-15 16:16:00,2174-01-03 18:30:00,1
2,103,130744,2144-08-12 17:37:00,2144-08-20 11:15:00,1
3,105,161160,2189-01-28 16:57:00,2189-02-02 16:40:00,1
4,109,164029,2140-01-19 13:25:00,2140-01-21 13:25:00,1


### 4. Discharged & Died < 30 (merge patients, addmisions)

Need last discharged visit HADM_ID

```python
dataset['DISCHARGED_DEATH'] = patients['EXPIRE_FLAG'] & not admissions['HOSPITAL_EXPIRE_FLAG'] & (patients['DOD'] - admissions['DISCHTIME'] <= np.timedelta64(30, 'D')) 
```

In [31]:


dnd = pd.merge(patients[['SUBJECT_ID', 'DOD', 'EXPIRE_FLAG']],admissions[['SUBJECT_ID', 'HADM_ID','DISCHTIME','HOSPITAL_EXPIRE_FLAG']],'inner',on='SUBJECT_ID')
dnd['DISCHARGED_DEATH'] = ((dnd['EXPIRE_FLAG']==1) & (dnd['HOSPITAL_EXPIRE_FLAG']==0) & (dnd['DOD'] - dnd['DISCHTIME'] <= np.timedelta64(30, 'D')) ).astype(int)
dnd = dnd[dnd['DISCHARGED_DEATH'] == 1].reset_index(drop=True)
print(dnd.shape)
dnd.head()

(2246, 7)


Unnamed: 0,SUBJECT_ID,DOD,EXPIRE_FLAG,HADM_ID,DISCHTIME,HOSPITAL_EXPIRE_FLAG,DISCHARGED_DEATH
0,668,2183-07-10,1,166245,2183-07-07 16:30:00,0,1
1,670,2161-02-15,1,176690,2161-02-13 15:45:00,0,1
2,700,2115-05-04,1,199309,2115-05-02 16:30:00,0,1
3,711,2185-05-26,1,158767,2185-05-16 17:10:00,0,1
4,735,2128-07-04,1,140547,2128-06-10 17:30:00,0,1


In [32]:
admissions.head(1)

Unnamed: 0,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DEATHTIME,ADMISSION_TYPE,ADMISSION_LOCATION,DISCHARGE_LOCATION,INSURANCE,LANGUAGE,RELIGION,MARITAL_STATUS,ETHNICITY,EDREGTIME,EDOUTTIME,DIAGNOSIS,HOSPITAL_EXPIRE_FLAG,HAS_CHARTEVENTS_DATA
0,22,165315,2196-04-09 12:26:00,2196-04-10 15:54:00,NaT,EMERGENCY,EMERGENCY ROOM ADMIT,DISC-TRAN CANCER/CHLDRN H,Private,,UNOBTAINABLE,MARRIED,WHITE,2196-04-09 10:06:00,2196-04-09 13:24:00,BENZODIAZEPINE OVERDOSE,0,1


## merge

In [33]:
left = pd.DataFrame(
    {
        "key": ["K0", "K1", "K2", "K3"],
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
    }
)

In [34]:
right = pd.DataFrame(
    {
        "key": ["K0", "K1", "K2"],
        "C": ["C0", "C1", "C2"],
        "D": ["D0", "D1", "D2"],
    }
)

In [35]:
result = pd.merge(left, right, on="key",how='left')

In [36]:
result

Unnamed: 0,key,A,B,C,D
0,K0,A0,B0,C0,D0
1,K1,A1,B1,C1,D1
2,K2,A2,B2,C2,D2
3,K3,A3,B3,,


In [37]:
tnr.dtypes

SUBJECT_ID                       int64
HADM_ID                          int64
ICUSTAY_ID                       int64
INTIME                  datetime64[ns]
TRANSFERRED_RETURNED             int32
dtype: object

In [38]:
admissions.dtypes

SUBJECT_ID                       int64
HADM_ID                          int64
ADMITTIME               datetime64[ns]
DISCHTIME               datetime64[ns]
DEATHTIME               datetime64[ns]
ADMISSION_TYPE                  object
ADMISSION_LOCATION              object
DISCHARGE_LOCATION              object
INSURANCE                       object
LANGUAGE                        object
RELIGION                        object
MARITAL_STATUS                  object
ETHNICITY                       object
EDREGTIME                       object
EDOUTTIME                       object
DIAGNOSIS                       object
HOSPITAL_EXPIRE_FLAG             int64
HAS_CHARTEVENTS_DATA             int64
dtype: object

In [39]:
print(icustays.shape)
print(tnr.shape)
# dataset4 = admissions[['SUBJECT_ID', 'HADM_ID']].merge(tnr[['HADM_ID', 'ICUSTAY_ID', 'TRANSFERRED_RETURNED']], how='left', on= 'HADM_ID')
dataset4 = pd.merge(icustays[['SUBJECT_ID','ICUSTAY_ID', 'HADM_ID']],tnr[['HADM_ID', 'ICUSTAY_ID', 'TRANSFERRED_RETURNED']], how='left', on= ['HADM_ID', 'ICUSTAY_ID'])
print(dataset4.shape)
print(np.sum(dataset4['TRANSFERRED_RETURNED']))
dataset4.head(1)

(61532, 11)
(3746, 5)
(61532, 4)
3746.0


Unnamed: 0,SUBJECT_ID,ICUSTAY_ID,HADM_ID,TRANSFERRED_RETURNED
0,268,280836,110404,


In [40]:
print(np.sum(tnd['TRANSFERRED_DEATH']))
dataset5 = dataset4.merge(tnd[['HADM_ID','ICUSTAY_ID','TRANSFERRED_DEATH']], how='left', on=['HADM_ID', 'ICUSTAY_ID'])
print(np.sum(dataset5['TRANSFERRED_RETURNED']))
print(np.sum(dataset5['TRANSFERRED_DEATH']))
dataset5.head(1)

2093
3746.0
2093.0


Unnamed: 0,SUBJECT_ID,ICUSTAY_ID,HADM_ID,TRANSFERRED_RETURNED,TRANSFERRED_DEATH
0,268,280836,110404,,0.0


In [44]:
dataset5.sort_values(['TRANSFERRED_RETURNED', 'TRANSFERRED_DEATH','HADM_ID'],ascending=False).head(6)

Unnamed: 0,SUBJECT_ID,ICUSTAY_ID,HADM_ID,TRANSFERRED_RETURNED,TRANSFERRED_DEATH
23774,14514,254344,199646,1.0,1.0
37403,29636,226404,199547,1.0,1.0
38432,26637,291329,199511,1.0,1.0
38433,26637,248684,199511,1.0,1.0
36045,26849,221342,199270,1.0,1.0
48275,55204,289262,198883,1.0,1.0


In [45]:
admissions[admissions['HADM_ID']==199511]

Unnamed: 0,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DEATHTIME,ADMISSION_TYPE,ADMISSION_LOCATION,DISCHARGE_LOCATION,INSURANCE,LANGUAGE,RELIGION,MARITAL_STATUS,ETHNICITY,EDREGTIME,EDOUTTIME,DIAGNOSIS,HOSPITAL_EXPIRE_FLAG,HAS_CHARTEVENTS_DATA
32654,26637,199511,2179-08-20 20:34:00,2179-09-20 06:18:00,2179-09-20 06:18:00,EMERGENCY,EMERGENCY ROOM ADMIT,DEAD/EXPIRED,Medicare,,OTHER,SINGLE,BLACK/AFRICAN AMERICAN,2179-08-20 12:14:00,2179-08-20 22:45:00,HYPOTENSION,1,1


In [46]:
icustays[icustays['HADM_ID']==199511]

Unnamed: 0,SUBJECT_ID,HADM_ID,ICUSTAY_ID,DBSOURCE,FIRST_CAREUNIT,LAST_CAREUNIT,FIRST_WARDID,LAST_WARDID,INTIME,OUTTIME,LOS
38432,26637,199511,291329,carevue,MICU,MICU,52,52,2179-08-20 20:35:00,2179-08-24 17:44:00,3.8813
38433,26637,199511,248684,carevue,CCU,CCU,57,57,2179-08-27 02:49:00,2179-08-30 23:47:00,3.8736
38434,26637,199511,200307,carevue,MICU,MICU,52,52,2179-09-09 08:17:00,2179-09-14 17:33:00,5.3861


In [42]:
dnr.head(1)

Unnamed: 0,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DISCHARGED_RETURNED
0,36,182104,2131-04-30 07:15:00,2131-05-08 14:00:00,1


In [43]:
np.sum(dataset5.duplicated('HADM_ID'))

3746

In [52]:
dataset5.head(1)

Unnamed: 0,SUBJECT_ID,ICUSTAY_ID,HADM_ID,TRANSFERRED_RETURNED,TRANSFERRED_DEATH
0,268,280836,110404,,0.0


In [49]:
print(np.sum(dnr['DISCHARGED_RETURNED']))
dataset6 = dataset5.merge(dnr[['HADM_ID','DISCHARGED_RETURNED']], how='left', on=['HADM_ID'])
print(np.sum(dataset6['TRANSFERRED_RETURNED']))
print(np.sum(dataset6['TRANSFERRED_DEATH']))
print(np.sum(dataset6['DISCHARGED_RETURNED']))

3390
3746.0
2093.0
3493.0


In [41]:
# dataset4 = admissions[['SUBJECT_ID', 'HADM_ID']].merge(tnr[['SUBJECT_ID', 'HADM_ID', 'ICUSTAY_ID', 'TRANSFERRED_RETURNED']], how='left', on=['SUBJECT_ID', 'HADM_ID'])

# dataset4 = dataset4.merge(tnd[['SUBJECT_ID', 'HADM_ID','TRANSFERRED_DEATH']], how='left', on=['SUBJECT_ID', 'HADM_ID'])
# dataset4 = dataset4.merge(dnr[['SUBJECT_ID', 'HADM_ID','DISCHARGED_RETURNED']], how='left', on=['SUBJECT_ID', 'HADM_ID'])
# dataset4 = dataset4.merge(dnd[['SUBJECT_ID', 'DISCHARGED_DEATH']], how='left', on=['SUBJECT_ID'])
# dataset4 = dataset4[(dataset4['TRANSFERRED_RETURNED'] == 1) | (dataset4['TRANSFERRED_DEATH'] == 1) | (dataset4['DISCHARGED_RETURNED'] == 1) | (dataset4['DISCHARGED_DEATH'] == 1)]
# # dataset4 = dataset4.drop_duplicates(subset='HADM_ID')

In [42]:
# dataset4 = pd.merge(tnr[['SUBJECT_ID', 'HADM_ID', 'ICUSTAY_ID', 'TRANSFERRED_RETURNED']], tnd[['SUBJECT_ID', 'HADM_ID','TRANSFERRED_DEATH']], how='outer',on='HADM_ID')
# dataset4 = dataset4.merge(dnr[['SUBJECT_ID', 'HADM_ID','DISCHARGED_RETURNED']], how='outer', on=['HADM_ID'])
# dataset4 = dataset4.merge(dnd[['SUBJECT_ID', 'HADM_ID', 'DISCHARGED_DEATH']], how='outer', on=['HADM_ID', 'SUBJECT_ID'])

In [43]:
# dataset4 = dataset4.drop_duplicates(subset='HADM_ID')
dataset4.loc[dataset4['TRANSFERRED_RETURNED'].isnull(), 'TRANSFERRED_RETURNED'] = 0
dataset4['TRANSFERRED_RETURNED'] = dataset4['TRANSFERRED_RETURNED'].astype(int)
dataset4.loc[dataset4['TRANSFERRED_DEATH'].isnull(), 'TRANSFERRED_DEATH'] = 0
dataset4['TRANSFERRED_DEATH'] = dataset4['TRANSFERRED_DEATH'].astype(int)
dataset4.loc[dataset4['DISCHARGED_RETURNED'].isnull(), 'DISCHARGED_RETURNED'] = 0
dataset4['DISCHARGED_RETURNED'] = dataset4['DISCHARGED_RETURNED'].astype(int)
dataset4.loc[dataset4['DISCHARGED_DEATH'].isnull(), 'DISCHARGED_DEATH'] = 0
dataset4['DISCHARGED_DEATH'] = dataset4['DISCHARGED_DEATH'].astype(int)
dataset4.head()

KeyError: 'TRANSFERRED_DEATH'

In [None]:
print(np.sum(dataset4['TRANSFERRED_RETURNED']==1))
print(np.sum(dataset4['TRANSFERRED_DEATH']==1))
print(np.sum(dataset4['DISCHARGED_RETURNED']==1))
print(np.sum(dataset4['DISCHARGED_DEATH']==1))

4904
2491
3435
2251


## Age

In [None]:
remove_age = pd.merge(admissions[['SUBJECT_ID', 'HADM_ID', 'ADMITTIME']], patients[['SUBJECT_ID', 'DOB']], how='inner', on='SUBJECT_ID')

In [None]:
remove_age['AGE'] = ((pd.to_datetime(remove_age['ADMITTIME']).dt.date - pd.to_datetime(remove_age['DOB']).dt.date) / np.timedelta64(1, 'Y')).astype(int)

In [None]:
remove_age['UNDER_18'] = (remove_age['AGE'] < 18).astype(int)


In [None]:
dataset4 = pd.merge(dataset4, remove_age[['HADM_ID', 'AGE']], 'inner',on='HADM_ID')

In [None]:
dataset4 = dataset4[dataset4['AGE'] >= 18]

## Goal: Find cases of readmission

1. Transferred & returned               = 3555
2. Transferred & died                   = 1974
3. Discharged & returned < 30 days      = 3205
4. Discharged & died < 30 days          = 2556

In [None]:
print(np.sum(dataset4['TRANSFERRED_RETURNED']==1))
print(np.sum(dataset4['TRANSFERRED_DEATH']==1))
print(np.sum(dataset4['DISCHARGED_RETURNED']==1))
print(np.sum(dataset4['DISCHARGED_DEATH']==1))

4795
2485
3204
2249
