# Prediction of Hospital Readmissions

This notebook's goal is to reproduce the claims from Zebin and Chaussalet's paper, 'Design and implementation of a deep recurrent model for prediction of readadmission in urgent care using electronic health records'.



## Claims

When predicting ICU readmissions:

1. LSTM+CNN produced higher accuracy than logistic regression, random forest, and SVM.
2. LSTM+CNN produced higher precision than logistic regression, random forest, and SVM.
3. LSTM+CNN produced higher recall than logistic regression and SVM

In [1]:
import numpy as np
import pandas as pd

In [2]:
patients = pd.read_csv('./mimic-iii/PATIENTS.csv')
patients.head(1)

Unnamed: 0,ROW_ID,SUBJECT_ID,GENDER,DOB,DOD,DOD_HOSP,DOD_SSN,EXPIRE_FLAG
0,234,249,F,2075-03-13 00:00:00,,,,0


In [3]:
admissions = pd.read_csv('./mimic-iii/ADMISSIONS.csv')
admissions.head(1)

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DEATHTIME,ADMISSION_TYPE,ADMISSION_LOCATION,DISCHARGE_LOCATION,INSURANCE,LANGUAGE,RELIGION,MARITAL_STATUS,ETHNICITY,EDREGTIME,EDOUTTIME,DIAGNOSIS,HOSPITAL_EXPIRE_FLAG,HAS_CHARTEVENTS_DATA
0,21,22,165315,2196-04-09 12:26:00,2196-04-10 15:54:00,,EMERGENCY,EMERGENCY ROOM ADMIT,DISC-TRAN CANCER/CHLDRN H,Private,,UNOBTAINABLE,MARRIED,WHITE,2196-04-09 10:06:00,2196-04-09 13:24:00,BENZODIAZEPINE OVERDOSE,0,1


In [4]:
transfers = pd.read_csv('./mimic-iii/TRANSFERS.csv')
transfers.head(1)

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ICUSTAY_ID,DBSOURCE,EVENTTYPE,PREV_CAREUNIT,CURR_CAREUNIT,PREV_WARDID,CURR_WARDID,INTIME,OUTTIME,LOS
0,657,111,192123,254245.0,carevue,transfer,CCU,MICU,7.0,23.0,2142-04-29 15:27:11,2142-05-04 20:38:33,125.19


In [5]:
icustays = pd.read_csv('./mimic-iii/ICUSTAYS.csv')
icustays.head(1)


Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ICUSTAY_ID,DBSOURCE,FIRST_CAREUNIT,LAST_CAREUNIT,FIRST_WARDID,LAST_WARDID,INTIME,OUTTIME,LOS
0,365,268,110404,280836,carevue,MICU,MICU,52,52,2198-02-14 23:27:38,2198-02-18 05:26:11,3.249


In [6]:
# print(len(icustays.groupby(['HADM_ID'])))
# test = (icustays.groupby(['HADM_ID']).size() > 1).reset_index()
# test[test[0]==True]

## NOTE: Organ Donors

There are some cases where a patient as multiple inconsistent death times in duplicated records. Either the patient has 2 different death times or the admit and discharge times do not align with death time.
It appears that the organ donor collection is processed under a different HADM_ID number and the entries are inconsistent.

In [7]:
# addmissions where DEATH IN HOSPITAL
organ = admissions[admissions['HOSPITAL_EXPIRE_FLAG']==1].reset_index(drop=True)

organ[organ.duplicated('SUBJECT_ID',keep=False)].sort_values('SUBJECT_ID').head(2)

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DEATHTIME,ADMISSION_TYPE,ADMISSION_LOCATION,DISCHARGE_LOCATION,INSURANCE,LANGUAGE,RELIGION,MARITAL_STATUS,ETHNICITY,EDREGTIME,EDOUTTIME,DIAGNOSIS,HOSPITAL_EXPIRE_FLAG,HAS_CHARTEVENTS_DATA
43,533,417,178013,2177-03-22 22:24:00,2177-03-23 07:20:00,2177-03-23 07:20:00,EMERGENCY,EMERGENCY ROOM ADMIT,DEAD/EXPIRED,Private,,UNOBTAINABLE,MARRIED,WHITE,2177-03-22 22:01:00,2177-03-23 00:20:00,SUBARACHNOID HEMORRHAGE,1,1
44,534,417,102633,2177-03-23 16:17:00,2177-03-23 07:20:00,2177-03-23 07:20:00,URGENT,PHYS REFERRAL/NORMAL DELI,DEAD/EXPIRED,Private,,UNOBTAINABLE,MARRIED,WHITE,,,ORGAN DONOR ACCOUNT,1,1


## Example

Patient with `SUBJECT_ID = 250`

In [8]:
# patients[patients['SUBJECT_ID'] == 250]

In [9]:
# admissions[admissions['SUBJECT_ID'] == 250]

In [10]:
# transfers[transfers['SUBJECT_ID'] == 250]

## Create Dataset

SUBJECT_ID exmaples  291, 283, 250

In [11]:
patients['DOB'] = pd.to_datetime(patients['DOB'], errors='coerce')
patients['DOD'] = pd.to_datetime(patients['DOD'], errors='coerce')
patients['DOD_HOSP'] = pd.to_datetime(patients['DOD_HOSP'], errors='coerce')
patients['DOD_SSN'] = pd.to_datetime(patients['DOD_SSN'], errors='coerce')
patients = patients.drop('ROW_ID', axis=1)

In [12]:
admissions['ADMITTIME'] = pd.to_datetime(admissions['ADMITTIME'], errors='coerce')
admissions['DISCHTIME'] = pd.to_datetime(admissions['DISCHTIME'], errors='coerce')
admissions['DEATHTIME'] = pd.to_datetime(admissions['DEATHTIME'], errors='coerce')
admissions = admissions.drop('ROW_ID', axis=1)

In [13]:
transfers['INTIME'] = pd.to_datetime(transfers['INTIME'], errors='coerce')
transfers['OUTTIME'] = pd.to_datetime(transfers['OUTTIME'], errors='coerce')
transfers = transfers.drop('ROW_ID', axis=1)

In [14]:
icustays['INTIME'] = pd.to_datetime(icustays['INTIME'], errors='coerce')
icustays['OUTTIME'] = pd.to_datetime(icustays['OUTTIME'], errors='coerce')
icustays = icustays.drop('ROW_ID', axis=1)

## Goal: Find cases of readmission

1. Transferred & returned               = 3555
2. Transferred & died                   = 1974
3. Discharged & returned < 30 days      = 3205
4. Discharged & died < 30 days          = 2556


In [15]:
admissions_type = admissions['ADMISSION_TYPE'].unique()
admissions_type

array(['EMERGENCY', 'ELECTIVE', 'NEWBORN', 'URGENT'], dtype=object)

### 1. Transferred & Returned

```python
dataset['TRANSFERRED_RETURNED'] = icustays.duplicated(subset=['HADM_ID']).astype(int)
```

In [16]:
np.sum(icustays.duplicated(subset=['HADM_ID']).astype(int))

3746

In [17]:
# HADM_ID where patient has multiple ICUSTAY_ID
tnr = icustays[['SUBJECT_ID', 'HADM_ID', 'ICUSTAY_ID', 'INTIME']]
tnr = tnr.sort_values(['HADM_ID','INTIME'])
tnr['TRANSFERRED_RETURNED'] = (tnr.duplicated(subset=['HADM_ID'])).astype(int)
#print(tnr[tnr.duplicated(subset=['HADM_ID'],keep=False)].head(10))
tnr = tnr[tnr['TRANSFERRED_RETURNED'] == 1].reset_index(drop=True)
print(tnr.shape)
tnr.head()

(3746, 5)


Unnamed: 0,SUBJECT_ID,HADM_ID,ICUSTAY_ID,INTIME,TRANSFERRED_RETURNED
0,29971,100021,220579,2109-08-28 10:21:05,1
1,58947,100037,221136,2183-04-20 13:16:43,1
2,1549,100055,245659,2150-07-08 12:49:43,1
3,73946,100104,240176,2201-06-26 23:55:21,1
4,3278,100130,284185,2109-07-28 16:35:00,1


### 2. Transferred & Died

```python
dataset['TRANSFERRED_DEATH'] = admissions['HOSPITAL_EXPIRE_FLAG'] and not dataset['DEATH_IN_ICU']
```

### Define death

```python
patients['EXPIRE_FLAG'] == 1
```

### Define death in hospital

```python
admissions['HOSPITAL_EXPIRE_FLAG'] == 1
```

### Define death in ICU (merge addmissions & icustays)

```python
dataset['DEATH_IN_ICU'] = (admissions['DEATHTIME'] > icustays['INTIME']) & (admissions['DEATHTIME'] < icustays['OUTTIME']).astype(int)
```

In [18]:
# (tnr.duplicated(subset=['HADM_ID'], keep='last')).astype(int)
icustays.shape

(61532, 11)

In [19]:
icustays.sort_values('INTIME').drop_duplicates('HADM_ID',keep='last').shape

(57786, 11)

In [20]:
# addmissions where DEATH IN HOSPITAL
temp = admissions[admissions['HOSPITAL_EXPIRE_FLAG']==1].reset_index(drop=True)
temp = temp[(temp['DIAGNOSIS'] != 'ORGAN DONOR ACCOUNT') & (temp['DIAGNOSIS'] != 'DONOR ACCOUNT') & (temp['DIAGNOSIS'] != 'ORGAN DONOR') & (temp['DIAGNOSIS'].notnull())].reset_index(drop=True)
#temp[temp['DIAGNOSIS'] == 'ORGAN DONOR ACCOUNT'].shape
#temp[temp['DIAGNOSIS'].isnull()] # 1 of these
#temp[temp['DIAGNOSIS'] != 'ORGAN DONOR ACCOUNT'].shape
# 'DONOR ACCOUNT'	
# 'ORGAN DONOR ACCOUNT'
# 'ORGAN DONOR'

# a = temp[temp.duplicated('SUBJECT_ID',keep=False)].sort_values('SUBJECT_ID')
# a[(a['DIAGNOSIS'] != 'ORGAN DONOR ACCOUNT')]
temp.shape

(5813, 18)

This should be transferred death, but my number doesn't match paper. It looks like the paper is creating duplicates.

In [25]:
tnd = pd.merge(temp, icustays.sort_values('INTIME').drop_duplicates('HADM_ID',keep='last'), 'left' ,on=['SUBJECT_ID', 'HADM_ID'])
tnd['TRANSFERRED_DEATH'] = ((tnd['DEATHTIME'].notnull()) & ((tnd['DEATHTIME'] < tnd['INTIME']) | (tnd['DEATHTIME'] > tnd['OUTTIME']))).astype(int)
tnd = tnd[['SUBJECT_ID', 'HADM_ID', 'ADMITTIME', 'DISCHTIME', 'DEATHTIME','HOSPITAL_EXPIRE_FLAG', 'INTIME', 'OUTTIME', 'TRANSFERRED_DEATH']]
np.sum(tnd['TRANSFERRED_DEATH'])


1298

In [23]:
5813-4510

1303

HOSPITAL_EXPIRE_FLAG compares to DEATHTIME within admit times

In [85]:
admissions[ (admissions['HOSPITAL_EXPIRE_FLAG']==0) & (admissions['DEATHTIME'].notnull()) & (admissions['DEATHTIME'] >= admissions['ADMITTIME']) & (admissions['DEATHTIME'] <= admissions['DISCHTIME'])]

(0, 18)

This is the code that the paper used. It creates duplicates. The merge is improper. I need to show a proof of this.

In [36]:
asdf = pd.merge(admissions, icustays, 'inner' ,on=['SUBJECT_ID', 'HADM_ID'])
asdf['TRANSFERRED_DEATH'] = ((asdf['DEATHTIME'].notnull()) & ((asdf['DEATHTIME'] < asdf['INTIME']) | (asdf['DEATHTIME'] > asdf['OUTTIME']))).astype(int)
asdf = asdf[['SUBJECT_ID', 'HADM_ID', 'ADMITTIME', 'DISCHTIME', 'DEATHTIME','HOSPITAL_EXPIRE_FLAG', 'INTIME', 'OUTTIME', 'TRANSFERRED_DEATH']]
np.sum(asdf['TRANSFERRED_DEATH'])

2096

In [37]:
dups = asdf[asdf['TRANSFERRED_DEATH']==1].sort_values('SUBJECT_ID')
dups[dups.duplicated('HADM_ID',keep=False)].head(20)

Unnamed: 0,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DEATHTIME,HOSPITAL_EXPIRE_FLAG,INTIME,OUTTIME,TRANSFERRED_DEATH
485,188,132401,2161-11-01 17:48:00,2162-01-17 05:50:00,2162-01-17 05:50:00,1,2161-12-09 17:03:04,2161-12-17 21:04:31,1
486,188,132401,2161-11-01 17:48:00,2162-01-17 05:50:00,2162-01-17 05:50:00,1,2162-01-10 16:36:40,2162-01-16 18:35:07,1
383,377,139824,2168-03-04 13:56:00,2168-04-04 10:47:00,2168-04-04 10:47:00,1,2168-03-04 13:57:00,2168-03-15 16:33:00,1
384,377,139824,2168-03-04 13:56:00,2168-04-04 10:47:00,2168-04-04 10:47:00,1,2168-03-21 13:13:00,2168-03-23 16:27:00,1
385,377,139824,2168-03-04 13:56:00,2168-04-04 10:47:00,2168-04-04 10:47:00,1,2168-03-26 08:03:00,2168-04-03 15:03:00,1
427,408,173910,2188-10-27 21:25:00,2189-01-11 20:19:00,2189-01-11 20:19:00,1,2188-10-27 21:27:14,2188-10-30 19:54:32,1
428,408,173910,2188-10-27 21:25:00,2189-01-11 20:19:00,2189-01-11 20:19:00,1,2188-11-19 12:30:24,2188-11-26 11:26:04,1
1257,766,183370,2178-03-03 23:18:00,2178-03-27 04:41:00,2178-03-27 04:41:00,1,2178-03-03 23:19:45,2178-03-05 11:40:21,1
1258,766,183370,2178-03-03 23:18:00,2178-03-27 04:41:00,2178-03-27 04:41:00,1,2178-03-07 14:17:44,2178-03-12 13:48:25,1
966,834,153730,2166-06-17 03:00:00,2166-09-12 11:30:00,2166-09-12 11:30:00,1,2166-08-19 06:00:00,2166-08-31 18:55:00,1


In [15]:
# There will be duplicate SUBJECT_ID
tnd = pd.merge(admissions[['SUBJECT_ID', 'HADM_ID', 'ADMITTIME', 'DISCHTIME', 'DEATHTIME','HOSPITAL_EXPIRE_FLAG']], icustays[['SUBJECT_ID', 'HADM_ID', 'INTIME', 'OUTTIME']], 'inner',on=['SUBJECT_ID', 'HADM_ID'])
# HADM_ID where patient died in ICU
tnd['DEATH_IN_ICU'] = ((tnd['DEATHTIME'].notnull()) & (tnd['DEATHTIME'] >= tnd['INTIME']) & (tnd['DEATHTIME'] <= tnd['OUTTIME'])).astype(int)
tnd.head()

Unnamed: 0,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DEATHTIME,HOSPITAL_EXPIRE_FLAG,INTIME,OUTTIME,DEATH_IN_ICU
0,22,165315,2196-04-09 12:26:00,2196-04-10 15:54:00,NaT,0,2196-04-09 12:27:00,2196-04-10 15:54:00,0
1,23,152223,2153-09-03 07:15:00,2153-09-08 19:10:00,NaT,0,2153-09-03 09:38:55,2153-09-04 15:59:11,0
2,23,124321,2157-10-18 19:34:00,2157-10-25 14:00:00,NaT,0,2157-10-21 11:40:38,2157-10-22 16:08:48,0
3,24,161859,2139-06-06 16:14:00,2139-06-09 12:48:00,NaT,0,2139-06-06 16:15:36,2139-06-07 04:33:25,0
4,25,129635,2160-11-02 02:06:00,2160-11-05 14:55:00,NaT,0,2160-11-02 03:16:23,2160-11-05 16:23:27,0


In [18]:
# HADM_ID where patient died outside ICU, but still in hospital
# = dataset.copy()
tnd['TRANSFERRED_DEATH'] = ((tnd['HOSPITAL_EXPIRE_FLAG']==1) & (tnd['DEATH_IN_ICU']==0)).astype(int)
tnd = (tnd[tnd['TRANSFERRED_DEATH'] == 1]).reset_index(drop=True)
tnd.head()

Unnamed: 0,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DEATHTIME,HOSPITAL_EXPIRE_FLAG,INTIME,OUTTIME,DEATH_IN_ICU,TRANSFERRED_DEATH
0,56,181711,2104-01-02 02:01:00,2104-01-08 10:30:00,2104-01-08 10:30:00,1,2104-01-02 02:02:39,2104-01-03 22:25:29,0,1
1,61,189535,2119-01-04 18:12:00,2119-02-03 01:35:00,2119-02-03 01:35:00,1,2119-01-20 15:58:00,2119-01-22 16:11:10,0,1
2,91,121205,2177-04-23 00:08:00,2177-05-10 15:16:00,2177-05-10 15:16:00,1,2177-04-27 02:08:00,2177-04-27 14:03:00,0,1
3,101,175533,2196-09-26 18:36:00,2196-10-12 13:17:00,2196-10-12 13:17:00,1,2196-09-26 18:37:40,2196-10-06 16:01:56,0,1
4,103,133550,2144-08-30 23:09:00,2144-09-01 14:28:00,2144-09-01 14:28:00,1,2144-08-30 23:12:13,2144-08-31 17:32:16,0,1


### 3. Discharged & Returned < 30

This is the wrong way. This labels the return visit instead of discharged visit

```python
sort admissions by SUBJECT_ID, ADMITTIME
shifted = admissions.shift(1)
shifted['DISCHTIME'] = pd.to_datetime(shifted['DISCHTIME'], errors='coerce')
dataset['DISCHARGED_RETURNED'] = (shifted['SUBJECT_ID'] == admissions['SUBJECT_ID']) & (admissions['ADMITTIME'] - shifted['DISCHTIME'] <= np.timedelta64(30, 'D')) 
```

In [19]:
dnr = admissions[['SUBJECT_ID', 'HADM_ID', 'ADMITTIME', 'DISCHTIME']].copy()
dnr.sort_values(by=['SUBJECT_ID', 'DISCHTIME'], ignore_index=True, inplace=True)
shifted = (dnr.copy()).shift(-1)
dnr['ADMITTIME'] = pd.to_datetime(dnr['ADMITTIME'], errors='coerce')
dnr['DISCHTIME'] = pd.to_datetime(dnr['DISCHTIME'], errors='coerce')
shifted['ADMITTIME'] = pd.to_datetime(shifted['ADMITTIME'], errors='coerce')
shifted['DISCHTIME'] = pd.to_datetime(shifted['DISCHTIME'], errors='coerce')
dnr['DISCHARGED_RETURNED'] = ((shifted['SUBJECT_ID'] == dnr['SUBJECT_ID']) & (shifted['HADM_ID'] != dnr['HADM_ID']) & (shifted['ADMITTIME'] - dnr['DISCHTIME'] <= np.timedelta64(30, 'D'))).astype(int)
dnr = dnr[dnr['DISCHARGED_RETURNED'] == 1].reset_index(drop=True)
dnr.head()

Unnamed: 0,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DISCHARGED_RETURNED
0,36,182104,2131-04-30 07:15:00,2131-05-08 14:00:00,1
1,68,170467,2173-12-15 16:16:00,2174-01-03 18:30:00,1
2,103,130744,2144-08-12 17:37:00,2144-08-20 11:15:00,1
3,105,161160,2189-01-28 16:57:00,2189-02-02 16:40:00,1
4,109,164029,2140-01-19 13:25:00,2140-01-21 13:25:00,1


### 4. Discharged & Died < 30 (merge patients, addmisions)

Need last discharged visit HADM_ID

```python
dataset['DISCHARGED_DEATH'] = patients['EXPIRE_FLAG'] & not admissions['HOSPITAL_EXPIRE_FLAG'] & (patients['DOD'] - admissions['DISCHTIME'] <= np.timedelta64(30, 'D')) 
```

In [20]:
dnd = pd.merge(patients[['SUBJECT_ID', 'DOD', 'EXPIRE_FLAG']],admissions[['SUBJECT_ID', 'HADM_ID','DISCHTIME','HOSPITAL_EXPIRE_FLAG']],'inner',on='SUBJECT_ID')
dnd['DISCHARGED_DEATH'] = ((dnd['EXPIRE_FLAG']==1) & (dnd['HOSPITAL_EXPIRE_FLAG']==0) & (dnd['DOD'] - dnd['DISCHTIME'] <= np.timedelta64(30, 'D')) ).astype(int)
dnd = dnd[dnd['DISCHARGED_DEATH'] == 1].reset_index(drop=True)
dnd.head()

Unnamed: 0,SUBJECT_ID,DOD,EXPIRE_FLAG,HADM_ID,DISCHTIME,HOSPITAL_EXPIRE_FLAG,DISCHARGED_DEATH
0,668,2183-07-10,1,166245,2183-07-07 16:30:00,0,1
1,670,2161-02-15,1,176690,2161-02-13 15:45:00,0,1
2,700,2115-05-04,1,199309,2115-05-02 16:30:00,0,1
3,711,2185-05-26,1,158767,2185-05-16 17:10:00,0,1
4,735,2128-07-04,1,140547,2128-06-10 17:30:00,0,1


In [21]:
admissions.head(1)

Unnamed: 0,SUBJECT_ID,HADM_ID,ADMITTIME,DISCHTIME,DEATHTIME,ADMISSION_TYPE,ADMISSION_LOCATION,DISCHARGE_LOCATION,INSURANCE,LANGUAGE,RELIGION,MARITAL_STATUS,ETHNICITY,EDREGTIME,EDOUTTIME,DIAGNOSIS,HOSPITAL_EXPIRE_FLAG,HAS_CHARTEVENTS_DATA
0,22,165315,2196-04-09 12:26:00,2196-04-10 15:54:00,NaT,EMERGENCY,EMERGENCY ROOM ADMIT,DISC-TRAN CANCER/CHLDRN H,Private,,UNOBTAINABLE,MARRIED,WHITE,2196-04-09 10:06:00,2196-04-09 13:24:00,BENZODIAZEPINE OVERDOSE,0,1


## merge

In [22]:
dataset4 = admissions[['SUBJECT_ID', 'HADM_ID']].merge(tnr[['SUBJECT_ID', 'HADM_ID', 'ICUSTAY_ID', 'TRANSFERRED_RETURNED']], how='left', on=['SUBJECT_ID', 'HADM_ID'])

dataset4 = dataset4.merge(tnd[['SUBJECT_ID', 'HADM_ID','TRANSFERRED_DEATH']], how='left', on=['SUBJECT_ID', 'HADM_ID'])
dataset4 = dataset4.merge(dnr[['SUBJECT_ID', 'HADM_ID','DISCHARGED_RETURNED']], how='left', on=['SUBJECT_ID', 'HADM_ID'])
dataset4 = dataset4.merge(dnd[['SUBJECT_ID', 'DISCHARGED_DEATH']], how='left', on=['SUBJECT_ID'])
dataset4 = dataset4[(dataset4['TRANSFERRED_RETURNED'] == 1) | (dataset4['TRANSFERRED_DEATH'] == 1) | (dataset4['DISCHARGED_RETURNED'] == 1) | (dataset4['DISCHARGED_DEATH'] == 1)]
# dataset4 = dataset4.drop_duplicates(subset='HADM_ID')

In [23]:
# dataset4 = dataset4.drop_duplicates(subset='HADM_ID')
dataset4.loc[dataset4['TRANSFERRED_RETURNED'].isnull(), 'TRANSFERRED_RETURNED'] = 0
dataset4['TRANSFERRED_RETURNED'] = dataset4['TRANSFERRED_RETURNED'].astype(int)
dataset4.loc[dataset4['TRANSFERRED_DEATH'].isnull(), 'TRANSFERRED_DEATH'] = 0
dataset4['TRANSFERRED_DEATH'] = dataset4['TRANSFERRED_DEATH'].astype(int)
dataset4.loc[dataset4['DISCHARGED_RETURNED'].isnull(), 'DISCHARGED_RETURNED'] = 0
dataset4['DISCHARGED_RETURNED'] = dataset4['DISCHARGED_RETURNED'].astype(int)
dataset4.loc[dataset4['DISCHARGED_DEATH'].isnull(), 'DISCHARGED_DEATH'] = 0
dataset4['DISCHARGED_DEATH'] = dataset4['DISCHARGED_DEATH'].astype(int)
dataset4.head()

Unnamed: 0,SUBJECT_ID,HADM_ID,ICUSTAY_ID,TRANSFERRED_RETURNED,TRANSFERRED_DEATH,DISCHARGED_RETURNED,DISCHARGED_DEATH
15,36,182104,,0,0,1,0
21,41,101757,237024.0,1,0,0,0
23,357,145674,,0,0,1,0
27,358,110872,244658.0,1,0,0,0
31,361,148959,257948.0,1,0,0,0


## Age

In [24]:
remove_age = pd.merge(admissions[['SUBJECT_ID', 'HADM_ID', 'ADMITTIME']], patients[['SUBJECT_ID', 'DOB']], how='inner', on='SUBJECT_ID')

In [25]:
remove_age['AGE'] = ((pd.to_datetime(remove_age['ADMITTIME']).dt.date - pd.to_datetime(remove_age['DOB']).dt.date) / np.timedelta64(1, 'Y')).astype(int)

In [26]:
remove_age['UNDER_18'] = (remove_age['AGE'] < 18).astype(int)


In [27]:
dataset4 = pd.merge(dataset4, remove_age[['HADM_ID', 'AGE']], 'inner',on='HADM_ID')

In [28]:
dataset4 = dataset4[dataset4['AGE'] >= 18]

## Goal: Find cases of readmission

1. Transferred & returned               = 3555
2. Transferred & died                   = 1974
3. Discharged & returned < 30 days      = 3205
4. Discharged & died < 30 days          = 2556

In [29]:
print(np.sum(dataset4['TRANSFERRED_RETURNED']))
print(np.sum(dataset4['TRANSFERRED_DEATH']))
print(np.sum(dataset4['DISCHARGED_RETURNED']))
print(np.sum(dataset4['DISCHARGED_DEATH']))

4227
2498
3366
4253
