# 24 hour most recent labels

This notebook generates labels according to the most recent level of care a patient receives prior to the 24 hour mark.

A patient has label
* 1 if they are in critical care in the most recent event before the 24hr mark after being admitted as an inpatient
* 0 otherwise



In [None]:
import pandas as pd
import numpy as np
from datetime import datetime
from datetime import timedelta
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
%load_ext google.cloud.bigquery

### Download adjusted_cohort table from BQ

In [None]:
%%bigquery triage_cohort_adjusted
select *
from traige_TE.triage_cohort_adjusted

In [None]:
triage_cohort_adjusted.to_csv("triage_cohort_adjusted.csv", index=False)

### Get the labels for different windows

We need to use the ADT table to get the labels for the different CSNs since we need to know the trajectories.

In [None]:
triage_cohort_adjusted = pd.read_csv("triage_cohort_adjusted.csv")

In [None]:
triage_cohort_adjusted

In [None]:
# load in a fresh copy of the adjusted cohort table here

# read in the cohort after saving the first time
adj_cohort_adt_file = "adjusted_cohort_adt.csv"
adjusted_cohort_adt = pd.read_csv(adj_cohort_adt_file)

# change the effective time to datetime since read in from csv
adjusted_cohort_adt.effective_time_jittered_utc = pd.to_datetime(adjusted_cohort_adt.effective_time_jittered_utc)

adjusted_cohort_adt.sort_values(['pat_enc_csn_id_coded', 'seq_num_in_enc'], inplace=True)

# use this to hide ID columns from view
hidecols = ['jc_uid', 'pat_enc_csn_id_coded']

# join the adt table with the adjusted cohort table
joined_adjusted_cohort_adt = triage_cohort_adjusted.merge(adjusted_cohort_adt, 
                                                          on = ['jc_uid', 'pat_enc_csn_id_coded'])
joined_adjusted_cohort_adt.admit_time = pd.to_datetime(joined_adjusted_cohort_adt.admit_time)
joined_adjusted_cohort_adt.head()


In [None]:
# we still have all of the adt events
print(adjusted_cohort_adt.pat_enc_csn_id_coded.nunique())
print(joined_adjusted_cohort_adt.pat_enc_csn_id_coded.nunique())

In [None]:
# compute the time since event
joined_adjusted_cohort_adt['time_since_admit'] = joined_adjusted_cohort_adt.apply(
    lambda x: x.effective_time_jittered_utc - x.admit_time, axis=1)
joined_adjusted_cohort_adt.head()

In [None]:
# remove all events that occurr prior to admit and also after 24 hours after admit
keep_adt = joined_adjusted_cohort_adt[(joined_adjusted_cohort_adt.time_since_admit 
                                       <= timedelta(hours=24))
                                     & (joined_adjusted_cohort_adt.time_since_admit 
                                       >= timedelta(hours=0))]
print(keep_adt.time_since_admit.describe())
keep_adt.head()

In [None]:
# sort the dataframe
sorted_adt = keep_adt.sort_values(by = ['jc_uid', 'pat_enc_csn_id_coded', 'time_since_admit'])
sorted_adt

# group by encounter and keep only the last time
last_adt = sorted_adt.groupby('pat_enc_csn_id_coded').tail(1)
last_adt.head()

## Look here

This is something to watch out for. When we grab the last event that occurs before the 24 hour mark, the minimum time is 5 minutes. This person probably went straight to critical care from the ER. Might not reflect what we are trying to capture with this label though.

We can look into this 5 minute individual in the next few code blocks.

In [None]:
# we have one row for each event now - each row is the last entry for each event prior to 24 hr mark
print(last_adt.pat_enc_csn_id_coded.nunique())
print(last_adt.shape)

last_adt.time_since_admit.describe()

In [None]:
last_adt[last_adt.time_since_admit < timedelta(hours=1)].sort_values('time_since_admit')

In [None]:
adjusted_cohort_adt[adjusted_cohort_adt.pat_enc_csn_id_coded == 131110146103]

## continue with labels again

In [None]:
# look at the level of care assignments across all individuals
last_adt.pat_lv_of_care.value_counts()

In [None]:
last_adt['24hr_recent_label'] = (last_adt.pat_lv_of_care == 'Critical Care').astype(int)
last_adt.head()

In [None]:
# grab relevant columns
labels = last_adt[['jc_uid', 'pat_enc_csn_id_coded', 'inpatient_data_id_coded', 'admit_time', '24hr_recent_label']]
labels

In [None]:
# save the data
labels.to_csv("adjusted_cohort_24hr_recent_labels.csv", index=False)