# Decide how to compute valid prediction times.
Current approach has holes, so rethink it.  Probably best approach is to use the stays, which are derived from the daily census and the transfer logs.  The point of this is to have an explicit model for when predictions are done so we can be careful about constructing our problem.  

* First, assume the daily census records are the result of an automated process that runs just before midnight each day.  This seems to be consistent with the transfer logs (harder to say what is going on the other end).  
* Second, specify an explicit <code>time_of_day</code> at which predictions are run each day. 
* Valid prediction times are times for which two conditions hold: the patient was in the census from prior night, and the patient was not transferred out before <code>time_of_day</code>

This suggests that the easiest way to build prediction times is from our stays: 

* As before, use <code>patient_stays.get_patient_stays</code> to get patient stays.  This just splits continuous runs of daily census records using transfer log events.  
* Throw out the first day of each stay. 
* If there is a transfer time stamp for the end of the stay, see if it is before <code>time_of_day</code>.  If so, throw out the last day as a valid prediction time.  Otherwise, include it. 

This is best done constructively, from stays, rather than as a filter on proposal prediction times built from the census we were doing previously. 

In [1]:
import os
import sys
import datetime
import pandas as pd
import pickle as pkl
import numpy as np
import scipy

sys.path.append('/code')
from edge import data
from edge import patient_stays
from edge import diagnosis
from edge import meds
from edge import vitals
from edge import utils

%load_ext autoreload
%autoreload 2

In [2]:
data_dict = data.load_raw_data_from_files('/data/raw', prefix='infinity')

Loading demographics data from /data/raw...
Loading census data from /data/raw...
Loading transfers data from /data/raw...
Loading diagnoses data from /data/raw...
Loading meds data from /data/raw...
Loading vitals data from /data/raw...
Loading orders data from /data/raw...
Loading lab_results data from /data/raw...
Loading alerts data from /data/raw...
Loading progress_notes data from /data/raw...
Loading stays data from /data/raw...


In [3]:
census = data_dict['census']
census = census.sort_values(by=['MasterPatientID', 'CensusDate'])
census = utils.deduper(census, unique_keys=['MasterPatientID', 'FacilityID', 'CensusDate'])
census.shape

(10919704, 10)

In [5]:
stays = data_dict['stays']
stays.head()

Unnamed: 0,MasterPatientID,PatientID,FacilityID,StartDate,EndDate,DateOfTransfer,PurposeOfStay,TransferredTo,Outcome,OrderedByID,TransferReason,OtherReasonForTransfer,Planned,TransferredWithin30DaysOfAdmission,LengthOfStay,HospitalDischargeDate,PrimaryPhysicianID,Client,TransferDate
0,infinity-infinity_100,170,1,2017-01-01,2020-02-28,NaT,,,,,,,,,1153,NaT,,,
1,infinity-infinity_100059,2416559,1,2019-06-03,2019-11-25,2019-11-25 01:36:00,Chronic Long-Term,CHRIST HOSPITAL,"Admitted, Inpatient",113555.0,"Abnormal Vital Signs (low/high BP, high respir...",,No,0.0,175,NaT,113555.0,infinity-infinity,2019-11-25
2,infinity-infinity_100059,2416559,1,2020-01-10,2020-02-28,NaT,,,,,,,,,49,NaT,,,
3,infinity-infinity_100067,614952,1,2017-04-13,2017-04-13,NaT,,,,,,,,,0,NaT,,,
4,infinity-infinity_100112,523912,1,2017-01-01,2017-03-05,NaT,,,,,,,,,63,NaT,,,


In [83]:
ptimes = utils.get_prediction_timestamps(stays, "07:00:00")
ptimes.shape

Constructing jobs data
Launching jobs
Concatenating data frames


(10880004, 4)

In [84]:
ptimes.head()

Unnamed: 0,MasterPatientID,FacilityID,PredictionTimestamp,StayRowIndex
0,infinity-infinity_100,1.0,2017-01-02 07:00:00,0.0
1,infinity-infinity_100,1.0,2017-01-03 07:00:00,0.0
2,infinity-infinity_100,1.0,2017-01-04 07:00:00,0.0
3,infinity-infinity_100,1.0,2017-01-05 07:00:00,0.0
4,infinity-infinity_100,1.0,2017-01-06 07:00:00,0.0
