#### Workflow:
Each 'hash_patient_id' is processed separately to make the process efficient.

1. Create observations with 'create_data_points':
 - an observation is defined as a data point such that all signals 'peep', 'fio2', 'po2'
 are measured within one hour. The mean of all measurements within the hour is taken
 and the hour is rounded up to the nearest hour.
 - As an example if for a single 'hash_patient_id' we have three measurements:
 'peep' measured at 12:50, 'fio2' measured at 12:10 and po2 measured at 12:20, 12:40, 13:00
 then these measurements will create one data point: 13:00.
 - As a second example if fio2 would not have been measured between 12:01 and 13:00. Then
 we would discard all observations that happened between 12:01 and 13:00 for all signals.
 - Notebook works with a single patient and hence '_create_data_points_batch' to make
testing faster. Consider changing the function name to 'create_observations'.

2. (to be implemented) Function 'select_cohort' selects the cohort by checking the
inclusion and exclusion criteria. (First to do is to plot all values as a histogram,
possibly without discarding the non measured ones.)

3. Create a table with the treatment. Function 'get_proning' creates a table with
proning sessions. This table will be used to split the observations into control, treated,
outcome control, outcome treated.

load blood gas points (create_data_points)
 -> drop ids that are not eligible (check_inclusion(drop = True))
 -> split points into (control - not proned and not proned after
                       measurement_control - not proned and wasn't proned in the past, may be proned in the future
                       trated - not proned will be proned after
                       measurement_treated - proned at the moment)


In [4]:
%reset

In [13]:
import os, sys, random

import pandas as pd
import numpy as np
import swifter

import pacmagic
import pacmagic_icu

from importlib import reload
from data_warehouse_utils.dataloader import DataLoader

os.chdir('/home/adam/files/causal_inference')
os.getcwd()

from causal_inference.experiment_generator.create_observations import create_data_points
from causal_inference.experiment_generator.create_treatment import get_proning_table
from causal_inference.experiment_generator.create_treatment import add_treatment
from causal_inference.experiment_generator.create_inclusion_criteria import get_inclusion_data

In [14]:
# Reloads packages
os.chdir('/home/adam/files/causal_inference')
os.getcwd()

reload(sys.modules['causal_inference.experiment_generator.create_observations'])
reload(sys.modules['causal_inference.experiment_generator.create_treatment'])
reload(sys.modules['causal_inference.experiment_generator.create_inclusion_criteria'])


from causal_inference.experiment_generator.create_observations import create_data_points
from causal_inference.experiment_generator.create_treatment import get_proning_table
from causal_inference.experiment_generator.create_treatment import add_treatment
from causal_inference.experiment_generator.create_inclusion_criteria import get_inclusion_data

In [15]:
dl = DataLoader()

In [None]:
df_measurements = create_data_points(dl)

In [None]:
df_measurements.head()

In [None]:
df_measurements.info()

In [None]:
os.chdir('/home/adam/files/data')
os.getcwd()

In [None]:
df_measurements.to_csv('blood_gas_measurements.csv', index=False)

In [16]:
df_treatment = get_proning_table(dl)

In [17]:
df_treatment.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6866 entries, 0 to 6865
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   hash_patient_id     6866 non-null   object        
 1   effective_value     6866 non-null   object        
 2   session_id          6866 non-null   int64         
 3   is_correct_unit_yn  6866 non-null   bool          
 4   proning_canceled    6866 non-null   bool          
 5   start_timestamp     6866 non-null   datetime64[ns]
 6   end_timestamp       6866 non-null   datetime64[ns]
 7   duration_hours      6866 non-null   int64         
dtypes: bool(2), datetime64[ns](2), int64(2), object(2)
memory usage: 335.4+ KB


In [18]:
df_treatment.head()

Unnamed: 0,hash_patient_id,effective_value,session_id,is_correct_unit_yn,proning_canceled,start_timestamp,end_timestamp,duration_hours
0,0056C30A94364E6D71E41EF2F4611DE0FEDF1D86755991...,prone,1,True,False,2020-04-20 20:11:35,2020-04-27 15:29:45,163
1,0056C30A94364E6D71E41EF2F4611DE0FEDF1D86755991...,supine,0,True,False,2020-04-17 18:47:58,2020-04-20 20:11:35,73
2,0056C30A94364E6D71E41EF2F4611DE0FEDF1D86755991...,supine,2,True,False,2020-04-27 15:29:45,2020-05-06 20:20:29,220
3,0062A4D1F904E04A4B1FA417D87F71181AEB285660274A...,supine,0,False,False,2020-03-20 15:27:00,2020-03-20 22:18:00,6
4,0070A04E30F2A5F394E0EED71AE0C186DEAB514BD21D27...,Bed naar links,1,True,False,2020-05-13 04:00:00,2020-05-13 12:00:00,8


In [25]:
os.chdir('/home/adam/files/data')
os.getcwd()

'/home/adam/files/data'

In [26]:
df_treatment.to_csv('prone_sessions.csv', index=False)

In [None]:
df_observations = add_treatment(df_treatment)

In [None]:
os.chdir('/home/adam/files/data')
df_measurements = pd.read_csv('blood_gas_measurements.csv')
df_measurements.head()

In [None]:
df_observations = get_inclusion_data(df_observations, df_measurements)

In [None]:
df_observations.head()

In [20]:
df_treatment.describe()

Unnamed: 0,session_id,duration_hours
count,6866.0,6866.0
mean,6.113749,68.185698
std,7.336826,170.149423
min,0.0,0.0
25%,1.0,6.0
50%,4.0,18.0
75%,8.0,47.0
max,47.0,3470.0


In [22]:
df_treatment[df_treatment.effective_value == 'prone'].describe()

Unnamed: 0,session_id,duration_hours
count,2706.0,2706.0
mean,6.257945,36.512195
std,7.090593,64.739061
min,0.0,0.0
25%,1.0,14.0
50%,4.0,19.0
75%,9.0,33.0
max,47.0,865.0


In [24]:
df_treatment[df_treatment.effective_value == 'prone'][df_treatment.duration_hours <= 96].describe()


Boolean Series key will be reindexed to match DataFrame index.



Unnamed: 0,session_id,duration_hours
count,2531.0,2531.0
mean,6.547215,23.360332
std,7.179327,19.070468
min,0.0,0.0
25%,1.0,13.0
50%,4.0,19.0
75%,9.0,25.0
max,47.0,96.0


In [27]:

df_treatment.dtypes

hash_patient_id               object
effective_value               object
session_id                     int64
is_correct_unit_yn              bool
proning_canceled                bool
start_timestamp       datetime64[ns]
end_timestamp         datetime64[ns]
duration_hours                 int64
dtype: object