#### Workflow:
1. Create observations with 'create_data_points'. An observation is a data point in which
all signals 'peep', 'fio2', 'po2' are measured. All data points are rounded down to the
nearest hour. Hence each data points is an hour during which all signals where measured.
Notebook works with a single patient and hence '_create_data_points_batch' to make
testing faster. Consider changing the function name to 'create_observations'.

2. (to be implemented) Function 'select_cohort' selects the cohort by checking the
inclusion and exclusion criteria.

3. Create a table with the treatment. Function 'get_proning' creates a table with
proning sessions. This table will be used to check whether an observation was treated.


function that for each point checks preconditions

patients -> treated -> proning sessions -> included

patients -> never treated -> then look at points with blood gas -> if inclusion yes then data point

(update)

load blood gas points (create_data_points)
 -> drop ids that are not eligible (check_inclusion(drop = True))
 -> split points into (control - not proned and not proned after
                       measurement_control - not proned and wasn't proned in the past, may be proned in the future
                       trated - not proned will be proned after
                       measurement_treated - proned at the moment)


In [None]:
%reset

In [None]:
import os, sys, random

import pandas as pd
import numpy as np
import swifter

import pacmagic
import pacmagic_icu

from importlib import reload
from data_warehouse_utils.dataloader import DataLoader

os.chdir('/home/adam/files/causal_inference')
os.getcwd()

from causal_inference.experiment_generator.create_observations import create_data_points
from causal_inference.experiment_generator.create_observations import _get_hash_patient_id
from causal_inference.experiment_generator.create_observations import hour_rounder
from causal_inference.experiment_generator.create_observations import _create_data_points_batch
from causal_inference.experiment_generator.create_treatment import get_proning_table
from causal_inference.experiment_generator.create_treatment import proning_table_to_intervals


In [None]:
# Reloads packages

reload(sys.modules['causal_inference'])
reload(sys.modules['causal_inference.experiment_generator'])
reload(sys.modules['causal_inference.experiment_generator.create_observations'])
reload(sys.modules['causal_inference.experiment_generator.create_treatment'])

In [None]:
dl = DataLoader()

patient_id_all = _get_hash_patient_id(dl)

In [None]:
ID = random.choice(patient_id_all)
ID

In [None]:
df = _create_data_points_batch(dl = dl,
                               patient_id = ID,
                               compress = False, # doesn't have an effect yet
                               nearest = False) # We need to round down to be consistent

In [None]:
df.head()

In [None]:
# Remark: for some Ids there is no data, we need to account for it and then delete such a patient

df_treatment = get_proning_table(dl, ID)

df_treatment.head()

In [None]:
#for each patient we could say
# 1. Let's aggregate for each patient the sum of intervals when it was proned
# 1a We consider only proning sessions that was at least 1 hour long so if we check all
# full hours between time stamp and time stamp + x then we are fine
# 2. For each patient for each data point check if during the next x (argument)
#    it wasn't proned

In [None]:
df_position = dl.get_range_measurements(patients= [ID],
                                            parameters= ['position'],
                                            sub_parameters=['position_body'],
                                            columns=['hash_patient_id',
                                                     'start_timestamp',
                                                     'end_timestamp',
                                                     'effective_value',
                                                     'is_correct_unit_yn']
                                            )

df_position.sort_values(by = ['hash_patient_id', 'start_timestamp'],
                            ascending = True,
                            inplace = True)

df_position.reset_index(drop=True, inplace=True)

df_position.head()

In [None]:
    ### Aggregate into sessions: get_proning_table

df_position['effective_timestamp'] = df_position['start_timestamp']
df_position['effective_timestamp_next'] = df_position['effective_timestamp'].shift(-1)

df_position['effective_value_next'] = df_position['effective_value'].shift(-1)
df_position['session_id'] = 0
df_position['proning_canceled'] = False
session_id = 0

In [None]:
for idx, row in df_position.iterrows():

    df_position.loc[idx, 'session_id'] = session_id
    if row.effective_value != row.effective_value_next:
        session_id += 1
        df_position.loc[idx, 'effective_timestamp'] = row.effective_timestamp_next

    if (row.effective_value == 'prone') & (row.effective_value_next == 'canceled'):
        df_position.loc[idx, 'proning_canceled'] = True

In [None]:
df_position.head()

In [None]:
    ### Groupby session wise: groupby_proning_table
df_groupby_start = df_position.groupby(['hash_patient_id', 'effective_value', 'session_id'],
                               as_index=False)['start_timestamp'].min()

df_groupby_start = df_groupby_start.drop(columns = ['hash_patient_id', 'effective_value'])

    #df_groupby_start = df_groupby_start.rename(columns = {'effective_timestamp':'start_timestamp'})

df_groupby_end = df_position.groupby(['hash_patient_id', 'effective_value', 'session_id'],
                               as_index=False)['effective_timestamp'].max()

df_groupby_end = df_groupby_end.drop(columns = ['hash_patient_id', 'effective_value'])

df_groupby_end = df_groupby_end.rename(columns = {'effective_timestamp':'end_timestamp'})

df_groupby = df_position.groupby(['hash_patient_id', 'effective_value', 'session_id'],
                               as_index=False)['is_correct_unit_yn',
                                               'proning_canceled'].last()

df_groupby = pd.merge(df_groupby, df_groupby_start, how='left', on='session_id')
df_groupby = pd.merge(df_groupby, df_groupby_end, how='left', on='session_id')

    # Calculate duration full hours

df_groupby['duration_hours'] = df_groupby['end_timestamp'] - df_groupby['start_timestamp']
#df_groupby['duration_hours'] = df_groupby['duration_hours'].astype('timedelta64[h]').astype('int')

df_groupby.head()

In [None]:
# what happens if the last one is proning