# 2. Problem Definition

This notebook is reflecting the adaptivity usecase in the Cardea [paper](https://arxiv.org/abs/2010.00509). It is concerned with using the loaded entityset from MIMIC data, and speficying the meta information as well as designing the labeling function if needed that will be required in the feature engineering phase of the framework.

In [1]:
import pickle

In [2]:
# load the entityset from the previous step

with open('./mimic_entityset.pkl', 'rb') as file:
    entityset = pickle.load(file)

Using this dataset, we will consider three prediction tasks:
* predicting patient mortality
* predicting patient readmission
* predicting patient length of stay

For each problem, we will define the `label_times` that represents label and cutoff time for each instance (unique id). For example, the `mortality` problem, we use the `hospital_expire_flag` as our label and the `admittime` as the cutoff time.

In certain situations, we have a secondary time index column to denote columns that only appear after a particular time index. To read more about cutoff times, visit the featuretools [documnetation](https://docs.featuretools.com/en/v0.12.0/automated_feature_engineering/handling_time.html).

In [3]:
problem =  'mortality' # should be one of ['los', 'mortality', 'readmission']
    
if problem == 'mortality':
    label_column = 'hospital_expire_flag'
    time_column = 'admittime'
    secondary_time_column = 'dischtime'
    column_id = 'hadm_id'
    entity = 'admissions'
    secondary_columns = ['deathtime', 'discharge_location', 'hospital_expire_flag']

    entity_set_df = entityset[entity].df
    entityset = entityset.entity_from_dataframe(entity_id=entity,
                                                dataframe=entity_set_df,
                                                time_index=time_column,
                                                index=column_id,
                                                secondary_time_index={secondary_time_column: secondary_columns})

elif problem == 'los':
    label_column = 'los'
    time_column = 'intime'
    column_id = 'icustay_id'
    entity = 'icustays'
    secondary_time_column = 'outtime'
    secondary_columns = ['last_wardid', 'last_careunit', 'los']

    entity_set_df = entityset[entity].df
    entityset = entityset.entity_from_dataframe(entity_id=entity,
                                                dataframe=entity_set_df,
                                                time_index=time_column,
                                                index=column_id,
                                                secondary_time_index={secondary_time_column: secondary_columns})

elif problem == 'readmission':
    label_column = 'readmission'
    time_column = 'dischtime'
    column_id = 'hadm_id'
    entity = 'admissions'
    
else:
    raise Exception("problem not found.")

label_times = entityset[entity].df[[column_id, time_column, label_column]]
label_times.columns = ['instance_id', 'time', 'label']
label_times = label_times.sort_values('time')

We now have the `label_times` variable ready! We store them in the following dictionary pickle, such that we can use it the following step of automatically engineering the features.

In [4]:
parameters = {
    "es": entityset, 
    "entity": entity, 
    "label_times": label_times
}

with open('./{}_parameters.pkl'.format(problem), 'wb') as file:
    pickle.dump(parameters, file)