In this notebook, we construct feature space for all encounters. The feature type will include demographics information (age-continuous, sex-binary, race-binary, whether black), comorbidities (pre-selection, binary) and in-observation window features, including medications (pre-selection, binary), procedures (pre-selection, binary) and lab test results (pre-selection, continous values) as well as baseline SCr level (continuous).  

For comorbidities and medications, we will add upper level ontology to enhance the features.

1. Literature Reviews for AKI-related Comorbidities: diabetes, cancer, HIV/AIDS, CKD (stages 1-5), hypertension, chronic liver diseases, heart failure, gastrointestinal diseases, 
2. Literature Reviews for AKI-related Medications: chemotherapy
3. Literature Reviews for AKI-related Procedures: cardiac surgery

In [1]:
import pandas as pd
import numpy as np
import sys
import os
sys.path.append(os.path.abspath("/home/lideyi/AKI_GNN/notebooks/utils"))
from common_var import raw_path, ct_names, pat_id_cols

# Read Patient ID DataFrame

In [12]:
# read patient id dataframe
onset_df = pd.read_csv('/blue/yonghui.wu/lideyi/AKI_GNN/raw_data/onset_df_cleaned.csv')

# format columns
onset_df[pat_id_cols + ['SEX', 'RACE']] = onset_df[pat_id_cols + ['SEX', 'RACE']].astype(str)
date_cols = ['ADMIT_DATE', 'DISCHARGE_DATE', 'OBSERVATION_WINDOW_START', 'PREDICTION_POINT']
for col in date_cols:
    onset_df[col] = pd.to_datetime(onset_df[col]).dt.date

  onset_df = pd.read_csv('/blue/yonghui.wu/lideyi/AKI_GNN/raw_data/onset_df_cleaned.csv')


In [13]:
# since we already have demographic information in the onset_df
# we can directly translate them into features
# Convert SEX column to binary
onset_df['SEX'] = onset_df['SEX'].apply(lambda x: 1 if x == 'M' else (0 if x == 'F' else np.random.randint(0, 2)))
# For RACE, label Black as 1 and Others as 0
onset_df['RACE'] = onset_df['RACE'].apply(lambda x: 1 if x == 'Black' else 0)