# Definition

deterioration definition and analysis

Deterioration includes any of the following, alone or in combination:
* Death
* New onset medical complication - 'cat1' complication (see vars in raw ODK data with cat1_ prefix)
** cat1 complications include lack of appetite (anorexia)
** Note: cat1 complications prompt ITP referral in current ODK setup, though sometimes the caregiver or family members refuse referral
* Poor weight gain
** Weight at week 3 is lower than weight at admission
** Weight loss for 3 consecutive weeks (not related to loss of oedema)
** Static weight or weight loss for 4 consecutive weeks
** Poor weight gain (lt 5 g/kg/day) for 4 consecutive weeks
* Failure to lose oedema
** New appearance of oedema (onset of oedema when previously absent)
** Worsening/increase of oedema (grade 1-->2, 2-->3)
** Oedema not disappearing/reducing at 3rd week after initial appearance (static grade of oedema)
* Poor MUAC gain
** Static MUAC or MUAC loss for 2 consecutive weeks
* Discharge as not responding to treatment (status == ‘nonresponse’)

Note that deterioration in WHZ/WLZ (increased wasting), WAZ (increased underweight), HAZ (increased stunting) are NOT included -- my guess is that this is because they are difficult for workers to calculate in a typical OTP setup??? But it would be reasonably easy to include in this analysis if we set a threshold.

In general, complications requiring hospitalization (cat1) are pretty well defined in our program + data. However, poor weight gain, failure to lose oedema, and poor MUAC gain are less well defined (i.e., definitions aren't already built into the health worker ODK).


# Setup

In [1]:
!pip install import_ipynb --quiet


[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.6 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.5/1.6 MB[0m [31m13.5 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.6/1.6 MB[0m [31m27.0 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m17.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
import pandas as pd
import pickle
import statsmodels.formula.api as smf
import numpy as np
from tqdm import tqdm
import import_ipynb
from warnings import simplefilter
import os


# prompt: read google shared drive file

from google.colab import drive
drive.mount('/content/drive')

dir = "/content/drive/My Drive/[PBA] Data/"

os.chdir("/content/drive/My Drive/[PBA] Code")

from util import regress,convert_bool_to_int,infer_phq_score
os.chdir("/content")


Mounted at /content/drive


In [4]:
simplefilter(action="ignore", category=pd.errors.PerformanceWarning)
simplefilter(action="ignore", category=pd.errors.SettingWithCopyWarning)
simplefilter(action="ignore", category=FutureWarning)



In [5]:

# Load the pickle file
with open(dir + 'analysis/admit_weekly.pkl', 'rb') as f:
  admit_weekly = pickle.load(f)
with open(dir + 'analysis/admit_processed_raw.pkl', 'rb') as f:
  admit_raw = pickle.load(f)


In [6]:
numeric_cols = admit_weekly.select_dtypes(include=np.number).columns
numeric_cat1_cols = [col for col in admit_weekly.columns if col.startswith('cat1_')]
numeric_cat2_cols = [col for col in admit_weekly.columns if col.startswith('cat2_')]


# Discharge as not responding to treatment

status == 'nonresponse'

In [7]:
# prompt: find rows with current_status == 'nonresponse'

# Assuming admit_weekly DataFrame is already loaded as in the provided code.

# Find rows where 'current_status' is 'nonresponse'
nonresponse_rows = admit_weekly[admit_weekly['status'] == 'nonresponse']

admit_weekly['nonresponse'] = admit_weekly['status'] == 'nonresponse'

# Display or further process the nonresponse rows
#print(nonresponse_rows)

# Get the unique PIDs of patients with 'current_status' as 'nonresponse'
pids_nonresponse = nonresponse_rows['pid'].unique()


# Death

In [8]:
dead_rows = admit_weekly[admit_weekly['status_dead'] == True]
# Get the unique PIDs of patients with 'current_status' as 'dead'
pids_dead = dead_rows['pid'].unique()
print(len(pids_dead))
#admit_weekly.loc[admit_weekly['status_dead'] == True , 'status_dead_date'] = admit_weekly.loc[admit_weekly['status_dead'] == True, 'status_date']

#admit_weekly['status_dead_date'].notnull().sum()

#first_date_series = get_first_detn_date(admit_weekly,'status_dead',date_col='status_date')


253


# only data with visits filter

In [9]:
# prompt: get admit_weekly where calcdate_weekly is null

# Assuming admit_weekly is already loaded as in the provided code
print(admit_weekly['pid'].nunique(),admit_weekly.shape)
# Filter for rows where 'calcdate_weekly' is null
admit_weekly_no_weekly = admit_weekly[admit_weekly['calcdate_weekly'].isnull()].copy()
print(admit_weekly_no_weekly['pid'].nunique(),admit_weekly_no_weekly.shape)
print(admit_weekly['pid'].nunique() - admit_weekly_no_weekly['pid'].nunique())

pids_with_visits = list(set(admit_weekly['pid'].unique())  - set(admit_weekly_no_weekly['pid'].unique()))


10322 (66693, 1519)
575 (575, 1519)
9747


In [10]:
# prompt: drop admit_weekly rows where calcdate_weekly is null as we're only interested in visit time sequences
admit_weekly_all = admit_weekly.copy() # save all as death and nonresponse happen at admission, too

# Drop rows where 'calcdate_weekly' is null
admit_weekly.dropna(subset=['calcdate_weekly'], inplace=True)

In [11]:
# prompt: get max sequence_num by pid

# Assuming admit_weekly DataFrame is already loaded and processed as in the provided code.

# Group by 'pid' and get the maximum 'sequence_num' for each 'pid'
max_sequence_num_by_pid = admit_weekly.groupby('pid')['sequence_num'].max()

In [12]:
# prompt: left join admit_weekly to max_sequence_num_by_pid on pid

# Merge the DataFrames
admit_weekly = pd.merge(admit_weekly, max_sequence_num_by_pid.rename('max_sequence_num'), left_on='pid', right_index=True, how='left')

# Poor Weight Gain

In [13]:
# prompt: get admit_weekly where sequence_num ==3


# Filter for sequence_num == 3
admit_weekly_seq3 = admit_weekly[admit_weekly['sequence_num'] == 3]


In [14]:
# prompt: find pid with the most rows

# Find the 'pid' with the most rows
pid_counts = admit_weekly['pid'].value_counts()
pid_with_most_rows = pid_counts.index[0]
print(f"The 'pid' with the most rows is: {pid_with_most_rows}")

The 'pid' with the most rows is: 24-3193


* Weight at week 3 is lower than weight at admission
* Weight loss for 3 consecutive weeks (not related to loss of oedema)
* Static weight or weight loss for 4 consecutive weeks
* Poor weight gain (<5 g/kg/day) for 4 consecutive weeks


In [15]:
# get prior weight, lag_1
admit_weekly[f'weight_weekly_lag_1'] = admit_weekly.groupby('pid')['weight_weekly'].shift(1)


In [16]:
# prompt: Static weight or weight loss for 4 consecutive weeks

# Static weight or weight loss for 4 consecutive weeks
def static_or_weight_loss_4_weeks(df):
    # Create a boolean Series indicating whether the weight is static or decreased compared to 4 weeks prior
    df['static_weight_loss_1w'] = (df['weight_weekly'] <= df['weight_weekly_lag_1'])

    # Group by 'pid' and check for 4 consecutive True values in 'static_or_loss_4w'
    # Using rolling window to check consecutive values
    static_or_loss_4w_consecutive =df.groupby('pid')['static_weight_loss_1w'].rolling(window=4, min_periods=4).apply(lambda x: all(x), raw=True)

    # Instead of direct assignment, use reset_index to align the index:
    df['static_or_weight_loss_4_weeks'] = static_or_loss_4w_consecutive.reset_index(level=0, drop=True).fillna(False)
    pd.set_option('future.no_silent_downcasting', True)
    # prompt: convert 0 to False and 1 to True in admit_weekly['static_or_loss_4w_consecutive']
    df['static_or_weight_loss_4_weeks'] = df['static_or_weight_loss_4_weeks'].replace({0: False, 1: True})

    return df

admit_weekly = static_or_weight_loss_4_weeks(admit_weekly)

In [17]:
# prompt: Weight at week 3 is lower than weight at admission

# Assuming admit_weekly DataFrame is already loaded as in the provided code.

# Filter for sequence_num == 3
admit_weekly_seq3 = admit_weekly[admit_weekly['sequence_num'] == 3]

# Compare weight at week 3 to weight at admission
admit_weekly_seq3['weight_at_week3_lower_than_admission'] = admit_weekly_seq3['weight_weekly'] < admit_weekly_seq3['weight_admit_current']



# prompt: join admit_weekly_seq3 to admit_weekly on ['pid', 'calcdate_weekly']
# propagates future into the past however, so this is a data leak
admit_weekly = pd.merge(admit_weekly, admit_weekly_seq3[['pid', 'calcdate_weekly','weight_at_week3_lower_than_admission']], on=['pid','calcdate_weekly'], how='left')


In [18]:
# prompt: Poor weight gain (<5 g/kg/day) for 4 consecutive weeks

# Assuming admit_weekly DataFrame is already loaded as in the provided code.

def poor_weight_gain_4_weeks(df):
    # Calculate weight gain per day
    #df['weight_gain_per_day'] = df.groupby('pid')['weight_weekly'].diff() / 7  # Assuming weekly measurements
    df['weight_gain_per_day'] = df['weight_diff_weekly'] * 1000 / df['weight_weekly'] / df['calcdate_diff_weekly']

    # Check for poor weight gain (<5 g/kg/day) for 4 consecutive weeks
    df['poor_weight_gain'] = df['weight_gain_per_day'] < 5  # Adjust 5 based on your requirement

    # Group by 'pid' and check for 4 consecutive True values in 'poor_weight_gain'
    poor_weight_gain_4w_consecutive = df.groupby('pid')['poor_weight_gain'].rolling(window=4, min_periods=4).apply(lambda x: all(x), raw=True)

    # Assign the result back to the DataFrame, handling potential index mismatches
    df['poor_weight_gain_4_weeks'] = poor_weight_gain_4w_consecutive.reset_index(level=0, drop=True).fillna(False)
    df['poor_weight_gain_4_weeks'] = df['poor_weight_gain_4_weeks'].replace({0: False, 1: True})

    return df

admit_weekly = poor_weight_gain_4_weeks(admit_weekly)

In [19]:
# prompt: Weight loss for 3 consecutive weeks (not related to loss of oedema)

# Weight loss for 3 consecutive weeks (not related to loss of oedema)
def weight_loss_3_consecutive_weeks(df):
    # Check for weight loss in three consecutive weeks

    # Create a boolean Series indicating whether the weight decreased compared to prior
    df['strict_weight_loss_1w'] = (df['weight_weekly'] < df['weight_weekly_lag_1'])

    weight_loss_3_weeks =df.groupby('pid')['strict_weight_loss_1w'].rolling(window=3, min_periods=3).apply(lambda x: all(x), raw=True)

    # Instead of direct assignment, use reset_index to align the index:
    df['weight_loss_3_weeks'] = weight_loss_3_weeks.reset_index(level=0, drop=True).fillna(False)
    pd.set_option('future.no_silent_downcasting', True)
    # prompt: convert 0 to False and 1 to True in admit_weekly['weight_loss_3_weeks']
    df['weight_loss_3_weeks'] = df['weight_loss_3_weeks'].replace({0: False, 1: True})
    # not related to loss of oedema
    df['weight_loss_3_weeks'] = ((df['weight_loss_3_weeks']) & (df['cat2_oedema_weekly'] == False))


    return df

admit_weekly = weight_loss_3_consecutive_weeks(admit_weekly)

In [20]:
# prompt: get the row with max calcdate_weekly for a pid and then select those where the max row admit_weekly['static_or_weight_loss_4_weeks'] == True

# Assuming admit_weekly DataFrame is already loaded as in the provided code.

# Group by 'pid' and get the row with the maximum 'calcdate_weekly' for each 'pid'
max_calcdate_rows = admit_weekly.loc[admit_weekly.groupby('pid')['calcdate_weekly'].idxmax()]

# Filter the rows where 'static_or_weight_loss_4_weeks' is True in the max 'calcdate_weekly' rows
pids_static_or_weight_loss_4_weeks_latest = max_calcdate_rows[max_calcdate_rows['static_or_weight_loss_4_weeks'] == True]['pid'].unique()
pids_poor_weight_gain_4_weeks_latest = max_calcdate_rows[max_calcdate_rows['poor_weight_gain_4_weeks'] == True]['pid'].unique()
pids_weight_loss_3_weeks_latest = max_calcdate_rows[max_calcdate_rows['weight_loss_3_weeks'] == True]['pid'].unique()

pids_static_or_weight_loss_4_weeks = admit_weekly[admit_weekly['static_or_weight_loss_4_weeks'] == True]['pid'].unique()
pids_poor_weight_gain_4_weeks = admit_weekly[admit_weekly['poor_weight_gain_4_weeks'] == True]['pid'].unique()
pids_weight_loss_3_weeks = admit_weekly[admit_weekly['weight_loss_3_weeks'] == True]['pid'].unique()
pids_weight_at_week3_lower_than_admission = admit_weekly[admit_weekly['weight_at_week3_lower_than_admission'] == True]['pid'].unique()




In [21]:
# prompt: admit_weekly[weight_loss_ever] = (static_or_weight_loss_4_weeks | poor_weight_gain_4_weeks | weight_loss_3_weeks| weight_at_week3_lower_than_admission)

admit_weekly['detn_weight_loss_ever'] = (admit_weekly['static_or_weight_loss_4_weeks'] | admit_weekly['poor_weight_gain_4_weeks'] | admit_weekly['weight_loss_3_weeks'] | admit_weekly['weight_at_week3_lower_than_admission'])

In [22]:
# prompt: admit_weekly[detn_weight_loss_latest] = (static_or_weight_loss_4_weeks | poor_weight_gain_4_weeks | weight_loss_3_weeks| weight_at_week3_lower_than_admission) & sequence_num == max_sequence_num

admit_weekly['detn_weight_loss_latest'] = ((admit_weekly['static_or_weight_loss_4_weeks'] | admit_weekly['poor_weight_gain_4_weeks'] | admit_weekly['weight_loss_3_weeks'] | admit_weekly['weight_at_week3_lower_than_admission']) & (admit_weekly['sequence_num'] == admit_weekly['max_sequence_num']))

In [23]:
# prompt: pids_weight_loss_latest = admit_weekly[detn_weight_loss_latest] == True

pids_weight_loss_latest = admit_weekly[admit_weekly['detn_weight_loss_latest'] == True]['pid'].unique()

# prompt: pids_weight_loss_latest =  set of pids_static_or_weight_loss_4_weeks_latest, pids_poor_weight_gain_4_weeks_latest, pids_weight_loss_3_weeks_latest
# TODO why are 8 less when doing this way?
#pids_weight_loss_latest = list(set(list(pids_static_or_weight_loss_4_weeks_latest) + list(pids_poor_weight_gain_4_weeks_latest) + list(pids_weight_loss_3_weeks_latest)))

In [24]:
pids_weight_loss_ever = admit_weekly[admit_weekly['detn_weight_loss_ever'] == True]['pid'].unique()
#pids_weight_loss_ever = list(set(list(pids_static_or_weight_loss_4_weeks) + list(pids_poor_weight_gain_4_weeks) + list(pids_weight_loss_3_weeks) + list(pids_weight_at_week3_lower_than_admission)))

# cat1 complications

## count cat1, cat2 occurrences

In [25]:
# get the column names
# Filter columns that contain 'cat1' and end with '_weekly'
cat1_weekly_cols = [col for col in admit_weekly.columns if 'cat1' in col and col.endswith('_weekly')]
cat2_weekly_cols = [col for col in admit_weekly.columns if 'cat2' in col and col.endswith('_weekly')]

cat1_weekly_cols = admit_weekly[cat1_weekly_cols].select_dtypes(include=['bool']).columns
cat2_weekly_cols = admit_weekly[cat2_weekly_cols].select_dtypes(include=['bool']).columns

# Filter admit_raw columns that contain 'cat1' and can be summed
cat1_cols = admit_raw[[col for col in admit_raw.columns if 'cat1' in col]].select_dtypes(include=['bool']).columns
cat2_cols = admit_raw[[col for col in admit_raw.columns if 'cat2' in col]].select_dtypes(include=['bool']).columns


In [26]:
# prompt: get columns that contain cat1 and end in _weekly from admit_weekly
# prompt: sum cat1_cols by pid

# Group by 'pid' and sum the 'cat1' columns
cat1_sum_by_pid = admit_raw.groupby('pid')[cat1_cols].sum()
# Calculate the sum of each row in cat1_sum_by_pid
cat1_sum_by_pid = cat1_sum_by_pid.sum(axis=1)
cat1_sum_by_pid.name = 'admit_cat1_complications'
# Group by 'pid' and sum the 'cat2' columns
cat2_sum_by_pid = admit_raw.groupby('pid')[cat2_cols].sum()
# Calculate the sum of each row in cat2_sum_by_pid
cat2_sum_by_pid = cat2_sum_by_pid.sum(axis=1)
cat2_sum_by_pid.name = 'admit_cat2_complications'




In [27]:
def count_cat1_cat2(admit_weekly, cat1_weekly_cols, cat2_weekly_cols):
  # Group by 'pid' and sum the 'cat1' columns
  cat1_sum_by_pid_weekly = admit_weekly.groupby('pid')[cat1_weekly_cols].sum()
  # Calculate the sum of each row in cat1_sum_by_pid
  cat1_sum_by_pid_weekly = cat1_sum_by_pid_weekly.sum(axis=1)
  # Group by 'pid' and sum the 'cat2' columns
  cat2_sum_by_pid_weekly = admit_weekly.groupby('pid')[cat2_weekly_cols].sum()
  # Calculate the sum of each row in cat2_sum_by_pid
  cat2_sum_by_pid_weekly = cat2_sum_by_pid_weekly.sum(axis=1)
  cat1_sum_by_pid_weekly.name = 'cat1_complications_weekly'
  cat2_sum_by_pid_weekly.name = 'cat2_complications_weekly'
  return cat1_sum_by_pid_weekly, cat2_sum_by_pid_weekly


New onset medical complication - 'cat1' complication (see vars in raw ODK data with cat1_ prefix)

In [28]:
# prompt: append _weekly to weekly_raw cat1 columns


cat1_weekly = [col + '_weekly' for col in ['cat1_fever',
 'cat1_hypothermia',
 'cat1_measles',
 'cat1_breath',
 'cat1_vomiting',
 'cat1_bloodstool',
 'cat1_dehyd',
 'cat1_fissures',
 'cat1_orash',
 'cat1_ears',
 'cat1_noeat',
 'cat1_notests',
 'cat1_anemia',
 'cat1_overall']]

In [29]:
# prompt: lag each column in cat1_weekly

# Assuming 'admit_weekly' DataFrame and 'cat1_weekly' list are already defined as in the previous code.

for col in cat1_weekly:
  admit_weekly[f'{col}_lag_1'] = admit_weekly.groupby('pid')[col].shift(1)

In [30]:
# prompt: create a map of selected_columns to cat1_weekly
# Create a dictionary to map selected_columns to their corresponding weekly columns
col_map = {'cat1_fever_admit_current': 'cat1_fever_weekly',
 'cat1_hypothermia_admit_current': 'cat1_hypothermia_weekly',
 'cat1_measles_admit_current': 'cat1_measles_weekly',
 'cat1_breath_admit_current': 'cat1_breath_weekly',
 'cat1_vomiting_admit_current': 'cat1_vomiting_weekly',
 'cat1_bloodstool_admit_current': 'cat1_bloodstool_weekly',
 'cat1_dehyd_admit_current': 'cat1_dehyd_weekly',
 'cat1_fissures_admit_current': 'cat1_fissures_weekly',
 'cat1_orash_admit_current': 'cat1_orash_weekly',
 'cat1_ears_admit_current': 'cat1_ears_weekly',
 'cat1_noeat_admit_current': 'cat1_noeat_weekly',
 'cat1_notests_admit_current': 'cat1_notests_weekly',
 'cat1_anemia_admit_current': 'cat1_anemia_weekly',
 'cat1_overall_admit_current': 'cat1_overall_weekly'}

In [31]:
filtered_admit_weekly = admit_weekly[admit_weekly['sequence_num'] == 1 ]

for key, value in col_map.items():
    # Find rows where the lag of the current cat1 column is False and differs from the current value
    filtered_admit_weekly[f'{key}_diff_from_first_visit_and_admit_is_false'] = (
        (filtered_admit_weekly[key] != filtered_admit_weekly[f'{value}']) & (filtered_admit_weekly[f'{key}'] == False)
    )

rows_meeting_first_criteria = filtered_admit_weekly[filtered_admit_weekly[[f'{col}_diff_from_first_visit_and_admit_is_false' for col in col_map.keys()]].any(axis=1)].copy()

rows_meeting_first_criteria_pids = rows_meeting_first_criteria['pid'].unique()

rows_meeting_first_criteria['new_onset_medical_complication'] = True



In [32]:
# prompt: find rows where lag of cat1_weekly is false and differs from current value

# Assuming admit_weekly DataFrame and cat1_weekly list are already defined.

for col in cat1_weekly:
    # Find rows where the lag of the current cat1 column is False and differs from the current value
    admit_weekly[f'{col}_diff_from_lag_and_lag_is_false'] = (
        (admit_weekly[col] != admit_weekly[f'{col}_lag_1']) & (admit_weekly[f'{col}_lag_1'] == False)
    )

# Example: Display rows where any of the cat1 columns meet the criteria
rows_meeting_criteria = admit_weekly[admit_weekly[[f'{col}_diff_from_lag_and_lag_is_false' for col in cat1_weekly]].any(axis=1)]

rows_meeting_criteria = rows_meeting_criteria[~rows_meeting_criteria['pid'].isin(rows_meeting_first_criteria_pids)].copy()

rows_meeting_criteria['new_onset_medical_complication'] = True

# prompt: concatenate rows_meeting_criteria and rows_meeting_first_criteria

# Concatenate the two DataFrames
concatenated_rows = pd.concat([rows_meeting_criteria, rows_meeting_first_criteria])

admit_weekly = pd.merge(admit_weekly, concatenated_rows[['pid', 'calcdate_weekly','new_onset_medical_complication']], on=['pid','calcdate_weekly'], how='left')


pids_with_new_onset_medical_complication = concatenated_rows['pid'].unique()


In [33]:
# prompt: set admit_weekly['new_onset_medical_complication_latest']  = (new_onset_medical_complication ==True & sequence_num == max_sequence_num)

# Assuming admit_weekly DataFrame, max_sequence_num column, and new_onset_medical_complication column are already defined.

admit_weekly['new_onset_medical_complication_latest'] = (admit_weekly['new_onset_medical_complication'] == True) & (admit_weekly['sequence_num'] == admit_weekly['max_sequence_num'])

pids_with_new_onset_medical_complication_latest = admit_weekly[admit_weekly['new_onset_medical_complication_latest'] == True]['pid'].unique()


In [34]:
# Find column names containing 'diff_from_lag_and_lag_is_false'
columns_with_diff_from_lag = [col for col in admit_weekly.columns if 'diff_from_lag_and_lag_is_false' in col]


# prompt: for col in admit_weekly columns_with_diff_from_lag remove _weekly_diff_from_lag_and_lag_is_false and prepend y_

# Assuming admit_weekly DataFrame and columns_with_diff_from_lag list are already defined.

for col in columns_with_diff_from_lag:
    new_col_name = 'y_' + col.replace('_weekly_diff_from_lag_and_lag_is_false', '')
    admit_weekly = admit_weekly.rename(columns={col: new_col_name})


In [35]:
y_cat1 = columns_with_diff_from_lag.copy()
y_cat1 = ['y_' + col.replace('_weekly_diff_from_lag_and_lag_is_false', '') for col in y_cat1]


In [36]:
cols_admit_current_diff_from_first_visit_and_admit_is_false = [col for col in filtered_admit_weekly.columns if 'diff_from_first_visit_and_admit_is_false' in col]

for col in cols_admit_current_diff_from_first_visit_and_admit_is_false:
    new_col_name = 'temp_' + col.replace('_admit_current_diff_from_first_visit_and_admit_is_false', '')
    filtered_admit_weekly = filtered_admit_weekly.rename(columns={col: new_col_name})


In [37]:
temp_cat1 = cols_admit_current_diff_from_first_visit_and_admit_is_false.copy()
temp_cat1 = ['temp_' + col.replace('_admit_current_diff_from_first_visit_and_admit_is_false', '') for col in temp_cat1]


In [38]:
admit_weekly = admit_weekly.join(filtered_admit_weekly[temp_cat1],how='left')
for col in temp_cat1:
    admit_weekly.loc[(admit_weekly['sequence_num'] == 1), col.replace('temp_', 'y_')] = admit_weekly[col]

admit_weekly.drop(columns=temp_cat1, inplace=True)



In [39]:
[col for col in admit_weekly.columns if 'diarrhea' in col]

['c_diarrhea_admit_current',
 'cat1_diarrhea_admit_current',
 'cat2_diarrhea_admit_current',
 'info_sevdiarrhea',
 'has_c2diarrhea',
 'c_diarrhea_weekly',
 'cat1_diarrhea_weekly',
 'cat2_diarrhea_weekly']

In [40]:
admit_weekly.groupby('cat1_diarrhea_admit_current')[cat1_weekly_cols].sum().T

cat1_diarrhea_admit_current,False,True
cat1_fever_weekly,25,228
cat1_hypothermia_weekly,2,9
cat1_measles_weekly,3,7
cat1_resp_weekly,1,1
cat1_breath_weekly,5,10
cat1_vomiting_weekly,39,335
cat1_bloodstool_weekly,28,153
cat1_diarrhea_weekly,5,22
cat1_dehyd_weekly,58,310
cat1_fissures_weekly,2,24


In [41]:
# prompt: find cat1 columns in admit_weekly

# Assuming admit_weekly DataFrame is already loaded as in the provided code.

# Filter columns that contain 'cat1' and end with '_weekly'
cat1_weekly_cols = [col for col in admit_weekly.columns if 'cat1' in col]

# Print the column names
cat1_weekly_cols


['cat1_fever_admit_current',
 'cat1_hypothermia_admit_current',
 'cat1_measles_admit_current',
 'cat1_resp_admit_current',
 'cat1_breath_admit_current',
 'cat1_vomiting_admit_current',
 'cat1_bloodstool_admit_current',
 'cat1_diarrhea_admit_current',
 'cat1_dehyd_admit_current',
 'cat1_fissures_admit_current',
 'cat1_orash_admit_current',
 'cat1_eyes_admit_current',
 'cat1_ears_admit_current',
 'cat1_noeat_admit_current',
 'cat1_notests_admit_current',
 'cat1_anemia_admit_current',
 'cat1_overall_admit_current',
 'cat1_fever_weekly',
 'cat1_hypothermia_weekly',
 'cat1_measles_weekly',
 'cat1_resp_weekly',
 'cat1_breath_weekly',
 'cat1_vomiting_weekly',
 'cat1_bloodstool_weekly',
 'cat1_diarrhea_weekly',
 'cat1_dehyd_weekly',
 'cat1_fissures_weekly',
 'cat1_orash_weekly',
 'cat1_eyes_weekly',
 'cat1_ears_weekly',
 'cat1_unresolvedmalaria',
 'cat1_noeat_weekly',
 'cat1_notests_weekly',
 'cat1_anemia_weekly',
 'cat1_overall_weekly',
 'cat1_fever_weekly_lag_1',
 'cat1_hypothermia_weekly_la

# Failure to lose oedema

* New appearance of oedema (onset of oedema when previously absent)
* Worsening/increase of oedema (grade 1-->2, 2-->3)


In [42]:
# prompt: set value of None to False for admit_weekly['cat2_oedema_weekly']


admit_weekly['oedema_status_weekly'] = admit_weekly['oedema_status_weekly'].fillna('healthy')

In [43]:
# prompt: lag c_oedema_weekly,cat2_oedema_weekly

# Assuming admit_weekly DataFrame is already loaded as in the provided code.

# Create lagged columns for 'c_oedema_weekly' and 'cat2_oedema_weekly'
admit_weekly['c_oedema_weekly_lag_1'] = admit_weekly.groupby('pid')['c_oedema_weekly'].shift(1)
admit_weekly['cat2_oedema_weekly_lag_1'] = admit_weekly.groupby('pid')['cat2_oedema_weekly'].shift(1)

In [44]:
# prompt: find rows where lag of cat2_oedema_weekly_lag_1 is false and differs from current value or c_oedema_weekly is greater than lagged value

# Find rows where the lag of 'cat2_oedema_weekly_lag_1' is False and differs from the current value
# Or where 'c_oedema_weekly' is greater than the lagged value

# Assuming admit_weekly DataFrame is already loaded and processed as in the provided code.

# Identify rows meeting the specified criteria
admit_weekly['oedema_criteria_met'] = (
    (admit_weekly['cat2_oedema_weekly'] != admit_weekly['cat2_oedema_weekly_lag_1']) & (admit_weekly['cat2_oedema_weekly_lag_1'] == False) |
    (admit_weekly['c_oedema_weekly'] > admit_weekly['c_oedema_weekly_lag_1']) |
    (admit_weekly['cat2_oedema_weekly'] == True) & (admit_weekly['cat2_oedema_admit_current'] == False)
)

# Display or further process the rows where the criteria is met
oedema_rows = admit_weekly[admit_weekly['oedema_criteria_met']]
pids_oedema_criteria_met = oedema_rows['pid'].unique()


* Oedema not disappearing/reducing at 3rd week after initial appearance (static grade of oedema)

In [45]:
admit_weekly['oedema_initial_appearance'] = (
    (admit_weekly['cat2_oedema_weekly'] != admit_weekly['cat2_oedema_weekly_lag_1']) & (admit_weekly['cat2_oedema_weekly_lag_1'] == False) |
    (admit_weekly['cat2_oedema_weekly'] == True) & (admit_weekly['cat2_oedema_admit_current'] == False)
)

# Display or further process the rows where the criteria is met
oedema_appearance_rows = admit_weekly[admit_weekly['oedema_initial_appearance']]



In [46]:
# prompt: get those rows with pid, sequence_num+3 in oedema_appearance_rows

# Assuming oedema_appearance_rows DataFrame is already created as in the provided code.

# Create a new DataFrame with 'pid' and 'sequence_num+3'
new_oedema_appearance_rows = oedema_appearance_rows[['pid', 'sequence_num','cat2_oedema_weekly']].copy()
new_oedema_appearance_rows['sequence_num'] = new_oedema_appearance_rows['sequence_num'] + 3



In [47]:
# prompt: join admit_weekly,new_oedema_appearance_rows on pid,sequence_num

# Assuming admit_weekly and new_oedema_appearance_rows DataFrames are already defined.

# Perform the merge operation
admit_weekly = pd.merge(admit_weekly, new_oedema_appearance_rows, on=['pid', 'sequence_num'], how='left', suffixes=('', '_3rd_week'))

# Now 'merged_df' contains the joined data.  You can further process or analyze it as needed.

In [48]:
# prompt: find rows where cat2_oedema_weekly is True and cat2_oedema_weekly_3rd_week is True

# Assuming admit_weekly DataFrame is already loaded and processed as in the provided code.

# Find rows where both 'cat2_oedema_weekly' and 'cat2_oedema_weekly_3rd_week' are True
filtered_rows = admit_weekly[(admit_weekly['cat2_oedema_weekly'] == True) & (admit_weekly['cat2_oedema_weekly_3rd_week'].isnull())]

admit_weekly['oedema_not_disappearing'] = (admit_weekly['cat2_oedema_weekly'] == True) & (admit_weekly['cat2_oedema_weekly_3rd_week'].isnull())



# Display or further process the filtered rows
pids_oedema_not_disappearing= filtered_rows['pid'].unique()



In [49]:
# prompt: admit_weekly['failure_to_lose_oedema_latest'] = (oedema_not_disappearing | oedema_criteria_met) & sequence_num == max_sequence_num

# Assuming admit_weekly, oedema_not_disappearing, oedema_criteria_met, and max_sequence_num are defined.

admit_weekly['failure_to_lose_oedema_latest'] = ((admit_weekly['oedema_not_disappearing'] == True) | (admit_weekly['oedema_criteria_met'] == True)) & (admit_weekly['sequence_num'] == admit_weekly['max_sequence_num'])

In [50]:
pids_failure_to_lose_oedema_latest = admit_weekly[admit_weekly['failure_to_lose_oedema_latest'] == True]['pid'].unique()


In [51]:
pids_failure_to_lose_oedema = list(set(list(pids_oedema_not_disappearing) + list(pids_oedema_criteria_met) ))

# Poor MUAC gain

Static MUAC or MUAC loss for 2 consecutive weeks

In [52]:
# prompt: find 'muac_weekly' <= prior 'muac_weekly' for 2 consecutive rows

# Assuming admit_weekly DataFrame is already loaded and processed as in the provided code.

# Create a lagged column for 'muac_weekly'
admit_weekly['muac_weekly_lag_1'] = admit_weekly.groupby('pid')['muac_weekly'].shift(1)

# Check for static or MUAC loss for 2 consecutive weeks
admit_weekly['muac_loss_2_weeks'] = (admit_weekly['muac_weekly'] <= admit_weekly['muac_weekly_lag_1'])

# Group by 'pid' and check for 2 consecutive True values in 'muac_loss_2_weeks'
muac_loss_2_weeks_consecutive = admit_weekly.groupby('pid')['muac_loss_2_weeks'].rolling(window=2, min_periods=2).apply(lambda x: all(x), raw=True)

# Assign the result back to the DataFrame, handling potential index mismatches
admit_weekly['muac_loss_2_weeks_consecutive'] = muac_loss_2_weeks_consecutive.reset_index(level=0, drop=True).fillna(False)
# Now 'admit_weekly' contains a new column 'muac_loss_2_weeks_consecutive' indicating whether MUAC has been static or decreased for two consecutive weeks for each patient.

# Convert to boolean
admit_weekly['muac_loss_2_weeks_consecutive'] = admit_weekly['muac_loss_2_weeks_consecutive'].astype(bool)

# get the pids
pids_muac_loss = admit_weekly[admit_weekly['muac_loss_2_weeks_consecutive'] == True]['pid'].unique()

In [53]:
# prompt: admit_weekly[muac_loss_2_weeks_consecutive_latest] = muac_loss_2_weeks_consecutive & sequence_num = max_sequence_num

# Assuming admit_weekly and relevant columns are already defined as in the provided code.

# Create 'muac_loss_2_weeks_consecutive_latest' based on the condition
admit_weekly['muac_loss_2_weeks_consecutive_latest'] = (admit_weekly['muac_loss_2_weeks_consecutive'] == True) & (admit_weekly['sequence_num'] == admit_weekly['max_sequence_num'])

In [54]:
# get the pids
pids_muac_loss_latest = admit_weekly[admit_weekly['muac_loss_2_weeks_consecutive_latest'] == True]['pid'].unique()

In [55]:
#23-0811
#admit_weekly[admit_weekly['pid'] == '24-2250'][['pid','sequence_num','calcdate_weekly','muac_weekly','muac_loss_2_weeks_consecutive','muac_loss_2_weeks_consecutive_latest','interpolated']]

In [56]:
current = pd.read_csv("/content/drive/My Drive/[PBA] Full datasets/"+"FULL_pba_current_processed_2024-11-15.csv")

In [57]:
print(admit_weekly['pid'].nunique())

9747


# Consolidate all deterioration types

Of the 4928 patients in training data that have 1 or more visit, 1524 (30.8%) have one or more deterioration types:

1. poor weight gain (ever), 1021
2. new onset medical complications, 516
2. failur to lose oedema, 29
3. poor MUAC gain, 469
4. nonresponse to treatment, 224
5. dead, 42

total is 2283 due to patients being in multiple deterioration categories

These deteriorated at some point in their history.

701 are currently deteriorated, 14.2%



In [58]:
pids_deterioration = list(set(list(pids_weight_loss_ever) +
                              list(pids_with_new_onset_medical_complication) +
                              list(pids_failure_to_lose_oedema) +
                              list(pids_muac_loss) +
                              list(pids_nonresponse) +
                              list(pids_dead)))
print(len(pids_deterioration))

3527


these had deterioration at the latest weekly visit:

In [59]:
pids_deterioration_latest = list(set(list(pids_weight_loss_latest) +
                              list(pids_with_new_onset_medical_complication_latest) +
                              list(pids_failure_to_lose_oedema_latest) +
                              list(pids_muac_loss_latest) +
                              list(pids_nonresponse) +
                              list(pids_dead)))

print(len(pids_deterioration_latest))

1840


In [60]:
# prompt: list columns in admit_weekly starting with loc 980

first_added_col = admit_weekly.columns.get_loc('max_sequence_num')


# convert boolean to 1/0

In [61]:
# Find boolean columns
boolean_columns = admit_weekly.select_dtypes(include=['bool']).columns
print("Boolean columns:")
boolean_columns

Boolean columns:


Index(['b_referred_emergency_admit_current', 'b_presented_emergency',
       'b_prevenr', 'b_knowsbday', 'b_heightcheck', 'b_reachcheck',
       'ts_assessed_needitp', 'b_hastwin', 'b_twinalive', 'b_twinattended',
       ...
       'detn_weight_loss_ever', 'detn_weight_loss_latest',
       'new_onset_medical_complication_latest', 'oedema_criteria_met',
       'oedema_initial_appearance', 'oedema_not_disappearing',
       'failure_to_lose_oedema_latest', 'muac_loss_2_weeks',
       'muac_loss_2_weeks_consecutive',
       'muac_loss_2_weeks_consecutive_latest'],
      dtype='object', length=480)

In [62]:
# prompt: find columns that are single value and nonnull, then drop them

single_value_cols = [col for col in admit_weekly.columns if admit_weekly[col].nunique() == 1 and admit_weekly[col].notna().all()]
print("Single value columns:")
print(single_value_cols)
admit_weekly.drop(columns=single_value_cols, inplace=True)


Single value columns:
['b_referred_emergency_admit_current', 'state', 'imci_emergency_otp_admit_current', 'b_fl_nasi_admit_current', 'site_type_weekly']


In [63]:
single_value_cols = [col for col in admit_weekly_all.columns if admit_weekly_all[col].nunique() == 1 and admit_weekly_all[col].notna().all()]
print("Single value columns:")
print(single_value_cols)
admit_weekly_all.drop(columns=single_value_cols, inplace=True)



Single value columns:
[]


In [64]:
pd.set_option('future.no_silent_downcasting', True)
def convert_to_bool(df):
  # Identify columns that are True/False and convert them to boolean
  for col in df.columns:
    if pd.api.types.is_bool_dtype(df[col]):
        continue
    elif all(x in [True, False, 1, 0] for x in df[col].unique()):
        df[col] = df[col].astype(bool)
    elif all(x in [True, False, 1, 0, None] for x in df[col].unique()):
        df[col] = df[col].replace({None: False}).astype(bool)

In [65]:
# Identify columns with unique values [True, nan, False] and print null count
def find_3val_bool(df):
  for col in df.columns:
    if len(df[col].unique()) == 3:
        unique_vals = df[col].unique()
        if all(val in [True, False] or pd.isna(val) for val in unique_vals):
            null_ct = df[col].isnull().sum()
            size = df[col].size
            sum = df[col].sum()
            if null_ct > 0:
                print(f"Found 3-val bool column '{col}' with null count: {null_ct} {null_ct/size*100:.1f}% sum:{sum}")
            else:
              print(f"Found 3-val bool column '{col}' with null count: {df[col].isnull().sum()} sum:{sum}")

In [66]:
# prompt: convert detn columns with unique values [True nan False] to boolean

# Identify columns with unique values [True, nan, False] and convert them to boolean
def convert_3val_bool(df, threshold):
  for col in df.columns:
    if "lag" in col.lower():
      continue
    if len(df[col].unique()) == 3:
        unique_vals = df[col].unique()
        if all(val in [True, False] or pd.isna(val) for val in unique_vals):
            null_ct = df[col].isnull().sum()
            if null_ct < threshold:
              # print(f"Converting 3-val bool column '{col}' with null count: {null_ct}")
              df[col] = df[col].fillna(False).astype(bool)

In [67]:
find_3val_bool(admit_weekly)

Found 3-val bool column 'weight_at_week3_lower_than_admission' with null count: 56854 86.0% sum:486
Found 3-val bool column 'cat1_fever_weekly_lag_1' with null count: 9747 14.7% sum:229
Found 3-val bool column 'cat1_hypothermia_weekly_lag_1' with null count: 9747 14.7% sum:11
Found 3-val bool column 'cat1_measles_weekly_lag_1' with null count: 9747 14.7% sum:8
Found 3-val bool column 'cat1_breath_weekly_lag_1' with null count: 9747 14.7% sum:11
Found 3-val bool column 'cat1_vomiting_weekly_lag_1' with null count: 9747 14.7% sum:345
Found 3-val bool column 'cat1_bloodstool_weekly_lag_1' with null count: 9747 14.7% sum:166
Found 3-val bool column 'cat1_dehyd_weekly_lag_1' with null count: 9747 14.7% sum:329
Found 3-val bool column 'cat1_fissures_weekly_lag_1' with null count: 9747 14.7% sum:24
Found 3-val bool column 'cat1_orash_weekly_lag_1' with null count: 9747 14.7% sum:32
Found 3-val bool column 'cat1_ears_weekly_lag_1' with null count: 9747 14.7% sum:7
Found 3-val bool column 'cat1

In [68]:
convert_3val_bool(admit_weekly, len(admit_weekly))
convert_3val_bool(admit_weekly_all, len(admit_weekly_all))

In [69]:
find_3val_bool(admit_weekly)

Found 3-val bool column 'cat1_fever_weekly_lag_1' with null count: 9747 14.7% sum:229
Found 3-val bool column 'cat1_hypothermia_weekly_lag_1' with null count: 9747 14.7% sum:11
Found 3-val bool column 'cat1_measles_weekly_lag_1' with null count: 9747 14.7% sum:8
Found 3-val bool column 'cat1_breath_weekly_lag_1' with null count: 9747 14.7% sum:11
Found 3-val bool column 'cat1_vomiting_weekly_lag_1' with null count: 9747 14.7% sum:345
Found 3-val bool column 'cat1_bloodstool_weekly_lag_1' with null count: 9747 14.7% sum:166
Found 3-val bool column 'cat1_dehyd_weekly_lag_1' with null count: 9747 14.7% sum:329
Found 3-val bool column 'cat1_fissures_weekly_lag_1' with null count: 9747 14.7% sum:24
Found 3-val bool column 'cat1_orash_weekly_lag_1' with null count: 9747 14.7% sum:32
Found 3-val bool column 'cat1_ears_weekly_lag_1' with null count: 9747 14.7% sum:7
Found 3-val bool column 'cat1_noeat_weekly_lag_1' with null count: 9747 14.7% sum:267
Found 3-val bool column 'cat1_notests_weekl

In [70]:
convert_to_bool(admit_weekly)
convert_to_bool(admit_weekly_all)
# prompt: get boolean columns in det

# Assuming 'detn' DataFrame is already loaded as in the provided code.

boolean_columns = admit_weekly.select_dtypes(include=['bool']).columns
print("Boolean columns:")
print(boolean_columns.tolist())

# Convert boolean columns to numeric
for col in boolean_columns:
    admit_weekly[col] = admit_weekly[col].astype(int)

Boolean columns:
['b_presented_emergency', 'b_prevenr', 'b_knowsbday', 'b_heightcheck', 'b_reachcheck', 'ts_assessed_needitp', 'b_hastwin', 'b_twinalive', 'b_twinattended', 'ref_g6u4kg', 'ref_tsref', 'ref_u6sam', 'ref_oedg3', 'ref_oedsam', 'b_needsitp', 'b_wasreferred_admit_current', 'gave_al_act_admit_current', 'b_cgishoh', 'b_isvaxed', 'ses_b_foodsecurity', 'b_movenextvisit_admit_current', 'b_motheralive', 'b_fatheralive', 'b_hadbirthvax_admit', 'b_had6wvax_admit', 'b_rota1diff_admit', 'b_ipv1diff_admit', 'b_had10wvax_admit', 'b_rota2diff_admit', 'b_had14wvax_admit', 'b_rota3diff_admit', 'b_ipv2diff_admit', 'dayssincevita', 'b_has_phone_number_admit', 'b_wast_admit', 'b_muac_waz_admit', 'b_muac_wfh_admit', 'lean_season_admit', 'rainy_season_admit', 'b_outreach_admit_current', 'imci_emergency_itp', 'b_havepid', 'b_hasbc', 'pull_prev_study_recamox', 'b_correct_prevstatus', 'age_height_check', 'age_reach_check', 'b_stand_check_admit_current', 'b_phys_req_itp_admit_current', 'b_twin', 'b

# prepare training data

In [71]:
detn_cols = ['detn_weight_loss_ever','new_onset_medical_complication','muac_loss_2_weeks_consecutive','oedema_not_disappearing','nonresponse','status_dead']

detn_weight_loss_cols = ['static_or_weight_loss_4_weeks','poor_weight_gain_4_weeks','weight_loss_3_weeks','weight_at_week3_lower_than_admission']

admit_weekly[detn_cols].sum()

Unnamed: 0,0
detn_weight_loss_ever,4302
new_onset_medical_complication,1284
muac_loss_2_weeks_consecutive,1986
oedema_not_disappearing,40
nonresponse,7129
status_dead,395


In [74]:
# add last weekly row to admit_row
start_col = admit_weekly.columns.get_loc('calcdate_weekly')
end_col = admit_weekly.columns.get_loc('sequence_num')

weekly_columns = admit_weekly.columns[start_col:end_col +1]

# Add 'pid' to weekly_columns
weekly_columns = weekly_columns.tolist()  # Convert Index to list for mutability

weekly_columns.insert(0, 'weight')
weekly_columns.insert(0, 'muac')
weekly_columns.remove('weight_weekly')
weekly_columns.remove('muac_weekly')

weekly_columns.insert(0, 'wfh')
weekly_columns.insert(0, 'hfa')
weekly_columns.remove('wfh_weekly')
weekly_columns.remove('hfa_weekly')
weekly_columns.insert(0, 'wfa')
weekly_columns.remove('wfa_weekly')



if 'pid' not in weekly_columns:
    weekly_columns.insert(0, 'pid') #add pid to the beginning of the list

In [75]:
def trend(detn_prior,admit_weekly,admit,detn_col):
  # concatenate admit to admit_weekly['pid','calcdate_weekly','weight_weekly']
  # Concatenate admit to admit_weekly

  anthros = pd.concat([detn_prior[['pid','calcdate_weekly','weight','muac','hl','wfh', 'hfa', 'wfa']], admit], ignore_index=True)
  # prompt: sort anthros by pid, calcdate_weekly

  # Sort the 'anthros' DataFrame by 'pid' and then 'calcdate_weekly' so admittance row is first for each pid
  anthros = anthros.sort_values(by=['pid', 'calcdate_weekly'])
  # prompt: group anthros by pid, diff calcdate_weekly cumulative days from the first row in that group

  # Group by 'pid' and calculate the cumulative difference in days from the first 'calcdate_weekly'
  anthros['calcdate_weekly'] = pd.to_datetime(anthros['calcdate_weekly'])
  anthros['days_since_first'] = anthros.groupby('pid')['calcdate_weekly'].diff().dt.days
  # cumulative days is the regressor column
  anthros['cumulative_days'] = anthros.groupby('pid')['days_since_first'].cumsum().fillna(0)
  anthros.drop(columns=['days_since_first'], inplace=True)

  # prompt: for each pid in admit call weight_regress and add the first return value as 'weight_trend" and second as weight-rsquared columns in admit
  trend_df = pd.DataFrame(columns=['pid'])
  positive_pids = admit_weekly.loc[admit_weekly[detn_col] == True, 'pid'].unique()
  #prompt: for each anthro_col in
  # prompt: for each anthro_col in 'weight_weekly','muac_weekly','hl_weekly','wfhz_weekly', 'hfaz_weekly', 'wfaz_weekly':
  for anthro_col in ['wfh', 'hfa', 'wfa','weight','muac','hl']:
    print(anthro_col)
    # prompt: for each pid in admit call regress and add the first return value as f'{anthro_col}_trend'" and second as f'{anthro_col}_rsquared columns in admit

    # Apply the function to each unique 'pid' and create new columns
    results = []
    # only recalculate the trends for the partial weeklies for the pids with the deterioriation
    for pid in tqdm(positive_pids):
        trend, r_squared = regress(anthros, pid,anthro_col)
        results.append({'pid': pid, f'{anthro_col}_trend': trend, f'{anthro_col}_rsquared': r_squared})

    # Convert the list of dictionaries to a DataFrame
    results_df = pd.DataFrame(results)

    # Merge the results back into the 'admit' DataFrame
    trend_df = pd.merge(trend_df, results_df, on='pid', how='right')
    print(trend_df.shape)

  # np.-inf breaks downstream models
  rsquared_columns = [col for col in trend_df.columns if col.endswith('_rsquared')]
  trend_df[rsquared_columns] = trend_df[rsquared_columns].replace(-np.inf, 0)
  # just re-use the full weekly for the negative pids, to save time
  # Filter admit_weekly for rows where pid is NOT in positive_pids and sequence_num is 1
  filtered_admit_weekly = admit_weekly[~admit_weekly['pid'].isin(positive_pids) & (admit_weekly['sequence_num'] == 1)].copy()
  filtered_admit_weekly.rename(columns={'weight_weekly': 'weight', 'muac_weekly': 'muac'},inplace=True)
  filtered_admit_weekly.rename(columns={'wfa_weekly': 'wfa', 'wfh_weekly': 'wfh','hfa_weekly': 'hfa' },inplace=True)
  trend_df = pd.concat([filtered_admit_weekly[['pid', 'wfh_trend', 'wfh_rsquared', 'hfa_trend', 'hfa_rsquared',
       'wfa_trend', 'wfa_rsquared', 'weight_trend', 'weight_rsquared',
       'muac_trend', 'muac_rsquared', 'hl_trend', 'hl_rsquared']],
                        trend_df],
                       ignore_index=True)
  # prompt: get row count by pid in admit_weekly and append that column to admit
  # Group by 'pid' and count the number of rows for each 'pid'
  row_counts_by_pid = detn_prior.groupby('pid')['pid'].count()

  # Rename the 'pid' column to 'row_count'
  row_counts_by_pid = row_counts_by_pid.rename('row_count')

  # Merge the row counts back into the 'admit' DataFrame
  trend_df = pd.merge(trend_df, row_counts_by_pid, left_on='pid', right_index=True, how='left')

  return trend_df

# Export

only export where pid in pids_with_visits
as deterioration by definition requires us to look at a change since admission


In [76]:
def convert_recent_weeklies_to_series(detn_prior, num_of_visits = 2,weekly_columns=weekly_columns):
  # Group by 'pid' and assign rank within each group based on 'sequence_num'
  detn_prior['reverse_sequence_num'] = detn_prior.groupby('pid')['sequence_num'].rank(method='dense', ascending=False)
  latest_visits = detn_prior[detn_prior['reverse_sequence_num'].isin(np.arange(1, num_of_visits+1))][weekly_columns]
  latest_visits.loc[((latest_visits['final_numweeksback'] == 0) | ((latest_visits['final_numweeksback'] > 1) & (latest_visits['final_numweeksback'] < 2))), 'final_numweeksback'] = 1
  latest_visits['final_numweeksback'] = latest_visits['final_numweeksback'].fillna(1)
  # Replace NaN values with 1 as values are only 1 and 2
  latest_visits.sort_values(by=['pid', 'sequence_num'], ascending=[True,False],inplace=True)
  # make wk1 the most recent week
  visit_series = (latest_visits.assign(col=latest_visits.groupby('pid').cumcount()+1)
   .set_index(['pid','col'])
   .unstack('col')
   .sort_index(level=(1,0), axis=1)
  )
  visit_series.columns = [f"wk{y}_{x}" for x,y in visit_series.columns]
  # prompt: make visit_series.index a column named 'pid'
  visit_series = visit_series.reset_index()
  return visit_series


In [77]:
def remove_active_most_recent_weekly(admit_weekly):
  # prompt: get admit_weekly unique pids with status=='active'
  recent_pids = admit_weekly[(admit_weekly['status'] == 'active')]['pid'].unique()

  # prompt: delete the most recent calcdate_weekly from admit_weekly where pid in recent_pids
  # Group by pid and find the maximum calcdate_weekly for each pid in recent_pids
  max_calcdate_weekly = admit_weekly[admit_weekly['pid'].isin(recent_pids)].groupby('pid')['calcdate_weekly'].max()

  # Merge the maximum calcdate_weekly back into the original dataframe
  admit_weekly = admit_weekly.merge(max_calcdate_weekly.rename('max_calcdate_weekly'), left_on='pid', right_index=True, how='left')

  # Filter out rows with calcdate_weekly equal to the maximum for each pid in recent_pids
  rows_to_delete = admit_weekly[(admit_weekly['pid'].isin(recent_pids)) & (admit_weekly['calcdate_weekly'] == admit_weekly['max_calcdate_weekly'])]

  # Delete the rows
  admit_weekly = admit_weekly.drop(rows_to_delete.index)

  # Drop the temporary 'max_calcdate_weekly' column
  admit_weekly = admit_weekly.drop('max_calcdate_weekly', axis=1)
  return admit_weekly

In [78]:
# get the admittance date, weight and muac
admit = admit_weekly[['pid','calcdate_admit_current','weight_admit_current','muac_admit_current','hl_admit','wfh_admit_current','hfa_admit_current','wfa_admit_current']].drop_duplicates(subset=['pid'], keep='last')

# make the admit columns look like the weekly ones
admit.rename(columns={'calcdate_admit_current': 'calcdate_weekly',
                      'weight_admit_current': 'weight',
                      'muac_admit_current': 'muac',
                      'hl_admit': 'hl',
                      'wfa_admit_current' : 'wfa',
                      'hfa_admit_current' : 'hfa',
                      'wfa_admit_current' : 'wfa'
                      }, inplace=True)


In [79]:
# prompt: keep the most recent num_recent-most calcdate_weekly from admit_weekly groupby('pid') and pid in recent_pids

def remove_recent_weeklies(admit_weekly, recent_pids, num_recent=4):
    """Removes the most recent weekly entries for each pid.

    Args:
        admit_weekly: DataFrame.
        num_recent: The number of recent entries to remove.
        recent_pids: List of PIDs for which to remove recent entries.

    Returns:
        DataFrame: Modified DataFrame with recent entries removed.
    """

    # Group by 'pid' and rank the rows by 'calcdate_weekly' in descending order.
    admit_weekly['rank'] = admit_weekly[admit_weekly['pid'].isin(recent_pids)].groupby('pid')['calcdate_weekly'].rank(method='dense', ascending=False)

    # Identify rows to delete (most recent num_recent entries)
    rows_to_delete = admit_weekly[admit_weekly['rank'] <= num_recent]

    # Drop the identified rows
    admit_weekly = admit_weekly.drop(rows_to_delete.index)
    admit_weekly.drop(columns=['rank'], inplace=True)
    return admit_weekly
# Example usage (assuming recent_pids is defined):
# admit_weekly = remove_recent_weeklies(admit_weekly, recent_pids)

In [80]:
 def weekly_agg(detn_prior,admit):
  anthros = pd.concat([detn_prior[['pid','calcdate_weekly','weight','muac','hl','wfh', 'hfa', 'wfa']], admit], ignore_index=True)
  # prompt: sort anthros by pid, calcdate_weekly

  # Sort the 'anthros' DataFrame by 'pid' and then 'calcdate_weekly' so admittance row is first for each pid
  anthros = anthros.sort_values(by=['pid', 'calcdate_weekly'])


  weekly_agg = anthros.groupby('pid').agg(
    weekly_row_count=('pid', 'count'),
    weekly_first_admit=('calcdate_weekly', 'first'),
    weekly_last_admit=('calcdate_weekly', 'last'),
    weekly_last_muac=('muac', 'last'),
    weekly_first_muac=('muac', 'first'),
    weekly_avg_muac=('muac', 'mean'),
    weekly_first_weight=('weight', 'first'),
    weekly_last_weight=('weight', 'last'),
    weekly_avg_weight=('weight', 'mean'),
    weekly_first_hl =('hl', 'first'),
    weekly_last_hl=('hl', 'last'),
    weekly_min_hl=('hl', 'min'),
    weekly_max_hl=('hl', 'max'),
    weekly_avg_hl=('hl', 'mean'),
    weekly_first_wfh=('wfh', 'first'),
    weekly_last_wfh=('wfh', 'last'),
    weekly_min_wfh=('wfh', 'min'),
    weekly_max_wfh=('wfh', 'max'),
    weekly_avg_wfh=('wfh', 'mean'),
    weekly_first_hfa=('hfa', 'first'),
    weekly_last_hfa=('hfa', 'last'),
    weekly_min_hfa=('hfa', 'min'),
    weekly_max_hfa=('hfa', 'max'),
    weekly_avg_hfa=('hfa', 'mean'),
    weekly_first_wfa=('wfa', 'first'),
    weekly_last_wfa=('wfa', 'last'),
    weekly_min_wfa=('wfa', 'min'),
    weekly_max_wfa=('wfa', 'max'),
    weekly_avg_wfa=('wfa', 'mean'),
  )

  weekly_agg['muac_diff'] = weekly_agg['weekly_last_muac'] - weekly_agg['weekly_first_muac']
  weekly_agg['weight_diff'] = weekly_agg['weekly_last_weight'] - weekly_agg['weekly_first_weight']
  weekly_agg['calcdate_diff'] = weekly_agg['weekly_last_admit'] - weekly_agg['weekly_first_admit']
  weekly_agg['calcdate_diff'] = weekly_agg['calcdate_diff'].dt.total_seconds() / (24 * 60 * 60)
  weekly_agg['hl_diff'] = weekly_agg['weekly_last_hl'] - weekly_agg['weekly_first_hl']
  weekly_agg['wfh_diff'] = weekly_agg['weekly_last_wfh'] - weekly_agg['weekly_first_wfh']
  weekly_agg['hfa_diff'] = weekly_agg['weekly_last_hfa'] - weekly_agg['weekly_first_hfa']
  weekly_agg['wfa_diff'] = weekly_agg['weekly_last_wfa'] - weekly_agg['weekly_first_wfa']


  weekly_agg['weight_diff_ratio'] = weekly_agg['weight_diff']/weekly_agg['weekly_first_weight']
  weekly_agg['weight_diff_ratio_rate'] = weekly_agg['weight_diff_ratio']/weekly_agg['calcdate_diff']
  weekly_agg['muac_diff_ratio'] = weekly_agg['muac_diff']/weekly_agg['weekly_first_weight']
  weekly_agg['muac_diff_ratio_rate'] = weekly_agg['muac_diff_ratio']/weekly_agg['calcdate_diff']

  weekly_agg['hl_diff_ratio'] = weekly_agg['hl_diff']/weekly_agg['weekly_first_hl']
  weekly_agg['hl_diff_ratio_rate'] = weekly_agg['hl_diff_ratio']/weekly_agg['calcdate_diff']
  weekly_agg['wfh_diff_ratio'] = weekly_agg['wfh_diff']/weekly_agg['weekly_first_wfh']
  weekly_agg['wfh_diff_ratio_rate'] = weekly_agg['wfh_diff_ratio']/weekly_agg['calcdate_diff']

  weekly_agg['hfa_diff_ratio'] = weekly_agg['hfa_diff']/weekly_agg['weekly_first_hfa']
  weekly_agg['hfa_diff_ratio_rate'] = weekly_agg['hfa_diff_ratio']/weekly_agg['calcdate_diff']
  weekly_agg['wfa_diff_ratio'] = weekly_agg['wfa_diff']/weekly_agg['weekly_first_wfa']
  weekly_agg['wfa_diff_ratio_rate'] = weekly_agg['wfa_diff_ratio']/weekly_agg['calcdate_diff']

  return weekly_agg


In [82]:
# Load the mental health
with open(dir + 'analysis/admit_current_mh.pkl', 'rb') as f:
  admit_current_mh = pickle.load(f)

with open(dir + 'analysis/admit_current.pkl', 'rb') as f:
  admit_current = pickle.load(f)

admit_current_mh = convert_bool_to_int(admit_current_mh)

In [141]:
def only_rows_before_detn(detn, detn_col):
  # Get unique PIDs where 'detn_col' is True in the detn DataFrame
  detn_ever_pids = detn.loc[detn[detn_col] == True, 'pid'].unique()
  # Find PIDs in 'detn' that are NOT in 'detn_ever_pids'
  pids_not_in_ever_pids = detn.loc[~detn['pid'].isin(detn_ever_pids), 'pid'].unique()

  # prompt: remove rows with sequence_number >= first detn_ever group by pid

  # Group by 'pid' and find the first occurrence of 'detn_ever' == True
  if detn_col == 'nonresponse':
    first_detn_ever = detn.loc[detn[detn_col] == True].groupby('pid')['sequence_num'].max().reset_index()
  else:
    first_detn_ever = detn.loc[detn[detn_col] == True].groupby('pid')['sequence_num'].min().reset_index()

  # prompt: get admit_weekly[y_cat1] for max sequence_number by pid

  # Get admit_weekly[y_cat1] for max sequence_number by pid
  y_cat1_copy = y_cat1.copy()
  y_cat1_copy.insert(0, 'pid')

  # Rename the 'sequence_number' column to 'first_detn_seq' for clarity
  first_detn_ever = first_detn_ever.rename(columns={'sequence_num': 'first_detn_seq'})



  #detn.drop(columns=['first_detn_seq'], inplace=True)
  # Merge the 'first_detn_seq' back into the original DataFrame
  detn = pd.merge(detn, first_detn_ever, on='pid', how='left')

  y_detn_cat1 = detn[detn['sequence_num']==detn['first_detn_seq'] ][y_cat1_copy].copy()

  # Filter out rows where 'sequence_number' is greater than or equal to 'first_detn_seq'
  #max_sequence_rows = detn.loc[detn.groupby('pid')['sequence_num'].idxmax()]

  # Filter out rows where 'sequence_number' is greater than or equal to 'first_detn_seq'
  #print(detn[detn['pid']=='23-0107'][['pid','sequence_num','first_detn_seq']])
  seq_ct = 0
  if detn_col == 'nonresponse':
    # for nonresponse, discard the few rows before the event happened to discourage the model from keying on los or duration
    seq_ct = 3

  detn_prior = detn[((detn['sequence_num'] + seq_ct) < detn['first_detn_seq']) & (detn['pid'].isin(detn_ever_pids)) ].copy()

  #print(detn_prior[detn_prior['pid']=='23-0107'][['pid','sequence_num','first_detn_seq']])

  #print(detn.loc[detn['detn_ever'] == False].shape)

  # detn_prior contains only rows before the first deterioration for each patient plus all non-deteriorated patients with all their rows
  # Concatenate detn_prior and detn.loc[detn['detn_ever'] == False]
  detn_prior = pd.concat([detn_prior, detn[~detn['pid'].isin(detn_ever_pids)]])
  # clean up the working column
  detn.drop(columns=['first_detn_seq'], inplace=True)

  # Create a pandas Series where the index is detn_ever_pids and the value is True
  detn_ever_pids_series = pd.Series(index=detn_ever_pids, data=True)

  pids_not_in_ever_pids = pd.Series(index=pids_not_in_ever_pids, data=False)
  # Concatenate the two Series
  y_detn = pd.concat([detn_ever_pids_series,pids_not_in_ever_pids])
  # Rename the Series
  y_detn.name = detn_col
  #print('a',y_detn_cat1.shape)

  y_detn_cat1 = pd.merge(y_detn_cat1, y_detn, how='right', left_on='pid',right_on=y_detn.index)
  #print('b',y_detn_cat1.shape,y_detn.shape)
  y_detn_cat1.fillna(False, inplace=True)
  for col in y_cat1:
    y_detn_cat1[col] = y_detn_cat1[col].astype(int)
  return detn_prior, y_detn, y_detn_cat1


In [138]:
def prepare_export(detn_col='new_onset_medical_complication'):
  # get rows prior to the deterioration
  y_detn_cat1 = pd.DataFrame()

  recent_pids = admit_weekly[(admit_weekly['status'] == 'active')]['pid'].unique()
  if detn_col in ['new_onset_medical_complication']:
    # remove the row that may have the deterioration we want to predict, if pid is currently active
    detn_prior,y_detn,y_detn_cat1 = only_rows_before_detn(remove_recent_weeklies(admit_weekly, recent_pids, num_recent=1), detn_col)
  elif detn_col in ['oedema_not_disappearing']:
    # remove the row that may have the deterioration we want to predict, if pid is currently active
    detn_prior,y_detn,_ = only_rows_before_detn(remove_recent_weeklies(admit_weekly, recent_pids, num_recent=1), detn_col)
  elif detn_col == 'muac_loss_2_weeks_consecutive':
    # remove the 2 rows that may have the deterioration we want to predict, if pid is currently active
    detn_prior,y_detn,_ = only_rows_before_detn(remove_recent_weeklies(admit_weekly, recent_pids, num_recent=2), detn_col)
  elif detn_col == 'detn_weight_loss_ever':
    # remove the 4 rows that may have the deterioration we want to predict, if pid is currently active
    detn_prior,y_detn,_ = only_rows_before_detn(remove_recent_weeklies(admit_weekly, recent_pids, num_recent=4), detn_col)
  elif detn_col == 'status_dead':
    y_detn = pd.Series(index=admit_weekly_all['pid'].unique(), dtype=bool)
    y_detn[:] = 0  # Initialize all values to False
    y_detn[pids_dead] = 1
    y_detn.rename(detn_col,inplace=True)
    # remove no rows as death status is set from current status
    detn_prior = admit_weekly_all.copy()
  elif detn_col == 'nonresponse':
    #y_detn = pd.Series(index=admit_weekly_all['pid'].unique(), dtype=bool)
    #y_detn[:] = 0  # Initialize all values to False
    #y_detn[pids_nonresponse] = 1
    #y_detn.rename(detn_col,inplace=True)
    # remove no rows as nonresponse is set from current status
    #detn_prior = admit_weekly_all.copy()
    detn_prior,y_detn,_ = only_rows_before_detn(remove_recent_weeklies(admit_weekly, recent_pids, num_recent=4), detn_col)

  else:
    detn_prior,y_detn,_ = only_rows_before_detn(admit_weekly, detn_col)

  # get weekly aggregate stats
  detn_prior.rename(columns={'weight_weekly': 'weight', 'muac_weekly': 'muac'},inplace=True)
  detn_prior.rename(columns={'wfa_weekly': 'wfa', 'wfh_weekly': 'wfh','hfa_weekly': 'hfa' },inplace=True)
  detn_prior.sort_values(by=['pid', 'calcdate_weekly'], inplace=True)
  weekly_agg_stats = weekly_agg(detn_prior,admit)
  #print(weekly_agg_stats[weekly_agg_stats['pid']== '24-3335'])
  detn_prior.rename(columns={'weight_weekly': 'weight', 'muac_weekly': 'muac'},inplace=True)
  detn_prior.rename(columns={'wfa_weekly': 'wfa', 'wfh_weekly': 'wfh','hfa_weekly': 'hfa' },inplace=True)
  # get trend for those rows
  trend_stats = trend(detn_prior,admit_weekly,admit,detn_col)
  visit_series = convert_recent_weeklies_to_series(detn_prior, num_of_visits = 3,weekly_columns=weekly_columns)
  export = pd.merge(admit_raw, visit_series, on='pid', how='left') # no overlap so suffix isn't used
  # add weekly stats columns to admit_raw
  export = pd.merge(export, weekly_agg_stats, on='pid', how='left')
  # add trends to admit_raw
  export = pd.merge(export, trend_stats, on='pid', how='left')

  # Merge with admit_raw cat2_sum_by_pid based on the 'pid' column
  export = pd.merge(export, cat1_sum_by_pid, on='pid', how='left')
  export = pd.merge(export, cat2_sum_by_pid, on='pid', how='left')

  # get weekly cat1, cat2 counts up to deterioration
  numeric_cols = detn_prior.select_dtypes(include=['number', 'bool']).columns
  numeric_cat1_cols = [col for col in numeric_cols if col.startswith('cat1_')]
  numeric_cat2_cols = [col for col in numeric_cols if col.startswith('cat2_')]


  cat1_sum_by_pid_weekly, cat2_sum_by_pid_weekly = count_cat1_cat2(detn_prior, numeric_cat1_cols, numeric_cat2_cols)
  #cat1_sum_by_pid_weekly, cat2_sum_by_pid_weekly = count_cat1_cat2(detn_prior, cat1_weekly_cols, cat2_weekly_cols)

  export = pd.merge(export, cat1_sum_by_pid_weekly, on='pid', how='left')
  export = pd.merge(export, cat2_sum_by_pid_weekly, on='pid', how='left')

  # prompt: filter export where pid in pids_with_visits
  # as deterioration by definition requires us to look at a change since admission
  #export = export[export['pid'].isin(pids_with_visits)]


  # prompt: find columns that are single value and nonnull, then drop them

  single_value_cols = [col for col in export.columns if export[col].nunique() == 1 and export[col].notna().all()]

  export.drop(columns=single_value_cols, inplace=True)
  convert_3val_bool(export, len(export))
  convert_to_bool(export)
  boolean_columns = export.select_dtypes(include=['bool']).columns
  # Convert boolean columns to numeric
  for col in boolean_columns:
    export[col] = export[col].astype(int)
  export = infer_phq_score(admit_current_mh,admit_current,export)


  return export,y_detn,y_detn_cat1



In [84]:
# prompt: select number, int and boolean columns from admit_weekly

# Assuming 'admit_weekly' DataFrame is already defined and loaded.
# Example usage:

# Select number, integer and boolean columns
numeric_cols = admit_weekly.select_dtypes(include=['number', 'bool']).columns
selected_columns = admit_weekly[numeric_cols]

#print(selected_columns.head())


In [85]:
# prompt: find columns in admit_weekly that are numeric and start with 'cat1_'

# Assuming 'admit_weekly' is your DataFrame.
numeric_cat1_cols = admit_weekly.select_dtypes(include=np.number).columns
result = [col for col in numeric_cat1_cols if col.startswith('cat1_')]
#result


In [86]:
deterioration_types = ['detn_weight_loss_ever','new_onset_medical_complication','muac_loss_2_weeks_consecutive','oedema_not_disappearing','nonresponse','status_dead']

In [87]:
def get_first_detn_date(admit_weekly,variable,date_col='calcdate_weekly'):
  # Group by 'pid' and filter for 'new_onset_medical_complication' == True
  filtered_df = admit_weekly[admit_weekly[variable] == True].groupby('pid')
  # 'status_date' for nonresponse variable
  # Get the minimum 'calcdate_weekly' for each group
  min_calcdate = filtered_df[date_col].min()

  min_calcdate.rename(f'{variable}_date', inplace=True)
  min_calcdate = min_calcdate.reset_index()

  return min_calcdate

In [142]:
for col in deterioration_types:
  print(col)
  export,y_detn,y_detn_cat1 = prepare_export(detn_col=col)


  # get date of when deterioration first occurred and set it (for hazard analysis)
  if col in ['nonresponse','status_dead']:
    first_detn_date= get_first_detn_date(admit_weekly_all,col,'status_date')
  else:
    first_detn_date= get_first_detn_date(admit_weekly,col,'calcdate_weekly')

  #first_detn_date= get_first_detn_date(admit_weekly,col,date_col)
  export = pd.merge(export, first_detn_date, on='pid', how='left')
  # prompt: add series y_detn_ever as column to admit_raw
  # do inner join which discards patients w/no visit as their deterioration status needs to be decided still,
  # TODO probably could include death cases
  #current = pd.read_csv(dir+"train_pba_current_processed_2024-11-02.csv")

  # just include all pids
  detn_ever_pids = admit_weekly.loc[admit_weekly[col] == True, 'pid'].unique()
  detn_ever_pids_series = pd.Series(index=detn_ever_pids, data=True)
  pids_not_in_ever_pids = admit_weekly.loc[~admit_weekly['pid'].isin(detn_ever_pids), 'pid'].unique()
  pids_not_in_ever_pids_series = pd.Series(index=pids_not_in_ever_pids, data=False)
  # Concatenate the two Series
  y_detn_all = pd.concat([pids_not_in_ever_pids_series,detn_ever_pids_series])
  # Rename the Series
  y_detn_all.name = col
  print(y_detn_all.sum())
  if col == 'new_onset_medical_complication':
    export = export.merge(y_detn_cat1, on='pid', how='left')
  elif col in ['nonresponse','status_dead']:
      y_detn.name = col
      export = export.merge(y_detn, left_on='pid', right_index=True, how='left')
  else:
    export = export.merge(y_detn_all, left_on='pid', right_index=True, how='left')
  export = export.replace(-np.inf, 0)
  export[col].fillna(False, inplace=True)
  export[col] = export[col].astype(int)
  export['row_count'].fillna(0,inplace=True)
  export['weekly_row_count'].fillna(0,inplace=True)
  print('c',export.shape,export['pid'].nunique(),export[col].sum())
  with open(dir + f"analysis/{col}.pkl", "wb") as f:
    pickle.dump(export, f)


nonresponse
pid      sequence_num
25-1505  11.0            1
23-0107  11.0            1
23-0160  11.0            1
23-0165  11.0            1
23-0227  12.0            1
                        ..
23-0651  14.0            1
23-0475  11.0            1
23-0365  15.0            1
23-0362  11.0            1
23-0351  11.0            1
Name: count, Length: 584, dtype: int64
        pid  sequence_num  first_detn_seq
45  23-0107           1.0            11.0
46  23-0107           2.0            11.0
47  23-0107           3.0            11.0
48  23-0107           4.0            11.0
49  23-0107           5.0            11.0
50  23-0107           6.0            11.0
51  23-0107           7.0            11.0
52  23-0107           8.0            11.0
53  23-0107           9.0            11.0
54  23-0107          10.0            11.0
55  23-0107          11.0            11.0
        pid  sequence_num  first_detn_seq
45  23-0107           1.0            11.0
46  23-0107           2.0            11.0


100%|██████████| 584/584 [00:08<00:00, 65.54it/s]


(584, 3)
hfa


100%|██████████| 584/584 [00:08<00:00, 71.98it/s]


(584, 5)
wfa


100%|██████████| 584/584 [00:07<00:00, 76.08it/s]


(584, 7)
weight


100%|██████████| 584/584 [00:08<00:00, 66.40it/s]


(584, 9)
muac


100%|██████████| 584/584 [00:07<00:00, 81.70it/s]


(584, 11)
hl


100%|██████████| 584/584 [00:08<00:00, 67.55it/s]


(584, 13)
584
c (10322, 2331) 10322 584
