In [3]:
import pandas as pd
import os

**STATHIST_KIPA Cleaning**

Dropped features

- CAN_INIT_INACT_STAT_DT: 27% missing – absence likely indicates still-active status (implicitly captured), so explicit “first inactive” date is redundant.

- CAN_LAST_ACT_STAT_DT: 5% missing – largely replicates CANHX_BEGIN_DT_TM (start of each status period), so unnecessary.

- CAN_LAST_INACT_STAT_DT: 27% missing – same rationale as CAN_INIT_INACT_STAT_DT: missingness itself conveys “not yet inactive,” and explicit end-date not needed.

Kept features

- PX_ID (Patient ID) – primary key for merges.

- WL_ORG (Organ type) – identifies kidney vs. other listings.

- CANHX_BEGIN_DT_TM (WL status period begin) – defines when each active/inactive interval starts.

- CANHX_END_DT (WL status period end date) – endpoint of waiting-list status period.

- CANHX_END_DT_TM (WL status period end datetime) – higher-precision end timestamp.

- CAN_REM_CD (Reason removal) – captures why candidate left the list (transplant, death, other).

- CANHX_STAT_CD (WL status code) – indicates active vs. inactive listing during each interval.

- CAN_LISTING_DT (Listing date) – baseline when candidate first added to waiting list.

- CAN_INIT_ACT_STAT_DT (Date first active) – confirms initial activation date, even if later periods follow.



In [None]:
SUBSET_FOLDER = "/Users/chanyoungwoo/Thesis/Data_Extraction/extracted_subsets"
OUTPUT_FOLDER = "/Users/chanyoungwoo/Thesis/Data_Extraction/clean_subsets_ver1"
os.makedirs(OUTPUT_FOLDER, exist_ok=True)

stathist = pd.read_csv(os.path.join(SUBSET_FOLDER, "stathist_kipa_subset.csv"))

to_drop = [
    "CAN_INIT_INACT_STAT_DT",
    "CAN_LAST_ACT_STAT_DT",
    "CAN_LAST_INACT_STAT_DT",
]
stathist_clean = stathist.drop(columns=to_drop)

out_path = os.path.join(OUTPUT_FOLDER, "stathist_kipa_clean.csv")
stathist_clean.to_csv(out_path, index=False)

print(f"Cleaned STATHIST_KIPA saved to {out_path}")

Cleaned STATHIST_KIPA saved to /Users/chanyoungwoo/Thesis/Data_Extraction/clean_subsets_ver1/stathist_kipa_clean.csv
