# WOMAC NaN Distribution

Here, the moaks_shared_womac_df (the intersection of ID, or unique patient, between V00_moaks, V01_moaks, V01_womac and V03_womac) is analysed on the total number of rows with and without missing values in at least one MOAKS/WOMAC variable. This dataframe contains 3077 unique patients across four timepoints and two tests.

In [1]:
# -- import the shared moaks-womac dataframe -- #
from config import moaks_shared_womac_df

# -- calculate the total number of rows -- #
num_rows = len(moaks_shared_womac_df)
# -- calculate the sum of rows that include at least one NaN value -- #
num_nan_rows = moaks_shared_womac_df.isna().any(axis=1).sum()
# -- calculate the difference -- #
num_non_nan_rows = num_rows - num_nan_rows

print(f'Number of unique knees in the shared dataframe: {num_rows}')
print(f'Number of unique knees with at least one missing value for a variable: {num_nan_rows} ({num_nan_rows / num_rows * 100:.2f}%)')
print(f'Number of unique knees with no missing value for a variable: {num_non_nan_rows} ({num_non_nan_rows / num_rows * 100:.2f}%)')


Number of unique knees in the shared dataframe: 3077
Number of unique knees with at least one missing value for a variable: 773 (25.12%)
Number of unique knees with no missing value for a variable: 2304 (74.88%)


Since the number of unique knees wih no missing values for at least one MOAKS/WOMAC test is 773, we can either drop the NaN rows as to not complicate analysis, or use a method of imputation.

In [2]:
from missing_imputation import moaks_shared_womac_drop_df, moaks_shared_womac_median_df

print(f'Shape of dataframe with dropping NaN values: {moaks_shared_womac_drop_df.shape}')
print(f'Shape of dataframe with median imputation of NaN values: {moaks_shared_womac_median_df.shape}')

Shape of dataframe with dropping NaN values: (2304, 109)
Shape of dataframe with median imputation of NaN values: (3077, 109)


In [3]:
from IPython.display import display
from missing_imputation import moaks_shared_womac_median_table

display(moaks_shared_womac_median_table)

Unnamed: 0,103,95,90,98,102,105,97,94,99,100,...,37,36,35,34,33,32,31,30,29,53
Variable,V03WOMTSR,V01WOMTSR,V01WOMTSL,V03WOMTSL,V03WOMADLR,V03WOMADLL,V01WOMADLL,V01WOMADLR,V03WOMSTFL,V03WOMSTFR,...,V00MBMNTMA,V00MBMNSS,V00MBMNFLP,V00MBMNFMP,V00MBMNFLC,V00MBMNFMC,V00MBMNFLA,V00MBMNFMA,V00MBMPPL,V01MBMSTLA
Median,5.0,4.266667,3.1875,3.125,2.125,2.0,2.0,2.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
