# K-L NaN Distribution

Here, the moaks_shared_kl_df (the intersection of ID,SIDE, or unique knee, between V00_moaks, V01_moaks, V01_kl and V03_kl) is analysed on the total number of rows with and without missing values in at least one MOAKS/K-L variable. This dataframe contains 5739 unique knees across four timepoints and two tests.

In [9]:
# -- import the shared moaks-kl dataframe -- #
from config import moaks_shared_kl_df

# -- calculate the total number of rows -- #
num_rows = len(moaks_shared_kl_df)
# -- calculate the sum of rows that include at least one NaN value -- #
num_nan_rows = moaks_shared_kl_df.isna().any(axis=1).sum()
# -- calculate the difference -- #
num_non_nan_rows = num_rows - num_nan_rows

print(f'Number of unique knees in the shared dataframe: {num_rows}')
print(f'Number of unique knees with at least one missing value for a variable: {num_nan_rows} ({num_nan_rows / num_rows * 100:.2f}%)')
print(f'Number of unique knees with no missing value for a variable: {num_non_nan_rows} ({num_non_nan_rows / num_rows * 100:.2f}%)')

Number of unique knees in the shared dataframe: 5739
Number of unique knees with at least one missing value for a variable: 2860 (49.83%)
Number of unique knees with no missing value for a variable: 2879 (50.17%)


Since the number of unique knees wih no missing values for at least one MOAKS/KL test is 2036, we can either drop the NaN rows as to not complicate analysis, or use a method of imputation.

In [11]:
from missing_imputation import moaks_shared_kl_drop_df, moaks_shared_kl_median_df

print(f'Shape of dataframe with dropping NaN values: {moaks_shared_kl_drop_df.shape}')
print(f'Shape of dataframe with median imputation of NaN values: {moaks_shared_kl_median_df.shape}')

Shape of dataframe with dropping NaN values: (2879, 94)
Shape of dataframe with median imputation of NaN values: (5739, 94)


In [8]:
from IPython.display import display
from missing_imputation import moaks_shared_kl_median_table

display(moaks_shared_kl_median_table)

Unnamed: 0,91,90,57,66,65,64,63,62,61,60,...,33,32,31,30,29,28,27,26,25,46
Variable,V03XRKL,V01XRKL,V01MBMSTLP,V01MBMPSS,V01MBMPFLP,V01MBMPFMP,V01MBMPFLC,V01MBMPFMC,V01MBMPFLA,V01MBMPFMA,...,V00MBMNFLC,V00MBMNFMC,V00MBMNFLA,V00MBMNFMA,V00MBMPPL,V00MBMPPM,V00MBMPTLP,V00MBMPTMP,V00MBMPTLC,V01MBMSFLA
Median,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
