# Step 4: WOMAC Variable Analysis

In [1]:
from IPython.display import display, Markdown
from modules.plot_data import plot_pca_3d

### Objective
- Reveal underlying patterns in WOMAC variables
- Reduce dimensionality in WOMAC score matrix

## Statistical Tests
- Principal Component Analysis
- Linear Regression (with p-value analysis)

## Inputs and Outputs
- 3 left-knee WOMAC variables (disability, pain, stiffness) at 12-month follow up from full dataset (Input)
- 3 left-knee WOMAC variables at 24-month follow up from full dataset (Input)
- 3 right-knee WOMAC variables at 12-month follow up from full dataset (Input)
- 3 left-knee WOMAC variables at 24-month follow up from full dataset (Input)
- Principal components of left WOMAC variables (Output)
- Principal components of right WOMAC variables (Output)
- P-matrix of WOMAC variables (Output)

## 4.1 WOMAC Variable PCA at 12-Month Follow-Up
At 12-months there are 6 WOMAC variables: left-knee disability, left-knee pain, left-knee stiffness, right-knee disability, right-knee pain, and right-knee stiffness. Each knee scores are grouped together into seperate left and right matrices and ran through PCA.

In [13]:
from scripts.womac.womac_relationships import v01_womac_left_pc_df, v01_womac_left_pc_loadings_df, v01_womac_left_drop_reduced_df, v01_womac_right_pc_df, v01_womac_right_pc_loadings_df, v01_womac_right_drop_reduced_df

left_knee_womac_pc_scatter = plot_pca_3d(v01_womac_left_drop_reduced_df,title='Left-Knee WOMAC PC Plot')
left_knee_womac_pc_scatter.show()

In [7]:
display(Markdown('### 4.1.1 Left-Knee WOMAC Variable Principal Components'))
display(v01_womac_left_pc_df)
display(Markdown('### 4.1.2 Left-Knee WOMAC Variable Principal Components Loadings'))
display(v01_womac_left_pc_loadings_df.sort_values(by='PC1'))

### 4.1.1 Left-Knee WOMAC Variable Principal Components

Unnamed: 0,PC,Variance,Proportion of Total
0,PC1,2.611029,0.869965
1,PC2,0.271454,0.090445
2,PC3,0.118819,0.039589


### 4.1.2 Left-Knee WOMAC Variable Principal Components Loadings

Unnamed: 0,PC1,PC2,PC3
V01WOMSTFL,0.903895,0.425429,-0.049165
V01WOMKPL,0.939256,-0.265242,-0.218812
V01WOMADLL,0.954882,-0.141811,0.26177


In [4]:
right_knee_womac_pc_scatter = plot_pca_3d(v01_womac_right_drop_reduced_df,title='Right-Knee WOMAC PC Plot')
right_knee_womac_pc_scatter.show()

In [8]:
display(Markdown('### 4.1.3 Right-Knee WOMAC Variable Principal Components'))
display(v01_womac_right_pc_df)
display(Markdown('### 4.1.4 Right-Knee WOMAC Variable Principal Components Loadings'))
display(v01_womac_right_pc_loadings_df.sort_values(by='PC1'))

### 4.1.3 Right-Knee WOMAC Variable Principal Components

Unnamed: 0,PC,Variance,Proportion of Total
0,PC1,2.504212,0.834375
1,PC2,0.346919,0.115589
2,PC3,0.150173,0.050036


### 4.1.4 Right-Knee WOMAC Variable Principal Components Loadings

Unnamed: 0,PC1,PC2,PC3
V01WOMSTFR,0.874287,0.481937,-0.061591
V01WOMKPR,0.921374,-0.303504,-0.2437
V01WOMADLR,0.943877,-0.150136,0.29494


## Results
PC1 explains 87% of variance. PC2 explains 9% and is mainly stiffness. Pain and disability load similarly across PCs, so pain is redundant and can be dropped for dimensionality reduction.

## 4.2 WOMAC Variable PCA at 24-Month Follow-Up
At 24-months there are 6 WOMAC variables: left-knee disability, left-knee pain, left-knee stiffness, right-knee disability, right-knee pain, and right-knee stiffness. Each knee scores are grouped together into seperate left and right matrices and ran through PCA.

In [6]:
from scripts.womac.womac_relationships import v03_womac_left_pc_df, v03_womac_left_pc_loadings_df, v03_womac_left_drop_reduced_df, v03_womac_right_pc_df, v03_womac_right_pc_loadings_df, v03_womac_right_drop_reduced_df

left_knee_womac_pc_scatter = plot_pca_3d(v03_womac_left_drop_reduced_df,title='Left-Knee WOMAC PC Plot')
left_knee_womac_pc_scatter.show()

In [9]:
display(Markdown('### 4.2.1 Left-Knee WOMAC Variable Principal Components'))
display(v03_womac_left_pc_df)
display(Markdown('### 4.2.2 Left-Knee WOMAC Variable Principal Components Loadings'))
display(v03_womac_left_pc_loadings_df.sort_values(by='PC1'))

### 4.2.1 Left-Knee WOMAC Variable Principal Components

Unnamed: 0,PC,Variance,Proportion of Total
0,PC1,2.583156,0.860678
1,PC2,0.29654,0.098804
2,PC3,0.121606,0.040518


### 4.2.2 Left-Knee WOMAC Variable Principal Components Loadings

Unnamed: 0,PC1,PC2,PC3
V03WOMSTFL,0.894156,0.445254,-0.051656
V03WOMKPL,0.935088,-0.278031,-0.220777
V03WOMADLL,0.953547,-0.144872,0.264943


In [11]:
right_knee_womac_pc_scatter = plot_pca_3d(v03_womac_right_drop_reduced_df,title='Right-Knee WOMAC PC Plot')
right_knee_womac_pc_scatter.show()

In [12]:
display(Markdown('### 4.2.3 Right-Knee WOMAC Variable Principal Components'))
display(v03_womac_right_pc_df)
display(Markdown('### 4.2.4 Right-Knee WOMAC Variable Principal Components Loadings'))
display(v03_womac_right_pc_loadings_df.sort_values(by='PC1'))

### 4.2.3 Right-Knee WOMAC Variable Principal Components

Unnamed: 0,PC,Variance,Proportion of Total
0,PC1,2.542652,0.834375
1,PC2,0.306955,0.115589
2,PC3,0.151696,0.050036


### 4.2.4 Right-Knee WOMAC Variable Principal Components Loadings

Unnamed: 0,PC1,PC2,PC3
V03WOMSTFR,0.891529,0.447594,-0.072587
V03WOMKPR,0.924207,-0.300056,-0.237154
V03WOMADLR,0.94534,-0.128768,0.300308


## Results
PC1 explains 87% of variance. PC2 explains 9% and is mainly stiffness. Pain and disability load similarly across PCs, so pain is redundant and can be dropped for dimensionality reduction.

## 4.3 WOMAC Variables Linear Regression at 12-Month Follow-Up
At 12-months there are 6 WOMAC variables: left-knee disability, left-knee pain, left-knee stiffness, right-knee disability, right-knee pain, and right-knee stiffness. Each knee's subscales are grouped together into separate left and right matrices and ran through simple linear and multiple linear regression with non-linear and interaction terms.

In [16]:
from scripts.womac.womac_relationships import v01_womac_left_drop_p_matrix, v01_womac_right_drop_p_matrix

display(Markdown('### 4.3.1 Left-Knee WOMAC Variable p-value Matrix'))
display(v01_womac_left_drop_p_matrix)
display(Markdown('### 4.3.2 Right-Knee WOMAC Variable p-value Matrix'))
display(v01_womac_right_drop_p_matrix)

### 4.3.1 Left-Knee WOMAC Variable p-value Matrix

Unnamed: 0,V01WOMADLL,V01WOMKPL,V01WOMSTFL
V01WOMADLL,,0.0,1.257015e-109
V01WOMKPL,0.0,,0.01162093
V01WOMSTFL,0.02030359,3.784599e-05,
I(V01WOMADLL ** 2),,0.1509497,8.070540000000001e-18
I(V01WOMADLL ** 3),,0.1788804,7.837199e-10
I(V01WOMKPL ** 2),1.709466e-12,,0.165286
I(V01WOMKPL ** 3),2.793003e-12,,0.02361813
I(V01WOMSTFL ** 2),7.887696999999999e-38,4.050554e-11,
I(V01WOMSTFL ** 3),3.929314e-05,0.0005986794,
V01WOMADLL:I(V01WOMKPL ** 2),,,1.033468e-09


### 4.3.2 Right-Knee WOMAC Variable p-value Matrix

Unnamed: 0,V01WOMADLR,V01WOMKPR,V01WOMSTFR
V01WOMADLR,,1.23467e-319,3.5336810000000004e-75
V01WOMKPR,3.4656949999999995e-284,,0.0003961904
V01WOMSTFR,8.80983e-09,0.1617407,
I(V01WOMADLR ** 2),,0.1249166,8.956959e-15
I(V01WOMADLR ** 3),,0.4154476,3.515083e-08
I(V01WOMKPR ** 2),7.432552e-23,,0.3935854
I(V01WOMKPR ** 3),1.484923e-11,,0.006288256
I(V01WOMSTFR ** 2),1.2254920000000001e-52,1.018332e-06,
I(V01WOMSTFR ** 3),7.567904e-05,0.009697689,
V01WOMADLR:I(V01WOMKPR ** 2),,,1.991904e-11


## 4.4 WOMAC Variables Linear Regression at 24-Month Follow-Up
At 24-months there are 6 WOMAC variables: left-knee disability, left-knee pain, left-knee stiffness, right-knee disability, right-knee pain, and right-knee stiffness. Each knee's subscales are grouped together into separate left and right matrices and ran through simple linear and multiple linear regression with non-linear and interaction terms.

In [17]:
from scripts.womac.womac_relationships import v03_womac_left_drop_p_matrix, v03_womac_right_drop_p_matrix

display(Markdown('### 4.4.1 Left-Knee WOMAC Variable p-value Matrix'))
display(v03_womac_left_drop_p_matrix)
display(Markdown('### 4.4.2 Right-Knee WOMAC Variable p-value Matrix'))
display(v03_womac_right_drop_p_matrix)

### 4.4.1 Left-Knee WOMAC Variable p-value Matrix

Unnamed: 0,V03WOMADLL,V03WOMKPL,V03WOMSTFL
V03WOMADLL,,0.0,7.621712e-71
V03WOMKPL,0.0,,0.0005166371
V03WOMSTFL,3.139768e-06,0.03787274,
I(V03WOMADLL ** 2),,0.4604554,1.164919e-07
I(V03WOMADLL ** 3),,0.486935,0.00011521
I(V03WOMKPL ** 2),2.95761e-05,,0.2531768
I(V03WOMKPL ** 3),2.34492e-11,,0.03403511
I(V03WOMSTFL ** 2),5.476119e-37,6.173481e-07,
I(V03WOMSTFL ** 3),7.039132e-06,0.6416016,
V03WOMADLL:I(V03WOMKPL ** 2),,,5.016061e-08


### 4.4.2 Right-Knee WOMAC Variable p-value Matrix

Unnamed: 0,V03WOMADLR,V03WOMKPR,V03WOMSTFR
V03WOMADLR,,0.0,7.443900000000001e-81
V03WOMKPR,0.0,,1.605953e-06
V03WOMSTFR,2.541737e-11,0.1540533,
I(V03WOMADLR ** 2),,4.487692e-07,8.022663e-14
I(V03WOMADLR ** 3),,0.002713508,1.464256e-07
I(V03WOMKPR ** 2),1.007133e-20,,0.03640963
I(V03WOMKPR ** 3),5.420874e-11,,0.005401047
I(V03WOMSTFR ** 2),1.669125e-57,9.826124e-08,
I(V03WOMSTFR ** 3),4.919806e-07,0.05955228,
V03WOMADLR:I(V03WOMKPR ** 2),,,1.479915e-10


## Results
Extremely significant linear dependence between disability and pain, substantiating their redundancy.