# Step 6: WOMAC Regression and ML Regression Models

In [1]:
import pandas as pd
from IPython.display import display, Markdown, JSON

### Objective
- Evaluate the extent to which BML scores relate with WOMAC severity
- Explore the predictive power that BML scores have on WOMAC scores down the line

### Statistical Tests/Models
- Linear regression (LR)
- L1 Regularisation (Lasso)
- XGBoost Regressor

### Inputs and Outputs
- 45 BML variables at baseline and 12-month follow-up from full dataset, ungrouped (Input)
- WOMAC disability, pain, and stiffness scores at 12-month and 24-month follow-ups (Input)
- R-squared and mean-squared error for each BML variable from LR and Lasso (Output)
- R-squared and mean-squared error for XGBoost Classifier (Output)

## 6.1 WOMAC Linear Regression
WOMAC multiple linear multioutput regression was conducted on left and right knees between all time periods. Only left and right knees between baseline BML variables and 12-month K-L variables are shown here as all show the same evaluators.

In [4]:
from scripts.mbm.womac_oriented.womac_mbm_relationships import v00_v01_womac_left_p_matrix, v00_v01_womac_left_drop_results

display(Markdown('### 6.1.1 Baseline BML variables\' coefficients and p-values against left-knee WOMAC scores at 12-month follow-up'))
display(v00_v01_womac_left_p_matrix.sort_values(by='Coefficient', ascending=False).head(10))
display(Markdown('### 6.1.1 Baseline BML variables\' evaluators against left-knee WOMAC scores at 12-month follow-up'))
display(pd.DataFrame.from_dict(v00_v01_womac_left_drop_results, orient='index'))

### 6.1.1 Baseline BML variables' coefficients and p-values against left-knee WOMAC scores at 12-month follow-up

Unnamed: 0,Feature,Coefficient,p_value,Target
41,V00MBMNTLC,8.37642,0.001646413,V01WOMADLL
39,V00MBMNTLA,7.17615,0.5602934,V01WOMADLL
0,Intercept,5.14002,1.5064980000000001e-28,V01WOMADLL
35,V00MBMNFMP,3.90713,0.02041177,V01WOMADLL
43,V00MBMNTLP,3.803725,0.1966037,V01WOMADLL
10,V00MBMSTMC,2.689776,0.03720856,V01WOMADLL
15,V00MBMSPL,2.297608,0.02048129,V01WOMADLL
8,V00MBMSTMA,2.169071,0.2953689,V01WOMADLL
32,V00MBMNFLA,2.029553,0.04237201,V01WOMADLL
13,V00MBMSTLP,1.358327,0.9492029,V01WOMADLL


### 6.1.1 Baseline BML variables' evaluators against left-knee WOMAC scores at 12-month follow-up

Unnamed: 0,V01WOMADLL,V01WOMSTFL
R2,0.010274,-0.097519
RMSE,9.26712,1.473497
Intercept,5.14002,1.104567


In [5]:
from scripts.mbm.womac_oriented.womac_mbm_relationships import v00_v01_womac_right_p_matrix, v00_v01_womac_right_drop_results

display(Markdown('### 6.1.3 Baseline BML variables\' coefficients and p-values against right-knee WOMAC scores at 12-month follow-up'))
display(v00_v01_womac_right_p_matrix.sort_values(by='Coefficient', ascending=False).head(10))
display(Markdown('### 6.1.4 Baseline BML variables\' evaluators against right-knee WOMAC scores at 12-month follow-up'))
display(pd.DataFrame.from_dict(v00_v01_womac_right_drop_results, orient='index'))

### 6.1.3 Baseline BML variables' coefficients and p-values against right-knee WOMAC scores at 12-month follow-up

Unnamed: 0,Feature,Coefficient,p_value,Target
39,V00MBMNTLA,24.74194,0.7453921,V01WOMADLR
85,V00MBMNTLA,8.667266,0.2863867,V01WOMSTFR
36,V00MBMNFLP,6.348355,0.9842979,V01WOMADLR
43,V00MBMNTLP,5.410579,0.05590512,V01WOMADLR
0,Intercept,4.698063,1.515918e-26,V01WOMADLR
6,V00MBMSFLP,4.662465,0.5967478,V01WOMADLR
23,V00MBMPTMA,3.629631,0.008012196,V01WOMADLR
4,V00MBMSFLC,3.587747,0.7557643,V01WOMADLR
7,V00MBMSSS,2.362747,0.0384632,V01WOMADLR
35,V00MBMNFMP,2.285084,0.0847,V01WOMADLR


### 6.1.4 Baseline BML variables' evaluators against right-knee WOMAC scores at 12-month follow-up

Unnamed: 0,V01WOMADLR,V01WOMSTFR
R2,-0.005264,-0.062506
RMSE,10.568931,1.668552
Intercept,4.698063,1.166518


## 6.2 WOMAC L1 Regularisation
WOMAC L1 regularisation was conducted on left and right knees between all time periods. Only left and right knees between baseline BML variables and 12-month K-L variables are shown here as all show the same evaluators.

In [9]:
from scripts.mbm.womac_oriented.womac_mbm_relationships import v00_v01_womac_left_lasso_coef_matrix, v00_v01_womac_left_drop_lasso_results

display(Markdown('### 6.2.1 Baseline BML variables\' coefficients and p-values against left-knee WOMAC scores at 12-month follow-up'))
display(v00_v01_womac_left_lasso_coef_matrix.sort_values(by='Coefficient', ascending=False).head(10))
display(Markdown('### 6.2.2 Baseline BML variables\' evaluators against left-knee WOMAC scores at 12-month follow-up'))
display(pd.DataFrame.from_dict(v00_v01_womac_left_drop_lasso_results, orient='index'))

### 6.2.1 Baseline BML variables' coefficients and p-values against left-knee WOMAC scores at 12-month follow-up

Unnamed: 0,Feature,Target,Coefficient
0,Intercept,V01WOMADLL,6.828096
1,Intercept,V01WOMSTFL,1.272727
16,V00MBMSPL,V01WOMADLL,1.259452
33,V00MBMNFLA,V01WOMADLL,0.980824
9,V00MBMSTMA,V01WOMADLL,0.611233
2,V00MBMSFMA,V01WOMADLL,0.468948
42,V00MBMNTLC,V01WOMADLL,0.429271
17,V00MBMPFMA,V01WOMADLL,0.310602
14,V00MBMSTLP,V01WOMADLL,0.195591
3,V00MBMSFLA,V01WOMADLL,0.159597


### 6.2.2 Baseline BML variables' evaluators against left-knee WOMAC scores at 12-month follow-up

Unnamed: 0,V01WOMADLL,V01WOMSTFL
R2,0.005425,-0.049173
RMSE,9.289796,1.440678


In [19]:
from scripts.mbm.womac_oriented.womac_mbm_relationships import v00_v01_womac_right_lasso_coef_matrix, v00_v01_womac_right_drop_lasso_results

display(Markdown('### 6.2.3 Baseline BML variables\' coefficients and p-values against right-knee WOMAC scores at 12-month follow-up'))
display(v00_v01_womac_right_lasso_coef_matrix.sort_values(by='Coefficient', ascending=False).head(10))
display(Markdown('### 6.2.4 Baseline BML variables\' evaluators against right-knee WOMAC scores at 12-month follow-up'))
display(pd.DataFrame.from_dict(v00_v01_womac_right_drop_lasso_results, orient='index'))

### 6.2.3 Baseline BML variables' coefficients and p-values against right-knee WOMAC scores at 12-month follow-up

Unnamed: 0,Feature,Target,Coefficient
0,Intercept,V01WOMADLR,6.802332
1,Intercept,V01WOMSTFR,1.488426
3,V00MBMSFLA,V01WOMADLR,0.749821
24,V00MBMPTMA,V01WOMADLR,0.639894
16,V00MBMSPL,V01WOMADLR,0.350895
36,V00MBMNFMP,V01WOMADLR,0.287337
48,V00MBMSFLA,V01WOMSTFR,0.110864
33,V00MBMNFLA,V01WOMADLR,0.08647
69,V00MBMPTMA,V01WOMSTFR,0.080144
81,V00MBMNFMP,V01WOMSTFR,0.037033


### 6.2.4 Baseline BML variables' evaluators against right-knee WOMAC scores at 12-month follow-up

Unnamed: 0,V01WOMADLR,V01WOMSTFR
R2,0.035222,0.023763
RMSE,10.353917,1.599381


# 6.3 WOMAC XGBoost Regressor
The XGBoost Regressor is based on gradient-boosting decision trees. The XGBoost regressor was tuned with moderate tree depth (6), a small learning rate (0.1), 200 estimators, and subsampling/feature sampling set to 85%.

In [20]:
from scripts.mbm.womac_oriented.womac_mbm_xgBoost import v00_v01_womac_left_lasso_coef_matrix, v00_v01_moaks_womac_left_metrics

display(Markdown('### 6.3.1 Baseline BML variables\' coefficients against left-knee WOMAC scores at 12-month follow-up'))
display(v00_v01_womac_left_lasso_coef_matrix.sort_values(by='Coefficient', ascending=False).head(10))
display(Markdown('### 6.3.2 Baseline BML variables\' evaluators against left-knee WOMAC scores at 12-month follow-up'))
display(pd.DataFrame.from_dict(v00_v01_moaks_womac_left_metrics, orient='index'))

### 6.3.1 Baseline BML variables' coefficients against left-knee WOMAC scores at 12-month follow-up

Unnamed: 0,Feature,Target,Coefficient
0,Intercept,V01WOMADLL,6.828096
1,Intercept,V01WOMSTFL,1.272727
16,V00MBMSPL,V01WOMADLL,1.259452
33,V00MBMNFLA,V01WOMADLL,0.980824
9,V00MBMSTMA,V01WOMADLL,0.611233
2,V00MBMSFMA,V01WOMADLL,0.468948
42,V00MBMNTLC,V01WOMADLL,0.429271
17,V00MBMPFMA,V01WOMADLL,0.310602
14,V00MBMSTLP,V01WOMADLL,0.195591
3,V00MBMSFLA,V01WOMADLL,0.159597


### 6.3.2 Baseline BML variables' evaluators against left-knee WOMAC scores at 12-month follow-up

Unnamed: 0,r2,mse,mae
V01WOMADLL,-0.288167,111.775538,7.550649
V01WOMSTFL,-0.429905,2.828747,1.358085


In [18]:
from scripts.mbm.womac_oriented.womac_mbm_xgBoost import v00_v01_womac_right_lasso_coef_matrix, v00_v01_moaks_womac_right_metrics

display(Markdown('### 6.3.3 Baseline BML variables\' coefficients against right-knee WOMAC scores at 12-month follow-up'))
display(v00_v01_womac_right_lasso_coef_matrix.sort_values(by='Coefficient', ascending=False).head(10))
display(Markdown('### 6.3.4 Baseline BML variables\' evaluators against right-knee WOMAC scores at 12-month follow-up'))
display(pd.DataFrame.from_dict(v00_v01_moaks_womac_right_metrics, orient='index'))

### 6.3.3 Baseline BML variables' coefficients against right-knee WOMAC scores at 12-month follow-up

Unnamed: 0,Feature,Target,Coefficient
0,Intercept,V01WOMADLR,6.802332
1,Intercept,V01WOMSTFR,1.488426
3,V00MBMSFLA,V01WOMADLR,0.749821
24,V00MBMPTMA,V01WOMADLR,0.639894
16,V00MBMSPL,V01WOMADLR,0.350895
36,V00MBMNFMP,V01WOMADLR,0.287337
48,V00MBMSFLA,V01WOMSTFR,0.110864
33,V00MBMNFLA,V01WOMADLR,0.08647
69,V00MBMPTMA,V01WOMSTFR,0.080144
81,V00MBMNFMP,V01WOMSTFR,0.037033


### 6.3.4 Baseline BML variables' evaluators against right-knee WOMAC scores at 12-month follow-up

Unnamed: 0,r2,mse,mae
V01WOMADLR,0.002039,110.890744,7.102927
V01WOMSTFR,-0.095223,2.869797,1.338437


## Results
BML scores and WOMAC scores seem to have no predictive relationship.