# ⭐️ Automatic Bias Correction Modelling

The mechanisms of Automatic Bias Correction modelling approach is described in Section 5.5.

Import all the required packages.

In [1]:
import sys

sys.path.append("./preprocessing_utils")
sys.path.append("./feature_selection_utils")
sys.path.append("./visual_utils")
sys.path.append("./experiment_utils")

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split

import experiments_utils
import feature_selection
import preprocessing

# configure pandas settings for data display
pd.options.mode.chained_assignment = None
pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)

## 📂 Prepare Datasets

📌 Features used for prediction

In [3]:
selected_features = [
    "pelvic_pain_frequency_between_periods",
    "deep_vaginal_pain_during_intercourse",
    "painful_bowel_movements",
    "unable_to_cope_with_pain",
    "experienced_infertility",
    "family_history_endometriosis_prediction",
    "pelvic_pain_worst",
    "takes_hormones_for_pain",
    "takes_presc_painkillers",
]

Import real and synthetic data for the selected subset of features.

In [4]:
df_real = pd.read_csv(experiments_utils.ENDO_DATA_PREDICTION_PATH)
df_synth_tvae = pd.read_csv(
    "./synthetic_data/tvae_selected_features_with_treatments_exp_10000_synthetic_data.csv"
)

In [6]:
X_real = df_real[selected_features]
y_real = df_real["has_endometriosis"]

X_synth_tvae = df_synth_tvae[selected_features]
y_synth_tvae = df_synth_tvae["has_endometriosis"]
print(
    f"Synthetic data contains {y_synth_tvae.sum()} data points with positive endometriosis label."
)

Synthetic data contains 3932 data points with positive endometriosis label.


Define values of estimated treatment effects. 

The ATEs are obtained under [causality_ates_endo.ipynb](https://colab.research.google.com/drive/1SSy3NmiqabCy_9D8wFIC4ct_xh1B5k5c#scrollTo=MJq75w1wIwm8).

In [7]:
ate_features = ["takes_hormones_for_pain", "takes_presc_painkillers"]

In [8]:
takes_hormones_for_pain_effect = 0.25709203848847795
takes_presc_painkillers = 0.20126422297752325

Split the data into training and test sets.

In [9]:
X_datasets = [X_real, X_synth_tvae]
effects = {
    "takes_hormones_for_pain": takes_hormones_for_pain_effect,
    "takes_presc_painkillers": takes_presc_painkillers,
}

for X_dataset in X_datasets:
    for feature, effect in effects.items():
        # Create a new column with the effect of the treatment
        X_dataset[f"{feature}_effect"] = X_dataset[feature] * effect
        X_dataset.drop(columns=[feature], inplace=True)

In [10]:
# create training and test subsets
# test dataset is used for testing real-data-trained model and additionally serves as an external test set for synthetic-data-trained algorithm
X_train_real, X_test_external_dataset, y_train_real, y_test_external_dataset = (
    train_test_split(X_real, y_real, test_size=0.3, random_state=42)
)
X_train_real, X_test_external_dataset = preprocessing.impute_features(
    X_train_real, X_test_external_dataset
)

In [11]:
# entire dataset for testing (for synthetic-data-trained algorithm)
X_test_entire_dataset = X_real
y_test_entire_dataset = y_real

X_train_synth_tvae = X_synth_tvae
y_train_synth_tvae = y_synth_tvae
X_train_synth_tvae, _ = preprocessing.impute_features(
    X_train_synth_tvae, X_test_entire_dataset
)

## 🌍 Debiased Modelling with Real Data Only

The results of the code cells below are preseneted and discussed in Section 6.3.3.

### Logistic Regression

In [11]:
lr_model_double_debiased, lr_val_folds_double_debiased = (
    feature_selection.run_logistic_regression(X_train_real, y_train_real, disp=True)
)

Best Hyperparameters: {'C': 1, 'penalty': 'l2', 'solver': 'liblinear'}
Avg F1 Score: 0.7520504721934145


### Random Forest

In [12]:
rf_model_double_debiased, rf_val_folds_double_debiased = feature_selection.run_rf(
    X_train_real, y_train_real, disp=True
)

Best Hyperparameters: {'max_depth': 10, 'min_samples_leaf': 4, 'min_samples_split': 2, 'n_estimators': 300}
Avg F1 Score: 0.7733762332405768


### XGBoost

In [13]:
xgboost_model_double_debiased, xgb_val_folds_double_debiased = (
    feature_selection.run_xgb(X_train_real, y_train_real, disp=True)
)

Best Hyperparameters: {'colsample_bytree': 1, 'gamma': 0.1, 'learning_rate': 0.05, 'n_estimators': 100, 'subsample': 0.5}
Avg F1 Score: 0.7733963970395754


### AdaBoost

In [14]:
ada_model_double_debiased, ada_val_folds_double_debiased = feature_selection.run_ada(
    X_train_real, y_train_real, disp=True
)

Best Hyperparameters: {'algorithm': 'SAMME', 'learning_rate': 0.2, 'n_estimators': 200}
Avg F1 Score: 0.7571338794748065


### MLP

In [15]:
mlp_model_double_debiased, mlp_val_folds_double_debiased = feature_selection.run_mlp(
    X_train_real, y_train_real, disp=True
)

Best Hyperparameters: {'mlp__activation': 'tanh', 'mlp__alpha': 0.1, 'mlp__batch_size': 32, 'mlp__beta_1': 0.9, 'mlp__beta_2': 0.999, 'mlp__early_stopping': True, 'mlp__hidden_layer_sizes': (50, 50), 'mlp__learning_rate_init': 0.1, 'mlp__max_iter': 500, 'mlp__solver': 'adam'}
Avg F1 Score: 0.7998267049843949


### TabPFN

Please note that TabPFn runs under [TabPFN_endometriosis_experiment.ipynb](https://colab.research.google.com/drive/1S9i1o-kvCWtUDNY7kDj0AAR88KAaJCEo#scrollTo=FXdTtXVeqzgD) (section *Automatic Bias Correction Modelling*).

It achieves average F1 score of 75.09%.

In [16]:
y_test_pred_real = feature_selection.evaluate_model_performance(
    mlp_model_double_debiased, X_test_external_dataset, y_test_external_dataset
)

Confusion Matrix:
[[71 31]
 [ 6 58]]
Accuracy: 0.7771
Recall: 0.9062
Specificity: 0.6961
F1-Score: 0.7582


## 🤖 Debiased Modelling with Synthetic Data

The results of the code cells below are preseneted and discussed in Section 6.4.4.

### Logistic Regression

In [12]:
lr_model_synth_double_debiased, lr_val_folds_synth_double_debiased = (
    feature_selection.run_logistic_regression(
        X_train_synth_tvae, y_train_synth_tvae, disp=True
    )
)

Best Hyperparameters: {'C': 1, 'penalty': 'l2', 'solver': 'saga'}
Avg F1 Score: 0.8681251384054148


### Random Forest

In [13]:
rf_model_double_debiased_synth, rf_val_folds_double_debiased_synth = (
    feature_selection.run_rf(X_train_synth_tvae, y_train_synth_tvae, disp=True)
)

Best Hyperparameters: {'max_depth': 10, 'min_samples_leaf': 4, 'min_samples_split': 2, 'n_estimators': 300}
Avg F1 Score: 0.8652360141768168


### XGBoost

In [14]:
xgboost_model_double_debiased_synth, xgb_val_folds_double_debiased_synth = (
    feature_selection.run_xgb(X_train_synth_tvae, y_train_synth_tvae, disp=True)
)

Best Hyperparameters: {'colsample_bytree': 0.7, 'gamma': 0.05, 'learning_rate': 0.05, 'n_estimators': 100, 'subsample': 1}
Avg F1 Score: 0.8670300658681287


### AdaBoost

In [15]:
ada_model_double_debiased_synth, ada_val_folds_double_debiased_synth = (
    feature_selection.run_ada(X_train_synth_tvae, y_train_synth_tvae, disp=True)
)

Best Hyperparameters: {'algorithm': 'SAMME', 'learning_rate': 0.05, 'n_estimators': 50}
Avg F1 Score: 0.8584820310203846


### MLP

In [16]:
mlp_model_double_debiased_synth, mlp_val_folds_double_debiased_synth = (
    feature_selection.run_mlp(X_train_synth_tvae, y_train_synth_tvae, disp=True)
)

Best Hyperparameters: {'mlp__activation': 'relu', 'mlp__alpha': 0.001, 'mlp__batch_size': 32, 'mlp__beta_1': 0.9, 'mlp__beta_2': 0.9999, 'mlp__early_stopping': True, 'mlp__hidden_layer_sizes': (100, 50), 'mlp__learning_rate_init': 0.01, 'mlp__max_iter': 500, 'mlp__solver': 'adam'}
Avg F1 Score: 0.8672853756228148


### Evaluation on external dataset

In [19]:
y_test_pred_synth_real_subset = feature_selection.evaluate_model_performance(
    mlp_model_double_debiased_synth,
    X_test_external_dataset,
    y_test_external_dataset,
)

Confusion Matrix:
[[75 27]
 [11 53]]
Accuracy: 0.7711
Recall: 0.8281
Specificity: 0.7353
F1-Score: 0.7361


### Evaluation on entire dataset

In [20]:
y_test_pred_synth = feature_selection.evaluate_model_performance(
    mlp_model_double_debiased_synth, X_test_entire_dataset, y_test_entire_dataset
)

Confusion Matrix:
[[246  77]
 [ 53 176]]
Accuracy: 0.7645
Recall: 0.7686
Specificity: 0.7616
F1-Score: 0.7303
