# Bias Amplification by Inductive Transfer Learning

This notebook guides you through the process of bias amplification by inductive transfer learning. This demonstration uses cheset X-ray images from the Medical Imaging & Data Resource Center (MIDRC) Open-A1 data set as the example, and provides you with the instruction on how to train and deploy the model, as well as visualize the amplified bias.
The inductive transfer learning approach establishs a two-step transfer learning process to amplifies bias. In the first step, the AI model is trained to classify patients according to a subgroup attribute (e.g., patient sex). In the second step, the model is fine-tuned to the target clinical task. dditional control over the degree to which bias is promoted using this process can be obtained by altering the number of layers frozen during the second training step.

## Data Download and Conversion

(**Please skip this step if you have already done**)The example uses MIDRC Open-A1 chest X-ray dataset ([MIDRC official website](https://data.midrc.org/)), which can be accessed and downloaded by following the instruction ([link for download instruction](https://data.midrc.org/dashboard/Public/documentation/Gen3_MIDRC_GetStarted.pdf)). Several *.tsv* files that include study case, patient demography and image information can also be downloaded from the website (for your convenience, we have already included them in this repository). After data is successfully downloaded, the script below will generate *.json* data summary file, and convert dicom files to *.png* files.

In [None]:
# convert dicom to png file
png_save_dir = "/gpfs_projects/yuhang.zhang/OUT/2022_CXR/open_a1_jpeg"
%run ../src/utils/data_conversion.py \
    --save_dir "{png_save_dir}" \
    --input_file "20221010_summary_table__open_A1.json"

## Data Partition

After data are converted to *.png* files, they needed to be properly partitioned into training, validation and testing sets. In this experiment, all the data sets are equally stratified by patient sex (male and female), race (white and black) and COVID status (positive and negative). For each patient, only 1 image is selected (only CR images are used). To accelerate the whole experiment process, only 25% of the image data is used in this example. 

In [2]:
# data partition
%run ../src/utils/data_partitions.py \
    --input_list "20221010_summary_table__open_A1.json" \
    --conversion_file "/gpfs_projects/yuhang.zhang/OUT/2022_CXR/open_a1_jpeg/conversion_table.json" \
    --test_size 0.3 \
    --validation_size 0.2 \
    --partition_name "juypter_indirect_test" \
    --save_dir "/gpfs_projects/yuhang.zhang/OUT/2022_CXR/" \
    --max_img_per_patient 1 \
    --tasks 'M' 'F' 'White' 'Black' 'Yes' 'No' \
    --patient_img_selection_mode "random" \
    --random_seed 1 \
    --subsample_rate 0.25

Beginning bootstrapping

Number of patients/subgroup in input summary:
subgroup
F-Black-No-CR      678
F-Black-Yes-CR    1913
F-White-No-CR     2004
F-White-Yes-CR    1250
M-Black-No-CR      658
M-Black-Yes-CR    1684
M-White-No-CR     1864
M-White-Yes-CR    1333
Name: patient_id, dtype: int64

By patient summary of data partition

split           independent_test  train  validation   All
subgroup                                                 
F-Black-No-CR                 49     82          33   164
F-Black-Yes-CR                49     82          33   164
F-White-No-CR                 49     82          33   164
F-White-Yes-CR                49     82          33   164
M-Black-No-CR                 49     82          33   164
M-Black-Yes-CR                49     82          33   164
M-White-No-CR                 49     82          33   164
M-White-Yes-CR                49     82          33   164
All                          392    656         264  1312

DONE



## Model Training

After data partitioning is done, the following section will show you how to train the model to amplify the bias between subgroups. This demonstration takes patient sex as an example.

### First step training

To amplify bias by inductive transfer learning, the first step is to train the model to classify subgroup attribute (patient sex). You can run the following cell to train two separate sex classification models:
 - Model where "M" (male) is associated with model classification of “1”
 - Model where "F" (female) is associated with model classification of “1”

In this demonstration we uses *ResNet-18* as the example network architecture, and pre-trained weights trained from a contrastive self-supervised learning (CSL) approach and data from the CheXpert data. This weight file can be found under */example/* directory.

In [1]:
# first step training
main_dir = "/gpfs_projects/yuhang.zhang/OUT/2022_CXR/juypter_indirect_test/RAND_1"
task_list = ["M", "F"]
for task in task_list:
    %run ../src/utils/model_train.py -i "{main_dir}/train.csv" \
                                     -v "{main_dir}/validation.csv" \
                                     -o "{main_dir}/{task}" \
                                     -l "{main_dir}/{task}/run_log.log" \
                                     -c "checkpoint_csl.pth.tar" \
                                     -p "adam" \
                                     -g 0 \
                                     --random_state 0 \
                                     --pretrained_weights True \
                                     --train_task "{task}"

Start experiment...
Full fine tuning selected
Training for task: M
EPOCH	TR-AVG-LOSS	VD-AUC
> 0	0.69090		0.70070
> 1	0.66199		0.79454
> 2	0.53337		0.83947
> 3	0.44307		0.86771
> 4	0.41053		0.88539
> 5	0.38826		0.89199
> 6	0.36295		0.89560
> 7	0.34964		0.89773
> 8	0.35653		0.89853
> 9	0.34976		0.89905
Final epoch model saved to: /gpfs_projects/yuhang.zhang/OUT/2022_CXR/juypter_indirect_test/RAND_1/M/pytorch_last_epoch_model.onnx
END.
Start experiment...
Full fine tuning selected
Training for task: F
EPOCH	TR-AVG-LOSS	VD-AUC
> 0	0.68879		0.62839
> 1	0.65459		0.73921
> 2	0.53719		0.81847
> 3	0.44722		0.86263
> 4	0.41363		0.88355
> 5	0.38718		0.89268
> 6	0.35694		0.89710
> 7	0.34621		0.89956
> 8	0.35292		0.90152
> 9	0.34506		0.90186
Final epoch model saved to: /gpfs_projects/yuhang.zhang/OUT/2022_CXR/juypter_indirect_test/RAND_1/F/pytorch_last_epoch_model.onnx
END.


### Second step training

The second step is to fine-tune the model from step 1 to perform target clinical task. During this step, the same training/validation sets are used. By running the following cell, you can fine-tune these two resulted models to predict COVID status, with different number of model layers being frozen. In this example, two different number of frozen layers are shown for both pre-trained models. 

In [3]:
# second step training
frozen_layers = [1, 9]
for task in task_list:
    for n in frozen_layers:
        %run ../src/utils/model_train.py -i "{main_dir}/train.csv" \
                                         -v "{main_dir}/validation.csv" \
                                         -o "{main_dir}/{task}_{n}_frozen_layer/" \
                                         -l "{main_dir}/{task}_{n}_frozen_layer/run_log.log" \
                                         -c "{main_dir}/{task}/checkpoint__last.pth.tar" \
                                         -f "partial" \
                                         -p "adam" \
                                         -g 0 \
                                         --random_state 0 \
                                         --pretrained_weights True \
                                         --freeze_up_to {n}

Start experiment...
Fine tuning with first 1 layers frozen
Training for task: Yes
EPOCH	TR-AVG-LOSS	VD-AUC
> 0	0.83723		0.64107
> 1	0.64429		0.65852
> 2	0.60812		0.66730
> 3	0.57995		0.66695
> 4	0.57941		0.66954
> 5	0.56721		0.67051
> 6	0.56085		0.67051
> 7	0.56177		0.67034
> 8	0.55529		0.67045
> 9	0.55262		0.67068
Final epoch model saved to: /gpfs_projects/yuhang.zhang/OUT/2022_CXR/juypter_indirect_test/RAND_1/M_1_frozen_layer/pytorch_last_epoch_model.onnx
END.
Start experiment...
Fine tuning with first 9 layers frozen
Training for task: Yes
EPOCH	TR-AVG-LOSS	VD-AUC
> 0	0.85241		0.62150
> 1	0.66258		0.64836
> 2	0.62356		0.66523
> 3	0.59704		0.66621
> 4	0.59804		0.66936
> 5	0.58780		0.67097
> 6	0.58370		0.67103
> 7	0.58482		0.67086
> 8	0.57917		0.67114
> 9	0.57739		0.67109
Final epoch model saved to: /gpfs_projects/yuhang.zhang/OUT/2022_CXR/juypter_indirect_test/RAND_1/M_9_frozen_layer/pytorch_last_epoch_model.onnx
END.
Start experiment...
Fine tuning with first 1 layers frozen
Trainin

### Baseline model training

To show the degree to which bias is amplified by this approach, a baseline model is required to present baseline bias. You can run the following cell to train the baseline. To make fair comparison, the baseline uses the same model architecture and pre-trained weights (from CSL approach), as well as the same training/validation sets. However, it will skip the first step training, and directly train to perform COVID status prediction.

In [4]:
# baseline training
%run ../src/utils/model_train.py     -i "{main_dir}/train.csv" \
                                     -v "{main_dir}/validation.csv" \
                                     -o "{main_dir}/baseline" \
                                     -l "{main_dir}/baseline/run_log.log" \
                                     -c "checkpoint_csl.pth.tar" \
                                     -p "adam" \
                                     -g 0 \
                                     --random_state 0 \
                                     --pretrained_weights True

Start experiment...
Full fine tuning selected
Training for task: Yes
EPOCH	TR-AVG-LOSS	VD-AUC
> 0	0.69312		0.65461
> 1	0.67727		0.66460
> 2	0.63984		0.67378
> 3	0.59041		0.68084
> 4	0.58493		0.68274
> 5	0.56972		0.68285
> 6	0.55712		0.68331
> 7	0.56109		0.68279
> 8	0.55123		0.68262
> 9	0.55209		0.68279
Final epoch model saved to: /gpfs_projects/yuhang.zhang/OUT/2022_CXR/juypter_indirect_test/RAND_1/baseline/pytorch_last_epoch_model.onnx
END.


## Model Inference

After model training is done, you can deploy the models on the independent testing set by running the following cell. The inference code will save prediction scores as *results__.tsv* files under the same directory.

In [5]:
# experiment model inference
for task in task_list:
    for n in frozen_layers:
        %run ../src/utils/model_inference.py \
            -i "{main_dir}/independent_test.csv" \
            -w "{main_dir}/{task}_{n}_frozen_layer/pytorch_last_epoch_model.onnx" \
            -g 0 \
            -l "{main_dir}/{task}_{n}_frozen_layer/inference_log.log"
# baseline model inference
%run ../src/utils/model_inference.py \
            -i "{main_dir}/independent_test.csv" \
            -w "{main_dir}/baseline/pytorch_last_epoch_model.onnx" \
            -g 0 \
            -l "{main_dir}/baseline/inference_log.log"

Start inference...
Inferencing now ...
 There are 392 test samples in the list
 AUROC = 0.708116
 Time taken:  52.96520672100087
END.
Start inference...
Inferencing now ...
 There are 392 test samples in the list
 AUROC = 0.702676
 Time taken:  27.046851437997248
END.
Start inference...
Inferencing now ...
 There are 392 test samples in the list
 AUROC = 0.706320
 Time taken:  25.56717879700227
END.
Start inference...
Inferencing now ...
 There are 392 test samples in the list
 AUROC = 0.712464
 Time taken:  24.98355841100056
END.
Start inference...
Inferencing now ...
 There are 392 test samples in the list
 AUROC = 0.707622
 Time taken:  25.18999387599979
END.


## Bias Visualization

After inference, you can analyze the model bias by running the following code. The analysis code here will calculate the subgroup **predicted prevalence** and **AUROC** , and plot these measurements with respect to training disease prevalence differences between two subgroups.

In [7]:
# bias measurements and visualization
%run ../src/utils/bias_analysis.py \
    -d "{main_dir}" \
    -e "baseline" "M_1_frozen_layer" "M_9_frozen_layer" "F_1_frozen_layer" "F_9_frozen_layer" \
    -a "inductive transfer learning" \
    -r "results__.tsv" \
    -i "{main_dir}/independent_test.csv" \
    -s "sex" 


Start subgroup bias measurements

Done

