# Step: 1 Data Preprocessing
Steps:
1. Import and process the data
2. Find the common sample ids
3. Save the common sample ids
4. Save the labels
5. Save the features
6. Train test split
7. Save the train and test data
8. Differential analysis
9. Filter omic feature using the differential analysis results
10. Save the filtered differential analysis results
11. Split the omic data into train and test
12. Normalize the omic data with respect to the train data
13. Transform the test data with respect to the train data
14. Save the train and test data
15. Stag the all the omics data for CML
14. Save the train and test data for CML
15. Save the common features for CML
16. Save the labels for CML

In [14]:
!python3 preprocessing.py --data_folder "0_new_data_0.05" \
                        --stratify \
                        --strat_col_list '["Trimester", "fetal_near_miss"]' \
                        --test_size 0.2 \
                        --omic_normalize_dict '{"1": true, "2": true}' \
                        --da_omics_dict '{"1": true, "2": true}' \
                        --filter_wrt_dict '{"1": "p-value(f-test)", "2": "p-value(f-test)"}' \
                        --method_dict '{"1": "bonferroni", "2": "bonferroni"}' \
                        --da_threshold_dict '{"1": 0.05, "2": 0.2}' \
                        --CV \
                        --n_splits 5

Preprocessing with the following parameters:
Data Folder: 0_new_data_0.05
Stratify: True
Stratification Columns: ['Trimester', 'fetal_near_miss']
Test Size: 0.2
Omic Normalize Dict: {1: True, 2: True}
DA Omics Dict: {1: True, 2: True}
Filter With Respect Dict: {1: 'p-value(f-test)', 2: 'p-value(f-test)'}
Method Dict: {1: 'bonferroni', 2: 'bonferroni'}
DA Threshold Dict: {1: 0.05, 2: 0.2}
Cross Validation: True
Number of Splits: 5
Sample ids saved as pickle file at:  /Users/anand/Documents/projects/mogonet_cml/data/0_new_data_0.05/samples
Filtering omic data for omic: 1
Filtering omic data for omic: 2
Cross validation
Splitting the id data into train and test
Split:  0
For cv split 0: Splitting the omic data into train and test saving them plus saving the respective features
Split:  1
For cv split 1: Splitting the omic data into train and test saving them plus saving the respective features
Split:  2
For cv split 2: Splitting the omic data into train and test saving them plus saving the

# Step 2.1: MOGOGET Training and Testing

In [21]:
!python3 mogonet.py --data_folder "0_new_data_0.05"\
    --stratify \
    --CV \
    --n_splits 5 \
    --num_epoch_pretrain 400 \
    --num_epoch 1200 \
    --test_interval 50 \
    --lr_e_pretrain 1e-3 \
    --lr_e 5e-3 \
    --lr_c 1e-3 \
    --num_class 2 \
    --adj_parameter 2 \
    --dim_he_list '[200, 200, 100]'

Running MOGONET
Data folder:  0_new_data_0.05
Stratify:  True
CV:  True
Number of splits:  5
Number of pretrain epochs:  400
Number of epochs:  1200
Test interval:  50
Learning rate for pretraining:  0.001
Learning rate for encoder:  0.005
Learning rate for classifier:  0.001
Number of classes:  2
Adjacency parameter:  2
Dimension list for hidden layers:  [200, 200, 100]

Pretrain GCNs...
Preptarin epoch 0
Preptarin epoch 50
Preptarin epoch 100
Preptarin epoch 150
Preptarin epoch 200
Preptarin epoch 250
Preptarin epoch 300
Preptarin epoch 350

Training...

Test: Epoch 0
Test ACC: 0.241
Test F1: 0.214
Test AUC: 0.564

Test: Epoch 50
Test ACC: 0.897
Test F1: 0.000
Test AUC: 0.692

Test: Epoch 100
Test ACC: 0.897
Test F1: 0.000
Test AUC: 0.692

Test: Epoch 150
Test ACC: 0.897
Test F1: 0.000
Test AUC: 0.692

Test: Epoch 200
Test ACC: 0.897
Test F1: 0.000
Test AUC: 0.641

Test: Epoch 250
Test ACC: 0.897
Test F1: 0.000
Test AUC: 0.667

Test: Epoch 300
Test ACC: 0.897
Test F1: 0.000
Test AUC:

# Step 2.2: CML Training and Testing

In [2]:
!python3 cml.py --data_folder "0_new_data_0.05" \
            --cml_models '["XGBC", "DTC", "RFC", "SVC", "LRC", "ElasticNet"]' \
            --n_splits 5 \
            --CV \
            --stratify \
            --no_test_mode

Running CML model training
Data folder: 0_new_data_0.05
CML model: XGBC
Number of splits: 5
CV: True
Stratify: True
Test mode: False
exp. name =XGBC_gridsearch_20250224-164050
Cross validation
/Users/anand/Documents/projects/mogonet_cml/data/0_new_data_0.05/strat/CV/CV_0

Training the model for the trial 0 :
Performing a gridsearch
 - best model parameters:{'model__learning_rate': 0.1, 'model__max_depth': 3, 'model__n_estimators': 150}
Predicting and evaluating...
test scores: ACC = 0.931, F1 = 0.500, AUC = 0.667, (n:29)
Confusion Matrix: 
 [[26  0]
 [ 2  1]]
Saved figure: /Users/anand/Documents/projects/mogonet_cml/data/0_new_data_0.05/strat/CV/models/CML/XGBC_gridsearch_20250224-164050/CV_0/XGBC_gridsearch_0_gain_lift.png

Saved the options, scores and pipeline for trial 0 in: /Users/anand/Documents/projects/mogonet_cml/data/0_new_data_0.05/strat/CV/models/CML/XGBC_gridsearch_20250224-164050/CV_0
/Users/anand/Documents/projects/mogonet_cml/data/0_new_data_0.05/strat/CV/CV_1

Training

# Step 3: Results aggregation

In [10]:
!python3 result_aggregator.py --data_folder "0_new_data_0.05" \
                            --CV \
                            --stratify \
                            --num_epoch_pretrain 400 \
                            --num_epoch 1200

[31mAggregating results[0m
[32mData folder: 0_new_data_0.05[0m
[32mCV: True[0m
[32mStratify: True[0m
[32mNumber of pretrain epochs: 400[0m
[32mNumber of epochs: 1200[0m
Final scores saved at:  /Users/anand/Documents/projects/mogonet_cml/data/0_new_data_0.05/results/final_scores_strat_CV.csv
Saved figure at:  /Users/anand/Documents/projects/mogonet_cml/data/0_new_data_0.05/results/new_data_orig_0.05strat_CV.png
Figure(800x600)


# Step 4: Biomarker identification

In [20]:
!python3 biomarker.py --data_folder "0_new_data_0.05" \
                    --stratify \
                    --CV \
                    --n_splits 5 \
                    --num_epoch_pretrain 400 \
                    --num_epoch 1200 \
                    --num_class 2 \
                    --adj_parameter 2 \
                    --dim_he_list "[200, 200, 100]"

Calculating feature importance
Data folder: 0_new_data_0.05
Stratify: True
CV: True
Number of splits: 5
Number of pretrain epochs: 400
Number of epochs: 1200
Model folder: /Users/anand/Documents/projects/mogonet_cml/data/0_new_data_0.05/strat/CV/models/400_1200
['20250224-160935']
Module E1 loaded!
Module C1 loaded!
Module E2 loaded!
Module C2 loaded!
Module C loaded!
^C
KeyboardInterrupt


# Run everything in the following order:
### 1. Data Preprocessing
### 2. MOGOGET Training and Testing
### 3. CML Training and Testing
### 4. Results aggregation

In [None]:
!python3 main.py --data_folder "0_new_data" \
                --stratify \
                --CV \
                --n_splits 5 \
                --num_epoch_pretrain 400 \
                --num_epoch 1200 \
                --test_interval 50 \
                --lr_e_pretrain 1e-3 \
                --lr_e 5e-3 \
                --lr_c 1e-3 \
                --num_class 2 \
                --adj_parameter 2 \
                --dim_he_list "[200, 200, 100]"