# Introduction

A recent study confirmed the association between gut microbiota and shifts in host healthy status: The gut microbiota is involved in colorectal cancer (CRC) progression [REF]. Therewith, it holds great potential to investigate whether the progression of CRC could be monitored through the gut microbiota. We assessed such applicability of EXPERT by introducing cancer data from Zeller, G. et al. In this assessment, we considered five stages in the progression of colorectal cancer (CRC): 0 (Healthy control) I, II, III, and IV according to the study of Zeller G. et al. We first found that the compositional shifts of the human gut within such progression are invisible to some traditional methods, exemplified by Principle Coordination Analysis (PCoA) using distance metric either in weighted-Unifrac or Jensen Shannon divergence. However, the assessment result of utilizing cross-validation accord with our hypothesis: the progression stage of CRC can be accurately monitored, while an obviously better performance (ROC-AUC over 0.95 for stages from I to IV) was achieved by the model built from the Disease Model. This has proved the superior applicability of EXPERT as a method for early detection of the occurrence of cancers and suggested considerable potential optimization through knowledge transfer in such monitor systems.

# Reproducibility statement

- EXPERT supports completely reproducible optimization & inference.
- Processed data are provided for reproducing the result, the original data can be found under `dataFiles/`.
- Rerunning the entire notebook with the configuration below should yield **completely consistent** results (compared to those reported in our paper).
- Session information
    - EXPERT (version 0.3)
    - Python (version 3.8.2)
    - TensorFlow (version 2.3.1)
    - Pandas (version 1.1.3)
    - NumPy (version 1.18.5)
    - ETE3 (version 3.1.2)
    - NCBI taxonomy database (released [2020-09-01](https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_archive/))

## Process
The following sections are used to reproduce the result reported in our paper. For detailed configuration and interpretation of results, please read our original paper first.

### Optimization
- `--finetune`: enable finetune for further optimization.
- `--update-statistics`: update statistics for Z-score standardization.

In [1]:
%%bash
for i in {0,1,2,3,4}; do
    expert transfer -i experiments/exp_$i/SourceCM.h5 -t ontology.pkl \
        -l experiments/exp_$i/SourceLabels.h5 -o experiments/exp_$i/Transfer_DM \
        -m ../Disease-diagnosis/experiments/exp_0/Independent/ --finetune --update-statistics;
done

Reordering labels and samples...
Total matched samples: 166
Total correct samples: 166?166
           mean       std
0      0.000000  0.000000
1      0.000000  0.000000
2      0.000000  0.000000
3      0.007968  0.025971
4      0.007889  0.025974
...         ...       ...
18013  0.000123  0.000331
18014  0.000000  0.000000
18015  0.000281  0.001178
18016  0.000010  0.000052
18017  0.000000  0.000000

[18018 rows x 2 columns]
Training using optimizer with lr=0.001...
Epoch 1/300
Epoch 2/300
Epoch 3/300
Epoch 4/300
Epoch 5/300
Epoch 6/300
Epoch 00006: ReduceLROnPlateau reducing learning rate to 0.00010000000474974513.
Epoch 7/300
Epoch 8/300
Epoch 9/300
Epoch 10/300
Epoch 11/300
Epoch 00011: ReduceLROnPlateau reducing learning rate to 1.0000000474974514e-05.
Epoch 12/300
Epoch 13/300
Epoch 14/300
Epoch 15/300
Epoch 16/300
Epoch 00016: ReduceLROnPlateau reducing learning rate to 1e-05.
Restoring model weights from the end of the best epoch.
Epoch 00016: early stopping
Model: "functional_5

2021-01-14 14:36:40.984698: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-14 14:36:40.999543: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2499850000 Hz
2021-01-14 14:36:41.006244: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55a9d62ee620 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-14 14:36:41.006277: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
2021-01-14 14:36:54.911966: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequ

### Quantifying source contributions

- `--measure-unknown`: measure the contribution from unknown source(s).

In [3]:
%%bash
for i in {0,1,2,3,4}; do
    expert search -i experiments/exp_$i/QueryCM.h5 -m experiments/exp_$i/Transfer_DM -o experiments/exp_$i/Search_Transfer_DM;
done



### Evaluating performances
- `-S`: Set threshold for evaluation

In [4]:
%%bash
for i in {0,1,2,3,4}; do
    expert evaluate -i experiments/exp_$i/Search_Transfer_DM -l experiments/exp_$i/QueryLabels.h5 -o experiments/exp_$i/Eval_Transfer_DM -S 0;
done

Reordering labels and prediction result
Reordering labels and prediction result for samples
Running evaluation...
Evaluating biome source: root:D
      TN  FP  FN  TP     Acc   Sn  ...  FPR   Rc      Pr      F1  ROC-AUC   F-max
t                                  ...                                           
0.00   0  11   0  24  0.6857  1.0  ...  1.0  1.0  0.6857  0.8136   0.7674  0.8727
0.01   0  11   0  24  0.6857  1.0  ...  1.0  1.0  0.6857  0.8136   0.7674  0.8727
0.02   0  11   0  24  0.6857  1.0  ...  1.0  1.0  0.6857  0.8136   0.7674  0.8727
0.03   0  11   0  24  0.6857  1.0  ...  1.0  1.0  0.6857  0.8136   0.7674  0.8727
0.04   0  11   0  24  0.6857  1.0  ...  1.0  1.0  0.6857  0.8136   0.7674  0.8727
...   ..  ..  ..  ..     ...  ...  ...  ...  ...     ...     ...      ...     ...
0.97  11   0  24   0  0.3143  0.0  ...  0.0  0.0  0.0000     NaN   0.7674  0.8727
0.98  11   0  24   0  0.3143  0.0  ...  0.0  0.0  0.0000     NaN   0.7674  0.8727
0.99  11   0  24   0  0.3143  0.0 

100%|██████████| 1/1 [00:00<00:00, 1222.83it/s]
100%|██████████| 1/1 [00:00<00:00, 536.29it/s]
100%|██████████| 1/1 [00:00<00:00,  4.32it/s]
100%|██████████| 1/1 [00:00<00:00, 101.95it/s]
100%|██████████| 1/1 [00:00<00:00, 1197.00it/s]
100%|██████████| 1/1 [00:00<00:00, 542.39it/s]
100%|██████████| 1/1 [00:00<00:00,  4.29it/s]
100%|██████████| 1/1 [00:00<00:00, 102.78it/s]
100%|██████████| 1/1 [00:00<00:00, 1200.77it/s]
100%|██████████| 1/1 [00:00<00:00, 526.53it/s]
100%|██████████| 1/1 [00:00<00:00,  4.33it/s]
100%|██████████| 1/1 [00:00<00:00, 103.72it/s]
100%|██████████| 1/1 [00:00<00:00, 1145.36it/s]
100%|██████████| 1/1 [00:00<00:00, 511.63it/s]
100%|██████████| 1/1 [00:00<00:00,  4.29it/s]
100%|██████████| 1/1 [00:00<00:00, 104.45it/s]
100%|██████████| 1/1 [00:00<00:00, 1053.05it/s]
100%|██████████| 1/1 [00:00<00:00, 498.85it/s]
100%|██████████| 1/1 [00:00<00:00,  3.90it/s]
100%|██████████| 1/1 [00:00<00:00, 104.36it/s]


## Support
For support reproducing the result, please email: huichong.me@gmail.com.