# Introduction

A recent study confirmed the association between gut microbiota and shifts in host healthy status: The gut microbiota is involved in colorectal cancer (CRC) progression [REF]. Therewith, it holds great potential to investigate whether the progression of CRC could be monitored through the gut microbiota. We assessed such applicability of EXPERT by introducing cancer data from Zeller, G. et al. In this assessment, we considered five stages in the progression of colorectal cancer (CRC): 0 (Healthy control) I, II, III, and IV according to the study of Zeller G. et al. We first found that the compositional shifts of the human gut within such progression are invisible to some traditional methods, exemplified by Principle Coordination Analysis (PCoA) using distance metric either in weighted-Unifrac or Jensen Shannon divergence. However, the assessment result of utilizing cross-validation accord with our hypothesis: the progression stage of CRC can be accurately monitored, while an obviously better performance (ROC-AUC over 0.95 for stages from I to IV) was achieved by the model built from the Disease Model. This has proved the superior applicability of EXPERT as a method for early detection of the occurrence of cancers and suggested considerable potential optimization through knowledge transfer in such monitor systems.

# Reproducibility statement

- EXPERT supports completely reproducible optimization & inference.
- Processed data are provided for reproducing the result, the original data can be found under `dataFiles/`.
- Rerunning the entire notebook with the configuration below should yield **completely consistent** results (compared to those reported in our paper).
- Session information
    - EXPERT (version 0.3)
    - Python (version 3.8.2)
    - TensorFlow (version 2.3.1)
    - Pandas (version 1.1.3)
    - NumPy (version 1.18.5)
    - ETE3 (version 3.1.2)
    - NCBI taxonomy database (released [2020-09-01](https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_archive/))

## Process
The following sections are used to reproduce the result reported in our paper. For detailed configuration and interpretation of results, please read our original paper first.

### Optimization
- `--finetune`: enable finetune for further optimization.
- `--update-statistics`: update statistics for Z-score standardization.

In [1]:
%%bash
for i in {0,1,2,3,4}; do
    expert train -i experiments/exp_$i/SourceCM.h5 -t ontology.pkl -l experiments/exp_$i/SourceLabels.h5 -o experiments/exp_$i/Independent;
    expert transfer -i experiments/exp_$i/SourceCM.h5 -t ontology.pkl \
        -l experiments/exp_$i/SourceLabels.h5 -o experiments/exp_$i/Transfer_HM \
        -m ../Human-assessment/experiments/exp_1/Independent/ \
        --finetune --update-statistics;
    expert transfer -i experiments/exp_$i/SourceCM.h5 -t ontology.pkl \
        -l experiments/exp_$i/SourceLabels.h5 -o experiments/exp_$i/Transfer_DM \
        -m ../Disease-diagnosis/experiments/exp_0/Independent/ \
        --finetune --update-statistics;
done

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

2021-01-07 21:01:58.132390: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-07 21:01:58.142093: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2499960000 Hz
2021-01-07 21:01:58.143594: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56076b662b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-07 21:01:58.14

### Quantifying source contributions

- `--measure-unknown`: measure the contribution from unknown source(s).

In [2]:
%%bash
for i in {0,1,2,3,4}; do
    expert search -i experiments/exp_$i/QueryCM.h5 -m experiments/exp_$i/Independent -o experiments/exp_$i/Search_Independent;
    expert search -i experiments/exp_$i/QueryCM.h5 -m experiments/exp_$i/Transfer_HM -o experiments/exp_$i/Search_Transfer_HM;
    expert search -i experiments/exp_$i/QueryCM.h5 -m experiments/exp_$i/Transfer_DM -o experiments/exp_$i/Search_Transfer_DM;
done



### Evaluating performances
- `-S`: Set threshold for evaluation

In [3]:
%%bash
for i in {0,1,2,3,4}; do
    expert evaluate -i experiments/exp_$i/Search_Independent -l experiments/exp_$i/QueryLabels.h5 -o experiments/exp_$i/Eval_Independent -S 0;
    expert evaluate -i experiments/exp_$i/Search_Transfer_HM -l experiments/exp_$i/QueryLabels.h5 -o experiments/exp_$i/Eval_Transfer_HM -S 0;
    expert evaluate -i experiments/exp_$i/Search_Transfer_DM -l experiments/exp_$i/QueryLabels.h5 -o experiments/exp_$i/Eval_Transfer_DM -S 0;
done

Reordering labels and prediction result
Reordering labels and prediction result for samples
Running evaluation...
Evaluating biome source: root:CRC (stage 0)
      TN  FP  FN  TP  Acc   Sn   Sp  TPR  FPR   Rc   Pr  F1  ROC-AUC  F-max
t                                                                          
0.00   0  62   0   0  0.0  0.0  0.0  0.0  1.0  0.0  0.0 NaN      0.0    NaN
0.01   0  62   0   0  0.0  0.0  0.0  0.0  1.0  0.0  0.0 NaN      0.0    NaN
0.02   0  62   0   0  0.0  0.0  0.0  0.0  1.0  0.0  0.0 NaN      0.0    NaN
0.03   0  62   0   0  0.0  0.0  0.0  0.0  1.0  0.0  0.0 NaN      0.0    NaN
0.04   0  62   0   0  0.0  0.0  0.0  0.0  1.0  0.0  0.0 NaN      0.0    NaN
...   ..  ..  ..  ..  ...  ...  ...  ...  ...  ...  ...  ..      ...    ...
0.97  62   0   0   0  1.0  0.0  1.0  0.0  0.0  0.0  0.0 NaN      0.0    NaN
0.98  62   0   0   0  1.0  0.0  1.0  0.0  0.0  0.0  0.0 NaN      0.0    NaN
0.99  62   0   0   0  1.0  0.0  1.0  0.0  0.0  0.0  0.0 NaN      0.0    NaN
1.00  

100%|██████████| 1/1 [00:00<00:00, 944.88it/s]
100%|██████████| 1/1 [00:00<00:00, 401.02it/s]
100%|██████████| 1/1 [00:00<00:00,  1.32it/s]
100%|██████████| 1/1 [00:00<00:00, 20.55it/s]
100%|██████████| 1/1 [00:00<00:00, 915.59it/s]
100%|██████████| 1/1 [00:00<00:00, 393.20it/s]
100%|██████████| 1/1 [00:00<00:00,  1.26it/s]
100%|██████████| 1/1 [00:00<00:00, 49.69it/s]
100%|██████████| 1/1 [00:00<00:00, 966.65it/s]
100%|██████████| 1/1 [00:00<00:00, 354.73it/s]
100%|██████████| 1/1 [00:00<00:00,  1.27it/s]
100%|██████████| 1/1 [00:00<00:00, 49.84it/s]
100%|██████████| 1/1 [00:00<00:00, 1046.74it/s]
100%|██████████| 1/1 [00:00<00:00, 486.07it/s]
100%|██████████| 1/1 [00:00<00:00,  1.40it/s]
100%|██████████| 1/1 [00:00<00:00, 54.08it/s]
100%|██████████| 1/1 [00:00<00:00, 872.72it/s]
100%|██████████| 1/1 [00:00<00:00, 458.49it/s]
100%|██████████| 1/1 [00:00<00:00,  1.38it/s]
100%|██████████| 1/1 [00:00<00:00, 37.07it/s]
100%|██████████| 1/1 [00:00<00:00, 1078.50it/s]
100%|██████████| 1/1 

## Support
For support reproducing the result, please email: huichong.me@gmail.com.