# Introduction

A recent study confirmed the association between gut microbiota and shifts in host healthy status: The gut microbiota is involved in colorectal cancer (CRC) progression [REF]. Therewith, it holds great potential to investigate whether the progression of CRC could be monitored through the gut microbiota. We assessed such applicability of EXPERT by introducing cancer data from Zeller, G. et al. In this assessment, we considered five stages in the progression of colorectal cancer (CRC): 0 (Healthy control) I, II, III, and IV according to the study of Zeller G. et al. We first found that the compositional shifts of the human gut within such progression are invisible to some traditional methods, exemplified by Principle Coordination Analysis (PCoA) using distance metric either in weighted-Unifrac or Jensen Shannon divergence. However, the assessment result of utilizing cross-validation accord with our hypothesis: the progression stage of CRC can be accurately monitored, while an obviously better performance (ROC-AUC over 0.95 for stages from I to IV) was achieved by the model built from the Disease Model. This has proved the superior applicability of EXPERT as a method for early detection of the occurrence of cancers and suggested considerable potential optimization through knowledge transfer in such monitor systems.

# Reproducibility statement

- EXPERT supports completely reproducible optimization & inference.
- Processed data are provided for reproducing the result, the original data can be found under `dataFiles/`.
- Rerunning the entire notebook with the configuration below should yield **completely consistent** results (compared to those reported in our paper).
- Session information
    - EXPERT (version 0.3)
    - Python (version 3.8.2)
    - TensorFlow (version 2.3.1)
    - Pandas (version 1.1.3)
    - NumPy (version 1.18.5)
    - ETE3 (version 3.1.2)
    - NCBI taxonomy database (released [2020-09-01](https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_archive/))

## Process
The following sections are used to reproduce the result reported in our paper. For detailed configuration and interpretation of results, please read our original paper first.

### Optimization
- `--finetune`: enable finetune for further optimization.
- `--update-statistics`: update statistics for Z-score standardization.

In [1]:
%%bash
for i in {1,2,3,4,5,6,7,8,9,10}; do
    expert transfer -i experiments/exp_$i/SourceCM.h5 -t ontology.pkl \
        -l experiments/exp_$i/SourceLabels.h5 -o experiments/exp_$i/Transfer_DM \
        -m ../Disease-diagnosis/experiments/exp_0/Independent/ \
        --finetune --update-statistics;
done

Reordering labels and samples...
Total matched samples: 249
Total correct samples: 249?249
           mean       std
0      0.000000  0.000000
1      0.000000  0.000000
2      0.000000  0.000000
3      0.005865  0.034154
4      0.003519  0.020492
...         ...       ...
18013  0.000008  0.000028
18014  0.000000  0.000000
18015  0.000012  0.000055
18016  0.000003  0.000044
18017  0.000000  0.000000

[18018 rows x 2 columns]
Training using optimizer with lr=0.001...
Epoch 1/300
Epoch 2/300
Epoch 3/300
Epoch 4/300
Epoch 5/300
Epoch 6/300
Epoch 7/300
Epoch 8/300
Epoch 9/300
Epoch 10/300
Epoch 11/300
Epoch 12/300
Epoch 13/300
Epoch 14/300
Epoch 15/300
Epoch 16/300
Epoch 17/300
Epoch 18/300
Epoch 19/300
Epoch 20/300
Epoch 21/300
Epoch 22/300
Epoch 23/300
Epoch 24/300
Epoch 25/300
Epoch 26/300
Epoch 27/300
Epoch 28/300
Epoch 29/300
Epoch 30/300
Epoch 31/300
Epoch 32/300
Epoch 33/300
Epoch 34/300
Epoch 35/300
Epoch 36/300
Epoch 37/300
Epoch 38/300
Epoch 39/300
Epoch 40/300
Epoch 41/300
Epoch

2021-01-07 20:52:53.029795: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-07 20:52:53.069260: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2499960000 Hz
2021-01-07 20:52:53.071195: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55df7917b6d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-07 20:52:53.071236: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-01-07 20:52:58.478637: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 144288144 exceeds 10% of free system memory.
2021-01-07 20:52:59.714071: W tensorflow/core/framework/cpu_allocator_impl.cc:81

### Quantifying source contributions

- `--measure-unknown`: measure the contribution from unknown source(s).

In [2]:
%%bash
for i in {1,2,3,4,5,6,7,8,9,10}; do
    expert search -i experiments/exp_$i/QueryCM.h5 -m experiments/exp_$i/Transfer_DM -o experiments/exp_$i/Search_Transfer_DM;
done



### Evaluating performances
- `-S`: Set threshold for evaluation

In [3]:
%%bash
for i in {1,2,3,4,5,6,7,8,9,10}; do
    expert evaluate -i experiments/exp_$i/Search_Transfer_DM -l experiments/exp_$i/QueryLabels.h5 -o experiments/exp_$i/Eval_Transfer_DM -S 0;
done

Reordering labels and prediction result
Reordering labels and prediction result for samples
Running evaluation...
Evaluating biome source: root:China
      TN  FP  FN  TP     Acc   Sn  ...  FPR   Rc      Pr   F1  ROC-AUC   F-max
t                                  ...                                        
0.00   0   8   0   1  0.1111  1.0  ...  1.0  1.0  0.1111  0.2   0.9286  0.6667
0.01   0   8   0   1  0.1111  1.0  ...  1.0  1.0  0.1111  0.2   0.9286  0.6667
0.02   0   8   0   1  0.1111  1.0  ...  1.0  1.0  0.1111  0.2   0.9286  0.6667
0.03   0   8   0   1  0.1111  1.0  ...  1.0  1.0  0.1111  0.2   0.9286  0.6667
0.04   0   8   0   1  0.1111  1.0  ...  1.0  1.0  0.1111  0.2   0.9286  0.6667
...   ..  ..  ..  ..     ...  ...  ...  ...  ...     ...  ...      ...     ...
0.97   8   0   1   0  0.8889  0.0  ...  0.0  0.0  0.0000  NaN   0.9286  0.6667
0.98   8   0   1   0  0.8889  0.0  ...  0.0  0.0  0.0000  NaN   0.9286  0.6667
0.99   8   0   1   0  0.8889  0.0  ...  0.0  0.0  0.0000  Na

100%|██████████| 2/2 [00:00<00:00, 864.00it/s]
100%|██████████| 2/2 [00:00<00:00, 527.95it/s]
 50%|█████     | 1/2 [00:00<00:00,  3.87it/s]
Traceback (most recent call last):
  File "/home/chonghui/envs/miniconda3/envs/expert/bin/expert", line 8, in <module>
    sys.exit(main())
  File "/home/chonghui/envs/miniconda3/envs/expert/lib/python3.8/site-packages/expert/CLI/main.py", line 48, in main
    evaluate(cfg, args)
  File "/home/chonghui/envs/miniconda3/envs/expert/lib/python3.8/site-packages/expert/CLI/main_evaluate.py", line 34, in evaluate
    metrics_layers, avg_metrics_layers, overall_metrics = evaltr.eval()
  File "/home/chonghui/envs/miniconda3/envs/expert/lib/python3.8/site-packages/expert/src/evaluator.py", line 43, in eval
    predictions = self.predictions_multilayer[layer]
IndexError: list index out of range
100%|██████████| 2/2 [00:00<00:00, 1088.44it/s]
100%|██████████| 2/2 [00:00<00:00, 564.55it/s]
 50%|█████     | 1/2 [00:00<00:00,  3.89it/s]
Traceback (most recent ca

CalledProcessError: Command 'b'for i in {1,2,3,4,5,6,7,8,9,10}; do\n    expert evaluate -i experiments/exp_$i/Search_Transfer_DM -l experiments/exp_$i/QueryLabels.h5 -o experiments/exp_$i/Eval_Transfer_DM -S 0;\ndone\n'' returned non-zero exit status 1.

## Support
For support reproducing the result, please email: huichong.me@gmail.com.