# Introduction

We verified the utility of our transfer scheme, including features, model configuration, hyper-parameters, as well as knowledge transfer process, by validating the generalizing performance under two application scenarios: basic quantification of contributions under general application scenario, as well as quantifying human-associated source contributions under context-dependent application scenario. We introduced two datasets in order to systematically assess such performances: a combined dataset consists of 125,827 samples collected from 124 biomes, as well as a human dataset consists of 53,553 samples collected from 27 human-associated biomes. Through random cross-validation (Supplementary material), we found that the EXPERT model was  able to quantify source contributions for communities, as well as qualitatively identify the biome sources for communities (verified by AUROC of 0.xxx and 0.xxx, Fig. 2a-d), confirmed the utility of our transfer scheme in such context-dependent applications. 

# Reproducibility statement

- EXPERT supports completely reproducible optimization & inference.
- Processed data are provided for reproducing the result, the original data can be found under `dataFiles/`.
- Rerunning the entire notebook with the configuration below should yield **completely consistent** results (compared to those reported in our paper).
- Session information
    - EXPERT (version 0.3)
    - Python (version 3.8.2)
    - TensorFlow (version 2.3.1)
    - Pandas (version 1.1.3)
    - NumPy (version 1.18.5)
    - ETE3 (version 3.1.2)
    - NCBI taxonomy database (released [2020-09-01](https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_archive/))

## Process
The following sections are used to reproduce the result reported in our paper. For detailed configuration and interpretation of results, please read our original paper first.

### Optimization
- `--finetune`: enable finetune for further optimization.
- `--update-statistics`: update statistics for Z-score standardization.

In [1]:
!for i in {0,1,2,3,4,5,6,7}; do expert train -i experiments/exp_$i/SourceCM.h5 -t ontology.pkl -l experiments/exp_$i/SourceLabels.h5 -o experiments/exp_$i/Independent; done;

Reordering labels and samples...
Total matched samples: 103767
2021-01-08 09:39:15.210049: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-08 09:39:15.218055: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2499960000 Hz
2021-01-08 09:39:15.220702: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x563c8dd72880 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-08 09:39:15.220765: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-01-08 09:39:18.262929: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 4985796816 exceeds 10% of free system memory.


To change all 

### Quantifying source contributions

- `--measure-unknown`: measure the contribution from unknown source(s).

In [2]:
!for i in {0,1,2,3,4,5,6,7}; do expert search -i experiments/exp_$i/QueryCM.h5 -m experiments/exp_$i/Independent -o experiments/exp_$i/Search_Independent; done;



To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.



To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.



To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.



To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float

### Evaluating performances
- `-S`: Set threshold for evaluation

In [3]:
!for i in {0,1,2,3,4,5,6,7}; do expert evaluate -i experiments/exp_$i/Search_Independent -l experiments/exp_$i/QueryLabels.h5 -o experiments/exp_$i/Eval_Independent -S 0 -p 40; done;

Reordering labels and prediction result
Reordering labels and prediction result for samples
Running evaluation...
100%|████████████████████████████████████████████| 5/5 [00:00<00:00, 510.26it/s]
100%|████████████████████████████████████████████| 5/5 [00:00<00:00, 445.30it/s]
  0%|                                                     | 0/5 [00:00<?, ?it/s]Evaluating biome source: root:Host-associated
Evaluating biome source: root:Engineered
Evaluating biome source: root:Environmental
         TN     FP    FN    TP     Acc  ...      Rc      Pr      F1  ROC-AUC   F-max
t                                       ...                                         
0.00      0  10464     0  4358  0.2940  ...  1.0000  0.2940  0.4544   0.9938  0.9788
0.01   9979    484    34  4323  0.9650  ...  0.9922  0.8993  0.9435   0.9938  0.9788
0.02  10045    418    43  4314  0.9689  ...  0.9901  0.9117  0.9493   0.9938  0.9788
0.03  10138    325    52  4305  0.9746  ...  0.9881  0.9298  0.9581   0.9938  0.9788
0.0

## Support
For support reproducing the result, please email: huichong.me@gmail.com.