Skip to content

advanced usage

AdeBC edited this page Feb 3, 2021 · 1 revision

Ontology construction

construct a biome ontology using microbiomes.txt

expert construct -i microbiomes.txt -o ontology.pkl
# Also equivalent to
expert construct --input microbiomes.txt --output ontology.pkl
  • Input: microbiomes.txt file, contains path from "root" node to each leaf node of biome ontology.
root:Environmental:Terrestrial:Soil
root:Environmental:Terrestrial:Soil:Agricultural
root:Environmental:Terrestrial:Soil:Boreal_forest
root:Environmental:Terrestrial:Soil:Contaminated
root:Environmental:Terrestrial:Soil:Crop
root:Environmental:Terrestrial:Soil:Crop:Agricultural_land
root:Environmental:Terrestrial:Soil:Desert
root:Environmental:Terrestrial:Soil:Forest_soil
root:Environmental:Terrestrial:Soil:Grasslands
root:Environmental:Terrestrial:Soil:Loam:Agricultural
root:Environmental:Terrestrial:Soil:Permafrost
root:Environmental:Terrestrial:Soil:Sand
root:Environmental:Terrestrial:Soil:Tropical_rainforest
root:Environmental:Terrestrial:Soil:Uranium_contaminated
root:Environmental:Terrestrial:Soil:Wetlands
root:Host-associated:Plants:Rhizosphere:Soil
  • Output: constructed biome ontology (pickle format, non-human-readable).

Source mapping

Mapping their source environments to microbiome ontology

expert map --to-otlg -t ontology.pkl -i mapper.csv -o labels.h5
# Also equivalent to
expert map --to-otlg --otlg ontology.pkl --input mapper.csv --output labels.h5
  • Input: the mapper file, contains biome source information for samples.
Env SampleID
0 root:Engineered:Wastewater ERR2260442
1 root:Engineered:Wastewater SRR980322
2 root:Engineered:Wastewater ERR2985272
3 root:Engineered:Wastewater ERR2814648
4 root:Engineered:Wastewater ERR2985275
  • Output: the labels for samples in each layer of the biome ontology (HDF format, non-human-readable).

Data converting & cleaning

Convert input data to a count matrix in genus level.

expert convert -i countMatrices.txt -o countMatrix.h5 --in-cm
# Also equivalent to
expert convert --input countMatrices.txt --output countMatrix.h5 --in-cm 
  • Input: a text file contains path to input count matrix files / abundance tables.
datasets/soil_dataset/root:Host-associated:Plants:Rhizosphere:Soil/MGYS00005146-ERR1690680.tsv
datasets/soil_dataset/root:Host-associated:Plants:Rhizosphere:Soil/MGYS00005146-ERR1689675.tsv
datasets/soil_dataset/root:Host-associated:Plants:Rhizosphere:Soil/MGYS00000513-ERR986792.tsv
datasets/soil_dataset/root:Host-associated:Plants:Rhizosphere:Soil/MGYS00005146-ERR1691198.tsv
datasets/soil_dataset/root:Host-associated:Plants:Rhizosphere:Soil/MGYS00001704-ERR1905845.tsv
datasets/soil_dataset/root:Host-associated:Plants:Rhizosphere:Soil/MGYS00005146-ERR1689214.tsv
datasets/soil_dataset/root:Host-associated:Plants:Rhizosphere:Soil/MGYS00005146-ERR1689910.tsv
  • Output: converted count matrix file in genus level (HDF format, non-human-readable).

Ab initio training

Build EXPERT model from scratch and training

expert train -i countMatrix.h5 -l labels.h5 -t ontology.pkl -o model
# Also equivalent to
expert train --input countMatrix.h5 --labels labels.h5 --otlg ontology.pkl --output model
  • Input: biome ontology and converted count matrix in genus level (and also labels for samples involved in the count matrix).
  • Output: trained model.

Fast adaptation

expert transfer -i countMatrix.h5 -l labels.h5 -t ontology.pkl -o model
# Also equivalent to
expert transfer --input countMatrix.h5 --labels labels.h5 --otlg ontology.pkl --output model
  • Input: biome ontology and converted count matrix in genus level (and also labels for samples involved in the count matrix).
  • Output: trained model.

Source tracking

expert search -i countMatrix.h5 -o searchResult -m model
# Also equivalent to
expert search --input countMatrix.h5 --output searchResult --model model
  • Input: converted count matrix in genus level.
  • Output: search result (multi-layer ).
searchResult
├── layer-2.csv
├── layer-3.csv
├── layer-4.csv
├── layer-5.csv
└── layer-6.csv

Take layer-2.csv as an example.

root:Engineered root:Environmental root:Host-associated root:Mixed Unknown
ERR2278752 0.0041427016 0.26372418 0.68632126 0.00040003657 0.045411825
ERR2278753 0.002841179 0.07928896 0.91735524 0.00051463145 0.0
ERR2666855 0.0006751048 0.0021803565 0.9970531 9.1493675e-05 0.0
ERR2666860 0.0005227786 0.013902989 0.98542625 0.00014803928 0.0
ERR2666881 0.0009569057 0.0023957777 0.9965403 0.00010694566 0.0

Evaluation

expert evaluate -i searchResultFolder -l labels.h5 -o EvaluationReport -p NUMProcesses
# Also equivalent to
expert evaluate --input searchResultFolder --labels labels.h5 --output EvaluationReport --processors NUMProcesses
  • Input: multi-layer labels and search result (source contribution) for samples.
  • Output: label-based evaluation report.
EvaluationReport
├── layer-2
│   └── root:Host-associated.csv
├── layer-2.csv
├── layer-3
│   └── root:Host-associated:Human.csv
├── layer-3.csv
├── layer-4
│   ├── root:Host-associated:Human:Circulatory_system.csv 
│   ├── root:Host-associated:Human:Digestive_system.csv
│   ├── root:Host-associated:Human:Lympathic_system.csv
│   ├── root:Host-associated:Human:Reproductive_system.csv
│   ├── root:Host-associated:Human:Respiratory_system.csv
│   └── root:Host-associated:Human:Skin.csv
├── layer-4.csv
├── layer-5
│   ├── root:Host-associated:Human:Circulatory_system:Blood.csv
│   ├── ...
│   └── root:Host-associated:Human:Respiratory_system:Pulmonary_system.csv
├── layer-5.csv
├── layer-6
│   ├── root:Host-associated:Human:Digestive_system:Large_intestine:Fecal.csv
│   ├── ...
│   └── root:Host-associated:Human:Respiratory_system:Pulmonary_system:Sputum.csv
└── layer-6.csv

Take layer-4/root:Host-associated:Human:Skin.csv as an example.

t TN FP FN TP Acc Sn Sp TPR FPR Rc Pr F1 ROC-AUC F-max
0.0 0 47688 0 4847 0.0923 1.0 0.0 1.0 1.0 1.0 0.0923 0.1689 0.9951 0.9374
0.01 44794 2893 30 4816 0.9444 0.9938 0.9393 0.9938 0.0607 0.9938 0.6247 0.7672 0.9951 0.9374
0.02 45545 2142 44 4802 0.9584 0.9909 0.9551 0.9909 0.0449 0.9909 0.6915 0.8146 0.9951 0.9374
0.03 45934 1753 59 4787 0.9655 0.9878 0.9632 0.9878 0.0368 0.9878 0.732 0.8409 0.9951 0.9374
0.04 46228 1459 73 4773 0.9708 0.9849 0.9694 0.9849 0.0306 0.9849 0.7659 0.8617 0.9951 0.9374

Run the program with -h option to see a detailed description on work modes & options.