OPAL: Open-community Profiling Assessment tooL

Taxonomic metagenome profilers predict the presence and relative abundance of microorganisms from shotgun sequence samples of DNA isolated directly from a microbial community. Over the past years, there has been an explosive growth of software and algorithms for this task, resulting in a need for more systematic comparisons of these methods based on relevant performance criteria. OPAL implements commonly used performance metrics, including those of the first challenge of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI), together with convenient visualizations.

Computed metrics

Unifrac error
L1 norm error
True positives, false positives, false negatives
Precision
Recall
F1 score
Jaccard index
Shannon diversity and equitability indices
Bray–Curtis distance

Example pages produced by OPAL

See also

Assessments of profiling submissions to the 1st CAMI Challenge

User Guide

Installation
Inputs
Running opal.py
Running opal.py using Docker
Running tsv2biom.py
Measuring runtime and maximum main memory usage
More examples

Installation

Requirements

OPAL 1.0.12 has been tested with Python 3.10 and 3.11.

See requirements.txt for all dependencies.

Steps

You can install OPAL using Docker, Bioconda, or as follows.

Install pip if not already installed (tested on Linux Ubuntu 18.04):

sudo apt install python3-pip

Should you receive the message Unable to locate package python3-pip, enter the following commands and repeat the previous step.

sudo add-apt-repository universe
sudo apt update

Then run:

pip3 install cami-opal

Make sure to add OPAL to your PATH:

echo 'PATH=$PATH:${HOME}/.local/bin' >> ~/.bashrc
source ~/.bashrc

Inputs

Note: Support for the BIOM format has been dropped (temporarily) in OPAL 1.0.4 due to incompatibility with Python 3.7.*.

OPAL uses at least two files:

A gold standard taxonomic profile
One or more taxonomic profiles to be assessed

Files must be in the CAMI profiling Bioboxes format or in the BIOM (Biological Observation Matrix) format. Program tsv2biom.py allows to convert profiles from the former format to the latter.

The BIOM format

The BIOM format used by OPAL is a sparse matrix stored in a JSON or HDF5 file, with a column per sample and a row per taxonomy ID, storing the corresponding abundances. RANK, TAXPATH, and TAXPATHSN are stored as metadata of each row and have the same meaning as in the CAMI profiling Bioboxes format:

RANK: taxonomic rank
TAXPATH and TAXPATHSN: path from the root of the taxonomy to the respective current taxon, including the current taxon, separated by a |. TAXPATH and TAXPATHSN contain identifiers and plain names, respectively, of the taxonomies. For more details and examples, see CAMI profiling Bioboxes format.

Running opal.py

usage: opal.py -g GOLD_STANDARD_FILE -o OUTPUT_DIR [-n] [-f FILTER] [-p] [-l LABELS] [-t TIME] [-m MEMORY] [-d DESC] [-r RANKS] [--metrics_plot_rel METRICS_PLOT_REL]
               [--metrics_plot_abs METRICS_PLOT_ABS] [--silent] [-v] [-h] [-b BRANCH_LENGTH_FUNCTION] [--normalized_unifrac]
               profiles_files [profiles_files ...]

OPAL: Open-community Profiling Assessment tooL

required arguments:
  profiles_files        Files of profiles
  -g GOLD_STANDARD_FILE, --gold_standard_file GOLD_STANDARD_FILE
                        Gold standard file
  -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                        Directory to write the results to

optional arguments:
  -n, --normalize       Normalize samples
  -f FILTER, --filter FILTER
                        Filter out the predictions with the smallest relative abundances summing up to [FILTER]% within a rank
  -p, --plot_abundances
                        Plot abundances in the gold standard (can take some minutes)
  -l LABELS, --labels LABELS
                        Comma-separated profiles names
  -t TIME, --time TIME  Comma-separated runtimes in hours
  -m MEMORY, --memory MEMORY
                        Comma-separated memory usages in gigabytes
  -d DESC, --desc DESC  Description for HTML page
  -r RANKS, --ranks RANKS
                        Highest and lowest taxonomic ranks to consider in performance rankings, comma-separated. Valid ranks: superkingdom, phylum, class, order, family, genus, species,
                        strain (default:superkingdom,species)
  --metrics_plot_rel METRICS_PLOT_REL
                        Metrics for spider plot of relative performances, first character, comma-separated. Valid metrics: w:weighted Unifrac, l:L1 norm, c:completeness, p:purity, f:false
                        positives, t:true positives (default: w,l,c,p,f)
  --metrics_plot_abs METRICS_PLOT_ABS
                        Metrics for spider plot of absolute performances, first character, comma-separated. Valid metrics: c:completeness, p:purity, b:Bray-Curtis (default: c,p)
  --silent              Silent mode
  -v, --version         show program's version number and exit
  -h, --help            Show this help message and exit

UniFrac arguments:
  -b BRANCH_LENGTH_FUNCTION, --branch_length_function BRANCH_LENGTH_FUNCTION
                        UniFrac tree branch length function (default: "lambda x: 1/x", where x=tree depth)
  --normalized_unifrac  Compute normalized version of weighted UniFrac by dividing by the theoretical max unweighted UniFrac

Example: To run the example, please download the files given in the data directory.

./opal.py -g data/goldstandard_low_1.bin \
data/cranky_wozniak_13 \
data/grave_wright_13 \
data/furious_elion_13 \
data/focused_archimedes_13 \
data/evil_darwin_13 \
data/agitated_blackwell_7 \
data/jolly_pasteur_3 \
-l "TIPP, Quikr, MP2.0, MetaPhyler, mOTU, CLARK, FOCUS" \
-o output_dir

Running opal.py using Docker

Download or git-clone OPAL from GitHub. In OPAL's directory, build the Docker image with the command:

docker build -t opal:latest .

opal.py can then be run with the docker run command. Example:

docker run -v $(pwd):/host opal \
opal.py -g /host/data/goldstandard_low_1.bin \
/host/data/cranky_wozniak_13 \
/host/data/grave_wright_13 \
/host/data/furious_elion_13 \
/host/data/focused_archimedes_13 \
/host/data/evil_darwin_13 \
/host/data/agitated_blackwell_7 \
/host/data/jolly_pasteur_3 \
-l "TIPP, Quikr, MP2.0, MetaPhyler, mOTU, CLARK, FOCUS" \
-o /host/output_dir

Running tsv2biom.py

usage: tsv2biom.py [-h] -o OUTPUT_FILE [-j] files [files ...]

Convert profile in the CAMI Bioboxes format to BIOM

positional arguments:
  files                 Input file(s), one file per sample

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT_FILE, --output_file OUTPUT_FILE
                        Output file
  -j, --json            Output in json (default: hdf5)

Example:

python3 tsv2biom.py data/cranky_wozniak_13 -o output_dir/cranky_wozniak_13.biom

Measuring runtime and maximum main memory usage

To measure the runtime and maximum main memory usage of a taxonomic profiler using OPAL, it must be converted to a Biobox docker image. Several Bioboxes are already available on Docker Hub (see Examples page).

To build your own Biobox, general instructions are available at http://bioboxes.org/. Most importantly, the Biobox of a profiler must satisfy specific input and output formats (see section Inputs above). Helpful examples of scripts and Dockerfiles are available at https://github.com/CAMI-challenge/docker_profiling_tools.

OPAL's tools to measure runtime and maximum main memory usage are:

opal_stats.py: Runs the Biobox of a taxonomic profiler and tracks its runtime and main memory usage.
opal_workflow.py: Runs the Bioboxes of one of more taxonomic profilers, tracks their runtimes and main memory usages using opal_stats.py, and automatically assesses their results with opal.py.

See example usage of these tools in the Examples page.

Runtimes and memory usages can also be manually provided to opal.py using options --time and --memory. They will then be incorporated in the results files and the HTML report.

More examples

See Examples page.

Developer Guide

We are using tox for project automation.

Tests

If you want to run tests, just type the following in the project's root directory:

tox

Citation

Please cite:

Meyer, F., Bremges, A., Belmann, P., Janssen, S., McHardy, A.C., and Koslicki, D. Assessing taxonomic metagenome profilers with OPAL. Genome Biology, 20, 51 (2019). https://doi.org/10.1186/s13059-019-1646-y

Part of OPAL's functionality was described in the CAMI manuscript. Thus please also cite:

Sczyrba, A., Hofmann, P., Belmann, P. et al. Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software. Nat Methods 14, 1063–1071 (2017). https://doi.org/10.1038/nmeth.4458

or

Meyer, F., Fritz, A., Deng, ZL. et al. Critical Assessment of Metagenome Interpretation: the second round of challenges. Nat Methods 19, 429–440 (2022). https://doi.org/10.1038/s41592-022-01431-4

Name		Name	Last commit message	Last commit date
Latest commit History 232 Commits
.circleci		.circleci
cami_i_challenge_submissions		cami_i_challenge_submissions
cami_i_hc		cami_i_hc
cami_ii_mg		cami_ii_mg
cami_ii_mg_filter1		cami_ii_mg_filter1
cami_ii_mg_filter1_normalized		cami_ii_mg_filter1_normalized
cami_opal		cami_opal
data		data
features		features
hmp_mc		hmp_mc
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
EXAMPLES.md		EXAMPLES.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
index.html		index.html
opal.py		opal.py
opal_stats.py		opal_stats.py
opal_workflow.py		opal_workflow.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini
version.py		version.py

License

CAMI-challenge/OPAL

Folders and files

Latest commit

History

Repository files navigation