OPAL: Assessing taxonomic profilers for metagenomes
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.

README.md

CircleCI

OPAL - Profiling Assessment

Example pages produced by OPAL:

Requirements

See default.txt for all dependencies.

User Guide

Installation

Install pip first (tested on Linux Ubuntu 16.04):

sudo apt install python3-pip

Then run:

pip3 install numpy
pip3 install cami-opal

Make sure to add OPAL to your PATH:

echo 'PATH=$PATH:${HOME}/.local/bin' >> ~/.bashrc
source ~/.bashrc

Input

OPAL uses at least two files:

  1. A gold standard taxonomic profile
  2. One or more taxonomic profiles to be assessed

Files must be in the CAMI profiling Bioboxes format or in the BIOM (Biological Observation Matrix) format. Program tsv2biom.py allows to convert profiles from the former format to the latter.

The BIOM format

The BIOM format used by OPAL is a sparse matrix stored in a JSON or HDF5 file, with a column per sample and a row per taxonomy ID, storing the corresponding abundances. RANK, TAXPATH, and TAXPATHSN are stored as metadata of each row and have the same meaning as in the CAMI profiling Bioboxes format:

  • RANK: taxonomic rank
  • TAXPATH and TAXPATHSN: path from the root of the taxonomy to the respective current taxon, including the current taxon, separated by a |. TAXPATH and TAXPATHSN contain identifiers and plain names, respectively, of the taxonomies. For more details and examples, see CAMI profiling Bioboxes format.

Computed metrics

  • Unifrac error
  • L1 norm error
  • True positives, false positives, false negatives
  • Precision
  • Recall
  • F1 score
  • Jaccard index
  • Shannon diversity and equitability indices
  • Bray–Curtis distance

Running opal.py

usage: opal.py [-h] -g GOLD_STANDARD_FILE [-n] [-p] [-l LABELS] -o OUTPUT_DIR
               profiles_files [profiles_files ...]

Compute all metrics for one or more taxonomic profiles

positional arguments:
  profiles_files        Files of profiles

optional arguments:
  -h, --help            show this help message and exit
  -g GOLD_STANDARD_FILE, --gold_standard_file GOLD_STANDARD_FILE
                        Gold standard file
  -n, --no_normalization
                        Do not normalize samples
  -p, --plot_abundances
                        Plot abundances in the gold standard (can take some
                        minutes)
  -l LABELS, --labels LABELS
                        Comma-separated profiles names
  -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                        Directory to write the results to

Example: To run the example, please download the files given in the data directory.

python3 opal.py -g data/goldstandard_low_1.bin \
data/cranky_wozniak_13 \
data/grave_wright_13 \
data/furious_elion_13 \
data/focused_archimedes_13 \
data/evil_darwin_13 \
data/agitated_blackwell_7 \
data/jolly_pasteur_3 \
-l "TIPP, Quikr, MP2.0, MetaPhyler, mOTU, CLARK, FOCUS" \
-o output_dir

Output: Directory output_dir will contain:

  • results.html
  • results.tsv
  • subdirectory per_rank with a .tsv file per taxonomic rank
  • subdirectory per_tool with a .tsv file per tool (CLARK.tsv, FOCUS.tsv, MetaPhyler.tsv, mOTU.tsv, MP2.0.tsv, Quikr.tsv, and TIPP.tsv)
  • spider_plot.pdf
  • spider_plot_recall_precision.pdf
  • plot_shannon.pdf

Note: spider plots will only be generated if at least 3 profiles are provided, so that the plots can form a triangle.

Running tsv2biom.py

usage: tsv2biom.py [-h] -o OUTPUT_FILE [-j] files [files ...]

Convert profile in the CAMI Bioboxes format to BIOM

positional arguments:
  files                 Input file(s), one file per sample

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT_FILE, --output_file OUTPUT_FILE
                        Output file
  -j, --json            Output in json (default: hdf5)

Example:

python3 tsv2biom.py data/cranky_wozniak_13 -o output_dir/cranky_wozniak_13.biom

Developer Guide

We are using tox for project automation.

Tests

If you want to run tests, just type the following in the project's root directory:

tox

Citation

Please cite:

  • Fernando Meyer, Andreas Bremges, Peter Belmann, Stefan Janssen, Alice Carolyn McHardy, and David Koslicki (2018). Assessing taxonomic metagenome profilers with OPAL. bioRxiv. doi:10.1101/372680

Part of OPAL's functionality was described in the CAMI manuscript. Thus please also cite:

  • Alexander Sczyrba, Peter Hofmann, Peter Belmann, et al. (2017). Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software. Nature Methods, 14, 11:1063–1071. doi:10.1038/nmeth.4458