The CAMI (Taxonomic) Binning Output Format
1. Outline
CAMI stands for Critical Assessment of Metagenome Interpretation. The CAMI binning format was originally specified for this contest. Because it is intended to serve as a standard format for binning methods, it is now maintained at https://github.com/bioboxes/rfc and all binning submissions for CAMI must comply with the Bioboxes Binning Format version 0.9.
2. Description of header tags
#CAMI Submission for Binning
This line is an optional commment line describing the file content.
@Version:0.9.0
This line gives information about the format version.
@SampleID:SAMPLEID
SAMPLEID is a unique identifier for a sequence sample. Do not give a tool name or your name here, we want to process submissions anonymously. If you want to submit results for a dataset, just refer to its name as the sample id (e.g.:1stCAMIChallenge_medium_GoldStandardAssembly). If you want to submit to a specific sample just use the name of the sample (e.g.:CAMI_MED_S001).
@@SEQUENCEID TAXID BINID
The last line is always the header for the output section and begins with '@@'. It defines the content of the output columns in a tab-separated format. The tags are described in the output section below.
3. Description of output section
The TAXID is a taxonomic assignment for your binned sequences (according to the provided taxonomy version from the download site) and will be used for evaluation of your predictions with taxonomy-based measures.
The BINID entries can be arbitrary alphanumeric identifiers for the bins. No taxonomy-based evaluation will be performed using the entries in this column.
There are three different scenarios for binning tools.
The first case, example A below: If you create taxonomic bins as output without further resolution, you do not need to include the BINID colummn, but only the TAXID column, in your output.
The second case, example B below: If you create bins that do not include taxonomic assignments you do not need to include the TAXID column, but only the BINID column, in your output.
The third case, example C below, is if you perform taxonomic binning and additionally resolve bins below existing taxonomic IDs, e.g. to define bins representing novel strains. In this case, you add both the TAXID. It will be easiest to check for consistency for us, if you in this case use for the BINID the TAXID entry and a numeric identifier appended (e.g. 562.2).
If you want to specify extra columns in the output, prefix each column name with _YourProgramName_
and only append columns to the right of the standard ones.
4. EXAMPLES
A
#CAMI Format for Binning
@Version:0.9.0
@@SEQUENCEID TAXID
read1201 123
read1202 123
read1203 131564
read1204 562
read1205 562
B
#CAMI Format for Binning
@Version:0.9.0
@@SEQUENCEID BINID
read1201 12346BIN
read1202 ANOTHERBIN
read1203 BIN6
read1204 BIN5
read1205 BIN5
C
#CAMI Format for Binning
@Version:0.9.0
@@SEQUENCEID TAXID BINID
read1201 123 123
read1202 123 123
read1203 131564 131564
read1204 562 562.1
read1205 562 562.2