Permalink
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
79 lines (54 sloc) 3.17 KB

The CAMI (Taxonomic) Binning Output Format

1. Outline

CAMI stands for Critical Assessment of Metagenome Interpretation. The CAMI binning format was originally specified for this contest. Because it is intended to serve as a standard format for binning methods, it is now maintained at https://github.com/bioboxes/rfc and all binning submissions for CAMI must comply with the Bioboxes Binning Format version 0.9.

2. Description of header tags

   #CAMI Submission for Binning

This line is an optional commment line describing the file content.

   @Version:0.9.0

This line gives information about the format version.

   @SampleID:SAMPLEID

SAMPLEID is a unique identifier for a sequence sample. Do not give a tool name or your name here, we want to process submissions anonymously. If you want to submit results for a dataset, just refer to its name as the sample id (e.g.:1stCAMIChallenge_medium_GoldStandardAssembly). If you want to submit to a specific sample just use the name of the sample (e.g.:CAMI_MED_S001).

   @@SEQUENCEID	TAXID	BINID	

The last line is always the header for the output section and begins with '@@'. It defines the content of the output columns in a tab-separated format. The tags are described in the output section below.

3. Description of output section

The TAXID is a taxonomic assignment for your binned sequences (according to the provided taxonomy version from the download site) and will be used for evaluation of your predictions with taxonomy-based measures.

The BINID entries can be arbitrary alphanumeric identifiers for the bins. No taxonomy-based evaluation will be performed using the entries in this column.

There are three different scenarios for binning tools.

The first case, example A below: If you create taxonomic bins as output without further resolution, you do not need to include the BINID colummn, but only the TAXID column, in your output.

The second case, example B below: If you create bins that do not include taxonomic assignments you do not need to include the TAXID column, but only the BINID column, in your output.

The third case, example C below, is if you perform taxonomic binning and additionally resolve bins below existing taxonomic IDs, e.g. to define bins representing novel strains. In this case, you add both the TAXID. It will be easiest to check for consistency for us, if you in this case use for the BINID the TAXID entry and a numeric identifier appended (e.g. 562.2).

If you want to specify extra columns in the output, prefix each column name with _YourProgramName_ and only append columns to the right of the standard ones.

4. EXAMPLES

A

#CAMI Format for Binning
@Version:0.9.0
@@SEQUENCEID		TAXID	

read1201	123	
read1202	123	
read1203	131564	
read1204	562	
read1205	562	

B

#CAMI Format for Binning
@Version:0.9.0
@@SEQUENCEID		BINID	

read1201	12346BIN
read1202	ANOTHERBIN	
read1203	BIN6	
read1204	BIN5	
read1205	BIN5	

C

#CAMI Format for Binning
@Version:0.9.0
@@SEQUENCEID		TAXID	BINID

read1201	123	123
read1202	123	123
read1203	131564	131564
read1204	562	562.1
read1205	562	562.2