Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 6 revisions

Biopiece: classify_taxonomy

Description

Warning: classify_taxonomy is under active development and testing.

classify_taxonomy parses taxonomy string from the Q_ID of records in the stream. For each Q_ID a taxonomy tree is created with nodes for each level (kingdom, phylum, class, etc) containing the taxonomic information at each node as well as the count and mean identity score. Using the -l switch will trim the taxonomic trees so that the lowest common ancester is output. Using the -s switch will add to the size the include cluster size from the Q_ID where this may be suffixed with _<cluster count>.

classify_taxonomy only works on headers of the GreenGenes format where the sequence name contains a taxonomy string of the format:

k__Archaea; p__Euryarchaeota; c__Methanococci; o__Methanococcales [...]

The records look like this:

REC_TYPE: Classification
LEVEL: phylum
NAME: SM2F11
COUNT: 3
SCORE: 0.65
---

Usage

... | classify_taxonomy [options]

Options

[-?         | --help]               # Print full usage description.
[-m <uint>  | --min_count]          # Debranch nodes where count <= min_count.
[-l         | --LCA]                # Output lowest common ancestor.
[-s <uint>  | --size=<uint>]        # Parse cluster size from Q_IDs.
[-I <file!> | --stream_in=<file!>]  # Read input from stream file     -  Default=STDIN
[-O <file>  | --stream_out=<file>]  # Write output to stream file     -  Default=STDOUT
[-v         | --verbose]            # Verbose output.

Examples

Here is an example of a complete taxonomic pipeline:

read_sff -ci data.sff |
extract_seq -l 500 |
trim_seq -l 10 |
grab -e 'SEQ_LEN >= 50' |
denoise_seq -vi 1 -r 0.6 |
denoise_seq -vi 0.98 -c 2 |
findsim_seq -vSQd sequences_16S_all_gg_2011_1_unaligned.fasta.gz |
grab -e 'REC_TYPE eq findsim' |
classify_taxonomy -ls |
grab -e 'REC_TYPE eq Classification' |
write_tab -ck COUNT,SCORE,LEVEL,NAME -o result.tab -x

See also

read_fasta

findsim_seq

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

mail@maasha.dk

October 2012

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

classify_taxonomy is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally