Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 6 revisions

Biopiece: read_sff

Description

read_sff read in sequence entries from SFF files. Quality scores will be converted to base 64 phred type scores (like Illumina). The resulting recods look like this:

SEQ_NAME: FQIBXOY01DRIMT
SEQ: TCAGTCATATTTTT...
SEQ_LEN: 279
CLIP_QUAL_LEFT: 4
CLIP_QUAL_RIGHT: 277
CLIP_ADAPTOR_LEFT: -1
CLIP_ADAPTOR_RIGHT: -1
SCORES: aaa`[[[_[NNNNNN...
X_POS: 1426
Y_POS: 1923
---

Negative values for any of the CLIP_ADAPTOR_ keys indicates that no adaptor was found.

read_sff don't work on gzipped files.

For more about the SFF format:

http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?cmd=show&f=formats&m=doc&s=format#sff

Usage

read_sff [options] -i <SFF file(s)>

Options

[-?          | --help]               #  Print full usage description.
[-i <files!> | --data_in=<files!>]   #  Comma separated list of files or glob expression to read.
[-n <uint>   | --num=<uint>]         #  Limit number of records to read.
[-m          | --mask]               #  Mask sequence according to clipping information.
[-c          | --clip]               #  Clip sequence according to clipping information.
[-I <file>   | --stream_in=<file!>]  #  Read input stream from file  -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output stream to file  -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

To read in 1 entry from a SFF file use read_sff with the -n switch: (SEQ and SCORES truncated for brievity)

read_sff -n 1 -i test.sff

SEQ_NAME: FQIBXOY01DRIMT
SEQ: TCAGTCATATTTTT...
SEQ_LEN: 279
CLIP_QUAL_LEFT: 4
CLIP_QUAL_RIGHT: 277
CLIP_ADAPTOR_LEFT: -1
CLIP_ADAPTOR_RIGHT: -1
SCORES: aaa`[[[_[NNNNNN...
X_POS: 1426
Y_POS: 1923
---

Use the -m switch to soft mask the sequences according to the CLIP_QUAL information:

read_sff -n 1 -i test.sff -m

SEQ_NAME: FQIBXOY01DRIMT
SEQ: tcagTCATATTTTT...
SEQ_LEN: 279
CLIP_QUAL_LEFT: 4
CLIP_QUAL_RIGHT: 277
CLIP_ADAPTOR_LEFT: -1
CLIP_ADAPTOR_RIGHT: -1
SCORES: aaa`[[[_[NNNNNN...
X_POS: 1426
Y_POS: 1923
---

Or use the -c switch to clip the sequences according to the CLIP_QUAL information:

read_sff -n 1 -i test.sff -c

SEQ_NAME: FQIBXOY01DRIMT
SEQ: TCATATTTTT...
SEQ_LEN: 275
CLIP_QUAL_LEFT: 4
CLIP_QUAL_RIGHT: 277
CLIP_ADAPTOR_LEFT: -1
CLIP_ADAPTOR_RIGHT: -1
SCORES: [[[_[NNNNNN...
X_POS: 1426
Y_POS: 1923
---

See also

scores_to_dec

[read_454]

[write_454]

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

mail@maasha.dk

Februar 2011

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

read_sff is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally