Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 6 revisions

Biopiece: intersect_records

Description

intersect_records intersect records in the stream based on overlapping intervals. Intersection is done by splitting the stream and intersect all records with a specific key with all records without a specific key. Intersection are done by locating overlapping intervals of S_BEG and S_END positions with the same S_ID and optionally the same STRAND. If a overlap is found the record without the specific key is emitted to the stream, unless the --inverse switch is set which results in non-intersecting records being emitted.

Usage

... | intersect_records <-k key> [options]

Options

[-?          | --help]               #  Print full usage description.
[-k <string> | --key=<string>]       #  Key used for intersection.
[-s          | --strand]             #  Only intervals on the same strand are intersected.
[-i          | --inverse]            #  Only non-intersecting records are emitted.
[-I <file!>  | --stream_in=<file!>]  #  Read input from stream file  -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output to stream file  -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

Consider the following two test files foo.tab:

#S_ID   S_BEG   S_END   STRAND
ID000001        100     400     +
ID000001        500     800     +

and bar.tab:

#S_ID   S_BEG   S_END   STRAND
ID000001        200     300     +
ID000001        600     700     -

In order to intersect foo.tab with bar.tab so that records from bar.tab are emitted if they intersect with records in foo.tab we use read_tab like this:

read_tab -i foo.tab | add_ident -k INTERSECT | read_tab -i bar.tab | intersect_records -k INTERSECT

STRAND: +
S_ID: ID000001
S_BEG: 200
S_END: 300
---
STRAND: -
S_ID: ID000001
S_BEG: 600
S_END: 700
---

Now, to intersect in a strand dependent manner use the -s switch:

read_tab -i foo.tab | add_ident -k INTERSECT | read_tab -i bar.tab | intersect_records -k INTERSECT -s

STRAND: +
S_ID: ID000001
S_BEG: 200
S_END: 300
---

And to inverse the result so that only non-intersecting records are emitted use the -i switch:

read_tab -i foo.tab | add_ident -k INTERSECT | read_tab -i bar.tab | intersect_records -k INTERSECT -si

STRAND: -
S_ID: ID000001
S_BEG: 600
S_END: 700
---

See also

read_tab

add_ident

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

mail@maasha.dk

December 2009

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

intersect_records is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally