[sambamba view] Filter expression syntax

Artem Tarasov edited this page Jun 8, 2016 · 7 revisions

Sambamba-view supports custom filtering for alignment records. This wiki page describes syntax of filter expressions which are provided by the user with --filter command-line option. Fields and flags are described in the SAM specification.

Syntax overview

A filter expression is a number of basic conditions linked by and, or, not logical operators, and enclosed in parentheses where needed.

Basic condition is a one for a single record field, tag, or flag.

You can use ==, !=, >, <, >=, <= comparison operators for both integers and strings.

Strings are delimited by single quotes, if you need a single quote inside a string, escape it with \.

Usage

Reduce the BAM file to a BAM file containing reads on the second reference sequence chr2 as described in the SAM header.

sambamba view -F "ref_id==1" -f bam HG01375.mapped.ILLUMINA.bwa.CLM.low_coverage.20120522.bam > HG01375.mapped.ILLUMINA.bwa.CLM.low_coverage.20120522_chr2.bam

Show all read names that start with ERR

sambamba view -F "read_name =~ /^ERR/" HG01375.mapped.ILLUMINA.bwa.CLM.low_coverage.20120522_chr1.bam

More examples of filter expressions

    mapping_quality >= 30 and ([RG] =~ /^abcd/ or [NM] == 7)
    read_name == 'abc\'def'

Basic conditions for flags

The following flag names are recognized:

  • paired
  • proper_pair
  • unmapped
  • mate_is_unmapped
  • reverse_strand
  • mate_is_reverse_strand
  • first_of_pair
  • second_of_pair
  • secondary_alignment
  • failed_quality_control
  • duplicate
  • supplementary
  • chimeric

Flag example

    not (unmapped or mate_is_unmapped) and first_of_pair

Basic conditions for fields

Conditions for integer and string fields are supported.

List of integer fields:

  • ref_id
  • position
  • mapping_quality
  • sequence_length
  • mate_ref_id
  • mate_position
  • template_length

List of string fields:

  • read_name
  • sequence
  • cigar
  • strand ('+'/'-')
  • ref_name
  • mate_ref_name

Example

    ref_id == 3 and mapping_quality >= 50 and sequence_length >= 80

Basic conditions for tags

Tags are denoted by their names in square brackets, for instance, [RG] or [Q2]. They support conditions for both integers and strings, i.e. the tag must also hold value of the corresponding type.

In order to do filtering based on the presence of a particular tag, you can use special null value.

Example

    [RG] != null and [AM] == 37
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.