Skip to content

Filter BAM file examples

Buys de Barbanson edited this page Jan 11, 2021 · 6 revisions

Extracting a subset of reads can be done using bamFilter.py. In the filter expression r is a reference to a pysam.AlignedSegment, see https://pysam.readthedocs.io/en/latest/api.html#pysam.AlignedSegment for all available attributes.

The following line extracts cells with sample names (SM) my_experiment_cell_315 and my_experiment_cell_314, and writes it to cell_315_and_314.bam .

bamFilter.py input.bam "r.get_tag('SM') in ['my_experiment_cell_315','my_experiment_cell_314']" -o cell_315_and_314.bam 

Extract all transcriptome reads from a mixed type file:

bamFilter.py input.bam "r.has_tag('dt') and r.get_tag('dt)=='RNA'" -o transcriptome.bam 

Extract all DNA reads from a mixed type file:

bamFilter.py input.bam "r.has_tag('dt') and r.get_tag('dt)=='DNA'" -o genome.bam 

Extract all reads with an assigned cut site:

bamFilter.py input.bam "r.has_tag('DS')" -o with_cut_site.bam 

Extract all unique reads with an assigned cut site:

bamFilter.py input.bam "r.has_tag('DS') and not r.is_duplicate" -o with_cut_site_unique.bam 

Extract all unmethylated RNA reads from a mixed type file:

bamFilter.py input.bam "r.has_tag('dt') and r.get_tag('dt)=='RNA' and not r.has_tag('MC') or read.get_tag('MC')==0" -o transcriptome_without_methylation.bam