Skip to content

IARCbioinfo/mpileup2readcounts

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 

Repository files navigation

Build Status

Synopsis

Get the readcounts at a locus by piping samtools mpileup output. The mpileup can contain one or several samples. This program has been tested on samtools v1.3.1

Install samtools

Compile mpileup2readcounts :

g++ -std=c++11 -O3 mpileup2readcounts.cc -o mpileup2readcounts

Usage

samtools mpileup -f ref.fa -l regions.bed BAM/*.bam | sed 's/		/	* 	*/g' | ./mpileup2readcounts 0 -5 false 3 0

Samtools arguments :

  • FASTA file
  • bed file
  • BAM files : several samples can be parsed

Four options for mpileup2readcounts :

  • 0 to parse all sample otherwise specify the number of the sample (for example 1 for the first sample)
  • BQcut : base quality score cutoff for each mapped/unmapped base, only those larger than cutoff will be output in the result, to use no filter set BQcut to -5
  • true to ignore indels
  • min_ao : minimum number of non-ref reads in at least one sample to consider a site
  • min_af : minimum allelic fraction in at least one sample to consider a site

Example output

chr loc ref depth A T C G a t c g Insertion Deletion depth A T C G a t c g Insertion Deletion
17 7572814 C 28 0 3 23 0 0 0 2 0 NA NA 8 0 0 8 0 0 0 0 0 NA NA
17 7572817 C 32 2 0 26 0 2 0 2 0 NA NA 8 0 0 8 0 0 0 0 0 NA NA
17 7579643 C 48 0 0 9 0 0 0 39 0 NA 4:ccccagccctccaggt|2:CCCCAGCCCTCCAGGT 9 0 0 6 0 0 0 3 0 NA NA

Line content

Common information for all samples:

  • chromosome
  • position on the chromosome
  • reference base

For each sample :

  • depth
  • ATCG/atcg count
  • insertions
  • deletions : in the example, for the first sample 6 deletions starting from position 7579643 + 1 are found

Using GNU parallel tool

We first remove overlap in the bed using bedtools. Note that the header is removed here:

grep -v '^track' regions.bed | sort -k1,1 -k2,2n | bedtools merge -i stdin | awk '{print $1"\t"$2"\t"$3}' | sed 's/[[:space:]]/:/' | sed 's/[[:space:]]/-/' | parallel --keep-order "samtools mpileup -f ref.fa --region {} test.bam | sed 's/		/	*	*/g' | mpileup2readcounts 0 -5 false 0 | tail -n +2"

About

Get per-nucleotide readcounts from samtools mpileup

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 100.0%