Skip to content

CosteaPaul/qaTools

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
bin
 
 
 
 
 
 
 
 
 
 
 
 
A couple of useful qa tools for sequencing data.

I. Setup:

   Change Makefile $SAMTOOLS path to your samtools instalation path.
   If you don't have samtools, you can find it here:
      http://samtools.sourceforge.net/
   or
      git clone git://github.com/samtools/samtools.git
      cd samtools
      make

   Change $HTSLIB to installation path of htslib.
   If you don't have it, you can get it like this:
      git clone https://github.com/samtools/htslib.git 
      cd htslib
      make

   Just run make and you should be done!

II. Tools:

1. qaCompute
   Computes normal and span coverage from a bam/sam file.
   Also counts unmapped and sub-par quality reads.
   Parameters:
   m	    -	Compute median coverage for each contig/chromosome. 
   		Will make running a bit slower. Off by default.
   
   q [INT]  -   Quality threshold. Any read with a mapping quality under
                INT will be ignored when computing the coverage.
                
		NOTE: bwa outputs mapping quality 0 for reads that map with
		equal quality in multiple places. If you want to condier this,
		set q to 0.
   
   d        -   Print coverage histrogram over each individual contig/chromosome.
   	        These details will be printed in file <output>.detail
  
   p [INT]  -   Print coverage profile to bed file, averaged over given window size.  
   
   i        -   Silent run. Will not print running info to stdout.

   s [INT]  -   Compute span coverage. (Use for mate pair libs)
                Instead of actual read coverage, using the options will consider
                the entire span of the insert as a read, if insert size is
		lower than INT. 
 		For an accurate estimation of span coverage, I recommend
		setting an insert size limit INT around 3*std_dev of your lib's 
		insert size distribution.

   c [INT]  -   Maximum X coverage to consider in histogram.

   h [STR]  -   Use different header. 
                Because mappers sometimes break the headers or simply don't output them, 
		this is provieded as a non-kosher way around it. Use with care!

   For more info on the parameteres try ./qaCompute

2. removeUnmapped
   Remove unmapped and sub-par quality reads from a bam/sam file.
   For more info on the parameters try ./removeUnmapped

3. computeInsertSizeHistogram
   Compute the insert size distribution from a bam/sam file.
   For more info on the parameters try ./computeInsertSizeHistogram 

About

Some more QA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published