CosteaPaul/qaTools
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
master
Could not load branches
Nothing to show
Could not load tags
Nothing to show
{{ refName }}
default
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code
-
Clone
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
- Open with GitHub Desktop
- Download ZIP
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
A couple of useful qa tools for sequencing data. I. Setup: Change Makefile $SAMTOOLS path to your samtools instalation path. If you don't have samtools, you can find it here: http://samtools.sourceforge.net/ or git clone git://github.com/samtools/samtools.git cd samtools make Change $HTSLIB to installation path of htslib. If you don't have it, you can get it like this: git clone https://github.com/samtools/htslib.git cd htslib make Just run make and you should be done! II. Tools: 1. qaCompute Computes normal and span coverage from a bam/sam file. Also counts unmapped and sub-par quality reads. Parameters: m - Compute median coverage for each contig/chromosome. Will make running a bit slower. Off by default. q [INT] - Quality threshold. Any read with a mapping quality under INT will be ignored when computing the coverage. NOTE: bwa outputs mapping quality 0 for reads that map with equal quality in multiple places. If you want to condier this, set q to 0. d - Print coverage histrogram over each individual contig/chromosome. These details will be printed in file <output>.detail p [INT] - Print coverage profile to bed file, averaged over given window size. i - Silent run. Will not print running info to stdout. s [INT] - Compute span coverage. (Use for mate pair libs) Instead of actual read coverage, using the options will consider the entire span of the insert as a read, if insert size is lower than INT. For an accurate estimation of span coverage, I recommend setting an insert size limit INT around 3*std_dev of your lib's insert size distribution. c [INT] - Maximum X coverage to consider in histogram. h [STR] - Use different header. Because mappers sometimes break the headers or simply don't output them, this is provieded as a non-kosher way around it. Use with care! For more info on the parameteres try ./qaCompute 2. removeUnmapped Remove unmapped and sub-par quality reads from a bam/sam file. For more info on the parameters try ./removeUnmapped 3. computeInsertSizeHistogram Compute the insert size distribution from a bam/sam file. For more info on the parameters try ./computeInsertSizeHistogram