Quality control for data generated by Illumina Sequencing.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
documentation
include
src
testing
.gitignore
LICENSE
Makefile
README.md

README.md

DOI

HTseqQA

Quality control for data generated by Illumina Sequencing.

Why?

We needed a scalable local solution for generating quality control reports in the 100k Genomes Project. We started with these measurements and visualizations and will likely add more later.

Installation/Dependencies:

The program will need to have a c++11 compatible compiler. To visualize the graphs you'll need R and ggplot2.

Usage:

./HTseqQA <options> -i <fastq or gzip'd fastq>

Other options

-o <int> , manually set the offset

-r , Only print out the Rscript for figure generation

-g , Creat a greyscale version of graphs

If you need to be running many files I suggest using parallel:

parallel HTseqQA -i {} ::: /path/to/all/*.fastqs

On a standard computer we're able to process 4000+ files for bacterial genomes over night.

Read Counts

Simple text file that tells you the number of reads seen.

Cumulative Quality Scores

Cumulative Quality Scores Graph

Nucleotide Proportions

Nucleotide Proportions

Quality Distributions by Position

Quality Distributions by Position

Passing Reads Filter

Passing Reads Filter

Sequence Uniqueness

Passing Reads Filter

Read GC Content Distrubution

GC Content Distribution