Skip to content

dylanbstorey/HTseqQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOI

HTseqQA

Quality control for data generated by Illumina Sequencing.

Why?

We needed a scalable local solution for generating quality control reports in the 100k Genomes Project. We started with these measurements and visualizations and will likely add more later.

Installation/Dependencies:

The program will need to have a c++11 compatible compiler. To visualize the graphs you'll need R and ggplot2.

Usage:

./HTseqQA <options> -i <fastq or gzip'd fastq>

Other options

-o <int> , manually set the offset

-r , Only print out the Rscript for figure generation

-g , Creat a greyscale version of graphs

If you need to be running many files I suggest using parallel:

parallel HTseqQA -i {} ::: /path/to/all/*.fastqs

On a standard computer we're able to process 4000+ files for bacterial genomes over night.

Read Counts

Simple text file that tells you the number of reads seen.

Cumulative Quality Scores

Cumulative Quality Scores Graph

Nucleotide Proportions

Nucleotide Proportions

Quality Distributions by Position

Quality Distributions by Position

Passing Reads Filter

Passing Reads Filter

Sequence Uniqueness

Passing Reads Filter

Read GC Content Distrubution

GC Content Distribution

About

Quality control for data generated by Illumina Sequencing.

Resources

License

Stars

Watchers

Forks

Packages

No packages published