Skip to content

Subsample FASTQ by sampling connected components of a de-Bruijn graph

Notifications You must be signed in to change notification settings

blahah/graphsample

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

graphsample

Subsample FASTQ by sampling connected components of a de-Bruijn graph

Taking a subsample from a FASTQ can lead to poor coverage of some regions in the subsample, causing the subsample to have informational properties that are not representative of the full read set.

graphsample addresses this problem by building a de-Bruijn graph from the reads, identifying all the connected components, and randomly sampling those components. It outputs the reads that belong to the sampled components.

What's the point? Glad you asked! graphsample allows you to take a small subsample from a large set of reads, and use the subsample to optimise the parameters of any tools and algorithms you want to run on the full set.

Compiling

$ git clone --recursive https://github.com/Blahah/graphsample.git
$ cd graphsample
$ make

Running

$ bin/graphsample --help

About

Subsample FASTQ by sampling connected components of a de-Bruijn graph

Resources

Stars

Watchers

Forks

Packages

No packages published