Skip to content

carrascomj/ripkmer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ripkmer

Build Status
There are two ways of viewing this:

  • Some k-mer algorithms using Rust-Bio [1].
  • My first project in Rust just to get confident with it.

Features

The first idea is to reproduce in Rust the KmerFinder [2] (in Python, but also in JavaScript).

  • K-mer count on FASTQ.
  • Filter by prefix.
  • Make it work for FASTA and BED files.
  • Compare k-mer distribution of two inputs.
  • Move towards a KMA implementation.

CLI Example

For this example, the first two FASTQ files of SRR396636, corresponding to reads from Pseudomonas aeruginsa MPAO1/P1, with 1909263 sequences of ~100 bp each, were downloaded.

Having ripkmer installed and in the $PATH:

ripkmer SRR396636.sra_1.fastq SRR396636.sra_2.fastq

where the k number and the prefix would be left as default, being equivalent to:

ripkmer SRR396636.sra_1.fastq SRR396636.sra_2.fastq 16 ATGAC

The output is in tabular format and can be redirected to standard output (and should not take much more than 4s).

(16-mers)	Unique	Redundant	Intersection_unique	Intersection
SRR396636.sra_1.fastq	23196	97871	34.19%	58.81%
SRR396636.sra_2.fastq	30698	89107	25.83%	64.59%

where

  • Unique is the number of unique k-mers found in the file;
  • Redundant is the number of total k-mers found (with repetitions);
  • Interesection_unique is the number of common unique k-mers found in both files;
  • and Intersection is the number of total common k-mers found.

References

[1] Köster, J. (2016). Rust-Bio: a fast and safe bioinformatics library. Bioinformatics, 32(3), 444-446.
[2] Benchmarking of Methods for Genomic Taxonomy. Larsen MV, Cosentino S, Lukjancenko O, Saputra D, Rasmussen S, Hasman H, Sicheritz-Pontén T, Aarestrup FM, Ussery DW, Lund O. J Clin Microbiol. 2014 Feb 26

About

Playing around with kmers very fast!

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages