-
-
Notifications
You must be signed in to change notification settings - Fork 20
Description
I was wondering how fast this package is, so I made a small comparison with rust's fastq library. On v1.6, for fastq files rust is about 3-4 times faster, while for fastq.gz it's only 1.6x (I guess decompressing becomes more of a bottleneck). I think I've used every tricks to make the Julia code as fast as possible, have I missed something ?
While I think the performances are pretty good (1.6x doesn't matter that much), improvement are always welcome. It seems Julia's version is still allocating a lot of memory (1.07 M allocations: 225.188 MiB, 0.11% gc time, for a 100MB fastq.gz file), to me it seems the rust library is not really validating or converting anything, it just load the data and memory and goes through it. While have nicely converted records is nice, maybe a more "bare-bone" API could be provided when performance is really crucial, what do you think ?
My rust code (ends up being simpler than the Julia code):
use fastq::{parse_path, Record};
extern crate fastq;
fn main() {
let filename = String::from("test.fastq.gz");
let path = Some(filename);
let mut total: u64 = 0;
let mut tot_qual: u64 = 0;
parse_path(path, |parser| {
parser.each(|record| {
let s = record.seq();
let q = record.qual();
for (is, iq) in s.iter().zip(q.iter()) {
if *is == b'A'{
total += 1;
tot_qual += (*iq -0x21) as u64;
}
}
true
}).expect("Invalid fastq file");
}).expect("Invalid compression");
println!("{:.15}", tot_qual as f64 / total as f64);
}And Julia :
using FASTX, CodecZlib, BioSymbols, BioSequences
function count_reads(fasta)
#reader = FASTQ.Reader(open(fasta))
reader = FASTQ.Reader(GzipDecompressorStream(open(fasta)))
record = FASTQ.Record()
N, tot_qual = 0, 0
seq = LongDNASeq(200)
while !eof(reader)
read!(reader, record)
q = FASTQ.quality(record)
copyto!(seq, record)
@inbounds for i=1:length(q)
if seq[i] == DNA_A
N += 1
tot_qual += q[i]
end
end
end
close(reader)
tot_qual/N
end
fasta = "test.fastq.gz"
@time count_reads(fasta)