Thank you for doing this benchmark #1

wraetz · 2021-07-28T13:31:34Z

It would be even more interesting if you take the kind of accession ( flat-table vs. cSRA aka aligned ) into consideration.
fasterq-dump was written to speed up the fastq-conversion for cSRA files, there speed up is the highest.
flat tables are processed by fasterq-dump too, but the speed-up is not that big compared to fastq-dump.

You can check what kind it is by running 'vdb-dump SRRXXXXXX --info'
If it has only 1 table ( the SEQUENCE-table ) it is a flat table, aka unaligned data.
If it has a SEQUENCE, a REFERENCE and at least a PRIMARY_ALIGNMENT - table, it is a cSRA aka aligned data.
( cSRA stands for compressed against the reference )

My guess is that the outcome will be:

for flat tables used parallel fastq-dump
for cSRA use prefetch + fasterq-dump

However it would be interesting if you can prove that

Midnighter · 2021-07-28T14:04:44Z

Cheers, thank you for the input. With parallel-fastq-dump I was specifically talking about this Python wrapper. Please also see the ongoing dicussion in the corresponding issue.

Would you happen to have some relevant run accessions for me to test this? I'm not sure right now how I would find them.

Midnighter · 2021-07-30T23:57:24Z

@wraetz I still don't know how to find sequencing information stored either in a flat format or the cSRA format.

But I did run another comparison that only compares extraction after prefetch and compression.

Midnighter mentioned this issue Jul 31, 2021

Benchmark comparison rvalieris/parallel-fastq-dump#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thank you for doing this benchmark #1

Thank you for doing this benchmark #1

wraetz commented Jul 28, 2021

Midnighter commented Jul 28, 2021

Midnighter commented Jul 30, 2021

Thank you for doing this benchmark #1

Thank you for doing this benchmark #1

Comments

wraetz commented Jul 28, 2021

Midnighter commented Jul 28, 2021

Midnighter commented Jul 30, 2021