Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thank you for doing this benchmark #1

Open
wraetz opened this issue Jul 28, 2021 · 2 comments
Open

Thank you for doing this benchmark #1

wraetz opened this issue Jul 28, 2021 · 2 comments

Comments

@wraetz
Copy link

wraetz commented Jul 28, 2021

It would be even more interesting if you take the kind of accession ( flat-table vs. cSRA aka aligned ) into consideration.
fasterq-dump was written to speed up the fastq-conversion for cSRA files, there speed up is the highest.
flat tables are processed by fasterq-dump too, but the speed-up is not that big compared to fastq-dump.

You can check what kind it is by running 'vdb-dump SRRXXXXXX --info'
If it has only 1 table ( the SEQUENCE-table ) it is a flat table, aka unaligned data.
If it has a SEQUENCE, a REFERENCE and at least a PRIMARY_ALIGNMENT - table, it is a cSRA aka aligned data.
( cSRA stands for compressed against the reference )

My guess is that the outcome will be:

  • for flat tables used parallel fastq-dump
  • for cSRA use prefetch + fasterq-dump

However it would be interesting if you can prove that

@Midnighter
Copy link
Member

Cheers, thank you for the input. With parallel-fastq-dump I was specifically talking about this Python wrapper. Please also see the ongoing dicussion in the corresponding issue.

Would you happen to have some relevant run accessions for me to test this? I'm not sure right now how I would find them.

@Midnighter
Copy link
Member

@wraetz I still don't know how to find sequencing information stored either in a flat format or the cSRA format.

But I did run another comparison that only compares extraction after prefetch and compression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants