Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fasta in https://cog-uk.s3.climb.ac.uk/phylogenetics/latest/ -- xz compression? #180

Closed
AngieHinrichs opened this issue Jan 7, 2022 · 4 comments

Comments

@AngieHinrichs
Copy link

AngieHinrichs commented Jan 7, 2022

Recently my fetches of https://cog-uk.s3.climb.ac.uk/phylogenetics/latest/cog_all.fasta have been repeatedly failing. Even with multiple "curl -C -" commands to continue at the offset where the previous fetch failed, sometimes 5 attempts is not enough to get the whole file.

The uncompressed cog_all.fasta is >54GB now (!), but when compressed with xz (which can run multi-threaded), it's much smaller, ~111MB.

Would it be possible to xz-compress cog_all.fasta (and possible other download files as well)? I hope that would help the network transfers, and if you're using cloud storage, should save costs there as well.

@rmcolq
Copy link

rmcolq commented Jan 7, 2022 via email

@AngieHinrichs
Copy link
Author

Oh, well, that's kind of embarrassing, I should have just tried cog_all.fasta.gz! 😆 https://cog-uk.s3.climb.ac.uk/phylogenetics/latest/ doesn't offer a listing but, um, yeah. Looks like cog_all.fasta.gz is 7GB -- better than 54GB, but still, could be 0.1GB with xz. :) Thanks!

@SamStudio8
Copy link
Member

@AngieHinrichs, you might find https://data.covid19.climb.ac.uk/changelog a useful log to check in on occasionally as we try our best to put up notifications of any changes to the data set and our processes -- like when compressed outputs were added.

@AngieHinrichs
Copy link
Author

Noted, thanks @SamStudio8. It would be great to get https://www.cogconsortium.uk/tools-analysis/public-data-analysis-2/ updated to point to the compressed versions too -- I will email the address on that page, contact@cogconsortium.uk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants