-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fasta in https://cog-uk.s3.climb.ac.uk/phylogenetics/latest/ -- xz compression? #180
Comments
I think there should be compressed versions of the files available except the newick (which is small anyway) from the same file paths with .gz on the end. However I was told not to remove the uncompressed ones to avoid breaking people's existing pipelines.
…Sent from my Galaxy
-------- Original message --------
From: Angie Hinrichs ***@***.***>
Date: 07/01/2022 22:50 (GMT+00:00)
To: COG-UK/dipi-group ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [COG-UK/dipi-group] fasta in https://cog-uk.s3.climb.ac.uk/phylogenetics/latest/ -- xz compression? (Issue #180)
This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email is genuine and the content is safe.
Recently my fetches of https://cog-uk.s3.climb.ac.uk/phylogenetics/latest/cog_all.fasta have been repeatedly failing. Even with multiple "curl -C -" commands to continue at the offset where the previous fetch failed, sometimes 5 attempts is not enough to get the whole file.
The uncompressed cog_all.fasta is >54GB now (!), but when compressed with xz (with can run multi-threaded), it's much smaller, ~111MB.
Would it be possible to xz-compress cog_all.fasta (and possible other download files as well)? I hope that would help the network transfers, and if you're using cloud storage, should save costs there as well.
—
Reply to this email directly, view it on GitHub<#180>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACLIWO6AISLKPOQIGMNGVS3UU5UZTANCNFSM5LP2BQVA>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
|
Oh, well, that's kind of embarrassing, I should have just tried cog_all.fasta.gz! 😆 https://cog-uk.s3.climb.ac.uk/phylogenetics/latest/ doesn't offer a listing but, um, yeah. Looks like cog_all.fasta.gz is 7GB -- better than 54GB, but still, could be 0.1GB with xz. :) Thanks! |
@AngieHinrichs, you might find https://data.covid19.climb.ac.uk/changelog a useful log to check in on occasionally as we try our best to put up notifications of any changes to the data set and our processes -- like when compressed outputs were added. |
Noted, thanks @SamStudio8. It would be great to get https://www.cogconsortium.uk/tools-analysis/public-data-analysis-2/ updated to point to the compressed versions too -- I will email the address on that page, contact@cogconsortium.uk. |
Recently my fetches of https://cog-uk.s3.climb.ac.uk/phylogenetics/latest/cog_all.fasta have been repeatedly failing. Even with multiple "curl -C -" commands to continue at the offset where the previous fetch failed, sometimes 5 attempts is not enough to get the whole file.
The uncompressed cog_all.fasta is >54GB now (!), but when compressed with xz (which can run multi-threaded), it's much smaller, ~111MB.
Would it be possible to xz-compress cog_all.fasta (and possible other download files as well)? I hope that would help the network transfers, and if you're using cloud storage, should save costs there as well.
The text was updated successfully, but these errors were encountered: