Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --compress=best #22

Open
bgilbert opened this issue Jun 23, 2022 · 3 comments
Open

Add --compress=best #22

bgilbert opened this issue Jun 23, 2022 · 3 comments
Labels
enhancement New feature or request good first issue Good for newcomers triaged This issue was evaluated, no more information is needed

Comments

@bgilbert
Copy link
Contributor

The tar compression flags invoke gzip and zstd with their default compression levels, which are not especially tight. Since we're creating long-term archives, we should probably compress as tightly as reasonably possible. That does require that we invoke the compressor separately.

@cgwalters
Copy link
Member

I was curious,


$ time zstd  packaging/rpm-ostree-2022.10.50.g7bb3e30b.tar
packaging/rpm-ostree-2022.10.50.g7bb3e30b.tar : 13.90%   (   125 MiB =>   17.3 MiB, packaging/rpm-ostree-2022.10.50.g7bb3e30b.tar.zst) 

________________________________________________________
Executed in    1.02 secs      fish           external
   usr time  292.11 millis    0.00 micros  292.11 millis
   sys time   45.13 millis  830.00 micros   44.30 millis
$ time zstd -11 packaging/rpm-ostree-2022.10.50.g7bb3e30b.tar
packaging/rpm-ostree-2022.10.50.g7bb3e30b.tar : 11.43%   (   125 MiB =>   14.3 MiB, packaging/rpm-ostree-2022.10.50.g7bb3e30b.tar.zst) 

________________________________________________________
Executed in    4.38 secs    fish           external
   usr time    2.35 secs    0.00 micros    2.35 secs
   sys time    0.07 secs  827.00 micros    0.07 secs
$ time zstd -19 packaging/rpm-ostree-2022.10.50.g7bb3e30b.tar
packaging/rpm-ostree-2022.10.50.g7bb3e30b.tar : 10.12%   (   125 MiB =>   12.6 MiB, packaging/rpm-ostree-2022.10.50.g7bb3e30b.tar.zst) 

________________________________________________________
Executed in   30.42 secs    fish           external
   usr time   29.86 secs    0.00 micros   29.86 secs
   sys time    0.08 secs  838.00 micros    0.08 secs

It's not really clear to me that even -11 is worth it...10x slower to become 2% smaller. And -19 being 100x slower to save < 4% seems even worse.

The biggest wins for zstd are when a pre-trained dictionary is provided, but that makes everything stateful and hence harms reproducibility.

(zstd vs gzip on my test case has zstd being ~10x faster to compress at default levels than gzip while reaching about the same compression)

@bgilbert
Copy link
Contributor Author

Hmm, I'm not sure I agree. Releases are done rarely, their artifacts live forever, and 3 more seconds (or 30) isn't that costly. (gzip's relative performance isn't directly relevant here, since the reason to use gzip is compatibility rather than performance.)

This is mitigated by the fact that vendor tarballs are probably not used very often, so the worldwide storage/transfer cost is limited.

@cgwalters
Copy link
Member

I have absolutely no objections to adding --compress=slower or something or maybe just compress=best matching gzip. Or, if someone wants to do that by default and we add --compress=fast instead, OK by me.

I'd admit my perspective on this is a bit tainted by the fact that right now at this moment I am running this program frequently for local interactive testing, so speed is better than compression 😄

In the end though for storage, the worldwide growth in high-definition video, big data in general, etc. has driven immense gains in storage technologies, things like SMR hard drives. My kids just yesterday recorded multiple 5-10 minute videos of themselves singing and I am sure they have no idea how much storage that took on my phone. The vendor tarballs are tiny peanuts compared to this stuff.

@cgwalters cgwalters changed the title Compress more tightly Add --compress=best Apr 30, 2023
@cgwalters cgwalters added enhancement New feature or request triaged This issue was evaluated, no more information is needed good first issue Good for newcomers labels Apr 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers triaged This issue was evaluated, no more information is needed
Projects
None yet
Development

No branches or pull requests

2 participants