New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch from zlib to zstd for backup compression #8211
Comments
I don't know much about compression, but the Wikipedia articles for zlib and zstd seem to indicate that the former refers to both a library and an algorithm, whereas the latter refers to just an algorithm (the compressed files of which typically use the |
Yes, for performance reasons. Zstd has better compression and compresses faster. |
Performance is not the only consideration. Such a change would have significant implications for the ability to recover data from Qubes backups in emergency scenarios. Since gzip is ubiquitous, while zstd is comparatively new, users will have to store some kind of zstd binary with their backups or risk their data being unrecoverable in such scenarios. |
zstd is available on pretty much every Linux distribution IIUC. |
Only in recent years, according to this: https://en.wikipedia.org/wiki/Zstd#Usage It seems like it's still somewhat experimental and in the process of being rolled out. Also, "available on" does not necessarily mean "preinstalled by default," which is a safe assumption for gzip. In many emergency scenarios, the user may only have access to an older computer or an older installation medium (e.g., a Linux ISO on a USB drive or disc that's a few years old). |
I understand the concerns about emergency recovery with zstd but on the other hand the performance benefits (both in compression speed and ratio) of using zstd compared to gzip are pretty impressive and many users would probably like to benefit from this. Would it be possible to just give users the choice between gzip and zstd? That way users concerned about emergency recovery with an old Linux ISO can still use gzip while other users more concerned about performance can switch to zstd. Also, for zstd there should probably also be an option to change the compression level, based on the benchmarks (and the "Compression Speed vs Ratio" diagram) from http://facebook.github.io/zstd/ different users may want to use different tradeoffs between speed and compression ratio. And one more implementation note: zstd readily supports multi-threaded compression, probably a good idea to enable this (e.g. by passing the |
Isn't the option already available? For example, you can already do
It might already be possible to pass sub-arguments when using the |
Many users are using the GUI for doing backups and there is no choice at all for the compression algorithm there, you can only enable/disable gzip compression in the GUI. Would be great to have a choice there to use zstd with a configurable compression level.
Haven't tried it as well but in any case the restoring operation currently doesn't support zstd (even if the header indicates zstd compression) since it is not listed in |
That should be a separate feature request, since it would presumably allow for specifying any supported compression filter (and perhaps a compression level for that compression filter, if applicable), not just zstd. I thought we already had a separate issue for this, but I wasn't able to find one just now. Please feel free to open one, if you still wish to. (Found a somewhat-related issue while searching: #3865)
Ah, I see. Thank you for pointing that out. |
That's only partially true. |
How to file a helpful issue
The problem you're addressing (if any)
zlib compression is slow and is often the bottleneck during backup generation (as per
top(1)
).The solution you'd like
Use zstd compression instead, which is significantly faster and can natively use multiple CPU cores.
The value to a user, and who that user might be
All users will benefit from faster backups.
The text was updated successfully, but these errors were encountered: