Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from zlib to zstd for backup compression #8211

Open
DemiMarie opened this issue May 13, 2023 · 10 comments
Open

Switch from zlib to zstd for backup compression #8211

DemiMarie opened this issue May 13, 2023 · 10 comments
Labels
C: core P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality.

Comments

@DemiMarie
Copy link

How to file a helpful issue

The problem you're addressing (if any)

zlib compression is slow and is often the bottleneck during backup generation (as per top(1)).

The solution you'd like

Use zstd compression instead, which is significantly faster and can natively use multiple CPU cores.

The value to a user, and who that user might be

All users will benefit from faster backups.

@DemiMarie DemiMarie added T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality. P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. labels May 13, 2023
@DemiMarie DemiMarie added this to the Release 4.2 milestone May 13, 2023
@DemiMarie DemiMarie self-assigned this May 13, 2023
@andrewdavidwong
Copy link
Member

I don't know much about compression, but the Wikipedia articles for zlib and zstd seem to indicate that the former refers to both a library and an algorithm, whereas the latter refers to just an algorithm (the compressed files of which typically use the .zst extension). Is this accurate? Are you proposing changing the default compression algorithm used by qvm-backup?

@DemiMarie
Copy link
Author

Are you proposing changing the default compression algorithm used by qvm-backup?

Yes, for performance reasons. Zstd has better compression and compresses faster.

@andrewdavidwong
Copy link
Member

Are you proposing changing the default compression algorithm used by qvm-backup?

Yes, for performance reasons. Zstd has better compression and compresses faster.

Performance is not the only consideration. Such a change would have significant implications for the ability to recover data from Qubes backups in emergency scenarios. Since gzip is ubiquitous, while zstd is comparatively new, users will have to store some kind of zstd binary with their backups or risk their data being unrecoverable in such scenarios.

@andrewdavidwong andrewdavidwong modified the milestones: Release 4.2, Release TBD May 14, 2023
@DemiMarie
Copy link
Author

Are you proposing changing the default compression algorithm used by qvm-backup?

Yes, for performance reasons. Zstd has better compression and compresses faster.

Performance is not the only consideration. Such a change would have significant implications for the ability to recover data from Qubes backups in emergency scenarios. Since gzip is ubiquitous, while zstd is comparatively new, users will have to store some kind of zstd binary with their backups or risk their data being unrecoverable in such scenarios.

zstd is available on pretty much every Linux distribution IIUC.

@andrewdavidwong
Copy link
Member

zstd is available on pretty much every Linux distribution IIUC.

Only in recent years, according to this: https://en.wikipedia.org/wiki/Zstd#Usage

It seems like it's still somewhat experimental and in the process of being rolled out.

Also, "available on" does not necessarily mean "preinstalled by default," which is a safe assumption for gzip.

In many emergency scenarios, the user may only have access to an older computer or an older installation medium (e.g., a Linux ISO on a USB drive or disc that's a few years old).

@jakoblell
Copy link

I understand the concerns about emergency recovery with zstd but on the other hand the performance benefits (both in compression speed and ratio) of using zstd compared to gzip are pretty impressive and many users would probably like to benefit from this. Would it be possible to just give users the choice between gzip and zstd? That way users concerned about emergency recovery with an old Linux ISO can still use gzip while other users more concerned about performance can switch to zstd.

Also, for zstd there should probably also be an option to change the compression level, based on the benchmarks (and the "Compression Speed vs Ratio" diagram) from http://facebook.github.io/zstd/ different users may want to use different tradeoffs between speed and compression ratio.

And one more implementation note: zstd readily supports multi-threaded compression, probably a good idea to enable this (e.g. by passing the -T0 parameter) when adding zstd support.

@andrewdavidwong
Copy link
Member

Would it be possible to just give users the choice between gzip and zstd?

Isn't the option already available? For example, you can already do qvm-backup --compress-filter bzip2 to use bzip2 instead of gzip. I've been using this for years, and it works great. I haven't tried zstd, but I was under the impression that this could be used for any compression filter available in dom0.

Also, for zstd there should probably also be an option to change the compression level [...]

And one more implementation note: zstd readily supports multi-threaded compression, probably a good idea to enable this (e.g. by passing the -T0 parameter) when adding zstd support.

It might already be possible to pass sub-arguments when using the --compress-filter option, but I'm not certain. I vaguely recall experimenting with this many years ago and being able to do it.

@jakoblell
Copy link

Isn't the option already available? For example, you can already do qvm-backup --compress-filter bzip2 to use bzip2 instead of gzip. I've been using this for years, and it works great.

Many users are using the GUI for doing backups and there is no choice at all for the compression algorithm there, you can only enable/disable gzip compression in the GUI. Would be great to have a choice there to use zstd with a configurable compression level.

I haven't tried zstd, but I was under the impression that this could be used for any compression filter available in dom0.

Haven't tried it as well but in any case the restoring operation currently doesn't support zstd (even if the header indicates zstd compression) since it is not listed in KNOWN_COMPRESSION_FILTERS here: https://github.com/QubesOS/qubes-core-admin-client/blob/ba9b24db90c1b09826b6fcff61f98941565a2824/qubesadmin/backup/restore.py#L68

@andrewdavidwong
Copy link
Member

andrewdavidwong commented May 20, 2023

Many users are using the GUI for doing backups and there is no choice at all for the compression algorithm there, you can only enable/disable gzip compression in the GUI. Would be great to have a choice there to use zstd with a configurable compression level.

That should be a separate feature request, since it would presumably allow for specifying any supported compression filter (and perhaps a compression level for that compression filter, if applicable), not just zstd. I thought we already had a separate issue for this, but I wasn't able to find one just now. Please feel free to open one, if you still wish to.

(Found a somewhat-related issue while searching: #3865)

Haven't tried it as well but in any case the restoring operation currently doesn't support zstd (even if the header indicates zstd compression) since it is not listed in KNOWN_COMPRESSION_FILTERS here: https://github.com/QubesOS/qubes-core-admin-client/blob/ba9b24db90c1b09826b6fcff61f98941565a2824/qubesadmin/backup/restore.py#L68

Ah, I see. Thank you for pointing that out.

@marmarek
Copy link
Member

Haven't tried it as well but in any case the restoring operation currently doesn't support zstd (even if the header indicates zstd compression) since it is not listed in KNOWN_COMPRESSION_FILTERS here: https://github.com/QubesOS/qubes-core-admin-client/blob/ba9b24db90c1b09826b6fcff61f98941565a2824/qubesadmin/backup/restore.py#L68

That's only partially true. zstd will not be automatically accepted, but it will work if you use --compress-filter zstd during restore too.

@andrewdavidwong andrewdavidwong removed this from the Release TBD milestone Aug 13, 2023
@DemiMarie DemiMarie removed their assignment Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: core P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality.
Projects
None yet
Development

No branches or pull requests

4 participants