-
-
Notifications
You must be signed in to change notification settings - Fork 734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add zstd compression #3411
add zstd compression #3411
Conversation
src/borg/compress.pyx
Outdated
# This is a NOT THREAD SAFE implementation. | ||
# Only ONE python context must to be created at a time. | ||
# It should work flawlessly as long as borg will call ONLY ONE compression job at time. | ||
ID = b'\x10\x00' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken "as is" from @willyvmm's PR.
next free one would be \x03\x00, so maybe we rather use that.
src/borg/compress.pyx
Outdated
dctx = zstd.ZstdDecompressor() | ||
data = super().decompress(data) | ||
try: | ||
return dctx.decompress(bytes(data)) # not sure about taht typecast but it works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and check whether that bytes() is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see zstandard "NEWS" @ 0.9.0.
src/borg/compress.pyx
Outdated
@@ -346,6 +382,16 @@ class CompressionSpec: | |||
else: | |||
raise ValueError | |||
self.level = level | |||
elif self.name in ('zstd', ): | |||
if count < 2: | |||
level = 5 # default compression level for zstd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken "as is", but it seems like the default level in python-zstandard is rather 3.
so, take 3 or use a different level better for typical borg usecase?
src/borg/testsuite/compress.py
Outdated
params_list += [ | ||
dict(name='zstd', level=1), | ||
dict(name='zstd', level=5), | ||
# also avoiding high zstd levels, memory needs unclear |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
research this.
The amount and kind of changes in this PR seem relatively safe, so maybe we can consider this even for 1.1-maint (discuss it with package maintainers first). But even if some dist does not / can not have python-zstandard for some reason, this is on the same level somehow as not having lzma in python. In both cases, borg will just say it's missing, but everything else will work. |
https://github.com/indygreg/python-zstandard/blob/master/NEWS.rst some changes ahead in 0.9. Maybe we should just vendor zstandard (>= 0.9 when it is ready) as we can't expect that to be in dists (except some rolling release dists). |
What's the rationale for using (thus, requiring) a Python wrapper if you're calling it from C[ython] anyway? |
"Getting it working quickly" :-D I am aware that we interfaced to the other stuff more directly from cython. OTOH, we could get rid of the python wrapper requirement and its not-yet-fully-stable api. |
Hmm, in fact it is only more direct for lz4. ZLIB and LZMA call the python wrapper in python stdlib. |
Codecov Report
@@ Coverage Diff @@
## master #3411 +/- ##
==========================================
+ Coverage 86.05% 86.58% +0.52%
==========================================
Files 36 36
Lines 9071 9346 +275
Branches 1501 1534 +33
==========================================
+ Hits 7806 8092 +286
+ Misses 856 849 -7
+ Partials 409 405 -4
Continue to review full report at Codecov.
|
OK, did some fix ups. I'ld work on the low-level stuff after discussing whether / how it can be adopted best into 1.1-maint and linux (and other) distributions. Like @enkore pointed out, adapting to zstd C code via cython would remove the If zstd support can go into 1.1-maint, we would have to vendor (bundle) zstd code as we can not expect currently stable linux (or other) distributions to have it / have it as a recent enough library. Also the package maintainer would not need to change the external package requirements. Additionally (maybe later), we could offer building against a external zstd also. |
This? |
@LocutusOfBorg: yes (current state of this PR), but this likely will change to the Cython based wrapper. so zstd either comes from external libzstd or from bundled zstd code. the cython based wrapper will be part of borg, that's just a few lines of Cython. the code changes are mostly separate from existing stuff, so little potential for regressions (i mean for stable dists which package 1.1, if any). problem is a bit that zstd tends to be a bit outdated in released dists, but I'ld rather want something recent, like 1.3.2. so guess I'll need to bundle the zstd C code with borg and offer an option so it can optionally be built against external libzstd also, if the version requirement can be met (like it is already done for libb2 / blake2 code). (updated from IRC discussion, to have stuff in 1 place) |
I pushed the code so it can use external libzstd and does not need python-zstandard any more. No builtin zstd right now, so build will fail if there is no external AND recent libzstd(-dev). |
nice, thanks! less bindings and wrappers is nice |
Just curious, why didn't you bundle 1.3.2 rather than 1.3.1? |
2 reasons:
I'll look a bit more at the differences between 1.3.1 and 1.3.2 later, maybe I'll update the bundled stuff to 1.3.2. |
|
I can sort of see the point here, though it should be considered that older distros that won't have zstd would seem unlikely to adapt borg or upgrade their packages to 1.1 from 1.0. I would suggest bundling this quite large library only if really necessary and not on a "if-maybe" premise (I don't know how far you looked into this). Though I would say that it should be avoided to add a new, mandatory dependency in 1.1.x. Maybe only bundle it there? |
@enkore I don't expect stable/released dists upgrading from 1.0 to 1.1, nor would I suggest/recommend that. So this is only for stable/released dists which already include some 1.1.x version. I agree that we shouldn't keep the bundled zstd in borg long-term (nor any other bigger stuff that can be easily provided by an external library), but currently 1.3.x is too new, so we should keep it for now (so developers and testers can run master code without stumbling over zstd). When we are approaching 1.2 RC we should evaluate zstd 1.3.x shared lib availability and remove it when it is widely available (same for blake2). |
Some users will also build 1.1.x from source on a dist that does not have zstd-1.3.x yet. |
I'll do some practical tests of the code now and will merge the stuff soon if the outcome is positive. |
Practical tests successful (pics, .tox, a VM). 2 small new changesets with other stuff I found. If noone is holding me back, will collapse & merge this later today. |
based on willyvmm's work in PR borgbackup#3116, but some changes: - removed any mulithreading changes - add zstandard in setup.py install_requires - tests - fix: minimum compression level is 1 (not 0) - use 3 for the default compression level - use ID 03 00 for zstd - only convert to bytes if we don't have bytes yet - move zstd code so that code blocks are ordered by ID - other cosmetic fixes
currently requires an externally available libzstd >= 1.3.0, no bundled zstd yet.
only .c and .h files + license
setup_zstd.py modified so it is just amending the Extension() kwargs, but the Extension is initialized by the caller. this way, amending can happend multiple times (e.g. for multiple compression algorithms). also: - move include/library dirs processing for system-library case - move system zstd prefix detection to setup_zstd module - cosmetic: setup.py whitespace fixes - prefer system zstd option, document zstd min. requirement
collapsed & rebased. |
based on willyvmm's work in PR #3116, but some changes: