Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add zstd compression #3411

Merged
merged 5 commits into from
Dec 15, 2017
Merged

add zstd compression #3411

merged 5 commits into from
Dec 15, 2017

Conversation

ThomasWaldmann
Copy link
Member

@ThomasWaldmann ThomasWaldmann commented Dec 2, 2017

based on willyvmm's work in PR #3116, but some changes:

  • removed all multithreading changes, multithreading will be addressed in borg later and differently.
  • tests
  • fix: minimum level is 1 (not 0)
  • default compression level = 3
  • only-if-needed bytes() conversion
  • other cosmetic fixes
  • removed python-zstandard external dependency
  • using built-in cython-based wrapper (like we do for lz4 also)
  • bundling zstd code into borg, so no new external dependency here either
  • if there is a system shared libzstd, it will build against this instead of the bundled zstd code

# This is a NOT THREAD SAFE implementation.
# Only ONE python context must to be created at a time.
# It should work flawlessly as long as borg will call ONLY ONE compression job at time.
ID = b'\x10\x00'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taken "as is" from @willyvmm's PR.

next free one would be \x03\x00, so maybe we rather use that.

dctx = zstd.ZstdDecompressor()
data = super().decompress(data)
try:
return dctx.decompress(bytes(data)) # not sure about taht typecast but it works.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and check whether that bytes() is needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see zstandard "NEWS" @ 0.9.0.

@@ -346,6 +382,16 @@ class CompressionSpec:
else:
raise ValueError
self.level = level
elif self.name in ('zstd', ):
if count < 2:
level = 5 # default compression level for zstd
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taken "as is", but it seems like the default level in python-zstandard is rather 3.

so, take 3 or use a different level better for typical borg usecase?

params_list += [
dict(name='zstd', level=1),
dict(name='zstd', level=5),
# also avoiding high zstd levels, memory needs unclear
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

research this.

@ThomasWaldmann
Copy link
Member Author

The amount and kind of changes in this PR seem relatively safe, so maybe we can consider this even for 1.1-maint (discuss it with package maintainers first).

But even if some dist does not / can not have python-zstandard for some reason, this is on the same level somehow as not having lzma in python. In both cases, borg will just say it's missing, but everything else will work.

@ThomasWaldmann
Copy link
Member Author

https://github.com/indygreg/python-zstandard/blob/master/NEWS.rst some changes ahead in 0.9.

Maybe we should just vendor zstandard (>= 0.9 when it is ready) as we can't expect that to be in dists (except some rolling release dists).

@enkore
Copy link
Contributor

enkore commented Dec 2, 2017

What's the rationale for using (thus, requiring) a Python wrapper if you're calling it from C[ython] anyway?

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Dec 2, 2017

"Getting it working quickly" :-D

I am aware that we interfaced to the other stuff more directly from cython.
We can also do it for this, but it'll require some more work.

OTOH, we could get rid of the python wrapper requirement and its not-yet-fully-stable api.

@ThomasWaldmann
Copy link
Member Author

Hmm, in fact it is only more direct for lz4. ZLIB and LZMA call the python wrapper in python stdlib.

@codecov-io
Copy link

codecov-io commented Dec 2, 2017

Codecov Report

Merging #3411 into master will increase coverage by 0.52%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3411      +/-   ##
==========================================
+ Coverage   86.05%   86.58%   +0.52%     
==========================================
  Files          36       36              
  Lines        9071     9346     +275     
  Branches     1501     1534      +33     
==========================================
+ Hits         7806     8092     +286     
+ Misses        856      849       -7     
+ Partials      409      405       -4
Impacted Files Coverage Δ
src/borg/helpers/checks.py 39.13% <ø> (ø) ⬆️
src/borg/helpers/parseformat.py 88.84% <0%> (+0.17%) ⬆️
src/borg/archiver.py 89.94% <0%> (+2.23%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 029d9ac...34b92ff. Read the comment docs.

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Dec 3, 2017

OK, did some fix ups.

I'ld work on the low-level stuff after discussing whether / how it can be adopted best into 1.1-maint and linux (and other) distributions.

Like @enkore pointed out, adapting to zstd C code via cython would remove the zstandard python wrapper requirement.

If zstd support can go into 1.1-maint, we would have to vendor (bundle) zstd code as we can not expect currently stable linux (or other) distributions to have it / have it as a recent enough library. Also the package maintainer would not need to change the external package requirements.

Additionally (maybe later), we could offer building against a external zstd also.

@LocutusOfBorg
Copy link
Contributor

@ThomasWaldmann
Copy link
Member Author

@LocutusOfBorg: yes (current state of this PR), but this likely will change to the Cython based wrapper.

so zstd either comes from external libzstd or from bundled zstd code. the cython based wrapper will be part of borg, that's just a few lines of Cython.

the code changes are mostly separate from existing stuff, so little potential for regressions (i mean for stable dists which package 1.1, if any).

problem is a bit that zstd tends to be a bit outdated in released dists, but I'ld rather want something recent, like 1.3.2. so guess I'll need to bundle the zstd C code with borg and offer an option so it can optionally be built against external libzstd also, if the version requirement can be met (like it is already done for libb2 / blake2 code).

(updated from IRC discussion, to have stuff in 1 place)

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Dec 4, 2017

I pushed the code so it can use external libzstd and does not need python-zstandard any more.

No builtin zstd right now, so build will fail if there is no external AND recent libzstd(-dev).

@LocutusOfBorg
Copy link
Contributor

nice, thanks! less bindings and wrappers is nice

@ThomasWaldmann ThomasWaldmann added this to the 1.1.4 milestone Dec 5, 2017
@ThomasWaldmann ThomasWaldmann self-assigned this Dec 5, 2017
@ThomasWaldmann ThomasWaldmann changed the title add zstd compression WIP: add zstd compression Dec 5, 2017
@FabioPedretti
Copy link
Contributor

Just curious, why didn't you bundle 1.3.2 rather than 1.3.1?

@ThomasWaldmann
Copy link
Member Author

2 reasons:

  • python-zstandard (where i took the initial version of setup_zstd.py from) bundles 1.3.1 (and iirc the .c/.h files are others/more in 1.3.2)
  • ubuntu 17.10 has 1.3.1 in the distribution, so maybe it is good if we test it with that.

I'll look a bit more at the differences between 1.3.1 and 1.3.2 later, maybe I'll update the bundled stuff to 1.3.2.

@ThomasWaldmann
Copy link
Member Author

  • rebased on current master
  • added zstd license
  • collapsed some changesets

@enkore
Copy link
Contributor

enkore commented Dec 7, 2017

If zstd support can go into 1.1-maint, we would have to vendor (bundle) zstd code as we can not expect currently stable linux (or other) distributions to have it / have it as a recent enough library.

I can sort of see the point here, though it should be considered that older distros that won't have zstd would seem unlikely to adapt borg or upgrade their packages to 1.1 from 1.0. I would suggest bundling this quite large library only if really necessary and not on a "if-maybe" premise (I don't know how far you looked into this).

Though I would say that it should be avoided to add a new, mandatory dependency in 1.1.x. Maybe only bundle it there?

@ThomasWaldmann
Copy link
Member Author

@enkore I don't expect stable/released dists upgrading from 1.0 to 1.1, nor would I suggest/recommend that.

So this is only for stable/released dists which already include some 1.1.x version.

I agree that we shouldn't keep the bundled zstd in borg long-term (nor any other bigger stuff that can be easily provided by an external library), but currently 1.3.x is too new, so we should keep it for now (so developers and testers can run master code without stumbling over zstd).

When we are approaching 1.2 RC we should evaluate zstd 1.3.x shared lib availability and remove it when it is widely available (same for blake2).

@ThomasWaldmann
Copy link
Member Author

Some users will also build 1.1.x from source on a dist that does not have zstd-1.3.x yet.
So bundling it with 1.1 will just let them do that without the complication of having to first provide a locally/custom install of a recent libzstd.

@ThomasWaldmann ThomasWaldmann changed the title WIP: add zstd compression add zstd compression Dec 8, 2017
@ThomasWaldmann
Copy link
Member Author

I'll do some practical tests of the code now and will merge the stuff soon if the outcome is positive.

@ThomasWaldmann
Copy link
Member Author

Practical tests successful (pics, .tox, a VM). 2 small new changesets with other stuff I found.

If noone is holding me back, will collapse & merge this later today.

based on willyvmm's work in PR borgbackup#3116, but some changes:

- removed any mulithreading changes
- add zstandard in setup.py install_requires
- tests
- fix: minimum compression level is 1 (not 0)
- use 3 for the default compression level
- use ID 03 00 for zstd
- only convert to bytes if we don't have bytes yet
- move zstd code so that code blocks are ordered by ID
- other cosmetic fixes
currently requires an externally available libzstd >= 1.3.0,
no bundled zstd yet.
only .c and .h files + license
setup_zstd.py modified so it is just amending the Extension() kwargs,
but the Extension is initialized by the caller.

this way, amending can happend multiple times (e.g. for multiple
compression algorithms).

also:
- move include/library dirs processing for system-library case
- move system zstd prefix detection to setup_zstd module
- cosmetic: setup.py whitespace fixes
- prefer system zstd option, document zstd min. requirement
@ThomasWaldmann
Copy link
Member Author

collapsed & rebased.

@ThomasWaldmann ThomasWaldmann merged commit e203b87 into borgbackup:master Dec 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants