-
-
Notifications
You must be signed in to change notification settings - Fork 734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added zstandard compression support with mutithreading support. #3116
Conversation
Thanks for the PR, but did you read our ticket / policy about adding compression algorithms? |
Hmm, seems like this can work as a optional compression, so it is not required to have the additional python / binary dependencies on every platform / distribution, so it might be compatible with the policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some nitpicks
also, please PR against master, see the development workflow docs.
@@ -22,6 +22,21 @@ try: | |||
except ImportError: | |||
lzma = None | |||
|
|||
import multiprocessing as mp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without "as mp" please. just search for "mp" to see why.
https://github.com/indygreg/python-zstandard | ||
pip3 install zstandard | ||
There is also python-zstd implementation. It will not work as this offer only basic support | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to go into setup.py as a optional requirement.
@@ -126,7 +141,8 @@ class LZ4(CompressorBase): | |||
osize = LZ4_compressBound(isize) | |||
buf = buffer.get(osize) | |||
dest = <char *> buf | |||
osize = LZ4_compress_limitedOutput(source, dest, isize, osize) | |||
with nogil: | |||
osize = LZ4_compress_limitedOutput(source, dest, isize, osize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you change this?
@@ -149,7 +165,8 @@ class LZ4(CompressorBase): | |||
except MemoryError: | |||
raise DecompressionError('MemoryError') | |||
dest = <char *> buf | |||
rsize = LZ4_decompress_safe(source, dest, isize, osize) | |||
with nogil: | |||
rsize = LZ4_decompress_safe(source, dest, isize, osize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you change this?
0: maximum cpu count | ||
-1..-99: maximum cpu count minus 1..99 | ||
1..99: defined cpu usage | ||
default value is 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll have to discuss about multithreading after the basic issues are fixed.
|
||
def compress(self, data): | ||
cctx = zstd.ZstdCompressor(level=self.level, threads=self.threads, write_content_size=True) | ||
data = cctx.compress(bytes(data)) # not sure about that typecast but it works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pep8: 2 blanks to the left of # and one to the right.
threads = int(values[2]) | ||
|
||
if threads == 0: | ||
threads=maxcpu # use maximum avaliable cpu's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pep8: blanks around operators, so it is " = ". also fix typos: "available" and "CPUs"
if threads == 0: | ||
threads=maxcpu # use maximum avaliable cpu's | ||
elif threads < 0: | ||
threads = maxcpu + threads #threads is NEGATIVE ... remember !! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pep8
threads=maxcpu # use maximum avaliable cpu's | ||
elif threads < 0: | ||
threads = maxcpu + threads #threads is NEGATIVE ... remember !! | ||
if threads < 1: # too less cpu's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not enough CPUs
if threads < 1: # too less cpu's | ||
raise ValueError | ||
else: | ||
if threads > maxcpu: # too many cpu's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPUs
Codecov Report
@@ Coverage Diff @@
## 1.1-maint #3116 +/- ##
=============================================
+ Coverage 85.86% 86.02% +0.15%
=============================================
Files 23 23
Lines 8979 8979
Branches 1515 1515
=============================================
+ Hits 7710 7724 +14
+ Misses 857 848 -9
+ Partials 412 407 -5
Continue to review full report at Codecov.
|
Multithreaded compressors generally only make sense for larger chunks of data as the compression ratio tends to suffer and little speedup is attained for small chunks of data and multiple threads. |
Multithreaded compression is related to this ticket: #3115 but i can see it will not work as i imagined. I did PR to 1.1 fork because i need it to work :) Thanks for all suggestions, i will rewrite the code and take all suggestions, and will remove multithreading support. I think this ticket can be closed for now. Best Regards |
@willyvmm thanks! Next PR against master, we'll discuss then whether we could adopt/backport that to 1.1.x. |
@willyvmm any news? :) |
@willyvmm did you do more work on this? now as zstd made it even into the linux kernel, I guess we also could take it. :) it would be perfect if it could be an optional dependency (so borg can still work, even if there is no libzstd [or however it is called] is available). |
Hmm, I just started on this. Based on @willyvmm's stuff, addressing my own comments from above. PR will come soon. |
based on willyvmm's work in PR borgbackup#3116, but some changes: - removed any mulithreading changes - add zstandard in setup.py install_requires - tests - fix: minimum level is 1 (not 0) - other cosmetic fixes multithreading in borg will be addressed in borg later and differently.
based on willyvmm's work in PR borgbackup#3116, but some changes: - removed any mulithreading changes - add zstandard in setup.py install_requires - tests - fix: minimum compression level is 1 (not 0) - use 3 for the default compression level - use ID 03 00 for zstd - only convert to bytes if we don't have bytes yet - move zstd code so that code blocks are ordered by ID - other cosmetic fixes
based on willyvmm's work in PR borgbackup#3116, but some changes: - removed any mulithreading changes - add zstandard in setup.py install_requires - tests - fix: minimum compression level is 1 (not 0) - use 3 for the default compression level - use ID 03 00 for zstd - only convert to bytes if we don't have bytes yet - move zstd code so that code blocks are ordered by ID - other cosmetic fixes
based on willyvmm's work in PR borgbackup#3116, but some changes: - removed any mulithreading changes - add zstandard in setup.py install_requires - tests - fix: minimum compression level is 1 (not 0) - use 3 for the default compression level - use ID 03 00 for zstd - only convert to bytes if we don't have bytes yet - move zstd code so that code blocks are ordered by ID - other cosmetic fixes (cherry picked from commit 11b2311)
Added support for Zstandard - Fast real-time compression algorithm http://www.zstd.net
https://github.com/facebook/zstd
prerequisites:
python-zstandard library https://github.com/indygreg/python-zstandard
pip3 install zstandard
There is also python-zstd implementation. It will not work, as this offer only basic support
--
I'am not too familiar with Python/git, so please review my code. Thanks.