Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added zstandard compression support with mutithreading support. #3116

Closed
wants to merge 2 commits into from

Conversation

willyvmm
Copy link

@willyvmm willyvmm commented Oct 9, 2017

Added support for Zstandard - Fast real-time compression algorithm http://www.zstd.net
https://github.com/facebook/zstd

prerequisites:
python-zstandard library https://github.com/indygreg/python-zstandard
pip3 install zstandard

There is also python-zstd implementation. It will not work, as this offer only basic support

Features:
     - compression levels from 1 to 22 supported
        default value is 5

     - multithreaded compression support
        0: maximum cpu count
        -1..-99: maximum cpu count minus 1..99
        1..99: defined cpu usage
        default value is 1

 Usage: borg create -C zstd,level,threads ...

Warning: This is NOT THREAD SAFE IMPLEMENTATION
  It allow only ONE python context to be created.
  It should work flawless as long as borg will call ONLY ONE compression job at time.

--
I'am not too familiar with Python/git, so please review my code. Thanks.

@ThomasWaldmann
Copy link
Member

Thanks for the PR, but did you read our ticket / policy about adding compression algorithms?

@ThomasWaldmann
Copy link
Member

Hmm, seems like this can work as a optional compression, so it is not required to have the additional python / binary dependencies on every platform / distribution, so it might be compatible with the policy.

Copy link
Member

@ThomasWaldmann ThomasWaldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some nitpicks

also, please PR against master, see the development workflow docs.

@@ -22,6 +22,21 @@ try:
except ImportError:
lzma = None

import multiprocessing as mp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without "as mp" please. just search for "mp" to see why.

https://github.com/indygreg/python-zstandard
pip3 install zstandard
There is also python-zstd implementation. It will not work as this offer only basic support
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to go into setup.py as a optional requirement.

@@ -126,7 +141,8 @@ class LZ4(CompressorBase):
osize = LZ4_compressBound(isize)
buf = buffer.get(osize)
dest = <char *> buf
osize = LZ4_compress_limitedOutput(source, dest, isize, osize)
with nogil:
osize = LZ4_compress_limitedOutput(source, dest, isize, osize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you change this?

@@ -149,7 +165,8 @@ class LZ4(CompressorBase):
except MemoryError:
raise DecompressionError('MemoryError')
dest = <char *> buf
rsize = LZ4_decompress_safe(source, dest, isize, osize)
with nogil:
rsize = LZ4_decompress_safe(source, dest, isize, osize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you change this?

0: maximum cpu count
-1..-99: maximum cpu count minus 1..99
1..99: defined cpu usage
default value is 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll have to discuss about multithreading after the basic issues are fixed.


def compress(self, data):
cctx = zstd.ZstdCompressor(level=self.level, threads=self.threads, write_content_size=True)
data = cctx.compress(bytes(data)) # not sure about that typecast but it works.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep8: 2 blanks to the left of # and one to the right.

threads = int(values[2])

if threads == 0:
threads=maxcpu # use maximum avaliable cpu's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep8: blanks around operators, so it is " = ". also fix typos: "available" and "CPUs"

if threads == 0:
threads=maxcpu # use maximum avaliable cpu's
elif threads < 0:
threads = maxcpu + threads #threads is NEGATIVE ... remember !!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep8

threads=maxcpu # use maximum avaliable cpu's
elif threads < 0:
threads = maxcpu + threads #threads is NEGATIVE ... remember !!
if threads < 1: # too less cpu's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not enough CPUs

if threads < 1: # too less cpu's
raise ValueError
else:
if threads > maxcpu: # too many cpu's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPUs

@codecov-io
Copy link

codecov-io commented Oct 9, 2017

Codecov Report

Merging #3116 into 1.1-maint will increase coverage by 0.15%.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff              @@
##           1.1-maint    #3116      +/-   ##
=============================================
+ Coverage      85.86%   86.02%   +0.15%     
=============================================
  Files             23       23              
  Lines           8979     8979              
  Branches        1515     1515              
=============================================
+ Hits            7710     7724      +14     
+ Misses           857      848       -9     
+ Partials         412      407       -5
Impacted Files Coverage Δ
src/borg/archiver.py 87.17% <0%> (+0.11%) ⬆️
src/borg/archive.py 83.41% <0%> (+0.95%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aae2b97...34233dd. Read the comment docs.

@enkore
Copy link
Contributor

enkore commented Oct 11, 2017

Multithreaded compressors generally only make sense for larger chunks of data as the compression ratio tends to suffer and little speedup is attained for small chunks of data and multiple threads.

@willyvmm
Copy link
Author

Multithreaded compression is related to this ticket: #3115 but i can see it will not work as i imagined.

I did PR to 1.1 fork because i need it to work :)

Thanks for all suggestions, i will rewrite the code and take all suggestions, and will remove multithreading support.
I did some tests, and it seems that multiple threads was never spawned, zstd logic is using much bigger window and do not spawn more threads when it is working with standard size chunks.

I think this ticket can be closed for now.

Best Regards
Willy.

@ThomasWaldmann
Copy link
Member

@willyvmm thanks!

Next PR against master, we'll discuss then whether we could adopt/backport that to 1.1.x.

@enkore enkore closed this Oct 11, 2017
@FabioPedretti
Copy link
Contributor

@willyvmm any news? :)

@ThomasWaldmann
Copy link
Member

@willyvmm did you do more work on this?

now as zstd made it even into the linux kernel, I guess we also could take it. :)

it would be perfect if it could be an optional dependency (so borg can still work, even if there is no libzstd [or however it is called] is available).

@ThomasWaldmann
Copy link
Member

Hmm, I just started on this. Based on @willyvmm's stuff, addressing my own comments from above.

PR will come soon.

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this pull request Dec 2, 2017
based on willyvmm's work in PR borgbackup#3116, but some changes:

- removed any mulithreading changes
- add zstandard in setup.py install_requires
- tests
- fix: minimum level is 1 (not 0)
- other cosmetic fixes

multithreading in borg will be addressed in borg later and differently.
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this pull request Dec 7, 2017
based on willyvmm's work in PR borgbackup#3116, but some changes:

- removed any mulithreading changes
- add zstandard in setup.py install_requires
- tests
- fix: minimum compression level is 1 (not 0)
- use 3 for the default compression level
- use ID 03 00 for zstd
- only convert to bytes if we don't have bytes yet
- move zstd code so that code blocks are ordered by ID
- other cosmetic fixes
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this pull request Dec 14, 2017
based on willyvmm's work in PR borgbackup#3116, but some changes:

- removed any mulithreading changes
- add zstandard in setup.py install_requires
- tests
- fix: minimum compression level is 1 (not 0)
- use 3 for the default compression level
- use ID 03 00 for zstd
- only convert to bytes if we don't have bytes yet
- move zstd code so that code blocks are ordered by ID
- other cosmetic fixes
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this pull request Dec 15, 2017
based on willyvmm's work in PR borgbackup#3116, but some changes:

- removed any mulithreading changes
- add zstandard in setup.py install_requires
- tests
- fix: minimum compression level is 1 (not 0)
- use 3 for the default compression level
- use ID 03 00 for zstd
- only convert to bytes if we don't have bytes yet
- move zstd code so that code blocks are ordered by ID
- other cosmetic fixes

(cherry picked from commit 11b2311)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants