Added zstandard compression support with mutithreading support. #3116

willyvmm · 2017-10-09T20:11:45Z

Added support for Zstandard - Fast real-time compression algorithm http://www.zstd.net
https://github.com/facebook/zstd

prerequisites:
python-zstandard library https://github.com/indygreg/python-zstandard
pip3 install zstandard

There is also python-zstd implementation. It will not work, as this offer only basic support

Features:
     - compression levels from 1 to 22 supported
        default value is 5

     - multithreaded compression support
        0: maximum cpu count
        -1..-99: maximum cpu count minus 1..99
        1..99: defined cpu usage
        default value is 1

 Usage: borg create -C zstd,level,threads ...

Warning: This is NOT THREAD SAFE IMPLEMENTATION
  It allow only ONE python context to be created.
  It should work flawless as long as borg will call ONLY ONE compression job at time.

--
I'am not too familiar with Python/git, so please review my code. Thanks.

ThomasWaldmann · 2017-10-09T20:17:56Z

Thanks for the PR, but did you read our ticket / policy about adding compression algorithms?

ThomasWaldmann · 2017-10-09T20:24:55Z

Hmm, seems like this can work as a optional compression, so it is not required to have the additional python / binary dependencies on every platform / distribution, so it might be compatible with the policy.

ThomasWaldmann

some nitpicks

also, please PR against master, see the development workflow docs.

ThomasWaldmann · 2017-10-09T20:25:42Z

src/borg/compress.pyx

@@ -22,6 +22,21 @@ try:
 except ImportError:
    lzma = None

+import multiprocessing as mp


without "as mp" please. just search for "mp" to see why.

ThomasWaldmann · 2017-10-09T20:26:42Z

src/borg/compress.pyx

+https://github.com/indygreg/python-zstandard
+pip3 install zstandard
+There is also python-zstd implementation. It will not work as this offer only basic support
+"""


this needs to go into setup.py as a optional requirement.

ThomasWaldmann · 2017-10-09T20:27:18Z

src/borg/compress.pyx

@@ -126,7 +141,8 @@ class LZ4(CompressorBase):
        osize = LZ4_compressBound(isize)
        buf = buffer.get(osize)
        dest = <char *> buf
-        osize = LZ4_compress_limitedOutput(source, dest, isize, osize)
+        with nogil:
+            osize = LZ4_compress_limitedOutput(source, dest, isize, osize)


why did you change this?

ThomasWaldmann · 2017-10-09T20:27:28Z

src/borg/compress.pyx

@@ -149,7 +165,8 @@ class LZ4(CompressorBase):
            except MemoryError:
                raise DecompressionError('MemoryError')
            dest = <char *> buf
-            rsize = LZ4_decompress_safe(source, dest, isize, osize)
+            with nogil:
+                rsize = LZ4_decompress_safe(source, dest, isize, osize)


why did you change this?

ThomasWaldmann · 2017-10-09T20:28:40Z

src/borg/compress.pyx

+            0: maximum cpu count
+            -1..-99: maximum cpu count minus 1..99
+            1..99: defined cpu usage
+            default value is 1


we'll have to discuss about multithreading after the basic issues are fixed.

ThomasWaldmann · 2017-10-09T20:30:13Z

src/borg/compress.pyx

+
+    def compress(self, data):
+        cctx = zstd.ZstdCompressor(level=self.level, threads=self.threads, write_content_size=True)
+        data = cctx.compress(bytes(data)) # not sure about that typecast but it works.


pep8: 2 blanks to the left of # and one to the right.

ThomasWaldmann · 2017-10-09T20:32:11Z

src/borg/compress.pyx

+                threads = int(values[2])
+
+                if threads == 0:
+                    threads=maxcpu # use maximum avaliable cpu's


pep8: blanks around operators, so it is " = ". also fix typos: "available" and "CPUs"

ThomasWaldmann · 2017-10-09T20:32:28Z

src/borg/compress.pyx

+                if threads == 0:
+                    threads=maxcpu # use maximum avaliable cpu's
+                elif threads < 0:
+                    threads = maxcpu + threads #threads is NEGATIVE ... remember !!


ThomasWaldmann · 2017-10-09T20:33:03Z

src/borg/compress.pyx

+                    threads=maxcpu # use maximum avaliable cpu's
+                elif threads < 0:
+                    threads = maxcpu + threads #threads is NEGATIVE ... remember !!
+                    if threads < 1: # too less cpu's


not enough CPUs

ThomasWaldmann · 2017-10-09T20:33:32Z

src/borg/compress.pyx

+                    if threads < 1: # too less cpu's
+                        raise ValueError
+                else:
+                    if threads > maxcpu: # too many cpu's


codecov-io · 2017-10-09T21:21:34Z

Codecov Report

Merging #3116 into 1.1-maint will increase coverage by 0.15%.
The diff coverage is n/a.

@@              Coverage Diff              @@
##           1.1-maint    #3116      +/-   ##
=============================================
+ Coverage      85.86%   86.02%   +0.15%     
=============================================
  Files             23       23              
  Lines           8979     8979              
  Branches        1515     1515              
=============================================
+ Hits            7710     7724      +14     
+ Misses           857      848       -9     
+ Partials         412      407       -5

Impacted Files	Coverage Δ
src/borg/archiver.py	`87.17% <0%> (+0.11%)`	⬆️
src/borg/archive.py	`83.41% <0%> (+0.95%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aae2b97...34233dd. Read the comment docs.

enkore · 2017-10-11T01:11:29Z

Multithreaded compressors generally only make sense for larger chunks of data as the compression ratio tends to suffer and little speedup is attained for small chunks of data and multiple threads.

willyvmm · 2017-10-11T15:37:09Z

Multithreaded compression is related to this ticket: #3115 but i can see it will not work as i imagined.

I did PR to 1.1 fork because i need it to work :)

Thanks for all suggestions, i will rewrite the code and take all suggestions, and will remove multithreading support.
I did some tests, and it seems that multiple threads was never spawned, zstd logic is using much bigger window and do not spawn more threads when it is working with standard size chunks.

I think this ticket can be closed for now.

Best Regards
Willy.

ThomasWaldmann · 2017-10-11T16:22:49Z

@willyvmm thanks!

Next PR against master, we'll discuss then whether we could adopt/backport that to 1.1.x.

FabioPedretti · 2017-10-20T20:22:24Z

@willyvmm any news? :)

ThomasWaldmann · 2017-12-02T19:05:41Z

@willyvmm did you do more work on this?

now as zstd made it even into the linux kernel, I guess we also could take it. :)

it would be perfect if it could be an optional dependency (so borg can still work, even if there is no libzstd [or however it is called] is available).

ThomasWaldmann · 2017-12-02T20:20:36Z

Hmm, I just started on this. Based on @willyvmm's stuff, addressing my own comments from above.

PR will come soon.

based on willyvmm's work in PR borgbackup#3116, but some changes: - removed any mulithreading changes - add zstandard in setup.py install_requires - tests - fix: minimum level is 1 (not 0) - other cosmetic fixes multithreading in borg will be addressed in borg later and differently.

based on willyvmm's work in PR borgbackup#3116, but some changes: - removed any mulithreading changes - add zstandard in setup.py install_requires - tests - fix: minimum compression level is 1 (not 0) - use 3 for the default compression level - use ID 03 00 for zstd - only convert to bytes if we don't have bytes yet - move zstd code so that code blocks are ordered by ID - other cosmetic fixes

based on willyvmm's work in PR borgbackup#3116, but some changes: - removed any mulithreading changes - add zstandard in setup.py install_requires - tests - fix: minimum compression level is 1 (not 0) - use 3 for the default compression level - use ID 03 00 for zstd - only convert to bytes if we don't have bytes yet - move zstd code so that code blocks are ordered by ID - other cosmetic fixes (cherry picked from commit 11b2311)

willyvmm added 2 commits October 9, 2017 21:34

Added zstd compression support with mutithreading support.

c125ce0

some small cleanups

34233dd

ThomasWaldmann reviewed Oct 9, 2017

View reviewed changes

willyvmm mentioned this pull request Oct 11, 2017

can't use --chunker-params to achieve bigger chunks #3115

Closed

enkore closed this Oct 11, 2017

ThomasWaldmann mentioned this pull request Dec 2, 2017

interesting compression algorithms + policy #1633

Open

ThomasWaldmann mentioned this pull request Dec 2, 2017

add zstd compression #3411

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added zstandard compression support with mutithreading support. #3116

Added zstandard compression support with mutithreading support. #3116

willyvmm commented Oct 9, 2017

ThomasWaldmann commented Oct 9, 2017

ThomasWaldmann commented Oct 9, 2017

ThomasWaldmann left a comment

ThomasWaldmann Oct 9, 2017

ThomasWaldmann Oct 9, 2017

ThomasWaldmann Oct 9, 2017

ThomasWaldmann Oct 9, 2017

ThomasWaldmann Oct 9, 2017

ThomasWaldmann Oct 9, 2017

ThomasWaldmann Oct 9, 2017

ThomasWaldmann Oct 9, 2017

ThomasWaldmann Oct 9, 2017

ThomasWaldmann Oct 9, 2017

codecov-io commented Oct 9, 2017 •

edited

Loading

enkore commented Oct 11, 2017

willyvmm commented Oct 11, 2017

ThomasWaldmann commented Oct 11, 2017

FabioPedretti commented Oct 20, 2017

ThomasWaldmann commented Dec 2, 2017

ThomasWaldmann commented Dec 2, 2017

Added zstandard compression support with mutithreading support. #3116

Added zstandard compression support with mutithreading support. #3116

Conversation

willyvmm commented Oct 9, 2017

ThomasWaldmann commented Oct 9, 2017

ThomasWaldmann commented Oct 9, 2017

ThomasWaldmann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Oct 9, 2017 • edited Loading

Codecov Report

enkore commented Oct 11, 2017

willyvmm commented Oct 11, 2017

ThomasWaldmann commented Oct 11, 2017

FabioPedretti commented Oct 20, 2017

ThomasWaldmann commented Dec 2, 2017

ThomasWaldmann commented Dec 2, 2017

codecov-io commented Oct 9, 2017 •

edited

Loading