Support for parallel processing #40

kanryu · 2019-01-09T03:25:30Z

Do you plan to parallelize compression and decompression ?

Currently I am interested in decompression.

Decompress_template switches between UNCOMPRESSED and huffman code, but each will run on a separate thread. Each UNCOMPRESSED can be divided into further threads if certain conditions are satisfied.

In addition, c ++ 11 adds multithreading, so you do not need to consider the pthread problem.

ebiggers · 2019-01-09T03:34:42Z

DEFLATE (and zlib and gzip) streams aren't suitable for parallel decompression.

However, if you aren't locked into a data format that uses a single stream, you can easily parallelize at the application layer by dividing the data into chunks before compression, then compressing and/or decompressing the chunks in parallel. libdeflate already works fine for this; just make sure to allocate a separate libdeflate_compressor or libdeflate_decompressor for each concurrent thread.

Piezoid · 2019-05-20T15:55:11Z

First, thank you for libdeflate.

We used your code as a base for experimenting with parallel decompression and found a way to achieve just that: https://github.com/Piezoid/pugz/

It's not yet production ready and we removed lots of features (compression, multiarch: only linux/x86 with SSE3.1 is currently supported). This a rather contrived implementation, I think it should be kept a a specialized library. Notably only ASCII files are currently supported.

The asynchronous API is not yet stabilized. Any input of usage patterns would be appreciated.

kanryu · 2019-05-21T05:29:26Z

@Piezoid It is an interesting product. In the case of gzip, it is an understanding that it is a mechanism to perform parallel processing using the fact that one gz file contains multiple zlib chunks, is it actually like that?

What I questioned is the argument whether it can be accelerated by parallel processing of huffman decoding and lz decoding in a single zlib chunk, but parallelization of that (gzip) is worth it in itself is.

Piezoid · 2019-05-21T12:22:26Z

What you describe, if I'm not mistaken is similar to the bgzip file format. It use the fact that a gzip file can contain multiple gzip "parts" concatenated. This break the LZ77 dependency between two successive segments allow random access and parallelization. It's is quite ubiquitous for compressing bioinformatics text file formats. It is retroc-ompatbile with gzip tools but require recompression.

Pugz aim at decompressing vanilla gzip files, with a single header/part/footer. In a gzip stream, there is multiple deflate blocks, but they only reset the Huffman tables. The LZ77 sliding window is not reset and dependencies (what we call back-references) are carried from one block to the next.

Pugz solves this problem by doing a first pass that record the origins of back-references in the initial unknown sliding window. Then, after thread synchronization, the back-references are "translated" back to the correct characters using the end of the decompressed chunk coming from another thread.

kanryu · 2019-05-22T02:57:02Z

@Piezoid Is that applicable to a deflate block, such as a PNG image?

Piezoid · 2019-05-27T11:42:36Z

Yes, but we don't support binary data atm. It could be done in theory, but at higher overhead (memory bandwidth).
Unless you have few very large PNGs I'm not sure if this would bring performance gains.
You are welcome to open an issue on pugz repository if you want to discuss the matter further.

ebiggers · 2019-11-28T19:36:47Z

Closing since support for parallel processing is currently out of scope for libdeflate itself.

dangerousplay mentioned this issue Nov 21, 2019

Performance and Compression level support bzikarsky/gelf-rust#17

Open

ebiggers added enhancement wontfix labels Nov 28, 2019

ebiggers closed this as completed Nov 28, 2019

sisong mentioned this issue Jan 3, 2024

I added stream & multi-thread support for libdeflate #335

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for parallel processing #40

Support for parallel processing #40

kanryu commented Jan 9, 2019

ebiggers commented Jan 9, 2019

Piezoid commented May 20, 2019 •

edited

Loading

kanryu commented May 21, 2019

Piezoid commented May 21, 2019

kanryu commented May 22, 2019 •

edited

Loading

Piezoid commented May 27, 2019

ebiggers commented Nov 28, 2019

Support for parallel processing #40

Support for parallel processing #40

Comments

kanryu commented Jan 9, 2019

ebiggers commented Jan 9, 2019

Piezoid commented May 20, 2019 • edited Loading

kanryu commented May 21, 2019

Piezoid commented May 21, 2019

kanryu commented May 22, 2019 • edited Loading

Piezoid commented May 27, 2019

ebiggers commented Nov 28, 2019

Piezoid commented May 20, 2019 •

edited

Loading

kanryu commented May 22, 2019 •

edited

Loading