-
Notifications
You must be signed in to change notification settings - Fork 657
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
* Added support for LZ4_RAW compression. (#1604) * This adds the implementation of LZ4_RAW codec by using lz4 block compression algorithm. (#1604) * This commit uses https://stackoverflow.com/questions/25740471/lz4-library-decompressed-data-upper-bound-size-estimation formula to estime the size of the uncompressed size. As it said in thread this algorithm over-estimates the size, but it is probably the best we can get with the current decompress API. As the size of a arrow LZ4_RAW block is not prepended to the block. * Other option would be to take the C++ approach to bypass the API (https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/compression_lz4.cc#L343). This approach consists on relaying on the output_buffer capacity to guess the uncompress_size. This works as `serialized_reader.rs` already knows the uncompressed_size, as it reads it from the page header, and allocates the output_buffer with a capacity equal to the uncompress_size (https://github.com/marioloko/arrow-rs/blob/master/parquet/src/file/serialized_reader.rs#L417). I did not follow this approach because: 1. It is too hacky. 2. It will limit the use cases of the `decompress` API, as the caller will need to know to allocate the right uncompressed_size. 3. It is not compatible with the current set of tests. However, new test can be created. * Clippy * Add integration test Co-authored-by: Adrián Gallego Castellanos <kugoad@gmail.com> Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
- Loading branch information
1 parent
880c4d9
commit 4e1247e
Showing
3 changed files
with
99 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters