PARQUET-515: Add "SetData" to LevelDecoder#51
PARQUET-515: Add "SetData" to LevelDecoder#51majetideepak wants to merge 3 commits intoapache:masterfrom
Conversation
|
@wesm feedback please :) |
| bit_offset_ = 0; | ||
| int num_bytes = std::min(8, max_bytes_ - byte_offset_); | ||
| memcpy(&buffered_values_, buffer_ + byte_offset_, num_bytes); | ||
| } |
There was a problem hiding this comment.
@wesm This is most likely a bug in the Impala code as well since we pulled this code from there.
There was a problem hiding this comment.
Good catch, I agree that it looks like a bug. I will look at reporting on the Impala JIRA. I looked through the RLE encoding code and the reason that this bug was never hit in in real code was that other parts of the RleDecoder code path (in BitReader::GetAligned) trigger the buffered_values_ value to get data copied into it.
Separately, I looked at parquet-format more carefully and it appears that BIT_PACKED uses MSB ordering while the RLE encoding uses LSB bit-packing for its bit-packed literal runs. This means we would be unable to correctly read files generated using the correct MSB-bit-packing.
But since BIT_PACKED is deprecated this all suggests that it never saw much production use. Let's leave this as is for now and we can investigate further.
There was a problem hiding this comment.
reported this in https://issues.cloudera.org/browse/IMPALA-3000.
There was a problem hiding this comment.
Yea, let's investigate this later.
|
Does this conflict with #49? |
|
Yes, it will conflict in the levels part. I will resolve the conflicts. My plan is to get your feedback meanwhile and keep it ready. |
src/parquet/column/levels-test.cc
Outdated
| // BIT_PACKED requires a sequence of atleast 8 | ||
| if (encoding == parquet::Encoding::BIT_PACKED) min_repeat_factor = 3; | ||
|
|
||
| int num_levels_per_width = ((2 << max_repeat_factor) - (1 << min_repeat_factor)); |
There was a problem hiding this comment.
Can we remove this slightly esoteric variable altogether and the num_levels variable too? Instead, use push_back to append generated levels to input_levels
|
Let me know when this is rebased on #49 -- up to you whether you wait til that lands in master. |
|
@wesm I am merging my commits to simply the merge |
bb3b4a0 to
4f82b67
Compare
4f82b67 to
c26db08
Compare
|
@julienledem this is good to go. Thank you! |
src/parquet/column/levels.h
Outdated
| #include "parquet/exception.h" | ||
| #include "parquet/types.h" | ||
| #include "parquet/util/rle-encoding.h" | ||
| #include <algorithm> |
There was a problem hiding this comment.
Missed this before. System includes go before library includes
|
+1 (outside the minor include ordering issue) |
|
the include order is fixed |
This PR implements a SetData interface for the LevelDecoder class similar to existing value decoders.
This PR also adds a test for PARQUET-523