Skip to content
This repository was archived by the owner on May 10, 2024. It is now read-only.

PARQUET-515: Add "SetData" to LevelDecoder#51

Closed
majetideepak wants to merge 3 commits intoapache:masterfrom
majetideepak:PARQUET-515
Closed

PARQUET-515: Add "SetData" to LevelDecoder#51
majetideepak wants to merge 3 commits intoapache:masterfrom
majetideepak:PARQUET-515

Conversation

@majetideepak
Copy link

This PR implements a SetData interface for the LevelDecoder class similar to existing value decoders.
This PR also adds a test for PARQUET-523

@majetideepak
Copy link
Author

@wesm feedback please :)

bit_offset_ = 0;
int num_bytes = std::min(8, max_bytes_ - byte_offset_);
memcpy(&buffered_values_, buffer_ + byte_offset_, num_bytes);
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wesm This is most likely a bug in the Impala code as well since we pulled this code from there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I agree that it looks like a bug. I will look at reporting on the Impala JIRA. I looked through the RLE encoding code and the reason that this bug was never hit in in real code was that other parts of the RleDecoder code path (in BitReader::GetAligned) trigger the buffered_values_ value to get data copied into it.

Separately, I looked at parquet-format more carefully and it appears that BIT_PACKED uses MSB ordering while the RLE encoding uses LSB bit-packing for its bit-packed literal runs. This means we would be unable to correctly read files generated using the correct MSB-bit-packing.

But since BIT_PACKED is deprecated this all suggests that it never saw much production use. Let's leave this as is for now and we can investigate further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, let's investigate this later.

@wesm
Copy link
Member

wesm commented Feb 14, 2016

Does this conflict with #49?

@majetideepak
Copy link
Author

Yes, it will conflict in the levels part. I will resolve the conflicts. My plan is to get your feedback meanwhile and keep it ready.

// BIT_PACKED requires a sequence of atleast 8
if (encoding == parquet::Encoding::BIT_PACKED) min_repeat_factor = 3;

int num_levels_per_width = ((2 << max_repeat_factor) - (1 << min_repeat_factor));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this slightly esoteric variable altogether and the num_levels variable too? Instead, use push_back to append generated levels to input_levels

@majetideepak majetideepak changed the title PARQUET-515: Add "Reset" to LevelDecoder PARQUET-515: Add "SetData" to LevelDecoder Feb 15, 2016
@wesm
Copy link
Member

wesm commented Feb 15, 2016

Let me know when this is rebased on #49 -- up to you whether you wait til that lands in master.

@majetideepak
Copy link
Author

@wesm I am merging my commits to simply the merge

@majetideepak
Copy link
Author

@julienledem this is good to go. Thank you!

#include "parquet/exception.h"
#include "parquet/types.h"
#include "parquet/util/rle-encoding.h"
#include <algorithm>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this before. System includes go before library includes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@wesm
Copy link
Member

wesm commented Feb 16, 2016

+1 (outside the minor include ordering issue)

@majetideepak
Copy link
Author

the include order is fixed

@asfgit asfgit closed this in 6df3836 Feb 16, 2016
@majetideepak majetideepak deleted the PARQUET-515 branch February 22, 2016 20:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants