-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++][Parquet] Support RLE Encoder for Boolean type #15107
Comments
Hey, I found an problem when trying to implement RLE for boolean.
So, should I:
Any idea here? (I can write a Buffer all input version first) |
That would mean using upper bound buffer size for boolean and slicing off the unnecessary part after encoding? Sounds like a good idea. Is this what other implementations do? |
Maybe we're not able to guess a "upper bound buffer" on Encoder, I'd like to buffer values in The parquet-mr uses I didn't find other implementions, seems that maybe people likes PLAIN Encoding? In Rust, parquet2 is not hybrid, seems it just implement bit-packing when encoding. Arrow-rs just uses a |
@pitrou @sfc-gh-nthimmegowda mind take a look? All change is ok for me. Maybe I can use |
@mapleFU Take a look at what? Did you post a PR? |
@pitrou Sorry, I'd like to post a PR but meet a problem. I want to make clear it before start writing the code. It's mentioned at #15107 (comment) . And I wonder should I:
If I make something wrong, please tell me |
@mapleFU I'm sorry, but I'm afraid you'll need to do more research yourself and decide a reasonable solution. That said, buffering all values before encoding doesn't sound very optimal... |
…4526) Create RLE Encoder for Boolean. ### Rationale for this change Boolean current can only use Plain Encoder, here it support RLE. ### What changes are included in this PR? Create encoder ### Are these changes tested? Yes ### Are there any user-facing changes? Yes, user can use new kind of encoding. * Closes: #15107 Authored-by: mwish <maplewish117@gmail.com> Signed-off-by: Will Jones <willjones127@gmail.com>
…an (apache#34526) Create RLE Encoder for Boolean. ### Rationale for this change Boolean current can only use Plain Encoder, here it support RLE. ### What changes are included in this PR? Create encoder ### Are these changes tested? Yes ### Are there any user-facing changes? Yes, user can use new kind of encoding. * Closes: apache#15107 Authored-by: mwish <maplewish117@gmail.com> Signed-off-by: Will Jones <willjones127@gmail.com>
Describe the enhancement requested
Currently, in our spec, boolean type in data page can use RLE to encode itself. And parquet-mr, parquet-go and other versions all supports it.
Thanks #14147 , we already have a Rle Boolean Decoder. So for testing, just testing it local is ok.
I'd like to submit a implemention and testing this weekend.
Component(s)
C++, Parquet
The text was updated successfully, but these errors were encountered: