-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++][Parquet] Getting only 0 when reading DELTA_BINARY_PACKED #15052
Comments
Interesting! Are you building from source? |
Hi, Anieway, I submit a fixing for encoder before: #14959 And I cannot reproduce this bug now, my code is running on master, and my running method is: index 09af32289..19a96dc40 100644
--- a/cpp/examples/parquet/low_level_api/reader_writer.cc
+++ b/cpp/examples/parquet/low_level_api/reader_writer.cc
@@ -64,6 +64,8 @@ int main(int argc, char** argv) {
// Add writer properties
parquet::WriterProperties::Builder builder;
builder.compression(parquet::Compression::SNAPPY);
+ builder.disable_dictionary();
+ builder.encoding("int32_field", parquet::Encoding::DELTA_BINARY_PACKED);
+ builder.encoding("int64_field", parquet::Encoding::DELTA_BINARY_PACKED);
std::shared_ptr<parquet::WriterProperties> props = builder.build(); Can you pull the lastest code to check if the problem still here? If still a bug, can you provide the minimal code to reproduce this bug? |
I was able to reproduce. I believe the problem is arrow/cpp/src/parquet/encoding.cc Line 2482 in 463cdcc
when requesting a single value at a time, |
Seems that my previous trying is failed because I use |
… only one value (#15124) This patch trying to fix #15052 . The problem is mentioned here: #15052 (comment) When read 1 value, DeltaBitPackDecoder will not call `InitBlock`, causing it always read `last_value_`. Seems the problem is introduced in #10627 and amol-@d982bed I will add some test tonight * Closes: #15052 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: mwish <1506118561@qq.com> Co-authored-by: Rok Mihevc <rok@mihevc.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…eading only one value (apache#15124) This patch trying to fix apache#15052 . The problem is mentioned here: apache#15052 (comment) When read 1 value, DeltaBitPackDecoder will not call `InitBlock`, causing it always read `last_value_`. Seems the problem is introduced in apache#10627 and amol-@d982bed I will add some test tonight * Closes: apache#15052 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: mwish <1506118561@qq.com> Co-authored-by: Rok Mihevc <rok@mihevc.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…eading only one value (apache#15124) This patch trying to fix apache#15052 . The problem is mentioned here: apache#15052 (comment) When read 1 value, DeltaBitPackDecoder will not call `InitBlock`, causing it always read `last_value_`. Seems the problem is introduced in apache#10627 and amol-@d982bed I will add some test tonight * Closes: apache#15052 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: mwish <1506118561@qq.com> Co-authored-by: Rok Mihevc <rok@mihevc.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…eading only one value (apache#15124) This patch trying to fix apache#15052 . The problem is mentioned here: apache#15052 (comment) When read 1 value, DeltaBitPackDecoder will not call `InitBlock`, causing it always read `last_value_`. Seems the problem is introduced in apache#10627 and amol-@d982bed I will add some test tonight * Closes: apache#15052 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: mwish <1506118561@qq.com> Co-authored-by: Rok Mihevc <rok@mihevc.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Describe the bug, including details regarding any error messages, version, and platform.
I have encountered a possible bug where when reading values one at a time with the low level API of parquet reader
(as done similarly in the example: /cpp/examples/parquet/low_level_api/reader_writer.cc)
in a row with DELTA_BINARY_PACKED encoding the results are all 0 regardless of file content.
aka this
rows_read = int64_reader->ReadBatch(1, &definition_level, &repetition_level, &value, &values_read);
gives a zero valuebut this
rows_read = int64_reader->ReadBatch(2, nullptr, nullptr, values, &values_read);
gives the correct value.So this seems not to occur when reading bigger batch-sizes (>1).
The problem might be somewhere within
DeltaBitPackDecoder<DType>::GetInternal
Component(s)
Parquet
The text was updated successfully, but these errors were encountered: