-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-17515: [C++][Archery] JSON integration testing with RLE #14179
Conversation
the scalar version visits all types that can exist as a Scalar. Currently this true for all types we have. This will change once we add run-length encoding, which is an array encoding.
- use int32 - calculate physical offset based on buffer size, instead of incorrectly using the physical size
works in archery round trip so far
Status Visit(const RunLengthEncodedArray& array) { | ||
--max_recursion_depth_; | ||
RETURN_NOT_OK(VisitArray(*array.run_ends_array())); | ||
RETURN_NOT_OK(VisitArray(*array.values_array())); | ||
++max_recursion_depth_; | ||
return Status::OK(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we handling the logical offset with the IPC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, i see the usage of logical_run_ends
and logical_values
, but shouldn't this be using those instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you're right
run_ends = self.run_ends_field.generate_column(size) | ||
if name is None: | ||
name = self.name | ||
return RunLengthEncodedColumn(name, size, run_ends, values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the size
here be larger than the size
used in generate_column
since it would be the "logical" length for the parent?
#include "arrow/array/array_base.h" // IWYU pragma: keep | ||
#include "arrow/array/array_binary.h" // IWYU pragma: keep | ||
#include "arrow/array/array_decimal.h" // IWYU pragma: keep | ||
#include "arrow/array/array_dict.h" // IWYU pragma: keep | ||
#include "arrow/array/array_encoded.h" // IWYU pragma: keep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe array_rle
or array_ree
would be a better name for RunEndEncoded arrays
DictionaryArrays are also a form of encoding so calling this array_encoded
seems a bit vague
…che#34550) * Closes apache#14340 * Closes apache#32773 * Closes apache#14179 * Closes: apache#32338 Lead-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Co-authored-by: Tobias Zagorni <tobias@zagorni.eu> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
No description provided.