Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-17515: [C++][Archery] JSON integration testing with RLE #14179

Closed
wants to merge 142 commits into from

Conversation

zagto
Copy link
Contributor

@zagto zagto commented Sep 20, 2022

No description provided.

zagto added 30 commits July 29, 2022 23:07
the scalar version visits all types that can exist as a Scalar.
Currently this true for all types we have. This will change once we add
run-length encoding, which is an array encoding.
- use int32
- calculate physical offset based on buffer size, instead of incorrectly
  using the physical size
@github-actions
Copy link

Comment on lines +529 to +535
Status Visit(const RunLengthEncodedArray& array) {
--max_recursion_depth_;
RETURN_NOT_OK(VisitArray(*array.run_ends_array()));
RETURN_NOT_OK(VisitArray(*array.values_array()));
++max_recursion_depth_;
return Status::OK();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we handling the logical offset with the IPC?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, i see the usage of logical_run_ends and logical_values, but shouldn't this be using those instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you're right

run_ends = self.run_ends_field.generate_column(size)
if name is None:
name = self.name
return RunLengthEncodedColumn(name, size, run_ends, values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the size here be larger than the size used in generate_column since it would be the "logical" length for the parent?

#include "arrow/array/array_base.h" // IWYU pragma: keep
#include "arrow/array/array_binary.h" // IWYU pragma: keep
#include "arrow/array/array_decimal.h" // IWYU pragma: keep
#include "arrow/array/array_dict.h" // IWYU pragma: keep
#include "arrow/array/array_encoded.h" // IWYU pragma: keep
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe array_rle or array_ree would be a better name for RunEndEncoded arrays

DictionaryArrays are also a form of encoding so calling this array_encoded seems a bit vague

zeroshade pushed a commit that referenced this pull request Mar 15, 2023
 * Closes #14340
 * Closes #32773
 * Closes #14179
 
* Closes: #32338

Lead-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Co-authored-by: Tobias Zagorni <tobias@zagorni.eu>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
rtpsw pushed a commit to rtpsw/arrow that referenced this pull request Mar 27, 2023
…che#34550)

 * Closes apache#14340
 * Closes apache#32773
 * Closes apache#14179
 
* Closes: apache#32338

Lead-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Co-authored-by: Tobias Zagorni <tobias@zagorni.eu>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants