Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(data): map type ID to child index before indexing a union child array #4598

Merged
merged 2 commits into from
Jul 31, 2023

Conversation

kawadakk
Copy link
Contributor

@kawadakk kawadakk commented Jul 31, 2023

Which issue does this PR close?

Closes #4578.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jul 31, 2023
@kawadakk kawadakk changed the title fix(data): map type ID to child index before indexing union child array fix(data): map type ID to child index before indexing a union child array Jul 31, 2023
@kawadakk kawadakk force-pushed the patches/fix-extend-union-dense branch 3 times, most recently from abb5612 to 00edcba Compare July 31, 2023 05:02
@kawadakk kawadakk marked this pull request as ready for review July 31, 2023 05:02
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, looks good to me. Just minor suggestion that you can feel free to ignore

@@ -48,14 +51,18 @@ pub(super) fn build_extend_dense(array: &ArrayData) -> Extend {
.extend_from_slice(&type_ids[start..start + len]);

(start..start + len).for_each(|i| {
let type_id = type_ids[i] as usize;
let type_id = type_ids[i];
let child_index = src_fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could follow the approach used by UnionArray and compute this mapping once

https://github.com/apache/arrow-rs/blob/master/arrow-array/src/array/union_array.rs#L345

Copy link
Contributor Author

@kawadakk kawadakk Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I implemented that optimization in an earlier version, but then realized that it might be counterproductive and excessively memory-intensive if the number of rows per input array is very low, and the number of input arrays is large (as in my use case). equal_dense is implemented in the naïve way as well.

@kawadakk kawadakk force-pushed the patches/fix-extend-union-dense branch from 00edcba to ab724b1 Compare July 31, 2023 09:11
@kawadakk
Copy link
Contributor Author

kawadakk commented Jul 31, 2023

Force-pushed a minor change to sate clippy

@tustvold tustvold merged commit c663d88 into apache:master Jul 31, 2023
25 checks passed
@kawadakk kawadakk deleted the patches/fix-extend-union-dense branch August 1, 2023 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

arrow::compute::concat panics for dense union arrays with non-trivial type IDs
2 participants