Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dense/Sparse annotation to DataType::Union #814

Closed
alamb opened this issue Oct 6, 2021 · 4 comments · Fixed by #885
Closed

Add Dense/Sparse annotation to DataType::Union #814

alamb opened this issue Oct 6, 2021 · 4 comments · Fixed by #885
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@alamb
Copy link
Contributor

alamb commented Oct 6, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
While working on validation for ArrayData (#810), I was trying to validate that the number of Buffers passed to a UnionArray were accurate.

However, since number of Buffers for a UnionArray differs based on if it has a "sparse" or "dense" layout (1 or 2, respectively), the sparseness for the UnionArray is not encoded in the typeDataType::UnionArray; This means we can't easily validate that the UnionArray has the correct number of buffers for its type

It also means that the UnionArray can not be used without first checking to see how many buffers it has (what type it is)

Describe the solution you'd like
I propose changing

enum DataType {
..
  Union(Vec<Field>),
..
}

To something like

enum UnionMode {
  Sparse,
  Dense
}

enum DataType {
..
  Union(UnionMode, Vec<Field>),
..
}

which is both consistent with the C++ implementation as well as allows UnionArray to be statically typechecked: https://github.com/apache/arrow/blob/661c7d749150905a63dd3b52e0a04dac39030d95/cpp/src/arrow/type.h#L1028

Describe alternatives you've considered
Can keep the sparse/dense

Additional context
Add any other context or screenshots about the feature request here.

@alamb alamb added enhancement Any new improvement worthy of a entry in the changelog api-change Changes to the arrow API arrow Changes to the arrow crate labels Oct 6, 2021
@alamb
Copy link
Contributor Author

alamb commented Oct 6, 2021

@nevi-me and @paddyhoran since it looks like you involved in the initial implementation of UnionArray I wonder if you have thoughts.

the arrow2 implementation appears to use a bool to encode sparseness: https://github.com/jorgecarleitao/arrow2/blob/55c3f9cf31f8fd0bc9c49f48d6265f07e2367ec8/src/datatypes/mod.rs#L95-L96

cc @jorgecarleitao

It may be the case that we simply "wait for arrow2" and not do this ticket

@alamb
Copy link
Contributor Author

alamb commented Oct 24, 2021

Maybe related to #85

@paddyhoran
Copy link
Contributor

Sorry for taking so long. I modeled the original impl on the C++ implementation, it's possible that the C++ impl added this after I wrote the initial implementation. I agree with the change.

It may be the case that we simply "wait for arrow2" and not do this ticket

I've been out of the loop, is there a plan for adopting arrow2?

@alamb
Copy link
Contributor Author

alamb commented Oct 26, 2021

I've been out of the loop, is there a plan for adopting arrow2?

I would say there are no plans -- lots of hope and ideas though :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
2 participants