-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-15244: [Format] Clarify that offsets are monotonic for binary like arrays #12019
Conversation
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename pull request title in the following format?
or
See also: |
I also started a mailing list thread on this topic: https://lists.apache.org/thread/fx8k250nn1d9b86sfo9t2gcl1v11mn4f |
Co-authored-by: Jorge Leitao <jorgecarleitao@gmail.com>
Co-authored-by: Matthijs Brobbel <m1brobbel@gmail.com>
|
Benchmark runs are scheduled for baseline = 31a07be and contender = e7dc8f5. e7dc8f5 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Rationale
The question of "what are the values of the offsets for non-valid entries in arrays" came up in arrow-rs: apache/arrow-rs#1071 and the existing docs seem to be somewhat vague on this issue.
I looked at three implementations of arrow, and they all seem to assume / validate the offsets are monotonic:
https://github.com/jorgecarleitao/arrow2/blob/37a9c758826a92d98dc91e992b2a49ce9724095d/src/array/specification.rs#L102-L119
Changes
Thus I propose updating the format docs to make the monotonic offsets explicit.
Background
I think @jorgecarleitao's description on apache/arrow-rs#1071 (comment), explains the reason why having monotonic offsets is a good idea