Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-15837: [C++][Python] Clarify documentation for ListArray::offsets() #12557

Closed

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Mar 3, 2022

No description provided.

@pitrou pitrou requested a review from westonpace March 3, 2022 17:41
@github-actions
Copy link

github-actions bot commented Mar 3, 2022

@lhoestq
Copy link

lhoestq commented Mar 3, 2022

If there is a way to reconstruct a ListArray using the offsets and values, maybe it could be worth mentioning it as well ?

EDIT: from the discussion on JIRA it doesn't seem possible yet - the doc looks good to me then, thanks !

@pitrou
Copy link
Member Author

pitrou commented Mar 3, 2022

There seem to be conda-related issues, I'm going to force-push.

@pitrou pitrou force-pushed the ARROW-15837-clarify-list-offsets branch from db64ce5 to 97f6f91 Compare March 3, 2022 19:04
@pitrou pitrou force-pushed the ARROW-15837-clarify-list-offsets branch from 97f6f91 to 7ce7280 Compare March 7, 2022 16:19
Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that this wording really hurts anything and the example is always helpful but I don't think it made the issue any clearer for me.

A list array has three pieces, values, offsets, and validity. It isn't clear why the offsets would be expected to contain the validity. I would think it just as likely someone assumes the values contains the validity.

Once we fix ARROW-15839 by adding a mask I think the original issue would be clearer.

@jorisvandenbossche
Copy link
Member

A list array has three pieces, values, offsets, and validity. It isn't clear why the offsets would be expected to contain the validity. I would think it just as likely someone assumes the values contains the validity.

While "a list array has three pieces, values, offsets, and validity" is of course correct, I think many people will think about (or explain) a list array as consisting of two pieces: values and offsets (those are also the two "child" arrays for which we have properties on ListArray to access them, and are the two arrays from which you can recreate a new ListArray in from_arrays). So I don't think the confusion from ARROW-15837 is that uncommon, and the clarification here seems helpful IMO.

Since the values array doesn't have a 1:1 relationship with the list values (and can have nulls itself, independent from nulls at the list level), I would find it less expected to think that those would contain the list validity.

ARROW-15839 will indeed help, but then it's maybe also the question if we want to make it easier to get the "mask" / validity bitmap of an existing ListArray (although that's not specific to a ListArray).

@pitrou pitrou closed this in c70426f Mar 8, 2022
@pitrou pitrou deleted the ARROW-15837-clarify-list-offsets branch March 8, 2022 11:04
@ursabot
Copy link

ursabot commented Mar 8, 2022

Benchmark runs are scheduled for baseline = a6e51a0 and contender = c70426f. c70426f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.29% ⬆️0.0%] test-mac-arm
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.26% ⬆️0.0%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants