Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-36053: [C++] summarizing a variable results in NA at random, while there is no NA in the subset of data #36368

Merged

Conversation

westonpace
Copy link
Member

@westonpace westonpace commented Jun 28, 2023

Rationale for this change

When merging two aggregate states we were failing to use the correct no_nulls field. This field tells us whether we should return null if skip_nulls=False (if no_nulls is false then we return null).

Since we were reading the wrong field we would sometimes emit null even when a column didn't actually have any nulls.

What changes are included in this PR?

Fixed the bug.

Are these changes tested?

Yes, I added a new unit test that reproduced this failure quite reliably.

Are there any user-facing changes?

No.

@westonpace westonpace added the Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. label Jun 28, 2023
@github-actions
Copy link

⚠️ GitHub issue #36053 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels Jun 29, 2023
@thisisnic thisisnic added this to the 13.0.0 milestone Jun 29, 2023
@paleolimbot
Copy link
Member

Just a note that I ran the original R reprex against Arrow C++ built from this branch to confirm that it fixes the issue (it does! Thank you!).

@bkietz bkietz merged commit 0cea12f into apache:main Jun 29, 2023
29 of 34 checks passed
@bkietz bkietz removed the awaiting merge Awaiting merge label Jun 29, 2023
@conbench-apache-arrow
Copy link

Conbench analyzed the 6 benchmark runs on commit 0cea12ff.

There were 6 benchmark results indicating a performance regression:

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: C++ Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++][R] summarizing a variable results in NA at random, while there is no NA in the subset of data.
4 participants