Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-15441: [C++][Compute] Fix incorrect result of hash_count a null type column #12251

Closed
wants to merge 2 commits into from

Conversation

Crystrix
Copy link
Contributor

The result of hash_count such array is incorrect.
For a table like this:

argument key
NULL 1
NULL 1

The result is :

CountOptions Expected Actual
ONLY_VALID 0 2
ONLY_NULL 2 0

This PR handles null type with different count options.

@github-actions
Copy link

@Crystrix Crystrix changed the title ARROW-15441: [C++] Fix incorrect result of hash_count a null type column ARROW-15441: [C++][Compute] Fix incorrect result of hash_count a null type column Jan 25, 2022
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, NullType has no buffers but most things assume no validity buffer == all valid. Thanks for catching this.

}
}

TEST(GroupBy, CountWithNullTypeEmptyTable) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there an issue with empty tables specifically that this is checking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's just to make sure this corner case can be correctly processed.

@lidavidm lidavidm closed this in 0f820ef Jan 26, 2022
@lidavidm
Copy link
Member

Thanks for the fix!

@ursabot
Copy link

ursabot commented Jan 26, 2022

Benchmark runs are scheduled for baseline = c4eb8dc and contender = 0f820ef. 0f820ef is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️2.03% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.04% ⬆️0.17%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants