ARROW-15441: [C++][Compute] Fix incorrect result of hash_count a null type column #12251

Crystrix · 2022-01-25T08:14:57Z

The result of hash_count such array is incorrect.
For a table like this:

argument	key
NULL	1
NULL	1

The result is :

CountOptions	Expected	Actual
ONLY_VALID	0	2
ONLY_NULL	2	0

This PR handles null type with different count options.

github-actions · 2022-01-25T08:15:20Z

https://issues.apache.org/jira/browse/ARROW-15441

lidavidm

Ah, NullType has no buffers but most things assume no validity buffer == all valid. Thanks for catching this.

lidavidm · 2022-01-25T13:34:23Z

cpp/src/arrow/compute/kernels/hash_aggregate_test.cc

+  }
+}
+
+TEST(GroupBy, CountWithNullTypeEmptyTable) {


Was there an issue with empty tables specifically that this is checking?

No, it's just to make sure this corner case can be correctly processed.

lidavidm · 2022-01-26T19:18:47Z

Thanks for the fix!

ursabot · 2022-01-26T19:21:36Z

Benchmark runs are scheduled for baseline = c4eb8dc and contender = 0f820ef. 0f820ef is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️2.03% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.04% ⬆️0.17%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Fix incorrect result of hash_count a null type column

6c97863

github-actions bot added the Component: C++ label Jan 25, 2022

Crystrix changed the title ~~ARROW-15441: [C++] Fix incorrect result of hash_count a null type column~~ ARROW-15441: [C++][Compute] Fix incorrect result of hash_count a null type column Jan 25, 2022

lidavidm approved these changes Jan 25, 2022

View reviewed changes

Remove an unnecessary variable

d499b45

lidavidm closed this in 0f820ef Jan 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-15441: [C++][Compute] Fix incorrect result of hash_count a null type column #12251

ARROW-15441: [C++][Compute] Fix incorrect result of hash_count a null type column #12251

Crystrix commented Jan 25, 2022

github-actions bot commented Jan 25, 2022

lidavidm left a comment

lidavidm Jan 25, 2022

Crystrix Jan 25, 2022

lidavidm commented Jan 26, 2022

ursabot commented Jan 26, 2022 •

edited

ARROW-15441: [C++][Compute] Fix incorrect result of hash_count a null type column #12251

ARROW-15441: [C++][Compute] Fix incorrect result of hash_count a null type column #12251

Conversation

Crystrix commented Jan 25, 2022

github-actions bot commented Jan 25, 2022

lidavidm left a comment

Choose a reason for hiding this comment

lidavidm Jan 25, 2022

Choose a reason for hiding this comment

Crystrix Jan 25, 2022

Choose a reason for hiding this comment

lidavidm commented Jan 26, 2022

ursabot commented Jan 26, 2022 • edited

ursabot commented Jan 26, 2022 •

edited