-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-9873: [C++][Compute] Optimize mode kernel for integers in small value range #8091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think polymorphism is a good idea for performance. Instead, you should probably use templated code.
Also, when null_count == length, the implementation can be trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to template. Did get better performance. Thanks.
|
Latest benchmark result after re-implementation. |
|
"RTools 35" CI failure is not related |
|
I get similar speedups on AMD Ryzen. |
… value range For int16/32/64 arrays with reasonable length, scan the array to find min/max values first. If (max-min) is within some threshold, instead of general hashmap, using a value indexed array can improve performance significantly. To be compatible with chunked array, value count array is transferred to hashmap before merging with others. This is an overhead for short array. Finding min/max may also introduce performance penalty in some cases. Please note it's hard to benefit all use cases. By applying this patch: - about 2x performance uplift for integers in small value range - no obvious performance drop for normal cases - non-trivial performance drop in some cases * 40% drop for short int8 array (8k length) * 10% drop for sparse array (few distinct values, big value gap)
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thank you @cyb70289
For int16/32/64 arrays with reasonable length, scan the array to find
min/max values first. If (max-min) is within some threshold, instead
of general hashmap, using a value indexed array can improve performance
significantly.
To be compatible with chunked array, value count array is transferred to
hashmap before merging with others. This is an overhead for short array.
Finding min/max may also introduce performance penalty in some cases.
Please note it's hard to benefit all use cases. By applying this patch: