perf: Optimize array_min, array_max for arrays of primitive types#21101
perf: Optimize array_min, array_max for arrays of primitive types#21101neilconway wants to merge 5 commits intoapache:mainfrom
array_min, array_max for arrays of primitive types#21101Conversation
|
We could add a similar fastpath for arrays of strings, although maybe not worth it because |
|
On an M4 Max, it looks like the crossover point between direct iteration and using the Arrow kernel is 32-40 list elements: So I lowered the iteration -> kernel switchover threshold to 32. |
|
These are great numbers ! @neilconway . Could we perhaps also remove if conditions as well and see if those help out. Example :
|
|
@coderfender Thanks for the feedback! I quickly checked 1 and 3 and they don't yield any improvement; I'd suspect the compiler will hoist loop-invariant branches like this out of the loop. The threshold check should be similar: it should be branch-predicted effectively. Lmk if you disagree! |
Which issue does this PR close?
array_min,array_maxfor primitive types #21100.Rationale for this change
In the current implementation, we construct a
PrimitiveArrayfor each row, feed it to the Arrowmin/maxkernel, and then collect the resultingScalarValues in aVec. We then construct a finalPrimitiveArrayfor the result viaScalarValue::iter_to_arrayof theVec.We can do better for ListArrays of primitive types. First, we can iterate directly over the flat values buffer of the
ListArrayfor the batch and compute the min/max from each row's slice directly. Second, Arrow'smin/maxkernels have a reasonable amount of per-call overhead; for small arrays, it is more efficient to compute the min/max ourselves via direct iteration.Benchmarks (8192 rows, arrays of int64 values, M4 Max):
What changes are included in this PR?
array_maxAre these changes tested?
Yes.
Are there any user-facing changes?
No.