You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed this behavior while working on support for DictionaryArrays and wanted to capture it in a ticket in case someone has time to work on it.
In order to implement an equality predicate to a constant such as d1 = 'three', DataFusion effectively creates an array with the same value 'three' repeated over and over again and uses the equality compute kernel. This is ... suboptimal.
Yordan Pavlov / @yordan-pavlov:
I have an initial implementation of direct comparison operations to scalar values in datafusion which, for the simple query used in the benchmark ("select f32, f64 from t where f32 >= 250 and f64 > 250") shows approximately 10x performance improvement:
before:
filter_scalar time: [35.733 ms 36.613 ms 37.924 ms]
after:
filter_scalar time: [3.5938 ms 3.6450 ms 3.7035 ms]
change: [-90.048% -89.846% -89.625%] (p = 0.00 < 0.05)
I have also added a benchmark to compare the change in performance when comparing two arrays (using query "select f32, f64 from t where f32 >= f64") and it is negligible:
before:
filter_array time: [11.601 ms 11.656 ms 11.718 ms]
after:
filter_array time: [11.854 ms 11.957 ms 12.070 ms]
change: [+1.8032% +3.6391% +5.5671%] (p = 0.00 < 0.05)
I noticed this behavior while working on support for DictionaryArrays and wanted to capture it in a ticket in case someone has time to work on it.
In order to implement an equality predicate to a constant such as
d1 = 'three'
, DataFusion effectively creates an array with the same value'three'
repeated over and over again and uses the equality compute kernel. This is ... suboptimal.Here is what the predicate looks like:
Reporter: Andrew Lamb / @alamb
Assignee: Yordan Pavlov / @yordan-pavlov
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-10173. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: