You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After looking at the ExecPlan output of some queries, it jumped out at me how we translate int_field == 5 in R as cast(int_field, float64) == 5 because 5 is a double in R.
This extra work has a noticeable performance impact. Here's a simple query on the taxi dataset, filtering down to 54 out of 1.5 billion rows and selecting a single column. My idea was to make a query that does not much other than evaluate the filter.
After looking at the ExecPlan output of some queries, it jumped out at me how we translate
int_field == 5
in R ascast(int_field, float64) == 5
because 5 is a double in R.This extra work has a noticeable performance impact. Here's a simple query on the taxi dataset, filtering down to 54 out of 1.5 billion rows and selecting a single column. My idea was to make a query that does not much other than evaluate the filter.
You can see the difference in the query plans too:
Ideally Acero would do this more intelligently (cf. ARROW-11402), but we should also be able to do smarter things when assembling the Expression in R.
Reporter: Neal Richardson / @nealrichardson
Assignee: Neal Richardson / @nealrichardson
Related issues:
mutate(x2=ifelse(x=='',NA,x))
Error: Function 'if_else' has no kernel matching input types (fixes)PRs and other links:
Note: This issue was originally created as ARROW-17462. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: