There are currently a few places where we call R code from C++ (and after ARROW-16444 and ARROW-16703 we will have some more where the overhead of calling into R might be greater than the time it takes to actually evaluate the function/the functions will be called in a tight loop).
The current approach uses cpp11::function. This is totally fine and safe but generates some ugly backtraces on error and is potentially slower than the lean-and-mean approach of purrr (whose entire job is to call R functions in a loop and has been heavily optimized). The purrr approach is to construct the call() and calling environment in advance and then just run Rf_eval(call, env) in the loop. This is both faster (fewer R API calls) and generates better backtraces (e.g., Error in fun(arg1, arg2) instead of Error in (function(a, b) { ...the whole content of the function ... })(every, deparsed, argument).
Before optimizing that heavily we should of course benchmark to see exactly how much that matters!
Reporter: Dewey Dunnington / @paleolimbot
Note: This issue was originally created as ARROW-17148. Please see the migration documentation for further details.
There are currently a few places where we call R code from C++ (and after ARROW-16444 and ARROW-16703 we will have some more where the overhead of calling into R might be greater than the time it takes to actually evaluate the function/the functions will be called in a tight loop).
The current approach uses
cpp11::function. This is totally fine and safe but generates some ugly backtraces on error and is potentially slower than the lean-and-mean approach of purrr (whose entire job is to call R functions in a loop and has been heavily optimized). The purrr approach is to construct thecall()and calling environment in advance and then just runRf_eval(call, env)in the loop. This is both faster (fewer R API calls) and generates better backtraces (e.g.,Error in fun(arg1, arg2)instead ofError in (function(a, b) { ...the whole content of the function ... })(every, deparsed, argument).Before optimizing that heavily we should of course benchmark to see exactly how much that matters!
Reporter: Dewey Dunnington / @paleolimbot
Note: This issue was originally created as ARROW-17148. Please see the migration documentation for further details.