New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Add support for registering tricky functions with the Substrait consumer (or add a bunch of substrait meta functions) #31046
Comments
Weston Pace / @westonpace: At a minimum one would think this mapping object would be a simple bidirectional string:string map which goes from Arrow function name to Substrait function name and back. Unfortunately, as the ticket describes, I do not think this is possible today. The worst case scenario is that we require two functions for every entry in the mapping. One that goes from a Substrait "call" to an Arrow "call" and the reverse. I think, as a first attempt, we should tackle this with a very manual mapping, probably with some kind of convenience option for the functions that are simple aliases and then we can look at how we improve from there. A substrait "call" is a name (string), a vector of arguments (expressions), and a vector of options (literal expressions). An arrow "call" is a name (string), a vector of arguments (expressions), and an options object (POCO). So my suggestion for the mapping would be something like...
The add function is an interesting example (some pseudo-code / imaginary helper functions for brevity):
|
Weston Pace / @westonpace: Also, if there were conformance testing then I think one could prioritize / only implement the Substrait -> Arrow path as the Arrow -> Substrait path is currently only used for testing (although one can imagine non-testing applications). |
Weston Pace / @westonpace:
|
Weston Pace / @westonpace: Tagging for discussion. |
David Li / @lidavidm: |
Sanjiban Sengupta / @sanjibansg: https://docs.google.com/spreadsheets/d/1Jm7vt-sTxsmB7HlLsdWPk6LFINcOGzZjU9SL4ZtiYoY/edit?usp=sharing |
Yaron Gvili / @rtpsw: In the context of the general discussion, Substrait also has a ternary-function "clip" that does not currently appear in the list. Some possible solutions for it are:
|
Weston Pace / @westonpace: I think #1 is something that will happen a lot at some point but I feel like it lives in the realm of the query planner/optimizer. So I'd almost want to say "Arrow doesn't support that function" before we get into the realm of "equivalent but not identical plans". Having something like #3 in Substrait would possible enable something like #1 to happen in a query planner. One could then imagine the following conversation between planner and consumer:
|
Weston Pace / @westonpace: |
Sometimes one Substrait function will map to multiple Arrow functions. For example, the Substrait
add
function might be referring to Arrow'sadd
oradd_checked
. We need to figure out how to register this correctly (e.g. one possible approach would be asubstrait_add
meta function).Other times a substrait function will encode something Arrow considers an "option" as a function argument. For example, the is_in Arrow function is unary with an option for the lookup set. The substrait function is binary but the second argument must be constant and be the lookup set. Neither of which is to be confused with a truly binary is_in function which takes in a different set at every row.
It's possible there is no work to do here other than adding a bunch of substrait_ meta functions in Arrow. In that case all the work will be done in other JIRAs. Or, it is possible that there is some kind of extension we can make to the function registry that bypasses the need for the meta functions. I'm leaving this JIRA open so future contributors can consider this second option.
Reporter: Weston Pace / @westonpace
Assignee: Weston Pace / @westonpace
PRs and other links:
Note: This issue was originally created as ARROW-15582. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: