The streaming execution engine supports sorting (I believe, as a sink node option?), but the Substrait consumer does not currently consume sort relations. Please can we have support for this?
Here's the example code/plan I tested with (in R, using the in-development substrait package):
library(dplyr)
library(substrait)
# create a basic table and order it
out <- tibble::tibble(a = 1, b = 2) %>%
arrow_substrait_compiler() %>%
arrange(a)
# take a look at the plan created
out$plan()
#> message of type 'substrait.Plan' with 2 fields set
#> extension_uris {
#> extension_uri_anchor: 1
#> }
#> relations {
#> root {
#> input {
#> sort {
#> input {
#> read {
#> base_schema {
#> names: "a"
#> names: "b"
#> struct_ {
#> types {
#> fp64 {
#> }
#> }
#> types {
#> fp64 {
#> }
#> }
#> }
#> }
#> named_table {
#> names: "named_table_1"
#> }
#> }
#> }
#> sorts {
#> expr {
#> selection {
#> direct_reference {
#> struct_field {
#> }
#> }
#> }
#> }
#> direction: SORT_DIRECTION_ASC_NULLS_LAST
#> }
#> }
#> }
#> names: "a"
#> names: "b"
#> }
#> }
# try to run the plan
collect(out)
#> Error: NotImplemented: conversion to arrow::compute::Declaration from Substrait relation sort {
...
#> /home/nic2/arrow/cpp/src/arrow/engine/substrait/serde.cc:73 FromProto(plan_rel.rel(), ext_set)
Reporter: Nicola Crane / @thisisnic
Note: This issue was originally created as ARROW-16649. Please see the migration documentation for further details.
The streaming execution engine supports sorting (I believe, as a sink node option?), but the Substrait consumer does not currently consume sort relations. Please can we have support for this?
Here's the example code/plan I tested with (in R, using the in-development substrait package):
Reporter: Nicola Crane / @thisisnic
Note: This issue was originally created as ARROW-16649. Please see the migration documentation for further details.