Skip to content

[C++] Add support for sorting to the Substrait consumer #31998

@asfimport

Description

@asfimport

The streaming execution engine supports sorting (I believe, as a sink node option?), but the Substrait consumer does not currently consume sort relations.  Please can we have support for this?

Here's the example code/plan I tested with (in R, using the in-development substrait package):

 

library(dplyr)
library(substrait)

# create a basic table and order it
out <- tibble::tibble(a = 1, b = 2) %>%
  arrow_substrait_compiler() %>%
  arrange(a)

# take a look at the plan created
out$plan()
#> message of type 'substrait.Plan' with 2 fields set
#> extension_uris {
#>   extension_uri_anchor: 1
#> }
#> relations {
#>   root {
#>     input {
#>       sort {
#>         input {
#>           read {
#>             base_schema {
#>               names: "a"
#>               names: "b"
#>               struct_ {
#>                 types {
#>                   fp64 {
#>                   }
#>                 }
#>                 types {
#>                   fp64 {
#>                   }
#>                 }
#>               }
#>             }
#>             named_table {
#>               names: "named_table_1"
#>             }
#>           }
#>         }
#>         sorts {
#>           expr {
#>             selection {
#>               direct_reference {
#>                 struct_field {
#>                 }
#>               }
#>             }
#>           }
#>           direction: SORT_DIRECTION_ASC_NULLS_LAST
#>         }
#>       }
#>     }
#>     names: "a"
#>     names: "b"
#>   }
#> }

# try to run the plan
collect(out)
#> Error: NotImplemented: conversion to arrow::compute::Declaration from Substrait relation sort {
...
#> /home/nic2/arrow/cpp/src/arrow/engine/substrait/serde.cc:73  FromProto(plan_rel.rel(), ext_set)

Reporter: Nicola Crane / @thisisnic

Note: This issue was originally created as ARROW-16649. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions