Skip to content

[C++] Unify approaches to attach schemas on record batches exiting Acero #32235

@asfimport

Description

@asfimport

Internally, Acero uses ExecBatch everywhere, without schemas. Originally, the various exit nodes would simply attach a boring schema based on the output data types and an inference of field names.

However, as part of Substrait integration and other improvements the various sink nodes are being amended to support:

  • Custom field names

  • Custom metadata

    However, the current implementation is somewhat inconsistent.

    SinkNode:

  • Does not support custom field names or metadata
    ConsumingSinkNode:

  • Supports custom names but not custom metadata
    WriteNode

  • Supports custom metadata but not custom names

    We should create a SinkNodeOptions base class that supports custom names and custom metadata and we should have a single place with utility methods for attaching a schema to an outgoing exec batch. Then all of our sink nodes should use this single tool for modifying outgoing batches.

Reporter: Weston Pace / @westonpace

Note: This issue was originally created as ARROW-16915. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions