Skip to content

[VL] Port AttachDistributedSequenceExec to Velox backend #12187

@baibaichen

Description

@baibaichen

Description

Spark's AttachDistributedSequenceExec prepends a contiguous, globally increasing Long id column to its child output. It is used by pandas-on-Spark's distributed-sequence default index and by DataFrame.zipWithIndex. Today Gluten falls back to vanilla Spark for this operator, which forces a columnar → row transition that dominates runtime for wide / nested-typed inputs.

This issue tracks porting the operator to the Velox backend so that it runs end-to-end on columnar batches.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions