Description
Spark's AttachDistributedSequenceExec prepends a contiguous, globally increasing Long id column to its child output. It is used by pandas-on-Spark's distributed-sequence default index and by DataFrame.zipWithIndex. Today Gluten falls back to vanilla Spark for this operator, which forces a columnar → row transition that dominates runtime for wide / nested-typed inputs.
This issue tracks porting the operator to the Velox backend so that it runs end-to-end on columnar batches.
Description
Spark's
AttachDistributedSequenceExecprepends a contiguous, globally increasingLongid column to its child output. It is used by pandas-on-Spark'sdistributed-sequencedefault index and byDataFrame.zipWithIndex. Today Gluten falls back to vanilla Spark for this operator, which forces a columnar → row transition that dominates runtime for wide / nested-typed inputs.This issue tracks porting the operator to the Velox backend so that it runs end-to-end on columnar batches.