Skip to content

[VL] Add LocalTableScanExec support to Velox backend#12080

Open
minni31 wants to merge 1 commit into
apache:mainfrom
minni31:oss/velox-local-table-scan-support
Open

[VL] Add LocalTableScanExec support to Velox backend#12080
minni31 wants to merge 1 commit into
apache:mainfrom
minni31:oss/velox-local-table-scan-support

Conversation

@minni31
Copy link
Copy Markdown

@minni31 minni31 commented May 12, 2026

CONTEXT

LocalTableScanExec is a Spark physical operator that materializes in-memory data (e.g., from Dataset.toDF(), spark.range(), or constant relations optimized by the catalyst planner). Currently, Gluten does not offload this operator, so the output rows stay in Spark's internal row format and require a separate row-to-columnar conversion step before downstream Velox operators can consume them.

This is the companion PR to #12077 (RDDScanExec support) — both follow the same design pattern.

WHAT

Adds a VeloxLocalTableScanTransformer that intercepts LocalTableScanExec in the offload rules and performs row-to-columnar conversion using Velox's native RowToColumnarConverter. The implementation:

  • Introduces a LocalTableScanTransformer base trait in gluten-substrait with the backend-agnostic contract (output attributes, row data, schema validation).
  • Adds the Velox-specific VeloxLocalTableScanTransformer that delegates schema validation to VeloxValidatorApi.validateSchema — the same canonical validator used by all other Velox operators — ensuring recursive complex-type validation, TimestampNTZ handling, and variant struct detection are handled consistently.
  • Wires up the offload rule in OffloadSingleNodeRules and the backend factory method in VeloxSparkPlanExecApi.
  • Skips offloading for streaming sources (plan.getStream.isEmpty guard) since those follow a different execution path.
  • Propagates SQLMetrics (numInputRows, numOutputBatches, convertTime) so conversion costs are visible in the Spark UI.
  • Uses the 7-parameter toColumnarBatchIterator overload to pass plan-level metrics to the native converter.

Tests

Suite Tests Status
VeloxLocalTableScanSuite 7 tests covering primitive types, nullable columns, empty datasets, multi-partition, string columns, filter pushdown, multiple column types Local pass

@github-actions github-actions Bot added CORE works for Gluten Core VELOX labels May 12, 2026
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@minni31 minni31 force-pushed the oss/velox-local-table-scan-support branch from 4e33e13 to d6fcdf8 Compare May 13, 2026 05:28
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant