Skip to content

[VL][Delta] Add JVM Delta DV scan handoff#12198

Draft
malinjawi wants to merge 2 commits into
apache:mainfrom
malinjawi:split/delta-dv-java-scan-handoff-pr-clean
Draft

[VL][Delta] Add JVM Delta DV scan handoff#12198
malinjawi wants to merge 2 commits into
apache:mainfrom
malinjawi:split/delta-dv-java-scan-handoff-pr-clean

Conversation

@malinjawi
Copy link
Copy Markdown
Contributor

What changes are proposed in this pull request?

This PR is the next split in the Delta deletion-vector scan stack and is stacked on #12197.

It adds the JVM/Substrait/Velox handoff that consumes the essential Delta DV scan info extracted by #12197, materializes serialized DV payloads on the JVM side, and passes them to native scan execution.

Main changes:

  • add a Delta DV preprocessing rule for the Velox Delta component without replacing Delta's PrepareDeltaScan
  • reuse DeltaDeletionVectorScanInfo from [VL][Delta] Add DV scan info extraction utility #12197 to extract per-file DV metadata and serialized DV bytes from Delta-prepared scan files
  • add Delta local files Substrait nodes/builders carrying DeltaReadOptions
  • embed the serialized DV payload in DeltaReadOptions, instead of passing essential DV data through generic metadata columns
  • add a native DeltaSplitInfo path for Delta-specific split metadata
  • wire the handoff through VeloxIteratorApi, VeloxPlanConverter, WholeStageResultIterator, and SubstraitToVeloxPlan
  • strip Spark's synthetic DV predicate/internal columns only after the native split has the payload, so Velox applies the DV filter natively and we avoid double filtering
  • add Spark 3.5 and Spark 4.0 focused handoff coverage

This PR is intentionally handoff-only:

Issue: #11901

How was this patch tested?

Validation run locally:

  • JAVA_HOME=$(/usr/libexec/java_home -v 17) ./build/mvn test-compile -pl backends-velox -am -Pjava-17,spark-3.5,backends-velox,hadoop-3.3,spark-ut,delta -DskipTests
  • JAVA_HOME=$(/usr/libexec/java_home -v 17) ./build/mvn test-compile -pl backends-velox -am -Pjava-17,spark-4.0,scala-2.13,backends-velox,hadoop-3.3,spark-ut,delta -DskipTests
  • git diff --check
  • clang-format over touched C++ files with /opt/homebrew/opt/llvm@15/bin/clang-format

Was this patch authored or co-authored using generative AI tooling?

Generated-by: IBM BOB

@github-actions github-actions Bot added CORE works for Gluten Core VELOX DATA_LAKE labels May 31, 2026
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core DATA_LAKE VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant