Skip to content

[VL][Delta] Add persistent DV DELETE correctness path#12216

Draft
malinjawi wants to merge 11 commits into
apache:mainfrom
malinjawi:split/delta-dv-delete-correctness
Draft

[VL][Delta] Add persistent DV DELETE correctness path#12216
malinjawi wants to merge 11 commits into
apache:mainfrom
malinjawi:split/delta-dv-delete-correctness

Conversation

@malinjawi
Copy link
Copy Markdown
Contributor

@malinjawi malinjawi commented Jun 1, 2026

What changes

This is the next stacked Delta DV MoR slice after #12215. It adds the minimal persistent-DV DELETE correctness path for Velox Delta native execution while keeping Delta OSS semantics for action generation, stats, and transaction behavior.

Stack order:

  1. [VL][Delta] Add DV scan info extraction utility #12197 - DV scan info extraction utility
  2. [VL][Delta] Add JVM Delta DV scan handoff #12198 - JVM Delta DV scan handoff
  3. [VL][Delta] Guard DV DML row-index scans #12215 - DML row-index scan safety
  4. This PR - persistent DV DELETE correctness path

This PR should remain draft until the earlier scan-safety PR has reviewer confidence and this branch has native CI signal.

Scope

  • Adds GlutenDeleteCommand for persistent-DV row-condition DELETEs.
  • Routes only eligible persistent-DV DELETE commands through the Gluten Delta command wrapper.
  • Uses Delta's existing DML deletion-vector helpers for touched-file discovery and action generation.
  • Keeps ordinary DELETE, metadata-only DELETE, and full-table DELETE on the existing path.
  • Adds Delta 3.3 and Delta 4.0 coverage.

Intentionally deferred

  • Native bitmap aggregation as the default DELETE bitmap construction path.
  • Plain Parquet target-scan optimization.
  • Checksum shortcuts or stats rewrites beyond Delta's existing behavior.
  • DELETE diagnostics/benchmark suite, which stays in the follow-up branch.

Validation

Local validation after rebasing onto origin/split/delta-dv-dml-scan-safety at f3135560c7a3d7a0dfedc1f64267c9a529e7a970:

  • git diff --check origin/split/delta-dv-dml-scan-safety...HEAD
  • env JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home PATH=/opt/homebrew/opt/openjdk@17/bin:$PATH ./build/mvn -q test-compile -pl gluten-delta -am -Pjava-17,spark-3.4,backends-velox,hadoop-3.3,spark-ut,delta -DskipTests after restacking onto the Spark 3.3/3.4 compatibility fix in [VL][Delta] Guard DV DML row-index scans #12215
  • env JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home PATH=/opt/homebrew/opt/openjdk@17/bin:$PATH ./build/mvn -q test-compile -pl backends-velox -am -Pjava-17,spark-3.5,backends-velox,hadoop-3.3,spark-ut,delta -DskipTests
  • env JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home PATH=/opt/homebrew/opt/openjdk@17/bin:$PATH ./build/mvn -q test-compile -pl backends-velox -am -Pjava-17,spark-4.0,scala-2.13,backends-velox,hadoop-3.3,spark-ut,delta -DskipTests

Focused local ScalaTest execution was attempted for DeleteSQLWithDeletionVectorsSuite but this Mac checkout cannot start the Velox backend because darwin/aarch64/libgluten.dylib is not available. The run reached Spark startup and aborted before executing tests with FileNotFoundException: darwin/aarch64/libgluten.dylib. Treat native CI as the runtime correctness gate for this draft.

Mohammad Linjawi and others added 5 commits May 31, 2026 12:06
Keep Delta DV DML row-index target scans on Spark unless native DML row-index scanning and native write are explicitly enabled. Preserve the Spark Project/Filter subtree above the fallback scan and add Delta 3.3/4.0 plan-shape coverage for metadata row-index on and off.

Validation: JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home ./build/mvn test-compile -pl backends-velox -am -Pjava-17,spark-3.5,backends-velox,hadoop-3.3,spark-ut,delta -DskipTests

Validation: JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home ./build/mvn test-compile -pl backends-velox -am -Pjava-17,spark-4.0,scala-2.13,backends-velox,hadoop-3.3,spark-ut,delta -DskipTests

Validation: git diff --cached --check
@github-actions github-actions Bot added CORE works for Gluten Core VELOX DATA_LAKE labels Jun 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Run Gluten Clickhouse CI on x86

@malinjawi malinjawi force-pushed the split/delta-dv-delete-correctness branch from c43e4fa to bbb971f Compare June 1, 2026 13:54
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Run Gluten Clickhouse CI on x86

malinjawi added 3 commits June 1, 2026 17:48
Route Delta DELETE commands with persistent deletion vectors through the Gluten-specific command while leaving metadata-only, full-table, and non-DV cases on the existing Delta path.

Add Delta 3.3 and Delta 4.0 coverage for persistent DV DELETE routing and repeated deletion-vector updates.

Validation: git diff --cached --check; mvn test-compile -pl backends-velox -am -Pjava-17,spark-3.5,backends-velox,hadoop-3.3,spark-ut,delta -DskipTests; mvn test-compile -pl backends-velox -am -Pjava-17,spark-4.0,scala-2.13,backends-velox,hadoop-3.3,spark-ut,delta -DskipTests.
@malinjawi malinjawi force-pushed the split/delta-dv-delete-correctness branch from e705eb8 to 1ba990c Compare June 1, 2026 14:50
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Run Gluten Clickhouse CI on x86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core DATA_LAKE VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant