Goal
Track Delta-only work to improve native Gluten/Velox support for Delta writer and table optimization paths.
This issue is organized by feature-sized work areas. Each top-level task should map to one reviewable PR or a small stack of tightly related PRs.
Related Work
Feature Tracks
OPTIMIZE Compaction
Native support for Delta OPTIMIZE compaction/bin-packing command paths.
Scope:
offload OPTIMIZE command transactions through GlutenOptimisticTransaction
cover path-based and table-name OPTIMIZE forms
cover OPTIMIZE ... WHERE partition predicates
keep OPTIMIZE read/shuffle/write native where supported
validate returned OPTIMIZE metrics and file statistics
benchmark compaction on small-file Delta tables
Related PR:
Expected coverage:
path-based OPTIMIZE
table-name OPTIMIZE
OPTIMIZE ... WHERE partition_predicate
native-write-disabled fallback
data correctness before and after compaction
Delta log add/remove-file metadata correctness
Optimized Write
Native support and correctness hardening for Delta optimized write paths.
Scope:
verify native behavior when delta.autoOptimize.optimizeWrite is enabled
verify native behavior when spark.databricks.delta.optimizeWrite.enabled is enabled
verify DataFrameWriter option optimizeWrite behavior
cover non-partitioned optimized writes
cover partitioned optimized writes
validate output file sizing and partition layout metadata
reduce unnecessary columnar-to-row transitions in write, stats, and commit paths
Related PRs:
Expected coverage:
non-partitioned append and overwrite
partitioned append and overwrite
optimized-write table property, SQL conf, and writer option
partition values in add-file metadata
min/max/nullCount stats in add-file metadata
native and fallback plan assertions
OPTIMIZE ZORDER
Native support for Delta ZORDER layout operations.
Scope:
add native support for Delta ZORDER expressions such as InterleaveBits
add native support for RangePartitionId
keep ZORDER read/shuffle/sort/write native where supported
improve fallback diagnostics when ZORDER cannot stay native
validate ZORDER output correctness and Delta log metadata
benchmark ZORDER on larger Delta layout workloads
Expected coverage:
OPTIMIZE ... ZORDER BY (...)
OPTIMIZE ... WHERE ... ZORDER BY (...)
single-column and multi-column ZORDER
native expression coverage for ZORDER planning
data correctness and Delta log metadata after ZORDER
Data Skipping Stats
Correctness coverage for Delta data-skipping metadata generated by native write and optimization paths.
Scope:
verify native Delta writes preserve min/max/nullCount stats
verify stats behavior with partition columns
verify stats behavior with delta.dataSkippingNumIndexedCols
verify stats remain usable after native writes, optimized writes, and OPTIMIZE
Expected coverage:
stats in Delta add-file JSON
partitioned and non-partitioned tables
columns inside and outside the indexed stats range
queries that rely on data skipping after native writes
Auto Compaction
Native behavior and correctness coverage for Delta auto compaction after successful writes.
Scope:
investigate whether post-commit auto compaction runs through native write paths
cover table property delta.autoOptimize.autoCompact
cover session config spark.databricks.delta.autoCompact.enabled
validate partition selection, file stats, and commit metadata after auto compaction
Expected coverage:
auto compaction on non-partitioned tables
auto compaction on partitioned tables
minimum-file threshold behavior
native/fallback diagnostics for post-commit compaction work
Delta Checkpoints And Log Compaction
Evaluate whether there is meaningful Gluten/Velox execution work in Delta checkpoint and log compaction paths.
Scope:
evaluate Delta multi-part checkpoint write paths
evaluate Delta log compaction paths
only open implementation PRs if there is execution work beyond Delta log metadata handling
Expected coverage:
clear investigation result
follow-up issue or PR only if native execution can add value
Performance And Diagnostics
Benchmark and explain remaining overhead after native execution improvements.
Scope:
profile stage time for native execution versus Delta planning/log/listing/commit overhead
benchmark non-partitioned Delta writes
benchmark partitioned Delta writes
benchmark Delta optimized writes
benchmark Delta OPTIMIZE compaction
benchmark Delta OPTIMIZE ZORDER after native ZORDER expression support lands
use larger Delta datasets where write volume dominates fixed planning and commit overhead
Expected coverage:
before/after numbers for each feature track
stage-level breakdown when native speedup is hidden by fixed overhead
clear fallback diagnostics for unsupported pieces
Boundaries
Keep each PR reviewable and focused
Prefer correctness tests before benchmark-only changes
Split command offload, native expression support, metadata correctness, and benchmark work into separate patches where practical
Goal
Track Delta-only work to improve native Gluten/Velox support for Delta writer and table optimization paths.
This issue is organized by feature-sized work areas. Each top-level task should map to one reviewable PR or a small stack of tightly related PRs.
Related Work
Feature Tracks
OPTIMIZE Compaction
Native support for Delta
OPTIMIZEcompaction/bin-packing command paths.Scope:
OPTIMIZEcommand transactions through GlutenOptimisticTransactionOPTIMIZEformsOPTIMIZE ... WHEREpartition predicatesRelated PR:
Expected coverage:
OPTIMIZEOPTIMIZEOPTIMIZE ... WHERE partition_predicateOptimized Write
Native support and correctness hardening for Delta optimized write paths.
Scope:
delta.autoOptimize.optimizeWriteis enabledspark.databricks.delta.optimizeWrite.enabledis enabledoptimizeWritebehaviorRelated PRs:
Expected coverage:
OPTIMIZE ZORDER
Native support for Delta ZORDER layout operations.
Scope:
InterleaveBitsRangePartitionIdExpected coverage:
OPTIMIZE ... ZORDER BY (...)OPTIMIZE ... WHERE ... ZORDER BY (...)Data Skipping Stats
Correctness coverage for Delta data-skipping metadata generated by native write and optimization paths.
Scope:
delta.dataSkippingNumIndexedColsExpected coverage:
Auto Compaction
Native behavior and correctness coverage for Delta auto compaction after successful writes.
Scope:
delta.autoOptimize.autoCompactspark.databricks.delta.autoCompact.enabledExpected coverage:
Delta Checkpoints And Log Compaction
Evaluate whether there is meaningful Gluten/Velox execution work in Delta checkpoint and log compaction paths.
Scope:
Expected coverage:
Performance And Diagnostics
Benchmark and explain remaining overhead after native execution improvements.
Scope:
Expected coverage:
Boundaries