Delta pipeline fix tests by felipepessoto · Pull Request #12386 · apache/gluten

felipepessoto · 2026-06-27T09:10:40Z

What changes are proposed in this pull request?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

…es baseline Run delta-io/delta's `spark` ScalaTest suite against a Gluten Velox bundle in CI and gate the results against a committed baseline so the many expected Delta-on- Gluten failures stay manageable and can be fixed incrementally without letting currently-passing tests silently regress. What it adds (.github/workflows/util/delta-spark-ut/): - delta_spark_ut.yml: builds the native lib + Gluten bundle, then runs the Delta spark suite sharded by suite into 4 shards x 4 forked test JVMs (~16-way), and gates each shard against the baseline. - compare-test-results.py: the gate. Per shard, regressions (failed not in the baseline) fail the build; newly-passing baselined tests are flagged so the baseline can be tightened. Also supports seed/aggregate modes. - known-failures.txt: the committed baseline of expected failures. - setup-delta.sh: clones Delta, injects the Gluten bundle, patches DeltaSQLCommandTest, and force-fails the two DeletionVectorsSuite 2B-row tests whose native row-index materialization OOM-kills the runner and hangs the shard. - README.md: how the pipeline, gating and baseline-refresh work. The workflow also carries a hang watchdog that thread-dumps and kills a wedged fork, and tunes the per-fork heap (2G) and off-heap (2G) to fit the ~16G runner. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

…line Delta's data-skipping, limit-push-down, column-pruning and scan-metric tests collect file-source scans by matching the concrete `FileSourceScanExec` case class. Under the Gluten Velox bundle the scan is offloaded to DeltaScanTransformer, a sibling that implements the same `FileSourceScanLike` interface but is not FileSourceScanExec, so the match misses and the scan looks absent. This surfaced as `scala.MatchError: List()` (~56 DataSkipping*/DeltaLimitPushDown* tests), empty generated-column partition filters (~45 OptimizeGeneratedColumnSuite tests) and broken column-pruning / scan-metric checks across the Delete, Update, Merge, DeletionVectors and RowId suites and the TestsStatistics helper. Gluten copies `partitionFilters` and the other accessors these tests read verbatim onto the offloaded scan, so results are identical to vanilla -- only the test's `case` match breaks. Fix it by cherry-picking the two merged upstream Delta commits that widen these matches to the shared `FileSourceScanLike` interface (behavior-preserving for vanilla, which also implements it): * delta-io/delta#7104 -- ScanReportHelper.collectScans * delta-io/delta#7105 -- the remaining 9 test sources, its follow-up Both are merged on Delta master but land after the ref this workflow builds against (v4.2.0), so setup-delta.sh cherry-picks them onto the shallow checkout. Each fetches the fix commit at depth 2 (commit + parent) so cherry-pick can compute the parent->fix diff, and uses `cherry-pick -n` so no committer identity is required. Once the pinned DELTA_REF advances to include a commit its cherry-pick becomes a clean no-op and that block can be removed. The cherry-picks run before the DeletionVectorsSuite 2B-row force-fail step: that step sed-injects fail() into DeletionVectorsSuite.scala, which delta-io/delta#7105 also edits, and git cherry-pick refuses to apply onto a working tree with uncommitted changes to a file it touches (exit 128). Refresh known-failures.txt from run 28299900971 (the delta-spark-aggregate job output), which ran all 19073 tests across 16 shards: removes 187 now-passing tests with 0 regressions, 963 -> 776. ~147 come from the fixes above (DataSkipping*, DeltaLimitPushDown*, OptimizeGeneratedColumnSuite, MergeInto*, RowIdSuite); the remaining ~40 are other suites that now pass (e.g. HiveConvertToDeltaSuite, BitmapAggregatorE2ESuite). Verified against the per-shard ran/failed lists: every baseline entry was observed this run (0 stale), so nothing was dropped due to a crashed or incomplete shard. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Make delta_spark_ut.yml a reusable workflow (on: workflow_call) and call it from velox_backend_x86.yml so the Delta tests reuse the native lib + arrow jars that workflow already builds, instead of duplicating the build-native-lib-centos-7 job. GitHub artifacts cannot be shared across workflows, so the only way to reuse the artifact is to run the Delta jobs in the same workflow run. delta_spark_ut.yml keeps a workflow_dispatch trigger for standalone manual runs (its build-native-lib-centos-7 job is gated to that case and skipped when called); the pull_request trigger is removed so the suite no longer double-runs. velox_backend_x86.yml gains an arrow-jars upload on its native build and a delta-spark-ut job that calls the reusable workflow. That job runs on every velox trigger like the other spark-test jobs, since core/velox/substrait/cpp changes can affect Delta query offload. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Address PR review feedback: - setup-delta.sh: replace the shallow-clone + full-clone fallback (which ran a destructive `rm -rf "$DELTA_DIR"`) with a single `git init` + shallow `fetch --depth 1 origin "$DELTA_REF"` + `checkout FETCH_HEAD`. This resolves a tag, branch, or commit SHA uniformly (`git clone --branch` rejects SHAs), drops the dead fallback branch, and removes the unguarded recursive delete. - compare-test-results.py: in enforce mode, a missing/typoed --known-failures path made load_entries() return an empty set, silently degrading to seed mode and passing the gate without enforcing regressions. Treat a missing baseline file as a configuration error (exit 2); an existing-but-empty file is still allowed and legitimately seeds. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Address PR review feedback with four robustness fixes: - compare-test-results.py (enforce/seed): raise NoReportsError and exit 2 when no JUnit <testsuite> elements are parsed, instead of warning and returning empty sets. Otherwise a misconfiguration (wrong reports dir, broken reporter, suites crashing before writing XML) yields zero failures -> zero regressions -> a silent green gate. - compare-test-results.py (aggregate): exit 2 before writing baseline-out when no per-shard failures-*.txt / ran-*.txt inputs are found. The gate-list download is continue-on-error and aggregate runs with if: always(), so missing artifacts would otherwise produce an empty baseline that could be committed, wiping known-failures.txt. - setup-delta.sh: pass the Delta ref after `--` in git fetch so a ref starting with `-` can't be misread as a git option (the script is workflow_dispatch- runnable with a user-supplied ref). - velox_backend_x86.yml: drop secrets: inherit from the reusable Delta UT call. delta_spark_ut.yml references no secrets, so inheriting them needlessly forwards all caller secrets to a workflow that clones and runs external code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Some Delta-on-Gluten MERGE tests that write deletion vectors fail non-deterministically: the same bundle passes them on one CI run and fails them on the next (native RoaringBitmapArray addSafe aborting on an invalid Long.MAX_VALUE row index). Such tests cannot live in known-failures.txt -- baselining them reds the gate on every run where they pass, and leaving them out reds it on every run where they fail. Add a flaky-tests.txt quarantine list read by the gate. A quarantined test is neutral: it never counts as a regression when it fails nor as now-passing when it passes, and is excluded from the regenerated baseline (aggregate mode). The suite portion of each entry is an fnmatch glob so one line covers a root-cause family across generated suite variants (e.g. *DVs*Suite); the test name is matched exactly. Seed the list with the DV-merge family behind the native row-index bug. This is an interim measure -- entries should be removed once that bug is fixed in the native backend so the tests are enforced again. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Velox has no Arrow representation for VariantType, so the native columnar write path -- which converts the incoming rows to Velox batches via RowToVeloxColumnarExec.toArrowSchema -- throws `UnsupportedOperationException: Unsupported data type: variant` at runtime. This broke every Delta write whose schema contains a variant column (INSERT, UPDATE, MERGE, OPTIMIZE/auto-compact, checkpoint-driven rewrites), since GlutenOptimisticTransaction.writeFiles always offloaded the write to the native writer (the now-removed code path built the Velox plan unconditionally). Guard GlutenOptimisticTransaction.writeFiles: if the input schema contains a variant at any nesting level, delegate to super.writeFiles (the vanilla Delta write path) instead of offloading. Non-variant writes are unaffected. The check matches by type name so it stays source-compatible across Spark versions. Adds GlutenDeltaVariantWriteSuite covering top-level, struct-nested, and UPDATE variant writes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions Bot added VELOX INFRA DOCS labels Jun 27, 2026

felipepessoto force-pushed the delta_pipeline_fix_tests branch from d39550f to 95ce39c Compare June 27, 2026 09:37

github-actions Bot removed the VELOX label Jun 27, 2026

felipepessoto force-pushed the delta_pipeline_fix_tests branch 4 times, most recently from b1fe046 to 2e09921 Compare July 3, 2026 22:09

github-actions Bot added the VELOX label Jul 3, 2026

felipepessoto and others added 7 commits July 3, 2026 22:10

Change order of steps

eb052c4

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

felipepessoto force-pushed the delta_pipeline_fix_tests branch 2 times, most recently from b2b5176 to 5fe2f12 Compare July 4, 2026 00:44

felipepessoto force-pushed the delta_pipeline_fix_tests branch from 5fe2f12 to 0381143 Compare July 4, 2026 00:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Delta pipeline fix tests#12386

Delta pipeline fix tests#12386
felipepessoto wants to merge 8 commits into
apache:mainfrom
felipepessoto:delta_pipeline_fix_tests

felipepessoto commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

felipepessoto commented Jun 27, 2026

What changes are proposed in this pull request?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant