Skip to content

Delta pipeline fix tests#12386

Draft
felipepessoto wants to merge 8 commits into
apache:mainfrom
felipepessoto:delta_pipeline_fix_tests
Draft

Delta pipeline fix tests#12386
felipepessoto wants to merge 8 commits into
apache:mainfrom
felipepessoto:delta_pipeline_fix_tests

Conversation

@felipepessoto

Copy link
Copy Markdown
Contributor

What changes are proposed in this pull request?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

@felipepessoto felipepessoto force-pushed the delta_pipeline_fix_tests branch from d39550f to 95ce39c Compare June 27, 2026 09:37
@github-actions github-actions Bot removed the VELOX label Jun 27, 2026
@felipepessoto felipepessoto force-pushed the delta_pipeline_fix_tests branch 4 times, most recently from b1fe046 to 2e09921 Compare July 3, 2026 22:09
@github-actions github-actions Bot added the VELOX label Jul 3, 2026
felipepessoto and others added 7 commits July 3, 2026 22:10
…es baseline

Run delta-io/delta's `spark` ScalaTest suite against a Gluten Velox bundle in CI
and gate the results against a committed baseline so the many expected Delta-on-
Gluten failures stay manageable and can be fixed incrementally without letting
currently-passing tests silently regress.

What it adds (.github/workflows/util/delta-spark-ut/):
- delta_spark_ut.yml: builds the native lib + Gluten bundle, then runs the Delta
  spark suite sharded by suite into 4 shards x 4 forked test JVMs (~16-way), and
  gates each shard against the baseline.
- compare-test-results.py: the gate. Per shard, regressions (failed not in the
  baseline) fail the build; newly-passing baselined tests are flagged so the
  baseline can be tightened. Also supports seed/aggregate modes.
- known-failures.txt: the committed baseline of expected failures.
- setup-delta.sh: clones Delta, injects the Gluten bundle, patches
  DeltaSQLCommandTest, and force-fails the two DeletionVectorsSuite 2B-row tests
  whose native row-index materialization OOM-kills the runner and hangs the shard.
- README.md: how the pipeline, gating and baseline-refresh work.

The workflow also carries a hang watchdog that thread-dumps and kills a wedged
fork, and tunes the per-fork heap (2G) and off-heap (2G) to fit the ~16G runner.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…line

Delta's data-skipping, limit-push-down, column-pruning and scan-metric tests
collect file-source scans by matching the concrete `FileSourceScanExec` case
class. Under the Gluten Velox bundle the scan is offloaded to
DeltaScanTransformer, a sibling that implements the same `FileSourceScanLike`
interface but is not FileSourceScanExec, so the match misses and the scan
looks absent. This surfaced as `scala.MatchError: List()` (~56
DataSkipping*/DeltaLimitPushDown* tests), empty generated-column partition
filters (~45 OptimizeGeneratedColumnSuite tests) and broken column-pruning /
scan-metric checks across the Delete, Update, Merge, DeletionVectors and
RowId suites and the TestsStatistics helper.

Gluten copies `partitionFilters` and the other accessors these tests read
verbatim onto the offloaded scan, so results are identical to vanilla -- only
the test's `case` match breaks. Fix it by cherry-picking the two merged
upstream Delta commits that widen these matches to the shared
`FileSourceScanLike` interface (behavior-preserving for vanilla, which also
implements it):

  * delta-io/delta#7104 -- ScanReportHelper.collectScans
  * delta-io/delta#7105 -- the remaining 9 test sources, its follow-up

Both are merged on Delta master but land after the ref this workflow builds
against (v4.2.0), so setup-delta.sh cherry-picks them onto the shallow
checkout. Each fetches the fix commit at depth 2 (commit + parent) so
cherry-pick can compute the parent->fix diff, and uses `cherry-pick -n` so no
committer identity is required. Once the pinned DELTA_REF advances to include
a commit its cherry-pick becomes a clean no-op and that block can be removed.

The cherry-picks run before the DeletionVectorsSuite 2B-row force-fail step:
that step sed-injects fail() into DeletionVectorsSuite.scala, which
delta-io/delta#7105 also edits, and git cherry-pick refuses to apply onto a
working tree with uncommitted changes to a file it touches (exit 128).

Refresh known-failures.txt from run 28299900971 (the delta-spark-aggregate job
output), which ran all 19073 tests across 16 shards: removes 187 now-passing
tests with 0 regressions, 963 -> 776. ~147 come from the fixes above
(DataSkipping*, DeltaLimitPushDown*, OptimizeGeneratedColumnSuite, MergeInto*,
RowIdSuite); the remaining ~40 are other suites that now pass (e.g.
HiveConvertToDeltaSuite, BitmapAggregatorE2ESuite). Verified against the
per-shard ran/failed lists: every baseline entry was observed this run (0
stale), so nothing was dropped due to a crashed or incomplete shard.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make delta_spark_ut.yml a reusable workflow (on: workflow_call) and call it from
velox_backend_x86.yml so the Delta tests reuse the native lib + arrow jars that
workflow already builds, instead of duplicating the build-native-lib-centos-7
job. GitHub artifacts cannot be shared across workflows, so the only way to
reuse the artifact is to run the Delta jobs in the same workflow run.

delta_spark_ut.yml keeps a workflow_dispatch trigger for standalone manual runs
(its build-native-lib-centos-7 job is gated to that case and skipped when
called); the pull_request trigger is removed so the suite no longer double-runs.
velox_backend_x86.yml gains an arrow-jars upload on its native build and a
delta-spark-ut job that calls the reusable workflow. That job runs on every
velox trigger like the other spark-test jobs, since core/velox/substrait/cpp
changes can affect Delta query offload.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address PR review feedback:

- setup-delta.sh: replace the shallow-clone + full-clone fallback (which ran a
  destructive `rm -rf "$DELTA_DIR"`) with a single `git init` + shallow
  `fetch --depth 1 origin "$DELTA_REF"` + `checkout FETCH_HEAD`. This resolves a
  tag, branch, or commit SHA uniformly (`git clone --branch` rejects SHAs),
  drops the dead fallback branch, and removes the unguarded recursive delete.

- compare-test-results.py: in enforce mode, a missing/typoed --known-failures
  path made load_entries() return an empty set, silently degrading to seed mode
  and passing the gate without enforcing regressions. Treat a missing baseline
  file as a configuration error (exit 2); an existing-but-empty file is still
  allowed and legitimately seeds.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address PR review feedback with four robustness fixes:

- compare-test-results.py (enforce/seed): raise NoReportsError and exit 2 when no
  JUnit <testsuite> elements are parsed, instead of warning and returning empty
  sets. Otherwise a misconfiguration (wrong reports dir, broken reporter, suites
  crashing before writing XML) yields zero failures -> zero regressions -> a
  silent green gate.
- compare-test-results.py (aggregate): exit 2 before writing baseline-out when no
  per-shard failures-*.txt / ran-*.txt inputs are found. The gate-list download is
  continue-on-error and aggregate runs with if: always(), so missing artifacts
  would otherwise produce an empty baseline that could be committed, wiping
  known-failures.txt.
- setup-delta.sh: pass the Delta ref after `--` in git fetch so a ref starting
  with `-` can't be misread as a git option (the script is workflow_dispatch-
  runnable with a user-supplied ref).
- velox_backend_x86.yml: drop secrets: inherit from the reusable Delta UT call.
  delta_spark_ut.yml references no secrets, so inheriting them needlessly forwards
  all caller secrets to a workflow that clones and runs external code.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Some Delta-on-Gluten MERGE tests that write deletion vectors fail
non-deterministically: the same bundle passes them on one CI run and
fails them on the next (native RoaringBitmapArray addSafe aborting on an
invalid Long.MAX_VALUE row index). Such tests cannot live in
known-failures.txt -- baselining them reds the gate on every run where
they pass, and leaving them out reds it on every run where they fail.

Add a flaky-tests.txt quarantine list read by the gate. A quarantined
test is neutral: it never counts as a regression when it fails nor as
now-passing when it passes, and is excluded from the regenerated
baseline (aggregate mode). The suite portion of each entry is an fnmatch
glob so one line covers a root-cause family across generated suite
variants (e.g. *DVs*Suite); the test name is matched exactly.

Seed the list with the DV-merge family behind the native row-index bug.
This is an interim measure -- entries should be removed once that bug is
fixed in the native backend so the tests are enforced again.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@felipepessoto felipepessoto force-pushed the delta_pipeline_fix_tests branch 2 times, most recently from b2b5176 to 5fe2f12 Compare July 4, 2026 00:44
Velox has no Arrow representation for VariantType, so the native columnar write
path -- which converts the incoming rows to Velox batches via
RowToVeloxColumnarExec.toArrowSchema -- throws
`UnsupportedOperationException: Unsupported data type: variant` at runtime. This
broke every Delta write whose schema contains a variant column (INSERT, UPDATE,
MERGE, OPTIMIZE/auto-compact, checkpoint-driven rewrites), since
GlutenOptimisticTransaction.writeFiles always offloaded the write to the native
writer (the now-removed code path built the Velox plan unconditionally).

Guard GlutenOptimisticTransaction.writeFiles: if the input schema contains a
variant at any nesting level, delegate to super.writeFiles (the vanilla Delta
write path) instead of offloading. Non-variant writes are unaffected. The check
matches by type name so it stays source-compatible across Spark versions.

Adds GlutenDeltaVariantWriteSuite covering top-level, struct-nested, and UPDATE
variant writes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@felipepessoto felipepessoto force-pushed the delta_pipeline_fix_tests branch from 5fe2f12 to 0381143 Compare July 4, 2026 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant