Delta pipeline fix tests#12386
Draft
felipepessoto wants to merge 8 commits into
Draft
Conversation
d39550f to
95ce39c
Compare
b1fe046 to
2e09921
Compare
…es baseline Run delta-io/delta's `spark` ScalaTest suite against a Gluten Velox bundle in CI and gate the results against a committed baseline so the many expected Delta-on- Gluten failures stay manageable and can be fixed incrementally without letting currently-passing tests silently regress. What it adds (.github/workflows/util/delta-spark-ut/): - delta_spark_ut.yml: builds the native lib + Gluten bundle, then runs the Delta spark suite sharded by suite into 4 shards x 4 forked test JVMs (~16-way), and gates each shard against the baseline. - compare-test-results.py: the gate. Per shard, regressions (failed not in the baseline) fail the build; newly-passing baselined tests are flagged so the baseline can be tightened. Also supports seed/aggregate modes. - known-failures.txt: the committed baseline of expected failures. - setup-delta.sh: clones Delta, injects the Gluten bundle, patches DeltaSQLCommandTest, and force-fails the two DeletionVectorsSuite 2B-row tests whose native row-index materialization OOM-kills the runner and hangs the shard. - README.md: how the pipeline, gating and baseline-refresh work. The workflow also carries a hang watchdog that thread-dumps and kills a wedged fork, and tunes the per-fork heap (2G) and off-heap (2G) to fit the ~16G runner. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…line Delta's data-skipping, limit-push-down, column-pruning and scan-metric tests collect file-source scans by matching the concrete `FileSourceScanExec` case class. Under the Gluten Velox bundle the scan is offloaded to DeltaScanTransformer, a sibling that implements the same `FileSourceScanLike` interface but is not FileSourceScanExec, so the match misses and the scan looks absent. This surfaced as `scala.MatchError: List()` (~56 DataSkipping*/DeltaLimitPushDown* tests), empty generated-column partition filters (~45 OptimizeGeneratedColumnSuite tests) and broken column-pruning / scan-metric checks across the Delete, Update, Merge, DeletionVectors and RowId suites and the TestsStatistics helper. Gluten copies `partitionFilters` and the other accessors these tests read verbatim onto the offloaded scan, so results are identical to vanilla -- only the test's `case` match breaks. Fix it by cherry-picking the two merged upstream Delta commits that widen these matches to the shared `FileSourceScanLike` interface (behavior-preserving for vanilla, which also implements it): * delta-io/delta#7104 -- ScanReportHelper.collectScans * delta-io/delta#7105 -- the remaining 9 test sources, its follow-up Both are merged on Delta master but land after the ref this workflow builds against (v4.2.0), so setup-delta.sh cherry-picks them onto the shallow checkout. Each fetches the fix commit at depth 2 (commit + parent) so cherry-pick can compute the parent->fix diff, and uses `cherry-pick -n` so no committer identity is required. Once the pinned DELTA_REF advances to include a commit its cherry-pick becomes a clean no-op and that block can be removed. The cherry-picks run before the DeletionVectorsSuite 2B-row force-fail step: that step sed-injects fail() into DeletionVectorsSuite.scala, which delta-io/delta#7105 also edits, and git cherry-pick refuses to apply onto a working tree with uncommitted changes to a file it touches (exit 128). Refresh known-failures.txt from run 28299900971 (the delta-spark-aggregate job output), which ran all 19073 tests across 16 shards: removes 187 now-passing tests with 0 regressions, 963 -> 776. ~147 come from the fixes above (DataSkipping*, DeltaLimitPushDown*, OptimizeGeneratedColumnSuite, MergeInto*, RowIdSuite); the remaining ~40 are other suites that now pass (e.g. HiveConvertToDeltaSuite, BitmapAggregatorE2ESuite). Verified against the per-shard ran/failed lists: every baseline entry was observed this run (0 stale), so nothing was dropped due to a crashed or incomplete shard. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make delta_spark_ut.yml a reusable workflow (on: workflow_call) and call it from velox_backend_x86.yml so the Delta tests reuse the native lib + arrow jars that workflow already builds, instead of duplicating the build-native-lib-centos-7 job. GitHub artifacts cannot be shared across workflows, so the only way to reuse the artifact is to run the Delta jobs in the same workflow run. delta_spark_ut.yml keeps a workflow_dispatch trigger for standalone manual runs (its build-native-lib-centos-7 job is gated to that case and skipped when called); the pull_request trigger is removed so the suite no longer double-runs. velox_backend_x86.yml gains an arrow-jars upload on its native build and a delta-spark-ut job that calls the reusable workflow. That job runs on every velox trigger like the other spark-test jobs, since core/velox/substrait/cpp changes can affect Delta query offload. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address PR review feedback: - setup-delta.sh: replace the shallow-clone + full-clone fallback (which ran a destructive `rm -rf "$DELTA_DIR"`) with a single `git init` + shallow `fetch --depth 1 origin "$DELTA_REF"` + `checkout FETCH_HEAD`. This resolves a tag, branch, or commit SHA uniformly (`git clone --branch` rejects SHAs), drops the dead fallback branch, and removes the unguarded recursive delete. - compare-test-results.py: in enforce mode, a missing/typoed --known-failures path made load_entries() return an empty set, silently degrading to seed mode and passing the gate without enforcing regressions. Treat a missing baseline file as a configuration error (exit 2); an existing-but-empty file is still allowed and legitimately seeds. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address PR review feedback with four robustness fixes: - compare-test-results.py (enforce/seed): raise NoReportsError and exit 2 when no JUnit <testsuite> elements are parsed, instead of warning and returning empty sets. Otherwise a misconfiguration (wrong reports dir, broken reporter, suites crashing before writing XML) yields zero failures -> zero regressions -> a silent green gate. - compare-test-results.py (aggregate): exit 2 before writing baseline-out when no per-shard failures-*.txt / ran-*.txt inputs are found. The gate-list download is continue-on-error and aggregate runs with if: always(), so missing artifacts would otherwise produce an empty baseline that could be committed, wiping known-failures.txt. - setup-delta.sh: pass the Delta ref after `--` in git fetch so a ref starting with `-` can't be misread as a git option (the script is workflow_dispatch- runnable with a user-supplied ref). - velox_backend_x86.yml: drop secrets: inherit from the reusable Delta UT call. delta_spark_ut.yml references no secrets, so inheriting them needlessly forwards all caller secrets to a workflow that clones and runs external code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Some Delta-on-Gluten MERGE tests that write deletion vectors fail non-deterministically: the same bundle passes them on one CI run and fails them on the next (native RoaringBitmapArray addSafe aborting on an invalid Long.MAX_VALUE row index). Such tests cannot live in known-failures.txt -- baselining them reds the gate on every run where they pass, and leaving them out reds it on every run where they fail. Add a flaky-tests.txt quarantine list read by the gate. A quarantined test is neutral: it never counts as a regression when it fails nor as now-passing when it passes, and is excluded from the regenerated baseline (aggregate mode). The suite portion of each entry is an fnmatch glob so one line covers a root-cause family across generated suite variants (e.g. *DVs*Suite); the test name is matched exactly. Seed the list with the DV-merge family behind the native row-index bug. This is an interim measure -- entries should be removed once that bug is fixed in the native backend so the tests are enforced again. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
b2b5176 to
5fe2f12
Compare
Velox has no Arrow representation for VariantType, so the native columnar write path -- which converts the incoming rows to Velox batches via RowToVeloxColumnarExec.toArrowSchema -- throws `UnsupportedOperationException: Unsupported data type: variant` at runtime. This broke every Delta write whose schema contains a variant column (INSERT, UPDATE, MERGE, OPTIMIZE/auto-compact, checkpoint-driven rewrites), since GlutenOptimisticTransaction.writeFiles always offloaded the write to the native writer (the now-removed code path built the Velox plan unconditionally). Guard GlutenOptimisticTransaction.writeFiles: if the input schema contains a variant at any nesting level, delegate to super.writeFiles (the vanilla Delta write path) instead of offloading. Non-variant writes are unaffected. The check matches by type name so it stays source-compatible across Spark versions. Adds GlutenDeltaVariantWriteSuite covering top-level, struct-nested, and UPDATE variant writes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
5fe2f12 to
0381143
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes are proposed in this pull request?
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?