Migrate to pyspark testing utils by ghanse · Pull Request #1107 · databrickslabs/dqx

ghanse · 2026-04-08T16:37:53Z

Changes

Replace chispa.assert_df_equality with pyspark.testing.utils.assertDataFrameEqual across all test files
Remove chispa from pyproject.toml test dependencies and uv.lock
Update the centralized assert_df_equality_ignore_fingerprints wrapper in tests/integration/conftest.py to translate chispa-style kwargs (ignore_nullable, ignore_column_order, ignore_row_order) to PySpark equivalents (checkRowOrder)
Handle chispa-specific transforms parameter in e2e PII notebook by applying transforms before assertion
Remove chispa==0.10.1 %pip install from e2e Databricks notebooks

Linked issues

Tests

… job

Replace assert_df_equality with assertDataFrameEqual from pyspark.testing.utils. Remove chispa dependency. Closes #1105 Co-authored-by: Isaac

Replace chispa references with pyspark.testing.utils. Part of #1105 Co-authored-by: Isaac

…atible with older Spark versions (#995)  * Improved Agent guidelines * Make `has_valid_schema` check compatible with older spark versions (< 4) * Fix issue with subqueries in sql `expression` check in Serverless v5 when check name is not provided (auto-derived) * Added guidelines on how to configure DQX with LDP/DLT (Lakeflow Declarative Pipelines) to enable incrementalization for Materialized Views (MVs) * Improved validation of required check function arguments * Fix CI issues  Resolves: #1053  - [x] manually tested - [x] added unit tests - [x] added integration tests - [ ] added end-to-end tests - [ ] added performance tests --------- Co-authored-by: Marcin Wojtyczka <marcin.wojtyczka@databricks.com> Co-authored-by: mwojtyczka <mwojtyczka@users.noreply.github.com>

github-actions · 2026-04-20T07:00:16Z

✅ 20/20 passed, 1 skipped, 1h32m48s total

_{Running from acceptance #4406}

github-actions · 2026-04-20T07:24:44Z

✅ 176/176 passed, 1 flaky, 9h21m27s total

Flaky tests:

🤪 test_ensemble_scoring_distributed_path (1m1.084s)

_{Running from anomaly #519}

codecov · 2026-04-20T08:48:17Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 55.62%. Comparing base (aa31ae6) to head (9da1356).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1107   +/-   ##
=======================================
  Coverage   55.62%   55.62%           
=======================================
  Files         101      101           
  Lines        9418     9418           
=======================================
  Hits         5239     5239           
  Misses       4179     4179

Flag	Coverage Δ
unit	`55.62% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

mwojtyczka and others added 19 commits March 27, 2026 14:10

tighten up deps

e9d33ff

updated blueprint

424a883

updated comments

f9a5649

updated hatchling version

b172546

refactored workflows to avoid azure login, move perf tests to nightly…

29bc0a3

… job

updated nightly

319c0c2

Merge branch 'main' into refactor_workflows

90291bd

pin databricks cli installation to a sha

26dbe28

improved bug template

9dba8e9

disabled release gh workflow

b280285

Add committed python version file

cd787fe

Migrate project from hatch to uv toolchain

2c14a56

Update actions to use uv with limited permissions

83b94ba

Pin yq

873d5a7

Update lock file

f96f575

Add dbt dependencies to test group

db4c4a6

Add build constraints file

2a599c3

Fix github actions

aca5e2f

feat: migrate from chispa to pyspark testing utils

e92ae0e

Replace assert_df_equality with assertDataFrameEqual from pyspark.testing.utils. Remove chispa dependency. Closes #1105 Co-authored-by: Isaac

ghanse requested a review from a team as a code owner April 8, 2026 16:37

ghanse removed the request for review from a team April 8, 2026 16:37

ghanse added the needs-review Ready for re-review label Apr 8, 2026

ghanse requested a review from gergo-databricks April 8, 2026 16:37

ghanse added needs-changes Changes required after review and removed needs-review Ready for re-review labels Apr 8, 2026

docs: update testing docs to use pyspark assertDataFrameEqual

e5759e8

Replace chispa references with pyspark.testing.utils. Part of #1105 Co-authored-by: Isaac

ghanse added needs-review Ready for re-review needs-cleanup Review passed, minor cleanup before merge and removed needs-changes Changes required after review needs-review Ready for re-review labels Apr 8, 2026

mwojtyczka added Approved to Merge When PR is reviewed and approved. To be merged once all tests pass and removed under-review This PR is currently being reviewed by one of DQX maintainers. needs-approval Reviewed, and awaiting maintainer approval labels Apr 15, 2026

ghanse mentioned this pull request Apr 16, 2026

Remove unused chispa test dependency databrickslabs/dbldatagen#408

Merged

4 tasks

ghanse and others added 2 commits April 20, 2026 08:44

Merge branch 'main' into ghanse/issue-1105-migrate-pyspark-testing

f32fc0a

fmt

5ab27f7

mwojtyczka temporarily deployed to tool April 20, 2026 07:47 — with GitHub Actions Inactive

mwojtyczka temporarily deployed to tool April 20, 2026 08:48 — with GitHub Actions Inactive

mwojtyczka had a problem deploying to tool April 20, 2026 08:48 — with GitHub Actions Failure

mwojtyczka temporarily deployed to tool April 20, 2026 08:48 — with GitHub Actions Inactive

Merge branch 'main' into ghanse/issue-1105-migrate-pyspark-testing

1d837fe

mwojtyczka temporarily deployed to tool April 21, 2026 11:46 — with GitHub Actions Inactive

mwojtyczka had a problem deploying to tool April 21, 2026 11:46 — with GitHub Actions Failure

mwojtyczka temporarily deployed to tool April 21, 2026 11:46 — with GitHub Actions Inactive

mwojtyczka temporarily deployed to tool April 21, 2026 11:48 — with GitHub Actions Inactive

mwojtyczka had a problem deploying to tool April 21, 2026 11:48 — with GitHub Actions Failure

fmt

9da1356

mwojtyczka requested a deployment to tool April 21, 2026 12:51 — with GitHub Actions In progress

mwojtyczka deployed to tool April 21, 2026 12:51 — with GitHub Actions Active

mwojtyczka requested a deployment to tool April 21, 2026 12:52 — with GitHub Actions In progress

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate to pyspark testing utils#1107

Migrate to pyspark testing utils#1107
ghanse wants to merge 29 commits intomainfrom
ghanse/issue-1105-migrate-pyspark-testing

ghanse commented Apr 8, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

codecov bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ghanse commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Linked issues

Tests

Uh oh!

github-actions bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ghanse commented Apr 8, 2026 •

edited

Loading

github-actions bot commented Apr 20, 2026 •

edited

Loading

github-actions bot commented Apr 20, 2026 •

edited

Loading

codecov bot commented Apr 20, 2026 •

edited

Loading