Skip to content

Migrate to pyspark testing utils#1107

Open
ghanse wants to merge 29 commits intomainfrom
ghanse/issue-1105-migrate-pyspark-testing
Open

Migrate to pyspark testing utils#1107
ghanse wants to merge 29 commits intomainfrom
ghanse/issue-1105-migrate-pyspark-testing

Conversation

@ghanse
Copy link
Copy Markdown
Collaborator

@ghanse ghanse commented Apr 8, 2026

Changes

  • Replace chispa.assert_df_equality with pyspark.testing.utils.assertDataFrameEqual across all test files
  • Remove chispa from pyproject.toml test dependencies and uv.lock
  • Update the centralized assert_df_equality_ignore_fingerprints wrapper in tests/integration/conftest.py to translate chispa-style kwargs (ignore_nullable, ignore_column_order, ignore_row_order) to PySpark equivalents (checkRowOrder)
  • Handle chispa-specific transforms parameter in e2e PII notebook by applying transforms before assertion
  • Remove chispa==0.10.1 %pip install from e2e Databricks notebooks

Linked issues

Closes #1105

Tests

  • manually tested
  • added unit tests
  • added integration tests
  • added end-to-end tests
  • added performance tests

@ghanse ghanse requested a review from a team as a code owner April 8, 2026 16:37
@ghanse ghanse removed the request for review from a team April 8, 2026 16:37
@ghanse ghanse added the needs-review Ready for re-review label Apr 8, 2026
@ghanse ghanse requested a review from gergo-databricks April 8, 2026 16:37
@ghanse ghanse added needs-changes Changes required after review and removed needs-review Ready for re-review labels Apr 8, 2026
Replace chispa references with pyspark.testing.utils.

Part of #1105

Co-authored-by: Isaac
@ghanse ghanse added needs-review Ready for re-review needs-cleanup Review passed, minor cleanup before merge and removed needs-changes Changes required after review needs-review Ready for re-review labels Apr 8, 2026
@mwojtyczka mwojtyczka added Approved to Merge When PR is reviewed and approved. To be merged once all tests pass and removed under-review This PR is currently being reviewed by one of DQX maintainers. needs-approval Reviewed, and awaiting maintainer approval labels Apr 15, 2026
ghanse and others added 2 commits April 20, 2026 08:44
…atible with older Spark versions (#995)

<!-- Summary of your changes that are easy to understand. Add
screenshots when necessary -->
* Improved Agent guidelines
* Make `has_valid_schema` check compatible with older spark versions (<
4)
* Fix issue with subqueries in sql `expression` check in Serverless v5
when check name is not provided (auto-derived)
* Added guidelines on how to configure DQX with LDP/DLT (Lakeflow
Declarative Pipelines) to enable incrementalization for Materialized
Views (MVs)
* Improved validation of required check function arguments
* Fix CI issues

<!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes,
fixed, resolve, resolves, resolved. See
https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword
-->

Resolves: #1053

<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->

- [x] manually tested
- [x] added unit tests
- [x] added integration tests
- [ ] added end-to-end tests
- [ ] added performance tests

---------

Co-authored-by: Marcin Wojtyczka <marcin.wojtyczka@databricks.com>
Co-authored-by: mwojtyczka <mwojtyczka@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 20, 2026

✅ 20/20 passed, 1 skipped, 1h32m48s total

Running from acceptance #4406

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 20, 2026

✅ 176/176 passed, 1 flaky, 9h21m27s total

Flaky tests:

  • 🤪 test_ensemble_scoring_distributed_path (1m1.084s)

Running from anomaly #519

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 55.62%. Comparing base (aa31ae6) to head (9da1356).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1107   +/-   ##
=======================================
  Coverage   55.62%   55.62%           
=======================================
  Files         101      101           
  Lines        9418     9418           
=======================================
  Hits         5239     5239           
  Misses       4179     4179           
Flag Coverage Δ
unit 55.62% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Approved to Merge When PR is reviewed and approved. To be merged once all tests pass

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Migrate to pyspark testing utils

2 participants