fix: restore column-vs-literal comparison in isGreaterThan/isLessThan family (#227) by nikolauspschuetz · Pull Request #273 · awslabs/python-deequ

nikolauspschuetz · 2026-06-26T18:56:49Z

Problem

isGreaterThanOrEqualTo (and the rest of the comparator family) regressed for column-vs-literal comparisons. This used to work:

check.isGreaterThanOrEqualTo("cluster_size", "1", hint="Cluster should have at least one element")

but now fails with the constraint message Input data does not include column 1! — the second operand is interpreted strictly as a column name. Reported in #227.

Root cause

This rode in with the bundled Deequ jar upgrade to 2.0.x (PyDeequ is pinned to com.amazon.deequ:deequ:2.0.8-... via pydeequ/configs.py). In Deequ 1.2.x the comparators built a plain SQL predicate ("<colA> >= <colB>") and passed no columns list, so a literal second operand worked. In Deequ 2.0.x the comparators pass columns = List(columnA, columnB), and Deequ validates that every entry is a real dataframe column — so literals fail. The PyDeequ wrappers themselves never changed; they just forward to the regressed Scala methods.

Fix

Route the whole comparator family (isLessThan, isLessThanOrEqualTo, isGreaterThan, isGreaterThanOrEqualTo) through Deequ's public satisfies(...) with an empty columns list — exactly the pre-2.0 behavior. Column-vs-column comparisons are unchanged; column-vs-literal/expression works again. Applied across the entire family for consistency.

Note: as with Deequ's own satisfies, columnB is treated as a Spark SQL expression, so string literals must be quoted by the caller (e.g. "'foo'"). This matches the original 1.2.x semantics.

Tests

Added to tests/test_checks.py:

test_comparator_against_literal — column-vs-literal for all four comparators, expecting Success.
test_fail_comparator_against_literal — a failing literal comparison, expecting Failure.

Validated against real Spark 3.5 / Deequ 2.0.8: the 2 new tests plus all 8 existing column-vs-column comparator tests pass (10 passed). CI will exercise the full pyspark 3.1/3.2/3.3/3.5 matrix.

Closes #227

github-actions

Generated by AI (model: us.anthropic.claude-opus-4-6-v1, prompt: 416310f3) — may not be fully accurate. Reply if this doesn't help.

nikolauspschuetz · 2026-06-27T17:30:38Z

Ready for review. Restores column-vs-literal comparison across the isGreaterThan*/isLessThan* family — a regression that rode in with the Deequ 2.0.x jar (it now routes through satisfies with an empty columns list, the pre-2.0 behavior). Validated locally on Spark 3.5: full test_checks.py → 88 passed. cc @sudsali @chenliu0831 — would appreciate your review. Closes #227.

…ly (awslabs#227) Deequ 2.0.x's Check.isLessThan/isLessThanOrEqualTo/isGreaterThan/ isGreaterThanOrEqualTo forward columns = List(columnA, columnB) to satisfies, which makes Deequ require both operands to be existing columns. This regressed the long-supported column-vs-literal usage (e.g. isGreaterThanOrEqualTo("cluster_size", "1")), failing with 'Input data does not include column 1!' (issue awslabs#227). Route the comparator family through Deequ's satisfies with an empty columns list (the pre-2.0 behaviour), building the SQL predicate in the wrapper. columnB may now be a column name or a SQL literal/expression. Column-vs-column comparisons are unchanged. Adds regression tests for column-vs-literal comparisons.

github-actions

Generated by AI (model: us.anthropic.claude-opus-4-6-v1, prompt: 416310f3) — may not be fully accurate. Reply if this doesn't help.

nikolauspschuetz · 2026-06-29T17:17:34Z

Thanks for the automated review pass. The current revision addresses these findings:

columnA quoting — the generated predicate backtick-quotes column A (`{columnA}` {operator} {columnB}), so column-A names with spaces/special characters or SQL reserved words stay valid.
columnB left raw (intentional) — restoring the column-vs-literal usage from Regression in behavior of check comparator function isGreaterThanOrEqualTo #227 is the whole point: columnB may be a column name, a SQL literal, or a SQL expression, so quoting is the caller's responsibility — the same contract as Deequ's own satisfies. This is documented in the _column_comparison docstring.
Default assertion — now routed through satisfies$default$3 (_ == 1.0), which is the correct default now that the comparator family genuinely goes through satisfies; it matches each comparator's own Deequ default.

Resolving the threads accordingly.

github-actions Bot reviewed Jun 26, 2026

View reviewed changes

Comment thread pydeequ/checks.py Outdated

Comment thread pydeequ/checks.py Outdated

Comment thread pydeequ/checks.py

nikolauspschuetz marked this pull request as ready for review June 27, 2026 17:30

nikolauspschuetz force-pushed the fix/issue-227-comparator-literal-regression branch from 50fc96c to 83c28e2 Compare June 27, 2026 17:41

github-actions Bot reviewed Jun 27, 2026

View reviewed changes

Comment thread pydeequ/checks.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: restore column-vs-literal comparison in isGreaterThan/isLessThan family (#227)#273

fix: restore column-vs-literal comparison in isGreaterThan/isLessThan family (#227)#273
nikolauspschuetz wants to merge 1 commit into
awslabs:masterfrom
nikolauspschuetz:fix/issue-227-comparator-literal-regression

nikolauspschuetz commented Jun 26, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nikolauspschuetz commented Jun 27, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

nikolauspschuetz commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

nikolauspschuetz commented Jun 26, 2026

Problem

Root cause

Fix

Tests

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nikolauspschuetz commented Jun 27, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nikolauspschuetz commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant