feat: Tolerances for inner lists and arrays by MariusMerkleQC · Pull Request #21 · Quantco/diffly

Marius Merkle (MariusMerkleQC) · 2026-03-27T08:13:40Z

Motivation

Follow-up to #19, this finally solves #8 to 100%. We so far defaulted to naive comparison for inner lists vs lists, so whenever they were nested within some other data structure (like an array of lists, a struct of struct of lists, etc.). Element-wise comparison accounting for tolerances is now applied instead: whenever two columns contain a list anywhere in their data type "tree", we compute the maximum length of the lists, where maximum is both over
(1) left and right data frame
(2) on any level in the data type tree

In list vs list comparisons, we then traverse all elements up to max_list_length and cover out-of-bounds by returning None. This doesn't yield false positive matches as we combine the element-wise check with a list-length check.

Changes

Adjusted _max_list_lengths_by_column to consider all data type levels
Adjusted expected outcome in test_condition_equal_columns_nested_list_array_with_tolerance
Added a new test test_condition_equal_columns_lists_only_inner where lists are not an outer but inner data type

codecov · 2026-03-27T08:13:59Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (2a3010b) to head (3d5467a).

Additional details and impacted files

@@            Coverage Diff            @@
##              main       #21   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           10        10           
  Lines          742       759   +17     
=========================================
+ Hits           742       759   +17

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Marius Merkle (MariusMerkleQC) · 2026-03-27T08:47:59Z

tests/test_conditions.py

-        assert actual.to_list() == [True, False, False]
-    else:
-        assert actual.to_list() == [True, True, False]
+    assert actual.to_list() == [True, True, False]


We now get the desired output 🥳

Marius Merkle (MariusMerkleQC) · 2026-03-27T08:49:16Z

diffly/comparison.py

+    """Collect max-list-length scalar expressions for every List level in the type
+    tree."""
+    if isinstance(dtype, pl.List):
+        return [expr.list.len().max(), *_list_length_exprs(expr.explode(), dtype.inner)]


The aliases clash, but it doesn't matter because we take max_horizontal immediately after.

Marius Merkle (MariusMerkleQC) · 2026-03-27T08:50:42Z

tests/test_conditions.py

        schema={"pk": pl.Int64, "a_right": rhs_type},
    )

+    max_list_length: int | None = None


This was incorrect in #19. However, I suggest an alternative approach to the tests in #23 to ensure we use the correct max_list_length values.

Copilot

Pull request overview

Extends tolerance-based element-wise comparisons to apply to inner list levels nested inside other types (e.g., structs/arrays containing lists) by computing a per-column maximum list length across the full dtype “tree”.

Changes:

Update _max_list_lengths_by_column to detect lists at any nesting level and compute the maximum list length across both frames and all list levels.
Adjust list-vs-list sequence comparison to always unroll element-wise using the computed max_list_length (and propagate it into recursive comparisons).
Update/extend tests to cover nested inner-list tolerance behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`diffly/comparison.py`	Computes max list lengths for columns with nested lists anywhere in their dtype tree.
`diffly/_conditions.py`	Removes nested-list fallback equality and requires `max_list_length` for List-vs-List unrolling; propagates length to recursive comparisons.
`tests/test_conditions.py`	Updates expected results and adds a new test for inner-list-only nesting (struct containing list).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

diffly/_conditions.py

tests/test_conditions.py

Oliver Borchert (borchero)

I actually didn't consider this previously but comparing all list elements independently will cause severe performance regressions when running the comparison on (large) lists with non-floats. Can we benchmark this and use all of this complicated logic only if there is a float somewhere in the "list hierarchy" (-> separate PR)?

Oliver Borchert (borchero) · 2026-03-27T18:02:06Z

diffly/_conditions.py

+        if max_list_length is None:
+            raise ValueError(
+                "max_list_length must be provided for List-vs-List comparisons "
+                "in _compare_sequence_columns()."
+            )
        n_elements = max_list_length


Triggering this branch in tests seems artificial. Tbh, I would just assume that this is set correctly (after all, it's a parameter that is exclusively set internally):

Suggested change

if max_list_length is None:

raise ValueError(

"max_list_length must be provided for List-vs-List comparisons "

"in _compare_sequence_columns()."

)

n_elements = max_list_length

n_elements = cast(int, max_list_length)

feat: Tolerances for inner lists and arrays

2ad8877

Marius Merkle (MariusMerkleQC) self-assigned this Mar 27, 2026

github-actions bot added the enhancement New feature or request label Mar 27, 2026

Marius Merkle (MariusMerkleQC) added 2 commits March 27, 2026 09:17

fix

ab746e9

remove _max_or_zero

abb8709

Marius Merkle (MariusMerkleQC) commented Mar 27, 2026

View reviewed changes

Marius Merkle (MariusMerkleQC) mentioned this pull request Mar 27, 2026

test: Combine tests with _max_list_lenghts_by_column #23

Open

Marius Merkle (MariusMerkleQC) requested a review from Copilot March 27, 2026 08:54

Copilot started reviewing on behalf of Marius Merkle (MariusMerkleQC) March 27, 2026 08:55 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

diffly/_conditions.py Outdated Show resolved Hide resolved

tests/test_conditions.py Outdated Show resolved Hide resolved

tests/test_conditions.py Show resolved Hide resolved

Marius Merkle (MariusMerkleQC) added 2 commits March 27, 2026 10:04

feedback copilot

8afddd2

fix test coverage

3d5467a

Marius Merkle (MariusMerkleQC) marked this pull request as ready for review March 27, 2026 09:10

Marius Merkle (MariusMerkleQC) requested review from EgeKaraismailogluQC and Oliver Borchert (borchero) as code owners March 27, 2026 09:10

Marius Merkle (MariusMerkleQC) linked an issue Mar 27, 2026 that may be closed by this pull request

Properly perform floating point comparisons for structs and lists #8

Open

Oliver Borchert (borchero) approved these changes Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Tolerances for inner lists and arrays#21

feat: Tolerances for inner lists and arrays#21
Marius Merkle (MariusMerkleQC) wants to merge 5 commits intomainfrom
nested_list_comparison

Marius Merkle (MariusMerkleQC) commented Mar 27, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

Marius Merkle (MariusMerkleQC) Mar 27, 2026

Uh oh!

Marius Merkle (MariusMerkleQC) Mar 27, 2026

Uh oh!

Marius Merkle (MariusMerkleQC) Mar 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Oliver Borchert (borchero) left a comment

Uh oh!

Oliver Borchert (borchero) Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Marius Merkle (MariusMerkleQC) commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Uh oh!

codecov bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Marius Merkle (MariusMerkleQC) Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Marius Merkle (MariusMerkleQC) Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Marius Merkle (MariusMerkleQC) Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Oliver Borchert (borchero) left a comment

Choose a reason for hiding this comment

Uh oh!

Oliver Borchert (borchero) Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Marius Merkle (MariusMerkleQC) commented Mar 27, 2026 •

edited

Loading

codecov bot commented Mar 27, 2026 •

edited

Loading