Skip to content

Fix nondeterminism in splitting#482

Merged
robknight merged 2 commits intomainfrom
splitting-nondeterminism
Feb 16, 2026
Merged

Fix nondeterminism in splitting#482
robknight merged 2 commits intomainfrom
splitting-nondeterminism

Conversation

@robknight
Copy link
Collaborator

@robknight robknight commented Feb 16, 2026

This addresses two nondeterminism cases:

The first issue came during predicate splitting: we were storing candidate statement indices in a HashSet, and then picking the "best" one with max_by_key.

Most of the time that’s fine, because the scoring function picks a clear winner. But in tie cases, where two statements have exactly the same primary score and secondary tie-breaker metrics, max_by_key effectively falls back to iterator order. With a HashSet, that iteration order is non-deterministic.

That meant the chosen "best next statement" could vary from run to run in those tie situations. Once that first difference happens, downstream ordering, split boundaries, and promoted wildcard sets can also differ, even though the input predicate is identical.

The fix was to make tie resolution explicit and deterministic. The selection key is now (primary_score, tie_breakers, Reverse(idx)). Since max_by_key chooses the maximum key, Reverse(idx) makes smaller indices rank higher in otherwise-equal cases.

The second issue affected only diagnostic messages, and has been resolved by ensuring a sort order on the message elements.

@robknight robknight requested a review from ed255 February 16, 2026 13:54
Copy link
Collaborator

@ed255 ed255 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the fix :D

@robknight robknight merged commit e950661 into main Feb 16, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants