Skip to content

Conversation

@jcoglan
Copy link
Contributor

@jcoglan jcoglan commented Jan 21, 2026

Overview

This PR represents work so far on a version of mango_selector:match/2 than can return a representation of how the input fails to match the selector, rather than just a boolean. There are a few commits of developing this behaviour incrementally before everything "snaps into place" to get us code paths that can produce failure descriptions, and retain the original boolean behaviour that avoids creating a lot of ephemeral failure lists, and can short-circuit on compound operators.

The rough steps here are as follows; I'm retaining all these commits for now in case we want to compare different designs for code complexity and performance.

  • Add a lot of unit tests to mango_selector that check every match operator and its negation. This is necessary since various compound operators like $or and $allMatch have surprising edge case behaviour on empty lists and we need to make sure this is not broken. Mostly these tests check that if selector S returns true on an input then { "$not": S } returns false and vice versa. There are some exceptions to this due to how $or works and how $and and $or are normalised.
  • Replace the existing implementation with one where every operator returns a possibly-empty list of failures, instead of a boolean. match_int then converts this to a bool on the way out.
  • Replace the Cmp argument to everything with a ctx record that contains cmp, as well as other things needed for failure generation, e.g. the path to the current value, whether matching is currently negated, etc.
  • Try to fix $allMatch and $elemMatch normalisation to avoid the complexity that comes from not being able to apply DeMorgan to them, due to $allMatch being defined to return false for empty lists. This is a breaking change that is reverted later since we decided to retain existing behaviour above all.
  • Implement negation handling, where the presence of a $not has to be communicated to nodes lower down the tree in order to produce good failure messages. Not all negation can be normalised out of the tree and so all operators need to handle being negated during evaluation.
  • Finally, collapse all the complexity into an implementation that supports both the old and new behaviours.

The idea in the design I've ended up with that tries to minimise both complexity and runtime cost is:

  • Add #ctx.verbose which indicates whether a detailed failure description is wanted.
  • Keep the original implementations of all operators as the response when passed #ctx{verbose=false}, i.e. when only a boolean result is needed.
  • For all leaf operators, the #ctx{verbose=true} case can be implemented by calling the #ctx{verbose=false} case, and creating a #failure record if this returns false.
  • #ctx{verbose=false} cases do not need to deal with #ctx{negate=true}; they continue to return their original result and let $not invert it. We only need special handling of #ctx{negate=true} in verbose mode, where the $not operator passes its effect down via the #ctx. This reduces the number of cases each operator has to deal with to basically: non-verbose mode, and positive and negative verbose cases.
  • For compound operators, special code is needed to gather up the failures from internal selectors and deal with edge cases in a way that's consistent with the original implementation.
  • #ctx.path is only updated in #ctx{verbose=true} code paths so this expense is avoided in non-verbose mode. Path items are added to the front of this list as that's cheaper than doing Path ++ [Item]; we would reverse these before returning to a client.
  • #failure records retain a #ctx from where they can access the path and negation state, in order to generate a good human-readable error message later on.
  • The tests are updated to make sure that both verbose modes return consistent results, i.e. if verbose=false returns true, then verbose=true returns [], and if the former returns false, the latter gives a non-empty list. These are all passing.

Testing recommendations

We should benchmark this in its current version, and both verbose modes of this version, against a substantial indexing workload to look for performance regressions. Or, if performance is equivalent in both verbose modes, we can remove a lot of redundancy by removing the verbose flag entirely.

Related Issues or Pull Requests

Checklist

  • Code is written and works correctly
  • Changes are covered by tests
  • Any new configurable parameters are documented in rel/overlay/etc/default.ini
  • Documentation changes were made in the src/docs folder
  • Documentation changes were backported (separated PR) to affected branches

@nickva
Copy link
Contributor

nickva commented Jan 21, 2026

That's a nice approach using a context record in place of the Cmp arg.

All the extra eunit tests are awesome. If you want, could even put them in a separate PR and we'd merge them right away. It would make it easier to review subsequent PRs because we can obviously see all the existing tests pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants