Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluator: disjunction performance improvements #2002

Open
myitcv opened this issue Oct 14, 2022 · 1 comment
Open

evaluator: disjunction performance improvements #2002

myitcv opened this issue Oct 14, 2022 · 1 comment

Comments

@myitcv
Copy link
Member

myitcv commented Oct 14, 2022

Creating a separate umbrella issue for disjunction performance improvements, off the back of #804

The following optimizations are possible for disjunctions

Early termination upon ambiguity

v0.2 deployed a strategy of first evaluating defaults and then non-defaults. This allowed for early termination:

as soon as all possibilities for defaults were processed, termination could stop if at least one remained
otherwise, termination could stop if at least two valid answers were obtained, as this would lead to ambiguity.

The implementation of v0.2 had other issues that were addressed in v0.3. This optimization was dropped in the process. However, this optimization is still valid for v0.3.

Postpone disjuncts with exclusive fields

For disjunctions of the form

#closed: {
    {a: int} | {b: int}
    {c: int} | {d: int}
}

where each top-level field in a disjunct is unique, it suffices to postpone evaluation until all other conjuncts are merged. As the fields are mutually exclusive per disjunction, it suffices to first determine which occur in the rest of the disjunction and then select the appropriate disjuncts accordingly. This would completely bypass disjunction expansion.

Mutually exclusive disjuncts / discriminator fields

Disjunctions that mutually do not unify can be handled more efficiently in the general case.

A simple example is the following:

a: 1 | 2 | 3
b: a
b: 1

Evaluating b is akin to a map lookup. Essentially, for each disjunct it takes constant effort to determine whether there is a match.

The same trick can be applied to structs that are mutually exclusive. Consider the following:

{kind: “Service”, name: string} | {kind: “Deployment”, name string}

as long as such a disjunction is matched against a struct that specifies a concrete kind field, only one struct can match. In this case, we call kind a discriminator fields. Just as with the trick for exclusive fields, the non-disjunction conjuncts can be unified beforehand to determine the presence of the discriminator field, after which disjuncts can be eliminated before fully unifying them. These mutually exclusive fields can be discovered during compilation or once when evaluating an arc.

A more general optimization along these lines is called the QuickCheck, (used in LinGo’s graph unification, for instance), where it can lead to considerable speedups. (Even though these implementations of graph unification are O(n), they often are used in a setting where many combinations of graphs are attempted, not unlike what CUE accomplishes with disjunctions within a single unification.) This encompasses detecting “hot” fields within a struct that are likely to fail unification. The indication of a required field (the foo!: bar annotation proposed elsewhere could be used for that), or just fields that are mutually exclusive as determined by anti-unification of disjuncts can be used for this purpose.

References:
https://profs.info.uaic.ro/~ciortuz/PAPERS/ALL/blue.pdf
https://www.cambridge.org/core/journals/natural-language-engineering/article/abs/pet-a-platform-for-experimentation-with-efficient-hpsg-processing-techniques/676B9162641E3DD8ABB846C0F5B105CF

Structure sharing

Currently, CUE injects the original conjuncts in a new context, triggering the recomputation of these values. When these values are not unified with any new conjuncts, however, there is really no need to trigger this recomputation. The arcs could then be shared between the two nodes. The result is a potentially significant reduction in recomputation.

This technique can CUE immune to the “billion laughs” attack and guarantee linear evaluation (disregarding comprehensions and complex disjunctions).

One requirement of this technique is that it has no upwards references. The current CUE implementation does have upward references, but they are used only in cases that do not impair the use of structure sharing:
to refer retrieve closedness constraints from the parent
to keep track of the path descended into with the API

Only the first directly affects evaluation. This is not an issue, however, as not unifying with new nodes also means no new closedness information, meaning that the arc has already passed a closedness test.

The second case should be fairly straightforward to track. The API could detect when an arc is shared, and copy the arc structure keeping a back reference to its original. That would only have to be done one node at a time. Obviously, users of the API would still need to not blindly traverse all “virtual” nodes of a structure to benefit from structure sharing. The asymmetry in up references can be used to flag a node as “borrowed”.

Structure sharing can also help make the API concurrent. References:
https://www.researchgate.net/publication/2415772_Quasi-Destructive_Graph_Unification
https://www.aclweb.org/anthology/P00-1045.pdf

@myitcv
Copy link
Member Author

myitcv commented Oct 14, 2022

Good test cases

Here is a necessarily incomplete list of good test cases that stress disjunctions:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants