evaluator: disjunction performance improvements #2002

myitcv · 2022-10-14T09:21:07Z

Creating a separate umbrella issue for disjunction performance improvements, off the back of #804

The following optimizations are possible for disjunctions

Early termination upon ambiguity

v0.2 deployed a strategy of first evaluating defaults and then non-defaults. This allowed for early termination:

as soon as all possibilities for defaults were processed, termination could stop if at least one remained
otherwise, termination could stop if at least two valid answers were obtained, as this would lead to ambiguity.

The implementation of v0.2 had other issues that were addressed in v0.3. This optimization was dropped in the process. However, this optimization is still valid for v0.3.

Postpone disjuncts with exclusive fields

For disjunctions of the form

#closed: {
    {a: int} | {b: int}
    {c: int} | {d: int}
}

where each top-level field in a disjunct is unique, it suffices to postpone evaluation until all other conjuncts are merged. As the fields are mutually exclusive per disjunction, it suffices to first determine which occur in the rest of the disjunction and then select the appropriate disjuncts accordingly. This would completely bypass disjunction expansion.

Mutually exclusive disjuncts / discriminator fields

Disjunctions that mutually do not unify can be handled more efficiently in the general case.

A simple example is the following:

a: 1 | 2 | 3
b: a
b: 1

Evaluating b is akin to a map lookup. Essentially, for each disjunct it takes constant effort to determine whether there is a match.

The same trick can be applied to structs that are mutually exclusive. Consider the following:

{kind: “Service”, name: string} | {kind: “Deployment”, name string}

as long as such a disjunction is matched against a struct that specifies a concrete kind field, only one struct can match. In this case, we call kind a discriminator fields. Just as with the trick for exclusive fields, the non-disjunction conjuncts can be unified beforehand to determine the presence of the discriminator field, after which disjuncts can be eliminated before fully unifying them. These mutually exclusive fields can be discovered during compilation or once when evaluating an arc.

A more general optimization along these lines is called the QuickCheck, (used in LinGo’s graph unification, for instance), where it can lead to considerable speedups. (Even though these implementations of graph unification are O(n), they often are used in a setting where many combinations of graphs are attempted, not unlike what CUE accomplishes with disjunctions within a single unification.) This encompasses detecting “hot” fields within a struct that are likely to fail unification. The indication of a required field (the foo!: bar annotation proposed elsewhere could be used for that), or just fields that are mutually exclusive as determined by anti-unification of disjuncts can be used for this purpose.

References:
https://profs.info.uaic.ro/~ciortuz/PAPERS/ALL/blue.pdf
https://www.cambridge.org/core/journals/natural-language-engineering/article/abs/pet-a-platform-for-experimentation-with-efficient-hpsg-processing-techniques/676B9162641E3DD8ABB846C0F5B105CF

Structure sharing

Currently, CUE injects the original conjuncts in a new context, triggering the recomputation of these values. When these values are not unified with any new conjuncts, however, there is really no need to trigger this recomputation. The arcs could then be shared between the two nodes. The result is a potentially significant reduction in recomputation.

This technique can CUE immune to the “billion laughs” attack and guarantee linear evaluation (disregarding comprehensions and complex disjunctions).

One requirement of this technique is that it has no upwards references. The current CUE implementation does have upward references, but they are used only in cases that do not impair the use of structure sharing:
to refer retrieve closedness constraints from the parent
to keep track of the path descended into with the API

Only the first directly affects evaluation. This is not an issue, however, as not unifying with new nodes also means no new closedness information, meaning that the arc has already passed a closedness test.

The second case should be fairly straightforward to track. The API could detect when an arc is shared, and copy the arc structure keeping a back reference to its original. That would only have to be done one node at a time. Obviously, users of the API would still need to not blindly traverse all “virtual” nodes of a structure to benefit from structure sharing. The asymmetry in up references can be used to flag a node as “borrowed”.

Structure sharing can also help make the API concurrent. References:
https://www.researchgate.net/publication/2415772_Quasi-Destructive_Graph_Unification
https://www.aclweb.org/anthology/P00-1045.pdf

The text was updated successfully, but these errors were encountered:

myitcv · 2022-10-14T10:41:30Z

Good test cases

Here is a necessarily incomplete list of good test cases that stress disjunctions:

myitcv added roadmap/performance pre v1.0 labels Oct 14, 2022

myitcv mentioned this issue Oct 14, 2022

evaluator: general performance enhancements #804

Closed

myitcv removed the pre v1.0 label Oct 14, 2022

myitcv assigned mpvl Oct 14, 2022

myitcv added the zGarden label Jun 13, 2023

myitcv added this to the v0.7.0: performance and disjunction fixes milestone Jun 14, 2023

mpvl added the disjunctions label Jun 14, 2023

myitcv removed the zGarden label Jun 14, 2023

mvdan mentioned this issue Jul 22, 2023

Unexpected long cue export time (~73 sec) #2500

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluator: disjunction performance improvements #2002

evaluator: disjunction performance improvements #2002

myitcv commented Oct 14, 2022

myitcv commented Oct 14, 2022

evaluator: disjunction performance improvements #2002

evaluator: disjunction performance improvements #2002

Comments

myitcv commented Oct 14, 2022

Early termination upon ambiguity

Postpone disjuncts with exclusive fields

Mutually exclusive disjuncts / discriminator fields

Structure sharing

myitcv commented Oct 14, 2022

Good test cases