Understanding the algorithm

David Nolen edited this page Dec 12, 2014 · 35 revisions

core.match employs the algorithm described in Luc Maranget's paper Compiling Pattern Matching to good Decision Trees. What follows is a slower paced description of the algorithm.

Necessity and Specialization

Consider the following pattern match:

(let [x true
      y true
      z true]
  (match [x y z]
    [_ false true] 1
    [false true _ ] 2
    [_ _ false] 3
    [_ _ true] 4
    :else 5))

It's clear that the above expression will return 4, as the values match the fourth clause.

As a visual aid we will show the pattern matrix as it goes through a series of transformations in the following form:

Fig. 1

 x y z
-------
[_ f t] 1
[f t _] 2
[_ _ f] 3
[_ _ t] 4
[_ _ _] 5

For brevity and alignment we've replaced true with t and false with f. The final line is represented as a line of wildcards. Finally we label the columns with the variables (in this context we may use the word "occurrence") and the rows.

Maranget's algorithm is based on a surprisingly simple insight. The semantics of pattern matching dictate that we evaluate the clauses one at a time from top to bottom. Each clause itself is evaluated left to right. Even so this gives us a surprising amount of leeway because of wildcards.

Observe that for the first line to succeed we need never consider x. But this is an important observation! Maranget's approach produces optimal decision trees by finding the columns that must be tested and testing them first. This is done by scoring columns to determine what to test next.

So how do we score a column? Wildcards obviously shouldn't add anything to a column's score. But in addition real patterns below a wildcard should also not add to a column's score. Given these simple rules the scored matrix looks like this:

Fig. 2

 x y z
-------
[0 1 1] 1
[0 1 0] 2
[0 0 0] 3
[0 0 0] 4
[0 0 0] 5

It's now immediately clear that the y column should be tested first. Maranget's algorithm modifies the pattern matrix by swapping the column that should be tested so that it is the new first column giving us:

Fig. 3

 y x z
-------
[f _ t] 1
[t f _] 2
[_ _ f] 3
[_ _ t] 4
[_ _ _] 5

At this point we now can generate the first switch of the decision tree. From this step we will compute two new pattern matrices - one will represent a specialized matrix - the matrix that remains after we've made a successful match, and the default matrix - the matrix if we don't find a match.

The specialized matrix will look like the following:

Fig. 4

 x z
-----
[_ t] 1

We've dropped the y column as we assume we successfully matched f. The default matrix will look like the following:

Fig. 5

 y x z
-------
[t f _] 2
[_ _ f] 3
[_ _ t] 4
[_ _ _] 5

At this point it should hopefully be clear that we can now recursively apply the above process of selecting the necessary column, and once again splitting between the specialized matrix and the default matrix.

For completeness we illustrate the remaining steps for Fig. 4. We pick the necessary column.

Fig. 6

 z x
-----
[t _] 1

When we specialize this matrix on t we'll be left with:

Fig. 7

 x
---
[_] 1

We are left with a pattern matrix that only has one row and it is a row of wildcards - we have arrived at a terminal node of the decision tree and should return 1.

Note: A close reading of Maranget's paper (pg 3) will show that we've diverged a bit from the original approach - wildcards are carried forward during specialization in Maranget's paper. This is a code size issue, particularly under the JVM where there are method size limits. core.match on the JVM backtracks via preallocated exceptions as these will get optimized into long jumps. This allows defaults to be shared without carrying them forward.

The Decision Tree

Every time we specialize a pattern matrix we will produce a SwitchNode, this node will eventually be compiled down to a test and will dispatch either to the results of the specialized matrix or the default matrix. If we get to a case where we have a pattern matrix containing only a row of wildcards as illustrated in Fig. 7 we will produce a LeafNode. We also produce a LeafNode when we have a pattern matrix where the rows are empty (no column).

It is possible that we may arrive at a matrix which has no rows. This means no match exists and for this case we will emit a FailNode which will be compiled to a backtrack.

The decision tree is a directed acyclic graph produced by the described algorithm and it will be composed of SwitchNode, LeafNode, and FailNode instances.

As Maranget shows in his paper - the simple idea of picking the necessary column produces compact decision trees.

Constructors

Thus far we have only demonstrated matching on literals. Pattern matching derives most of its power and benefit when applied to data structures. Consider the following match expression where we match on a Clojure vector:

(let [v [1 2 4]
      x true]
  (match [v x]
    [[_ 2 2] true] 1
    [[_ 2 3] false] 2
    [[1 2 4] _] 3
    :else 4))

This can be represented with the following matrix:

Fig. 8

    v    x
-----------
[[_ 2 2] t] 1
[[_ 2 3] f] 2
[[1 2 4] _] 3
[   _    _] 4

How can we match the contents of the vector? We simply need to transform the matrix into a form where Maranget's algorithm can work. When we specialize in this case, we are not specializing on a particular value, but whether the value to be tested is a vector of a certain size.

v in this case is already the necessary column so no swapping is required. The specialized matrix will look like Fig. 9. Notice that we've introduced new occurrences v_0, v_1, and v_2 for each element of the vector pattern. These new occurences are associated with a binding expression so that they will be properly bound to the correct value in the vector. For example v_0 will be bound via (nth v 0).

Fig. 9

 v_0 v_1 v_2 x
---------------
[ _   2   2  t] 1
[ _   2   3  f] 2
[ 1   2   4  _] 3

And the default matrix will look like:

Fig. 10

 v x
-----
[_ _] 4

So we can see that for complex patterns we simply try to convert the pattern matrix into a form such that Maranget's approach again applies. A quick glance at Fig. 9 shows that v1 will be the next necessary column.

Maranget illustrates this point with ML style data types but it should be clear that the idea works just as well for Clojure vectors as they are also positional in the same way that ML style data type constructors are.

Map patterns

Maranget's paper does not go into the problem of matching maps (dictionaries). However it's not too difficult to see how the approach can be extended to optimize matching maps.

Consider the following match expression:

(match [x]
  [{:a _ :b 2}] 1
  [{:a 1 :b 1}] 2
  [{:c 3 :d _ :e 4}] 3
  :else 4)

We can make map pattern matching conform to Maranget's approach - we consider all keys. The above translates to the following pattern matrix:

Fig. 11

        x
------------------
[{:a _ :b 2}     ] 1
[{:a 1 :b 1}     ] 2
[{:c 3 :d _ :e 4}] 3
[ _              ] 4

If we specialize this matrix on the map pattern we will get Fig. 12. As with vector patterns we introduce new occurrences which will be bound to corresponding values in the original occurrence. Maps are unordered so we must sort the keys.

Fig. 12

 x_a x_b x_c x_d x_e
----------------------
[(_)  2   _   _   _  ] 1
[ 1   1   _   _   _  ] 2
[ _   _   3  (_)  4  ] 3

The wildcard patterns specified by the user are not real wildcards, they must be tested - the user has specified that key must exist regardless of the value thus we write (_) to differentiate between that and the real wildcards - those values that are implied that truly do not need to be tested.

We now have a pattern matrix that can be optimized by Maranget's procedure. In core.match the existential patterns for the keys scores lower than real constructor patterns, thus x_b will be the next necessary column.

Pseudo-patterns

As alluded in section 4 in the original paper or-patterns are not really patterns - they simply introduce rows. Prior to real pattern matrix specialization the top left pattern is tested to see if it is a pseudo-pattern. If it is a pseudo-pattern the matrix is specialized by it. However, pseudo-patterns are generally erased so no actual source will ever be generated for them.

or-patterns

Consider the following pattern matrix:

Fig. 13

     x         y
---------------------
[_         (:or 1 2)] 1
[(:or 3 4) _        ] 2

y will be chosen and the matrix reordered:

Fig. 14

     y         x
---------------------
[(:or 1 2)     _    ] 1
[    _     (:or 3 4)] 2

Before actual specialization we expand the pseudo-patterns:

Fig. 15

     y         x
---------------------
[    1         _    ] 1
[    2         _    ] 1
[    _     (:or 3 4)] 2

Note that we did not attempt to expand all the or-patterns at once. This is important as or-patterns may appear in other pseudo-patterns.

function application patterns

It may not be immediately apparent but function application patterns are also best implemented as pseudo-patterns. Function application patterns work by taking an occurrence, applying a function to it and pattern matching on the result of that function application. The insight is that the function application introduces a new occurrence.

Consider the following pattern matrix:

Fig. 16

       n
---------------
[ (1 :<< inc) ] 1
[ (2 :<< dec) ] 2

The first pattern will increment n and pattern match the result. The second will decrement and do the same.

The insight is that we must introduce an occurrence in order to pattern match the result and we replace the original function application pattern with a wildcard:

Fig. 17

  app_n (inc n)       n
-----------------------------
[      1              _     ] 1
[      _        (2 :<< dec) ] 2

Maranget's algorithm may now proceed as usual.