Skip to content

Commit

Permalink
[blog] Deeply Nested Pattern Matching in samlang (#1495)
Browse files Browse the repository at this point in the history
  • Loading branch information
SamChou19815 committed Feb 25, 2024
1 parent 26f63e5 commit 06afbc9
Show file tree
Hide file tree
Showing 2 changed files with 112 additions and 0 deletions.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
export const title = "Deeply Nested Pattern Matching in samlang";
import blogPostPageMetadata from '../../../../../../../lib/blog-post-page-metadata';
export const metadata = blogPostPageMetadata(title, "/blog/2024-02-25-samlang-nested-pattern-matching/pattern-matching-not-exhaustive.png");

![Pattern matching with error](/blog/2024-02-25-samlang-nested-pattern-matching/pattern-matching-not-exhaustive.png)

Yesterday, I cut a [release](https://github.com/SamChou19815/samlang/releases/tag/v0.10.0) of [samlang](https://samlang.io) which contains first-class support for deeply nested pattern matching. In this blog post, I will document some of the rationales and implementation challenges.

## Rationale

Late last year, I was making a big push to make samlang's standard library more complete. More specifically, I want to add long-missing essentials like `Map` and `Set`.

In a functional programming language that favors immutable data structures, `Map` and `Set` are usually implemented using an auto-balancing binary tree. The implementation code often needs to [perform pattern matching on a tuple](https://github.com/facebook/flow/blob/48b30f112cc4ab2cbb5633d0b6202da828772a2c/src/hack_forked/utils/collections/third-party/flow_map.ml#L238-L275), and sometimes even deeper structures. samlang has supported single-level pattern matching since day one, but using single-level pattern matching is just too awkward for this kind of code.

In addition, it turns out that supporting deeply nested pattern matching is very difficult. While exhaustiveness checking in single-level pattern matching is like a walk in the park, it is an NP-hard problem for deeply nested pattern matching.

## Difficulty

Consider the example in the screenshot. How can you systematically check and decide that we are missing the `None, None` case?

Intuitively, you might think that you can generate all possible combinations of patterns, and then remove all the ones that are matched by the code. Obviously, this is an exponential algorithm at best. However, even if you don't care about performance, it is still not practical, because types can be recursive! Consider how you would pattern match on a functional list:

```samlang
class List<T>(Nil, Cons(T, List<T>)) {}
```

If you try to enumerate all combinations of `List<int>`, you will get:

```text
Nil
Cons(int, Nil)
Cons(int, Cons(int, Nil))
Cons(int, Cons(int, Cons(int, Nil)))
...
```

Recursive types make the set of inhabitants of that type infinite, so the naive solution clearly does not work.

## Theory

I was not able to come up with the type-checking algorithm by myself, but fortunately, there are already a few languages that implement deeply nested pattern matching, and there are [papers](http://moscova.inria.fr/~maranget/papers/ml05e-maranget.pdf) that describes the exact strategy. In samlang's implementation, I mostly followed the OCaml paper linked before.

The paper presents a quite powerful idea: the problem of exhaustiveness checking can be reduced to the problem of redundancy checking: if we add a final wildcard pattern `_` that is decided to be not redundant, then we know that the user-provided pattern matching is not exhaustive. In reality, we do need to implement two variants of the base algorithm, so that during exhausiveness checking we can also produce a counterexample in addition to the yes/no answer. Nevertheless, we only need to understand the idea once to solve both problems.

I will not go into details of the exact algorithm, since the paper already covers it and provides pseudocode. However, the elegance of the core abstraction is worth the extra elaboration. The paper developed a general-purposed abstract representation of patterns that are not specific to OCaml. There are only 3 kinds of patterns:

```rust
enum Pattern<C> {
Constructor(C, Vec<Pattern<C>>),
Wildcard,
Or(Vec<Pattern<C>>)
}
```

The constructor variant can be used to model patterns like `Some(_), None`, or just any nominal struct like `Student { name, age }`. The wildcard pattern covers both identifier patterns and actual wildcard patterns, since they are equivalent for exhaustiveness checking purposes. Finally, the OR pattern can be used to model either nested OR patterns or multiple patterns that have the same handler. For example:

```rust
fn atLeastLengthTwo<T>(l: List<T>) -> bool {
match l {
List::Nil | List::Cons(_, List::Nil) -> false,
List::Cons(_, List::Cons(_, _)) -> true,
}
}
```

The only language-specific part of the algorithm is a function `is_complete` that tells whether a set of constructors `C` of a type `T` is exactly the set of all possible constructors of `T`. For example, the function always return true for simple structs, while `is_complete({Some, None}, Option) == true` and `is_complete({Nil}, List) == false`. samlang's implementation of this function is almost the same as OCaml's implementation.

It's worth noting that the idea is powerful enough to not only handle nominally-typed languages, but also handle structurally-typed languages with best effort. In JavaScript, there is a common pattern of defining a disjoint union type, and switching on the discriminator field:

```typescript
type List<T> =
| { type: 'nil' }
| { type: 'cons', v: T, next: List<T> };

function isEmpty<T>(list: List<T>): boolean {
// Hypothetical pattern-matching syntax in JS
match (list) {
case {type: 'nil', ...}:
return true;
case {type: 'cons', ...}:
return false;
}
}
```

We could guess with the best effort from the type structure that the field `type` is the discriminator and thus constructor-like, so the patterns above can be modeled as:

```rust
vec![
Pattern::Constructor("nil", vec![]),
Pattern::Constructor("cons", vec![
Pattern::Wildcard,
Pattern::Wildcard,
]),
]
```

Note that we do need to do extra work to convert the implicit wildcard patterns into explicit ones.

## Implementation in samlang

Despite the mathematical complexity of the core algorithm, it is remarkably easy to implement: the core logic only takes less than 400 lines. However, it does take a while to connect the rest of the system to the core logic.

The algorithm is quite flexible. I have extended it a little bit to gracefully handle invalid patterns. During redundancy checking, invalid patterns are assumed to match nothing and omitted; during exhaustiveness checking, they are assumed to match everything. This strategy ensures that we don't emit false-positive errors. In addition, it can also be used to support `if let <pattern> = <expr> {...} else {...}` syntax: we will error if the single pattern is exhaustive, so that you don't write some useless code like `if let Student(_) = Student("Foo") {} else {}`. It can also be used to unify the implementation of destructuring with pattern matching. Intuitively, destructuring has the opposite behavior of `if let`, the single pattern must be exhaustive.

Therefore, with the support of this core algorithm, samlang also gains the `if let` syntax, and the implementation of `if let`, destructuring, and pattern matching is unified by the same infra.

## Future Work

Due to my laziness, there are still some missing pieces, including the lack of `if <expr>` support for patterns and multiple patterns going into the same branch. However, it is good enough that enables me to finish the implementation of immutable `Map` and `Set` in the samlang standard library.

In the next few months, I will mostly focus on switching the reference-counting GC I hacked together for samlang into the native GC support from WASM.

0 comments on commit 06afbc9

Please sign in to comment.