Skip to content
Regular languages in types
OCaml Other
  1. OCaml 98.6%
  2. Other 1.4%
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
test
.gitignore
.travis-ci.sh
.travis.yml
LICENSE.md
Makefile
Readme.md
jbuild
jbuild-workspace.dev
pumping.ml
pumping.opam

Readme.md

Pumping

Pumping is a library to enrich the OCaml type system in order to conveniently check regular languages.

The goal of type systems is to check that some expression respect some properties. More powerful type systems enrich the kind of properties that can be expressed. On this regard, the OCaml type system is quite powerful. Indeed, jacques Garrigue showed that arbitrary Turing machines can be encoded in GADTs. These methods allow to check very expressive properties, but are quite inconvenient to use. The pumping library allows to use properties that covers regular languages by leveraging a well known tool: regular expressions.

Regular expressions in types

Pumping introduces a new ppx annotation re (also available as pumping.re) that allows to write regular expressions in types:

type t = [%re? Star (`a | `b)]

We can then check if a word is in this regular language. Words here are sequence of polymorphic variants terminated by `End.

let x : t = `a (`a (`b `End))

Types produced by pumping are structural: we don't even need a type declaration!

let x : [%re? `a, (`b | `c)] = `a (`c `End)

Most of the usual regular expression operators are available, such as star and ranges.

let x
 : [%re? Star ('a'..'z'), (`Foo | `Bar)]
 = `f (`o (`o (`Bar `End)))

Intersection is also available:

(`B `End : [%re? (`A | `B) :: (`B | `C)])

Convenient notation for regular expressions

Unfortunately, PPXs are limited by the OCaml syntax. The usual syntax for regular expressions is short and well known by most people, but not easily encoded in the OCaml syntax. In order to be able to express regular properties in an even more terse manner, pumping also allows to use the POSIX notation for regular expressions:

let x : [%re"foo|bar"] = `foo `End

Most of the POSIX syntaxes are supported, including charsets, ranges and bounded repetition:

let x : [%re"[a-z]{2,6}"] = `p (`o (`t (`a (`t (`o `End)))))

A new operator, &, is introduced for the intersection:

let x : [%re"[ab]&[bc]"] = `b `End

Equivalence of regular expressions

Since types defined by pumping are structural, it is possible to use the typechecker to test the equivalence of regular expressions:

let f : [%re"aa*a"] -> [%re"a{2,}"] = fun x -> x

We can also leverage subtyping to test the inclusion of regular expressions:

let f x = (x : [%re"a*"] :> [%re"[ab]*"])

Implementation

The implementation is a fairly simple adaption of Brzozowski derivative. In particular, it follows the technique described in the article "Regular-expression derivatives reexamined". Recursive types with (polymorphic) variants naturally describes a graph, thanks to unification type variables and the as construction.

Indeed, [%re"(OC|oc|c)(a)*ml"] is simply:

[   `oc of 'v2
  | `c of 'v2
  | `OC of [ `ml of [ `End ] as 'v3  | `a of 'v2 ] as 'v2
] as 'v1

The typechecker then does a very good job at simplifying the types. Unfortunately, it also unfolds the sub-parts of non-recursive types, which sometimes causes an exponential blowup.

Going further

Several features are yet missing in this implementation. Indeed, the expressivity of regular languages is well known, but limited.

  • In the style of PERL named regular expressions, it would be interesting to introduce annotation on types, in order to describe mutually dependent regular expressions:
    type%re t1 = [%re? Star `Foo, t2]
    and t2 = [%re? (`Bar | `Baz | t2)]
  • It is unclear how to handle grouping, backreferences and lookaround constructions.
  • Deterministic Pushdown automatons are fairly easy to encode (using a "counter" type), but a new convenient syntax would need to be proposed.
You can’t perform that action at this time.