MaPLe (MPL) is an extension of the MLton compiler for Standard ML which implements support for nested (fork-join) parallelism. MPL generates executables with excellent multicore performance, utilizing a novel approach to memory management based on the theory of disentanglement [1,2,3,4,5].
MPL is research software and is being actively developed.
If you are you interested in using MPL, consider checking
out the tutorial.
You might also be interested in exploring
(a library for MPL) and the
Parallel ML benchmark suite.
Try out MPL with Docker:
$ docker pull shwestrick/mpl $ docker run -it shwestrick/mpl /bin/bash ...# examples/bin/primes @mpl procs 4 --
If you want to try out MPL by writing and compiling your own code, we recommend
mounting a local directory inside the container. For example, here's how you
can use MPL to compile and run your own
main.mlb in the current directory.
(To mount some other directory, replace
$(pwd -P) with a different path.)
$ ls main.mlb $ docker run -it -v $(pwd -P):/root/mycode shwestrick/mpl /bin/bash ...# cd /root/mycode ...# mpl main.mlb ...# ./main @mpl procs 4 --
Build and Install (from source)
MPL has only been tested on Linux with x86-64. The following software is required.
- GMP (GNU Multiple Precision arithmetic library)
- GNU Make, GNU Bash
- binutils (
- miscellaneous Unix utilities (
- Standard ML compiler and tools:
- Recommended: MLton (
mlyacc). Pre-built binary packages for MLton can be installed via an OS package manager or (for select platforms) obtained from http://mlton.org.
- Supported but not recommended: SML/NJ (
- Recommended: MLton (
The following builds the compiler at
$ make all
After building, MPL can then be installed to
$ make install
or to a custom directory with the
$ make PREFIX=/opt/mpl install
Parallel and Concurrent Extensions
MPL extends SML with a number of primitives for parallelism and concurrency.
Take a look at
examples/ to see these primitives in action.
Note: Before writing any of your own code, make sure to read the section "Disentanglement" below.
val par: (unit -> 'a) * (unit -> 'b) -> 'a * 'b val parfor: int -> (int * int) -> (int -> unit) -> unit val alloc: int -> 'a array
par primitive takes two functions to execute in parallel and
returns their results.
parfor primitive is a "parallel for loop". It takes a grain-size
(i, j), and a function
f, and executes
f(k) in parallel for each
i <= k < j. The grain-size
g is for manual granularity
parfor splits the input range into approximately
each of size at most
g, and each subrange is processed sequentially. The
grain-size must be at least 1, in which case the loop is "fully parallel".
alloc primitive takes a length and returns a fresh, uninitialized array
of that size. Warning: To guarantee no errors, the programmer must be
careful to initialize the array before reading from it.
alloc is intended to
be used as a low-level primitive in the efficient implementation of
high-performance libraries. It is integrated with the scheduler and memory
management system to perform allocation in parallel and be safe-for-GC.
val compareAndSwap: 'a ref -> ('a * 'a) -> 'a val arrayCompareAndSwap: ('a array * int) -> ('a * 'a) -> 'a
compareAndSwap r (x, y) performs an atomic
which attempts to atomically swap the contents of
returning the original value stored in
r before the CAS.
Polymorphic equality is determined
in the same way as MLton.eq, which is a
standard equality check for simple types (
word, etc.) and
a pointer equality check for other types (
string, tuples, datatypes,
etc.). The semantics are a bit murky.
arrayCompareAndSwap (a, i) (x, y) behaves the same as
on arrays instead of references. This performs a CAS at index
i of array
a, and does not read or write at any other locations of the array.
.mlb files (ML Basis) to describe
source files for compilation. A typical
.mlb file for MPL is shown
below. The first three lines of this file respectively load:
- The SML Basis Library
ForkJoinstructure, as described above
MLtonstructure, which includes the MPL extension
MLton.Parallelas described above, as well as various MLton-specific features. Not all MLton features are supported (see "Unsupported MLton Features" below).
(* libraries *) $(SML_LIB)/basis/basis.mlb $(SML_LIB)/basis/fork-join.mlb $(SML_LIB)/basis/mlton.mlb (* your source files... *) A.sml B.sml
Compiling a Program
The command to compile a
.mlb is as follows. By default, MPL
produces an executable with the same base name as the source file, i.e.
this would create an executable named
$ mpl [compile-time options...] foo.mlb
MPL has a number of compile-time options derived from MLton, which are documented here. Note that MPL only supports C codegen and does not support profiling.
Some useful compile-time options are
-output <NAME>Give a specific name to the produced executable.
-default-type int64 -default-type word64Use 64-bit integers and words by default.
-debug true -debug-runtime true -keep gFor debugging, keeps the generated C files and uses the debug version of the runtime (with assertions enabled). The resulting executable is somewhat peruse-able with tools like
-detect-entanglement trueenables the dynamic entanglement detector. See below for more information.
$ mpl -default-type -int64 -output foo sources.mlb
Running a Program
MPL executables can take options at the command line that control the run-time system. The syntax is
$ <program> [@mpl [run-time options...] --] [program args...]
The runtime arguments must begin with
@mpl and end with
--, and these are
not visible to the program via
Some useful run-time options are
Nworker threads to run the program.
set-affinityPin worker threads to processors. Can be used in combination with
affinity-stride <S>to pin thread
ito processor number
B + S*i.
block-size <X>Set the heap block size to
Xbytes. This can be written with suffixes K, M, and G, e.g.
64Kis 64 kilobytes. The block-size must be a multiple of the system page size (typically 4K). By default it is set to one page.
For example, the following runs a program
foo with a single command-line
bar using 4 pinned processors.
$ foo @mpl procs 4 set-affinity -- bar
Currently, MPL only supports programs that are disentangled, which (roughly speaking) is the property that concurrent threads remain oblivious to each other's allocations .
Here are a number of different ways to guarantee that your code is disentangled.
- (Option 1) Use only purely functional data (no
arrays). This is the simplest but most restrictive approach.
- (Option 2) If using mutable data, use only non-pointer data. MPL guarantees
that simple types (
real, etc.) are never indirected through a pointer, so for example it is safe to use
int array. Other types such as
int list arrayand
int array arrayshould be avoided. This approach is very easy to check and is surprisingly general. Data races are fine!
- (Option 3) Make sure that your program is race-free. This can be tricky to check but allows you to use any type of data. Many of our example programs are race-free.
Whenever a thread acquires a reference to an object allocated concurrently by some other thread, then we say that the two threads are entangled. This is a violation of disentanglement, which MPL currently does not allow.
MPL has a built-in dynamic entanglement detector which is enabled by default. The entanglement detector monitors individual reads and writes during execution; if entanglement is found, the program will terminate with an error message.
The entanglement detector is both "sound" and "complete": there are neither false negatives nor false positives. In other words, the detector always raises an alarm when entanglement occurs, and never raises an alarm otherwise. Note however that entanglement (and therefore also entanglement detection) can be execution-dependent: if your program is non-deterministic (e.g. racy), then entanglement may or may not occur depending on the outcome of a race condition. Similarly, entanglement could be input-dependent.
Entanglement detection is highly optimized, and typically has negligible
overhead (see ). It can be disabled at compile-time by passing
-detect-entanglement false; however, we recommend against doing so. MPL
relies on entanglement detection to ensure memory safety. We recommend leaving
entanglement detection enabled at all times.
Bugs and Known Issues
In general, the basis library has not yet been thoroughly scrubbed, and many functions may not be safe for parallelism (#41). Some known issues:
Int.toStringis racy when called in parallel.
Real.fromStringmay throw an error when called in parallel.
- (#115) The GC is currently
disabled at the "top level" (outside any calls to
ForkJoin.par). For highly parallel programs, this has generally not been a problem so far, but it can cause a memory explosion for programs that are mostly (or entirely) sequential.
Unsupported MLton Features
Many MLton-specific features are unsupported, including (but not limited to):
Thread(partially supported but not documented)
Cont(partially supported but not documented)
 Hierarchical Memory Management for Parallel Programs. Ram Raghunathan, Stefan K. Muller, Umut A. Acar, and Guy Blelloch. ICFP 2016.
 Hierarchical Memory Management for Mutable State. Adrien Guatto, Sam Westrick, Ram Raghunathan, Umut Acar, and Matthew Fluet. PPoPP 2018.
 Disentanglement in Nested-Parallel Programs. Sam Westrick, Rohan Yadav, Matthew Fluet, and Umut A. Acar. POPL 2020.
 Provably Space-Efficient Parallel Functional Programming. Jatin Arora, Sam Westrick, and Umut A. Acar. POPL 2021.
 Entanglement Detection with Near-Zero Cost. Sam Westrick, Jatin Arora, and Umut A. Acar. ICFP 2022.