Shared Memory Multithreading for Dart

This proposal tries to address two long standing gaps which separate Dart from other more low-level languages:

Dart developers should be able to utilize multicore capabilities without hitting limitations of the isolate model.
Interoperability between Dart and native platforms (C/C++, Objective-C/Swift, Java/Kotlin) requires aligning concurrency models between Dart world and native world. The misalignment is not an issue when invoking simple native synchronous APIs from Dart, but it becomes a blocker when:
- working with APIs pinned to a specific thread (e.g. @MainActor APIs in Swift or UI thread APIs on Android)
- dealing with APIs which want to call back from native into Dart on an arbitrary thread.
- using Dart as a container for shared business logic - which can be invoked on an arbitrary thread by surrounding native application.
Native code does not understand the concept of isolates.

See What issues are we trying to solve? for concrete code examples.

Note

Improving Dart's interoperability with native code and its multicore capabilities not only benefits Dart developers but also unlocks improvements in the Dart SDK. For example, we will be able to move dart:io implementation from C++ into Dart and later split it into a package.

The core of this proposal are two new concepts:

Shareable Data introduces the relaxation of isolate model allowing controlled sharing of mutable data and static state between isolates within an isolate group. This allows developer to write Dart code which can access shared mutable state concurrently.
Shared Isolates introduces a concept of shared isolate, an isolate which only has access to a state shared between all isolates of the group. This concept allows to bridge interoperability gap with native code. Shared isolate becomes an answer to the previously unresolved question "what if native code wants to call back into Dart from an arbitrary thread, which isolate does the Dart code run in?".

In addition to introducing these new concepts the proposal tries to suggest a number of API changes to various core libraries which are essentially to making Dart a good multithreaded language. Some of this proposals, like adding atomics, are fairly straightforward and non-controversial, others, like coroutines, are included to show the extent of possible changes and to provoke thought.

The Prototyping Roadmap section of the proposal tries to suggest a possible way forward with validating some of the possible benefits of the proposal without committing to specific major changes in the Dart language.

What issues are we trying to solve?

Parallelizing work-loads

When using isolates it is relatively straightforward to parallelize two types of workloads:

The output is a function of input without any significant dependency on other state and the input is cheap to send to another isolate. In this case developer can use Isolate.run to off load the computation to another isolate without paying significant costs for the transfer of data.
The computation that is self contained and runs in background producing outputs that are cheap to send to another isolate. In this case a persistent isolate can be spawned and stream data to the spawner.

Anything else hits the problem that transferring data between isolates is asynchronous and incurs copying costs which are linear in the size of transferred data.

Consider for example a front-end for Dart language which tries to parse a large Dart program. It is possible to parallelize parsing of strongly connected components in the import graph, however you can't fully avoid serialization costs - because resulting ASTs can't be directly shared between isolates. Similar example is parallelizing loading of ASTs from a large Kernel binary.

Note

Users can create data structures outside of the Dart heap using dart:ffi to allocate native memory and view it as typed arrays and structs. However, adopting such data representation in an existing Dart program is costly and comes with memory management challenges characteristic of low-level programming languages like C. That's why we would like to enable users to share data without requiring them to manually manage lifetime of complicated object graphs.

It is worth highlighting shared memory multithreading does not necessarily imply simultaneous access to mutable data. Developers can still structure their parallel code using isolates and message passing - but they can avoid the cost of copying the data by sending the message which can be directly shared with the receiver rather than copied.

Interoperability

Consider the following C code using a miniaudio library:

static void DataCallback(
   ma_device* device, void* out, const void* in, ma_uint32 frame_count) {
  // Synchronously process up to |frame_count| frames.
}

ma_device_config config = ma_device_config_init(ma_device_type_playback);
// This function will be called when miniaudio needs more data.
// The call will happend on a backend specific thread dedicated to
// audio playback.
config.dataCallback      = &DataCallback;
// ...

Porting this code to Dart using dart:ffi is currently impossible, as FFI only supports two specific callback types:

NativeCallable.isolateLocal: native caller must have an exclusive access to an isolate in which callback was created. This type of callback works if Dart calls C and C calls back into Dart synchronously. It also works if caller uses VM C API for entering isolates (e.g.Dart_EnterIsolate/Dart_ExitIsolate).
NativeCallable.listener: native caller effectively sends a message to the isolate which created the callback and does not synchronously wait for the response.

Neither of these work for this use-case where native caller wants to perform a synchronous invocation. There are obvious ways to address this:

Create a variation of NativeCallable.isolateLocal which enters (and after call leaves) the target isolate if the native caller is not already in the target isolate.
Create NativeCallable.onTemporaryIsolate which spawns (and after call destroys) a temporary isolate to handle the call.

Neither of these are truly satisfactory:

Allowing isolateLocal to enter target isolate means that it can block the caller if the target isolate is busy doing something (e.g. processing some events) in parallel. This is unacceptable in situations when the caller is latency sensitive e.g. audio thread or even just the main UI thread of an application.
Using temporary isolate comes with a bunch of ergonomic problems as every invocation is handled using a freshly created environment and no static state is carried between invocations - which might surprise the developer. Sharing mutable data between invocations requires storing it outside of Dart heap using dart:ffi.

Note

This particular example might not look entirely convincing because it can be reasonably well solved within confines of isolate model.

When you look at an isolate as a bag of state guarded by a mutex, you eventually realize that this bag is simply way too big - it encompasses the static state of the whole program - and this is what makes isolates unhandy to use. The rule of thumb is that coarse locks lead to scalability problems.

How do you solve it?

Spawn an isolate that does one specific small task. In the context of this example:

One isolate (consumer) is spawned to do nothing but synchronously handle DataCallback calls (using an extension of isolateLocal which enters and leaves isolate is required).
Another isolate (producer) is responsible for generating the data which is fed to the audio-library. The data is allocated in the native heap and directly shared with consumer.

However, isolates don't facilitate this style of programming. They are too coarse - so it is easy to make a mistake, touch a static state you are not supposed to touch, call a dependency which schedules an asynchronous task, etc. Furthermore, you do still need a low overhead communication channel between isolates. The shared memory is still part of the solution here, even though in this particular example we can manage with what dart:ffi allows us. And that is, in my opinion, a pretty strong signal in favor of more shared memory support in the language.

Another variation of this problem occurs when trying to use Dart for sharing business logic and creating shared libraries. Imagine that dart:ffi provided a way to export static functions as C symbols:

// foo.dart
import 'dart:ffi' as ffi;

// See https://dartbug.com/51383 for discussion of [ffi.Export] feature.
@ffi.Export()
void foo() {

}

Compiling this produces a shared library exporting a symbol with C signature:

// foo.h

extern "C" void foo();

The native code loads shared library and calls this symbol to invoke Dart code. Would not this be great?

Unfortunately currently there is no satisfactory way to define what happens when native code calls this exported symbol as the execution of foo is only meaningful within a specific isolate. Should foo create an isolate lazily? Should there be a single isolate or multiple isolates? What happens if foo is called concurrently from different threads? When should this isolate be destroyed?

These are all questions without satisfactory answers due to misalignment in execution modes between the native caller and Dart.

Finally, the variation of the interop problem exists in an opposite direction: invoking a native API from Dart on a specific thread. Consider the following code for displaying a file open dialog on Mac OS X:

NSOpenPanel* panel = [NSOpenPanel openPanel];

// Open the panel and return. When user selects a file
// the passed block will be invoked.
[panel beginWithCompletionHandler: ^(NSInteger result){
   // Handle the result.
}];

Trying to port this code to Dart hits the following issue: you can only use this API on the UI thread and Dart's main isolate is not running on the UI thread. Workarounds similar to discussed before can be applied here as well. You wrap a piece of Dart code you want to call on a specific thread into a function and then:

Send Dart isolateLocal callback to be executed on the specific thread, but make it enter (and leave) the target isolate.
Create an isolate specific to the target thread (e.g. special platform isolate for running on main platform thread) and have callbacks to be run in that isolate.

However the issues described above equally apply here: you either hit a problem with stalling the caller by waiting to acquire an exclusive access to an isolate or you hit a problem with ergonomics around the lack of shared state.

See go/dart-interop-native-threading and go/dart-platform-thread for more details around the challenge of crossing isolate-to-thread chasm and why all different solutions fall short.

Map of the Territory

Before we discuss our proposal for Dart it is worth look at what other popular and niche languages do around share memory multithreading. If you feel familiar with the space feel free to skip to Shareable Data section.

C/C++, Java, Scala, Kotlin, C# all have what I would call an unrestricted shared memory multithreading:

objects can be accessed (read and written to) from multiple threads at once
static state is shared between all threads
you can spawn new threads using core APIs, which are part of the language
you can execute code on any thread, even when the thread was spawned externally: thread spawned by Java can execute C code and the other way around thread spawned by C can execute Java code (after a little dance of attaching Java VM to a thread).

Python and Ruby, despite being scripting languages, both provide similar capabilities around multithreading as well (see Thread in Ruby and threading library in Python). The following Ruby program will spawn 10 threads which all update the same global variable:

count = 0
threads = 5.times.map do |i|
  puts "T#{i}: starting"
  Thread.new do
    count += 1
    puts "T#{i}: done"
  end
end

threads.each { |t| t.join }
puts "Counted to #{count}"

$ ruby test.rb
T0: starting
T1: starting
T2: starting
T3: starting
T4: starting
T1: done
T4: done
T0: done
T2: done
T3: done
Counted to 5

Concurrency in both languages is severely limited by a global lock which protects interpreter's integrity. This lock is known Global Interpreter Lock (GIL) in Python and Global VM Lock (GVL) in Ruby. GIL/GVL ensures that an interpreter is only running on one thread at a time. Scheduling mechanisms built into the interpreter allow it to switch between threads giving each a chance to run concurrently. This means executions of Python/Ruby code on different threads are interleaved, but serialized. You can observe non-atomic behaviors and data races (the VM will not crash though), but you can't utilize multicore capabilities. CPython developers are actively exploring the possibility to remove the GIL see PEP 703.

Erlang is a functional programming language for creating highly concurrent distributed systems which represents another extreme: no shared memory multithreading or low-level threading primitives at all. Design principles behind Erlang are summarized in Joe Armstrong's PhD thesis Making reliable distributed systems in the presence of software errors. Isolation between lightweight processes which form a running application is the idea at the very core of Erlang's design, to quote section 2.4.3 of the thesis:

The notion of isolation is central to understanding COP, and to the construction of fault-tolerant software. Two processes operating on the same machine must be as independent as if they ran on physically separated machines

...

Isolation has several consequences:

Processes have “share nothing” semantics. This is obvious since they are imagined to run on physically separated machines.

Message passing is the only way to pass data between processes. Again since nothing is shared this is the only means possible to exchange data.

Isolation implies that message passing is asynchronous. If process communication is synchronous then a software error in the receiver of a message could indefinitely block the sender of the message destroying the property of isolation.

Since nothing is shared, everything necessary to perform a distributed computation must be copied. Since nothing is shared, and the only way to communicate between processes is by message passing, then we will never know if our messages arrive (remember we said that message passing is inherently unreliable.) The only way to know if a message has been correctly sent is to send a confirmation message back.

JavaScript is a variation of communicating event-loops model and its capabilities clearly both inspired and defined capabilities of Dart's own isolate model. An isolated JavaScript environment allows only for a single thread of execution, but multiple such environments (workers) can be spawned in parallel. These workers share no object state and communicate via message passing which copies the data sent with an exception of transferrable objects . Recent versions of JavaScript poked a hole in the isolation boundary by introducing SharedArrayBuffer: allowing developers to share unstructured blobs of memory between workers.

Go concurrency model can be seen as an implementation of communicating sequential processes formalism proposed by Hoare. Go applications are collections of communicating goroutines, lightweight threads managed by Go runtime. These are somewhat similar to Erlang processes, but are not isolated from each other and instead execute inside a shared memory space. Goroutines communicate using message passing through channels. Go does not prevent developer from employing shared memory and provides a number of classical synchronization primitives like Mutex, but heavily discourages this. Effective Go contains the following slogan:

Do not communicate by sharing memory; instead, share memory by communicating.

Note

It is worth pointing out that managed languages which try to hide shared memory (isolated environments of JavaScript) and languages which try to hide threading (Go, JavaScript, Erlang) are bound to have difficulties communicating with languages which don't hide these things. These differences create an impedance mismatch between native caller and managed callee or the other way around. This is similar to what Dart is experiencing.

Rust gives developers access to shared memory multithreading, but leans onto ownership expressed through its type system to avoid common programming pitfalls. See Fearless Concurrency for an overview. Rust provides developers with tools to structure their code both using shared-state concurrency and message passing concurrency. Rust type system makes it possible to express ownership transfer associated with message passing, which means the message does not need to be copied to avoid accidental sharing.

Rust is not alone in using its type system to eliminate data races. Another example is Pony and its reference capabilities.

Swift does not fully hide threads and shared memory multi-threading, but it provides high-level concurrency abstractions tasks and actors on top of low-level mechanisms (see Swift concurrency for details). Swift actors provide a built-in mechanism to serialize access to mutable state: each actor comes with an executor which runs tasks accessing actor's state. External code must use asynchronous calls to access the actor:

actor A {
    var data: [Int]

    func add(value: Int) -> Int {
      // Code inside an actor is fully synchronous because
      // it has exclusive access to the actor.
      data.append(value)
      return data.count
    }
}

let actor : A

func updateActor() async {
  // Code outside of the actor is asynchronous. The actor
  // might be busy so we might need to suspend and wait
  // for the reply.
  let count = await actor.add(10)
}

Swift is moving towards enforcing "isolation" between concurrency domains: a value can only be shared across the boundary (e.g. from one actor to another) if it conforms to a Sendable protocol. Compiler is capable of validating obvious cases: a deeply immutable type or a value type which will be copied across the boundary are both obviously Sendable. For more complicated cases, which can't be checked automatically, developers have an escape hatch of simply declaring their type as conformant by writing @unchecked Sendable which disables compiler enforcement. Hence my choice of putting isolation in quotes.

OCaml is multi-paradigm programming language from the ML family of programming languages. Prior to version 5.0 OCaml relied on a global runtime lock which serialized access to the runtime from different threads meaning that only a single thread could run OCaml code at a time - putting OCaml into the same category as Python/Ruby with their GIL/GVL. However in 2022 after 8 years of work OCaml 5.0 brought multicore capabilities to OCaml. The OCaml Multicore project was seemingly focused on two things:

Modernizing runtime system and GC in particular to support multiple threads of execution using runtime in parallel (see Retrofitting Parallelism onto OCaml).
Incorporating effect handlers into the OCaml runtime system as a generic mechanism on top of which more concrete concurrency mechanisms (e.g. lightweight threads, coroutines, async/await etc) could be implemented (see Retrofitting Effect Handlers onto OCaml).

Unit of parallelism in OCaml is a domain - it's an OS thread plus some associated runtime structures (e.g. thread local allocation buffer). Domains are not isolated from each other: they allocate objects in a global heap, which is shared between all domains and can access and mutate shared global state.

Shareable Data

I propose to extend Dart with a shared memory multithreading but provide a clear type-based delineation between two concurrency worlds. Only instances of classes implementing Shareable interface can be concurrently mutated by another thread. We restrict the kind of data that a shareable class can contain: fields of shareable classes can only contain references to instances of shareable classes. This requirement is enforced at compile time by requiring that declared types of all fields are shareable.

// dart:core

/// [Shareable] instances can be shared between isolates in
/// a group and mutated concurrently.
///
/// A class implementing [Shareable] can only declare fields
/// which have a declared type that is a subtype of [Shareable].
abstract interface class Shareable {
}

class S implements Shareable {
  // ...
}

Note

I choose marker interface rather than dedicated syntax (e.g. shareable class) because marker interface comes handy when declaring type bounds for generics and allows to perform runtime checking if needed.

References to shareable instances can be passed between isolates within an isolate group directly.

class S implements Shareable {
  int v = 0;
}

void main() async {
  final s = S();

  await Isolate.run(() {
    // Even though the function is running in a different
    // isolate it has access to the original `s` and
    // mutations of `s.v` performed in this isolate will be
    // visible to other isolates.
    //
    // (Subject to memory model constraints).
    s.v = 10;
  });

  expect(equals(10), s.v);
}

To make state sharing between isolates simpler we also allow to declare static fields which are shared between all isolates in the isolate group. shared global fields are required to have shareable declared type.

// All isolates within an isolate group share this variable.
shared int v = 0;

Shareable classes are not required to be immutable. Field reads and writes are atomic, but other than that there are no implicit synchronization, locking or strong memory barriers associated with fields. Possible executions in terms of observed values will be specified by the Dart's memory model - which I propose to model after JavaScript's and Go's: program which is free of data races will execute in a sequentially consistent manner.

Caution

Atomicity of field access means considerable overhead on 32-bit platforms for reads/writes into unboxed int and double fields - because these fields are 64-bits wide. We might want to eschew atomicity guarantees and allow load tearing for primitive fields - though it makes semantics somewhat unpredictable and architecture dependent. The same concern applies to unboxed SIMD values inside Shareable objects.

Shareable Types

Type is shareable if and only if one of the following applies:

It is Null or Never.

It is an interface type which is subtype of Shareable.

It is a record type where all field types are shareable.

It is a nullable type T? where T is shareable type.

Why type based opt-in?

There are two main reasons for choosing explicit opt-in into shareability:

Aligning with emerging JS and Wasm capabilities. Current proposals for shared memory multithreading on the Web (see details below) propose partitioning shareable objects from non-shareable. We would like to make sure that Dart's semantics is possible to translate to both JS and Wasm - allowing us to fully implement our concurrency story on the Web.
Enabling graceful adoption of shared memory multithreading. Shared memory multithreading is complicated enough by itself, so I feel that it would complicate things more if we were to forcefully bring all existing libraries (not written with shared memory in mind) into the world where any data can be accessed from multiple threads at once. Existing code will continue to run as is within isolates. Newly written code can choose to opt-in into shareability where it matters for performance or interoperability reasons. Marking class as Shareable gives a strong signal that the developer has considered implications of sharing instances of these class across threads and took measure to ensure that it is safe.

I would like to introduce shared memory multithreading into Dart in a way that avoids subtle breakages in the existing code. Consider for a moment a library which maintains a global cache internally and is written under the assumption that Dart is a single threaded:

int _nextId = 0;

int allocateId() => _nextId++;

This was a valid way to structure this code in single-threaded Dart, but the same code becomes thread unsafe in the presence of shared memory multithreading.

Generics

When declaring a generic type the developer will have to use type parameter bounds to ensure that resulting class conforms to restrictions imposed by Shareable:

class X<T> implements Shareable {
  T v;  // compile time error.
}

class Y<T extends Shareable> implements Shareable {
  T v;  // ok.
}

Note

We could really benefit from the ability to use intersection types here (see #2709 and #1152, which would allow to specify complicated bounds like T extends Shareable & I. In the absence of intersections types developers would be forced to declare intermediate interfaces which implement all required interfaces (e.g. abstract interface ShareableI implements Shareable, I {}) and require users to implement those by specifying T extends ShareableI.

Functions

Shareability of a function depends on the values that it captures. We could define that any function is shareable iff it captures only variables of shareable declared type. Incorporating this property into the type system naturally leads to the desire to use intersection types to express the property that some value is both a function of a specific type and shareable:

class A implements Shareable {
  void Function() f;  // compile time error
  void Function() & Shareable f;  // ok
}

Introducing intersection types into type system might be a huge undertaking. For the purposes of developing an MVP we can choose one of the two approaches:

Ignore functions entirely: consider functions un-shareable. It becomes a compile time error to have a function type field inside a shareable class.
Allow function type fields inside a shareable class, but enforce shareability in runtime on assignment to the field.

Shareable core types

The following types will become shareable:

num (int and double), String, bool, Null, BigInt
Enum and consequently all user defined enums
RegExp, DateTime, Uri
TypedData - which makes all typed data types shareable.
Pointer
Type and Symbol
StackTrace

Collections

dart:core will provide shareable variants of all collection classes:

abstract interface class ShareableList<E extends Shareable?>
    implements List<E>, Shareable {
}

abstract interface class ShareableSet<E extends Shareable?>
    implements Set<E>, Shareable {
}

abstract interface class ShareableMap<K extends Shareable?,
                                  V extends Shareable?>
    implements Map<K, V>, Shareable {
}

Default implementations of List, Set and Map will be changed to be shareable if their element type is shareable

<Shareable>[] is ShareableList<Shareable> // => true

We will also provide methods to convert collections to their shareable counterparts:

extension ToShareableList<E extends Shareable?> on List<E> {
  /// Converts the list to [ShareableList] if it is not already shareable.
  ShareableList<E> toShareable() =>
    switch (this) {
      final ShareableList<Shareable?> shareable => shareable,
      _ => ShareableList<E>.from(this),
    };
}

extension ToShareableSet<E extends Shareable?> on Set<E> {
  ShareableSet<E> toShareable() => /* ... */
}

extension ToShareableMap<K extends Shareable?,
                         V extends Shareable?> on Map<K, V> {
  ShareableMap<K, V> toShareable() => /* ... */
}

`SendPort` semantics

SendPort is extended with a new method to which allows sending Shareable values by reference without copying:

abstract interface class SendPort {
    /// Sends an asynchronous [message] through this send port, to its
    /// corresponding [ReceivePort].
    ///
    /// The message is passed by reference to the receiver without
    /// copying.
    ///
    /// If sender and receiver do not share the same code then
    /// an [IllegalArgument] exception is thrown.
    void share(Shareable message);
}

`ShareableBox<T>`

In some situations we might need to put a non-shareable value inside a shareable type. This is okay as long as we can guarantee that this value will only be accessed within the isolate it originally belonged to.

abstract interface class ShareableBox<T> implements Shareable {
    factory ShareableBox(T value);
    T get value;
}

abstract interface class MutableShareableBox<T> implements ShareableBox<T> {
    factory MutableShareableBox(T value);
    set value(T newValue);
}

Note

It is possible to implement ShareableBox<T> on top of Expando but for efficiency reasons we might want to provide a built-in implementation

class _MutableShareableBoxImpl<T> implements MutableShareableBox<T> {
    final _lock = Lock();
    var _token = _AccessToken();
    static final _values = Expando<AccessToken, (T,)>();

    T get value => _lock.runLocked(() =>
       switch (_values[_token]) {
         (final v,) => v,
         _ => throw StateError("not owned by current isolate"),
       };
    });

    set value(T v) {
       _lock.runLocked(() {
         // If we don't own this box then reset access token so that
         // expandos in the other isolate can get cleared.
         if (_values[_token] == null) {
     	    _token = _AccessToken();
         }
         _values[_token] = v;
       });
    }
}

final class _AccessToken implements Shareable {}

Shared Isolates

Lets take another look at the following example:

int global = 0;

void main() async {
  global = 42;
  await Isolate.run(() {
    print(global);  // => 0
    global = 24;
  });
  print(global);  // => 42
}

Stripped to the bare minimum the example does not seem to behave in a confusing way: it seems obvious that each isolate has its own version of global variable and mutations of global are not visible across isolates. However, in the real world code such behavior might be hidden deep inside a third party dependency and thus much harder to detect and understand. This behavior also makes interoperability with native code more awkward than it ought to be: calling Dart requires an isolate, something that native code does not really know or care about. Consider for example the following code:

int global;

@pragma('vm:entry-point')
int foo() => global++;

The result of calling foo from the native side depends on which isolate the call occurs in.

shared global variables allow developers to tackle this problem - but hidden dependency on global state might introduce hard to diagnose and debug bugs.

I propose to tackle this problem by introducing the concept of shared isolate: code running in a shared isolate can only access shared state and not any of isolated state, an attempt to access isolated state results in a dynamic IsolationError.

// dart:isolate

class Isolate {
  /// Run the given function [f] in the _shared isolate_.
  ///
  /// Shared isolate contains a copy of the
  /// global `shared` state of the current isolate and does not have any
  /// non-`shared` state of its own. An attempt to access non-`shared` static variable throws [IsolationError].
  ///
  /// If [task] is not [Shareable] then [ArgumentError] is thrown.
  static Future<S> runShared<S extends Shared>(S Function() task);
}

int global = 0;

shared int sharedGlobal = 0;

void main() async {
  global = 42;
  sharedGlobal = 42;
  await Isolate.runShared(() {
    print(global);  // IsolationError: Can't access 'global' when running in shared isolate
    global = 24;  // IsolationError: Can't access 'global' when running in shared isolate

    print(sharedGlobal);  // => 42
    sharedGlobal = 24;
  });
  print(global);  // => 42
  print(sharedGlobal);  // => 24
}

Why not compile time isolation?

It is tempting to try introducing a compile time separation between functions which only access shared state and functions which can access isolated state. However an attempt to fit such separation into the existing language quickly breaks down.

One obvious approach is to introduce a modifier (e.g. shared) which can be applied to function declarations and impose a number of restrictions that shared functions have to satisfy. These restrictions should guarantee that shared functions can only access shared state.

shared void foo() {
  // ...
}

You can't override non-shared method with shared method.
Within a shared function
- If f(...) is a static function invocation then f must be a shared function.
- If o.m(...) is an instance method invocation, then o must be a subtype of Shareable and m must be shared method.
- If C(...) is a constructor invocation then C must be a shareable class.
- If g is a reference to a global variable then g must be shared.

Note

You can pass non-Shareable values to shared methods but you can't do anything useful with them because you can't touch their state. In other words non-Shareable types become opaque within shared methods.

This approach seems promising on the surface, but quickly hits issues:

It's unclear how to treat Object members like toString, operator == and get hashCode. These can't be marked as shared but should be accessible to both shared and non-shared code.
It's unclear how to treat function expression invocations:
- Function types don't encode necessary separation between shared and non-shared functions.
- Methods like List<T>.forEach pose challenge because they should be usable in both shared and non-shared contexts.

This makes us think that language changes required to achieve sound compile time delineation between shared and isolate worlds are too complicated to be worth it.

Upgrading `dart:ffi`

Introduction of shared isolate allows to finally address the problem of native code invoking Dart callbacks from arbitrary threads. NativeCallable can be extended with the corresponding constructor:

class NativeCallable<T extends Function> {
  /// Constructs a [NativeCallable] that can be invoked from any thread.
  ///
  /// When the native code invokes the function [nativeFunction], the corresponding
  /// [callback] will be synchronously executed on the same thread within a
  /// shared isolate corresponding to the current isolate group.
  ///
  /// [callback] must be [Shareable] that is: all variables it captures must
  /// have shareable declared type.
  external factory NativeCallable.shared(
    @DartRepresentationOf("T") Function callback,
    {Object? exceptionalReturn});
}

The function pointer returned by NativeCallable.shared(...).nativeFunction will be bound to an isolate group which produced it using the same trampoline mechanism FFI uses to create function pointers from closures. Returned function pointer can be called by native code from any thread. It does not require exclusive access to a specific isolate and thus avoids interoperability pitfalls associated with that:

No need to block native caller and wait for the target isolate to become available.
Clear semantics of globals:
- shared global state is accessible and independent from the current thread;
- accessing non-shared state will throw an IsolationError.

Linking to Dart code from native code

An introduction of shared isolate allows us to adjust our deployment story and make it simpler for native code to link, either statically or dynamically, to Dart code.

Note

Below when I say native library I mean a static library or shared object produced from Dart code using an AOT compiler. Such native library can be linked with the rest of the native code in the application either statically at build time or dynamically at runtime using appropriate native linkers provided by the native toolchain or the OS. The goal here is that using Dart from a native application becomes indistinguishable from using a simple C library.

Consider for example previously given in the Interoperability section:

// foo.dart

import 'dart:ffi' as ffi;

// See https://dartbug.com/51383 for discussion of [ffi.Export] feature.
@ffi.Export()
void foo() {

}

which produces a native library exporting a C symbol:

// foo.h

extern "C" void foo();

Shared isolates give us a tool to define what happens when foo is invoked by a native caller:

There is a 1-1 correspondence between the native library and an isolate group corresponding to this native library (e.g. there is a static variable somewhere in the library containing a pointer to the corresponding isolate group).
When an exported symbol is invoked the call happens in the shared isolate of that isolate group.

Note

Precise mechanism managing isolate group's lifetime does not matter for the purposes of the document and belongs to the separate discussion.

Core Library Changes

Upgrading `dart:async` capabilities

Shareable `Future` and `Stream` instances

Future and Stream should receive the same treatment as List: dart:async should be extended with shareable versions of these and a way to convert an existing object to a shareable one.

class ShareableFuture<T implements Shareable?>
    implements Future<T>, Shareable {
}

class ShareableStream<T implements Shareable?>
    implements Stream<T>, Shareable {
}

extension ToShareableFuture<E extends Shareable?> on Future<E> {
  /// Converts the [Future] to [ShareableFuture] if it is not already shareable.
  ShareableFuture<E> toShareable() => /* ... */;
}

extension ToShareableStream<E extends Shareable?> on Stream<E> {
  /// Converts the [Future] to [ShareableFuture] if it is not already shareable.
  ShareableStream<E> toShareable() => /* ... */;
}

The reason for providing these is to allow developers to structure their code using well understood primitives futures and streams instead of devising new primitives which reimplement some of the same functionality and is compatible with threading.

Executors of async callbacks

Consider the following code:

Future<void> foo() async {
  await something();
  print(1);
}

void main() async {
  await Isolate.runShared(() async {
    await foo();
  });
}

What happens when Future completes in a shared isolate? Who drives event loop of that isolate? Which thread will callbacks run on?

I propose to introduce another concept similar to Zone: Executor. Executors encapsulate the notion of the event loop and control how tasks are executed.

abstract interface class Executor implements Shareable {
    /// Returns the current executor.
    static Executor get current;

    Isolate get owner;

    /// Schedules the given task to run in the given executor.
    ///
    /// If the current isolate is not the [owner] of this executor
    /// the behavior depends on [copy]:   ///
    ///   * When [copy] is `false` (which is default) [task] is
    ///     expected to be [Shareable] and is transfered to the
    ///     target isolate directly. If it is not [Shareable] then
    ///     `ArgumentError` will be thrown.
    ///   * If [copy] is `true` then [task] is copied to the
    ///     target isolate using the same algorithm [SendPort.send]
    ///     uses.
    ///
    void schedule(void Function() task,
                  {bool copy = false});
}

How a particular executor runs scheduled tasks depends on the executor itself: e.g. an executor can have a pool of threads or notify an embedder that it has tasks to run and let embedder run these tasks.

All built-in asynchronous primitives will make the following API guarantee: a callback passed to Future or Stream APIs will be invoked using executor which was running the code which registered the callback.

Note

We need to be careful here to prevent crossing shareable and non-shareable domains. If you have a Future<T> where T is not a subtype of Shareable we should not allow registering multiple callbacks on it in a shared isolate because these callbacks end up running concurrently.

Consider for example the following code:

final executor = ThreadPool(concurrency: 2);

executor.schedule(() {
  final Future<List<int>> list = Future.value(<int>[]);
  list..then(cb1)..then(cb2);
});

In other words fut1 = fut.then(cb) is equivalent to:

final result = ShareableBox(Completer<R>());
final callback = ShareableBox(Zone.current.bind(cb));
final executor = Executor.current;
fut.then((v) {
  executor.schedule(() {
    try {
      final r = callback.value(v);
      result.value.complete(r);
    } catch (e, st) {
      result.value.completeError(e, st);
    }
  });
});
final fut1 = result.value.future;

Note

There is a clear parallel between Executor and Zone: asynchronous callbacks attached toStream and Future are bound to the current Zone. Original design suggested to treat Zone as an executor - but this obscured the core of the proposal, so current version splits this into a clear separate concept of Executor.

Structured Concurrency

Structured concurrency is a way of structuring concurrent code where lifecycle of concurrent tasks has clear relationship to the control-flow structure of the code which spawned those tasks. One of the most important properties of structured concurrency is an ability to cancel pending subtasks and propagate the cancellation recursively.

Consider for example the following code:

Future<Result> doSomething() async {
    final (a, b) = await (requestA(), computeB()).wait;
    return combineIntoResult(a, b);
}

If Dart supported structured concurrency, then the following would be guaranteed:

If either requestA or computeB fails, then the other is canceled.
doSomething computation can be canceled by the holder of the Future<Result> and this cancellation will be propagated into requestA and computeB.
If computeB throws an error before requestA is awaited then requestA still gets properly canceled.

Upgrading dart:async capabilities in the wake of shared-memory multithreading is also a good time to introduce some of the structured concurrency concepts into the language. See Cancellable Future proposal for the details of how this could work in Dart.

`dart:concurrent`

dart:concurrent will serve as a library hosting low-level concurrency primitives.

`Thread` and `ThreadPool`

Isolates are not threads even though they are often confused with ones. A code running within an isolate might be executing on a dedicated OS thread or it might running on a dedicated pool of threads. When expanding Dart's multithreading capabilities it seems reasonable to introduce more explicit ways to control threads.

// dart:concurrent

abstract class Thread implements Shareable {
  /// Runs the given function in a new thread.
  ///
  /// The function is run in a shared isolate, meaning that
  /// it will not have access to the non-shared state.
  ///
  /// The function will be run in a `Zone` which uses the
  /// spawned thread as an executor for all callbacks: this
  /// means the thread will remain alive as long as there is
  /// a callback referencing it.
  external static Thread start<T>(FutureOr<T> Function() main);

  /// Returns the current thread on which the execution occurs.
  ///
  /// Note: Dart code is only guaranteed to be pinned to a specific OS thread during a synchronous execution.
  external static Thread get current;

  Future<void> join();

  void interrupt();

  external set priority(ThreadPriority value);
  ThreadPriority get priority;
}

/// An [Executor] backed by a fixed size thread
/// pool and owned by the shared isolate of the
/// current isolate (see [Isolate.runShared]).
abstract class ThreadPool implements Executor {
  external factory ThreadPool({required int concurrency});
}

Additionally I think providing a way to synchronously execute code in a specific isolate on the current thread might be useful:

class Isolate {
    T runSync<T>(T Function() cb);
}

Asynchronous code and threads

Consider the following example:

Thread.start(() async {
  var v = await foo();
  var u = await bar();
});

Connecting execution / scheduling behavior to Zone allows us to give a clear semantics to this code: this code will run on a specific (newly spawned) thread and will not change threads between suspending and resumptions.

Atomic operations

AtomicRef<T> is a wrapper around a value of type T which can be updated atomically. It can only be used with true reference types - an attempt to create an AtomicRef<int>, AtomicRef<double> , AtomicRef<(T1, ..., Tn)> will throw.

Note

AtomicRef uses method based load / store API instead of simple getter/setter API (i.e. abstract T value) for two reasons:

We want to align this API with that of extensions like Int32ListAtomics, which use atomicLoad/atomicStore naming
We want to keep a possibility to later extend these methods, e.g. add a named parameter which specifies particular memory ordering.

// dart:concurrent

class AtomicRef<T extends Shareable> implements Shareable {
  /// Atomically updates the current value to [desired].
  ///
  /// The store has release memory order semantics.
  void store(T desired);

  /// Atomically reads the current value.
  ///
  /// The load has acquire memory order semantics.
  T load();

  /// Atomically compares whether the current value is identical to
  /// [expected] and if it is sets it to [desired] and returns
  /// `(true, expected)`.
  ///
  /// Otherwise the value is not changed and `(false, currentValue)` is
  /// returned.
  (bool, T) compareAndSwap(T expected, T desired);
}

class AtomicInt32 implements Shareable {
  void store(int value);
  int load();
  (bool, int) compareAndSwap(int expected, int desired);

  int fetchAdd(int v);
  int fetchSub(int v);
  int fetchAnd(int v);
  int fetchOr(int v);
  int fetchXor(int v);
}

extension Int32ListAtomics on Int32List {
  void atomicStore(int index, int value);
  int atomicLoad(int index);
  (bool, int) compareAndSwap(int index, int expected, int desired);
  int fetchAdd(int index, int v);
  int fetchSub(int index, int v);
  int fetchAnd(int index, int v);
  int fetchOr(int index, int v);
  int fetchXor(int index, int v);
}

// These extension methods will only work on fixed-length builtin
// List<T> type and will throw an error otherwise.
extension RefListAtomics<T extends Shareable> on List<T> {
  void atomicStore(int index, T value);
  T atomicLoad(int index);
  (bool, T) compareAndSwap(T expected, T desired);
}

Locks and conditions

At the bare minimum libraries should provide a non-reentrant Lock and a Condition. However we might want to provide more complicated synchronization primitives like re-entrant or reader-writer locks.

// dart:concurrent

// Non-reentrant Lock.
class Lock implements Shareable {
  void acquireSync();
  bool tryAcquireSync({Duration? timeout});

  void release();

  Future<void> acquire();
  Future<bool> tryAcquire({Duration? timeout});
}

class Condition implements Shareable {
  bool waitSync(Lock lock, {Duration? timeout});
  Future<bool> wait(Lock lock, {Duration? timeout});

  void notify();
  void notifyAll();
}

Note

Java has a number of features around synchronization:

It allows any object to be used for synchronization purposes.
It has convenient syntax for grabbing a monitor associated with an object: synchronized (obj) { /* block */ }.
It allows marking methods with synchronized keyword - which is more or less equivalent to wrapping method's body into synchronized block.

I don't think we want these features in Dart:

Supporting synchronization on any object comes with severe implementation complexity.
A closure based API R withLock<R>(lock, R Function() body) should provide a good enough alternative to special syntactic forms like synchronized.
An explicit locking in the body of the method is clearer than implicit locking introduced by an attribute.

Coroutines

Given that we are adding support for OS threads we should consider if we want to add support for coroutines (also known as fibers, or lightweight threads) as well.

abstract interface class Coroutine {
  /// Return currently running coroutine if any.
  static Coroutine? get current;

  /// Create a suspended coroutine which will execute the given
  /// [body] when resumed.
  static Coroutine create(void Function() body);

  /// Suspends the given currently running coroutine.
  ///
  /// This makes `resume` return with
  /// Expects resumer to pass back a value of type [R].
  static void suspend();

  /// Resumes previously suspended coroutine.
  ///
  /// If there is a coroutine currently running the suspends it
  /// first.
  void resume();

  /// Resumes previously suspended coroutine with exception.
  void resumeWithException(Object error, [StackTrace? st]);
}

Coroutines is a very powerful abstraction which allows to write straight-line code which depends on asynchronous values.

Future<String> request(String uri);

extension FutureSuspend<T> on Future<T> {
  T get value {
    final cor = Coroutine.current ?? throw 'Not on a coroutine';
    late final T value;
    this.then((v) {
      value = v;
      cor.resume();
    }, onError: cor.resumeWithException);
    cor.suspend();
    return value;
  }
}

List<String> requestAll(List<String> uris) =>
  Future.wait(uris.map(request)).value;

SomeResult processUris(List<String> uris) {
	final data = requestAll(uris);
  // some processing of [data]
  // ...
}

void main() {
  final uris = [...];
  Coroutine.create(() {
    final result = processUris(uris);
    print(result);
  }).resume();
}

Blocking operations

It might be useful to augment existing types like Future, Stream and ReceivePort with blocking APIs which would only be usable in shared isolate and under condition that it is not going to block the executor's event loop.

`dart:ffi` updates

dart:ffi should expose atomic reads and writes for native memory.

extension Int32PointerAtomics on Pointer<Int32> {
  void atomicStore(int value);
  int atomicLoad();
  (bool, int) compareAndSwap(int expected, int desired);
  int fetchAdd(int v);
  int fetchSub(int v);
  int fetchAnd(int v);
  int fetchOr(int v);
  int fetchXor(int v);
}

extension IntPtrPointerAtomics on Pointer<IntPtr> {
  void atomicStore(int value);
  int atomicLoad();
  (bool, int) compareAndSwap(int expected, int desired);
  int fetchAdd(int v);
  int fetchSub(int v);
  int fetchAnd(int v);
  int fetchOr(int v);
  int fetchXor(int v);
}

extension PointerPointerAtomics<T> on Pointer<Pointer<T>> {
  void atomicStore(Pointer<T> value);
  Pointer<T> atomicLoad();
  (bool, Pointer<T>) compareAndSwap(Pointer<T> expected, Pointer<T> desired);
}

For convenience reasons we might also consider making the following work:

final class MyStruct extends Struct {
  @Int32()
  external final AtomicInt value;
}

The user is expected to use a.value.store(...) and a.value.load(... to access the value.

Caution

Support for AtomicInt in FFI structs is meant to enable atomic access to fields without requiring developers to go through Pointer based atomic APIs. It is not meant as a way to interoperate with structs that contain std::atomic<int32_t> (C++) or _Atomic int32_t (C11) because these types don't have a defined ABI.

Prototyping Roadmap

The change of this impact has to be carefully evaluated. I suggest we start with bare minimum needed to validate share memory concurrency in a realistic setting:

Implement shared global fields using a VM specific pragma @pragma('vm:shared').
Hiding under an experimental flag and in a separate library (e.g. dart:concurrent):

Introduce Shareable interface and enforce suggested restrictions using a Kernel transformation instead of incorporating them into CFE.
Introduce minimum amount of core library changes (e.g. ShareableList).

With these changes we can try prototyping a multicore based optimization in either CFE or analyzer and assess the usability and the impact of the change.

Next we can add support for shared isolates and use that to prototype and evaluate benefits for the interop (e.g. write some code which uses thread pinned native API).

With feedback from these experiments we can update the proposal and formulate concrete plans on how we should proceed.

Appendix

Memory Models

Memory model describes the range of possible behaviors of multi-threaded programs which read and write shared memory. Programmer looks at the memory model to understand how their program will behave. Compiler engineer looks at the memory model to figure out which code transformations and optimization are valid. The table below provides an overview of memory models for some widely used languages.

Language	Memory Model
C#	Language specification itself (ECMA-334) does not describe any memory mode. Instead the memory model is given in Common Language Infrastructure (ECMA-335, section I.12.6 Memory model and optimizations). ECMA-335 memory model is relatively weak and CLR provides stronger guarantees documented here. See dotnet/runtime#63474 and dotnet/runtime#75790 for some additional context.
JavaScript	Memory model is documented in ECMA-262 section 13.0 Memory Model. This memory model is fairly straightforward: it guarantees sequential consistency for atomic operations, while leaving other operations unordered.
Java	Given in Java Language Specification (JLS) section 17.4
Kotlin	Not documented. Kotlin/JVM effectively inherits Java's memory model. Kotlin/Native - does not have a specified memory model, but likely follows JVM's one as well.
C++	Given in the section Multi-threaded executions and data races of the standard (since C++11). Notably very fine grained
Rust	No official memory model according to reference, however it is documented to "blatantly inherit C++20 memory model"
Swift	Defined to be consistent with C/C++. See SE-0282
Go	Given here: race free programs have sequential consistency, programs with races still have some non-deterministic but well-defined behavior.

Platform capabilities

When expanding Dart's capabilities we need to consider if this semantic can be implemented across the platforms that Dart runs on.

Native

No blockers to implement any multithreading behavior. VM already has a concept of isolate groups: multiple isolates concurrently sharing the same heap and runtime infrastructure (GC, program structure, debugger, etc).

Web (JS and Wasm)

No shared memory multithreading currently (beyond unstructured binary data shared via SharedArrayBuffer). However there is a Stage 1 TC-39 proposal JavaScript Structs: Fixed Layout Objects and Some Synchronization Primitives which introduces the concept of struct - fixed shape mutable object which can be shared between different workers. Structs are very similar to Shareable objects I propose, however they can't have any methods associated with them. This makes structs unsuitable for representing arbitrary Dart classes - which usually have methods associated with them.

Wasm GC does not have a well defined concurrency story, but a shared-everything-threads proposal is under design. This proposal seems expressive enough for us to be able to implement proposed semantics on top of it.

Note

Despite shared memory Wasm proposal has an issue which makes it challenging for Dart to adopt it:

It prohibits sharable and non-shareable structs to be subtypes of each other.
It prohibits externref inside shareable structs

Dart has Object as a base class for both shareable and non-shareable classes. If a program contains Shareable type - such type would need to be represented as a shared struct which means we have to mark Object struct as shared as well. But this means Dart objects can no longer directly contain externrefs inside them.

Assuming that Wasm is going to move forward with type based partitioning, we can still resolve this conundrum by employing a ShareableBox-like wrapper, which can be implemented on top of TLS storage and WeakMap.

An alternative could be to tweak semantics of Wasm's externref a bit: tag externref with their origin and dynamically checking origin match when externref is passed back from Wasm to the host environment (see Dynamic sharedness checks as an escape hatch issue).

`dart:*` race safety

When implementing dart:* libraries we should keep racy access in mind. Consider for example the following code:

class _GrowableList<T> implements List<T> {
  int length;
  final _Array storage;

  T operator[](int index) {
    RangeError.checkValidIndex(index, this, "index", this.length);
    return unsafeCast(storage[index]);  // (*)
  }

  T removeLast() {
    final result = this[length];
    storage[length--] = null;
    return result;
  }
}

This code is absolutely correct in single-threaded environment, but can cause heap-safety violation in a racy program: if operator[] races with removeLast then storage[index] might return null even though checkValidIndex succeeded.

Acknowledgements

The author thanks @aam @brianquinlan @dcharkes @kevmoo @liamappelbe @loic-sharma @lrhn @yjbanov for providing feedback on the proposal.

Files

proposal.md

Latest commit

History

proposal.md

File metadata and controls

Shared Memory Multithreading for Dart

What issues are we trying to solve?

Parallelizing work-loads

Interoperability

Map of the Territory

Shareable Data

Shareable Types

Why type based opt-in?

Generics

Functions

Shareable core types

Collections

SendPort semantics

ShareableBox<T>

Shared Isolates

Why not compile time isolation?

Upgrading dart:ffi

Linking to Dart code from native code

Core Library Changes

Upgrading dart:async capabilities

Shareable Future and Stream instances

Executors of async callbacks

Structured Concurrency

dart:concurrent

Thread and ThreadPool

Asynchronous code and threads

Atomic operations

Locks and conditions

Coroutines

Blocking operations

dart:ffi updates

Prototyping Roadmap

Appendix

Memory Models

Platform capabilities

Native

Web (JS and Wasm)

dart:* race safety

Acknowledgements

`SendPort` semantics

`ShareableBox<T>`

Upgrading `dart:ffi`

Upgrading `dart:async` capabilities

Shareable `Future` and `Stream` instances

`dart:concurrent`

`Thread` and `ThreadPool`

`dart:ffi` updates

`dart:*` race safety