Skip to content

Latest commit

 

History

History
1122 lines (821 loc) · 34.2 KB

Coroutines.rst

File metadata and controls

1122 lines (821 loc) · 34.2 KB

Coroutines in LLVM

Status

This document describes a set of experimental extensions to LLVM. Use with caution. Because the intrinsics have experimental status, compatibility across LLVM releases is not guaranteed. These intrinsics are added to support C++ Coroutines (P0057), though they are general enough to be used to implement coroutines in other languages as well.

Overview

LLVM coroutines are functions that have one or more suspend points. When a suspend point is reached, the execution of a coroutine is suspended. A suspended coroutine can be resumed to continue execution from the last suspend point or be destroyed.

In the following example, function f (which may or may not be a coroutine itself) returns a handle to a suspended coroutine (coroutine handle) that is used by main to resume the coroutine twice and then destroy it:

define i32 @main() {
entry:
  %hdl = call i8* @f(i32 4)
  call void @llvm.experimental.coro.resume(i8* %hdl)
  call void @llvm.experimental.coro.resume(i8* %hdl)
  call void @llvm.experimental.coro.destroy(i8* %hdl)
  ret i32 0
}

In addition to the function stack frame which exists when a coroutine is executing, there is an additional region of storage that contains objects that keep the coroutine state when a coroutine is suspended. This region of storage is called coroutine frame. It is created when a coroutine is called and destroyed when a coroutine runs to completion or destroyed by a call to the coro.destroy intrinsic.

An LLVM coroutine is represented as an LLVM function that has calls to coroutine intrinsics defining the structure of the coroutine. After mandatory CoroSplit pass, a coroutine is split into several functions that represent three different ways of how control can enter the coroutine:

  1. a ramp function, which represents an initial invocation of the coroutine that creates the coroutine frame and executes the coroutine code until it encounters a suspend point or reaches the end of the function;
  2. a coroutine resume function that contains the code to be executed once the coroutine is resumed;
  3. a coroutine destroy function that is invoked when the coroutine is destroyed.

Coroutines by Example

Coroutine Representation

Let's look at an example of an LLVM coroutine with the behavior sketched by the following pseudo-code.

This coroutine calls some function yield with value n as an argument and suspends execution. Every time it resumes it calls yield again with an argument one bigger than the last time. This coroutine never completes by itself and must be destroyed explicitly. If we use this coroutine with a main shown in the previous section. It will call yield with values 4, 5 and 6 after which the coroutine will be destroyed.

We will look at individual parts of the LLVM coroutine matching the pseudo-code above starting with coroutine frame creation and destruction:

define i8* @f(i32 %n) {
entry:
  %frame.size = call i32 @llvm.experimental.coro.size()
  %alloc = call i8* @malloc(i32 %frame.size)
  %frame = call i8* @llvm.experimental.coro.init(i8* %alloc, i32 0, i8* null, i8* null)
  %first.return = call i1 @llvm.experimental.coro.fork()
  br i1 %first.return, label %coro.return, label %coro.start

coro.start:
  ; ...
resume:
  ; ...

cleanup:
  %mem = call i8* @llvm.experimental.coro.delete(i8* %frame)
  call void @free(i8* %mem)
  call void @llvm.experimental.coro.resume.end()  
  br label %coro.return

coro.return:
  ret i8* %frame
}

First three lines of entry block establish the coroutine frame. The coro.size intrinsic is lowered to a constant representing the size required for the coroutine frame. The coro.init intrinsic returns the address to be used as a coroutine frame pointer (which could be at an offset relative to the allocated block of memory).

The coro.delete intrinsic, given the coroutine frame pointer, returns a pointer of the memory block to be freed.

Two other intrinsics seen in this fragment are used to mark up the control flow during an initial and subsequent invocation of the coroutine. The true branch of the conditional branch instruction consuming the result of the coro.fork intrinsic indicates the block where control should transfer on the first suspension of the coroutine. The coro.resume.end intrinsic marks the point where coroutine needs to return control back to the caller if it is not an initial invocation of the coroutine. (During the inital coroutine invocation this intrinsic is a no-op).

This function returns a pointer to a coroutine frame which acts as a coroutine handle expected by coro.resume and coro.destroy intrinsics.

The rest of the coroutine code in blocks coro.start and resume is straightforward:

coro.start:
  %n.val = phi i32 [ %n, %entry ], [ %inc, %resume ]
  call void @yield(i32 %n.val)
  %suspend = call i1 @llvm.experimental.coro.suspend(token none, i1 false)
  br i1 %suspend, label %resume, label %cleanup

resume:
  %inc = add i32 %n.val, 1
  br label %coro.start

When control reaches coro.suspend intrinsic, the coroutine is suspended and returns control back to the caller. The conditional branch following the coro.suspend intrinsic indicates two alternative continuation for the coroutine, one for normal resume, another for destroy. The boolean parameter to coro.suspend indicates whether a suspend point represents a final suspend or not.

Coroutine Transformation

One of the steps in coroutine transformation is to figure out what objects can reside on the normal function stack frame or in the register and which needs to go into a coroutine frame.

In the coroutine shown in the previous section, use of virtual register %n.val is separated from the definition by a suspend point, it cannot reside on the stack frame since it will go away once the coroutine is suspended and returns control back to the caller and, therefore, need to be a part of the coroutine frame.

Other members of the coroutine frame are addresses of resume and destroy functions representing the coroutine behavior for when a coroutine is resumed and destroyed respectively.

%f.frame = type { void (%f.frame*)*, void (%f.frame*)*, i32 }

After coroutine transformation, function f is responsible for creation and initialization of the coroutine frame and execution of the coroutine code until a suspend point is reached or control reaches the end of the function. It will look like:

define i8* @f(i32 %n) {
entry:
  %alloc = call noalias i8* @malloc(i32 24)
  %0 = call nonnull i8* @llvm.experimental.coro.init(i8* %alloc, i32 0, i8* null, i8* null)
  %frame = bitcast i8* %frame to %f.frame*
  %1 = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 0
  store void (%f.frame*)* @f.resume, void (%f.frame*)** %1
  %2 = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 1
  store void (%f.frame*)* @f.destroy, void (%f.frame*)** %2

  %n.val.addr = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 2
  store i32 %n, i32* %n.val.addr
  call void @yield(i32 %n)

  ret i8* %frame
}

Part of the original coroutine f that is responsible for executing code after resume will be extracted into f.resume function:

define internal fastcc void @f.resume(%f.frame* %frame.ptr.resume) {
entry:
  %n.val.addr = getelementptr %f.frame, %f.frame* %frame.ptr.resume, i64 0, i32 2
  %n.val = load i32, i32* %n.val.addr, align 4
  %inc = add i32 %n.val, 1
  store i32 %inc, i32* %n.val.addr, align 4
  tail call void @yield(i32 %inc)
  ret void
}

Whereas function f.destroy will end up simply calling free function:

define internal fastcc void @f.destroy(%f.frame* %frame.ptr.destroy) {
entry:
  %0 = bitcast %f.frame* %frame.ptr.destroy to i8*
  tail call void @free(i8* %0)
  ret void
}

Avoiding Heap Allocations

A particular coroutine usage pattern, which is illustrated by the main function in the overview section where a coroutine is created, manipulated and destroyed by the same calling function, is common for generator coroutines and is suitable for allocation elision optimization which avoid dynamic allocation by storing coroutine frame on the caller's frame.

To enable this optimization, we need to mark frame allocation and deallocation calls to allow bypassing them if not needed.

In the entry block, we will call coro.elide intrinsic that will return an address of a coroutine frame on the caller's frame when possible and null otherwise:

entry:
  %elide = call i8* @llvm.experimental.coro.elide()
  %0 = icmp ne i8* %elide, null
  br i1 %0, label %coro.init, label %coro.alloc

coro.alloc:
  %frame.size = call i32 @llvm.experimental.coro.size()
  %alloc = call i8* @malloc(i32 %frame.size)
  br label %coro.init

coro.init:
  %phi = phi i8* [ %elide, %entry ], [ %alloc, %coro.alloc ]
  %frame = call i8* @llvm.experimental.coro.init(i8* %phi, i32 0, i8* null, i8* null)

In the cleanup block, we will make freeing the coroutine frame conditional on coro.delete intrinsic. If allocation is elided, coro.delete returns null thus skipping the deallocation code:

cleanup:
  %mem = call i8* @llvm.experimental.coro.delete(i8* %frame)
  %tobool = icmp ne i8* %mem, null
  br i1 %tobool, label %if.then, label %if.end

if.then:
  call void @free(i8* %mem)
  br label %if.end

if.end:
  call void @llvm.experimental.coro.resume.end()
  br label %coro.return

With allocations and deallocations described as above, after inlining and heap allocation elision optimization, the resulting main will end up looking like:

define i32 @main() {
entry:
  call void @yield(i32 4)
  call void @yield(i32 5)
  call void @yield(i32 6)
  ret i32 0
}

Multiple Suspend Points

Let's consider the coroutine that has more than one suspend point:

Matching LLVM code would look like (with the rest of the code remaining the same as the code in the previous section):

coro.start:
    %n.val = phi i32 [ %n, %coro.init ], [ %inc, %resume ]
    call void @yield(i32 %n.val)
    %suspend1 = call i1 @llvm.experimental.coro.suspend(token none, i1 false)
    br i1 %suspend1, label %resume, label %cleanup

  resume:
    %inc = add i32 %n.val, 1
    %sub = sub nsw i32 0, %inc
    call void @yield(i32 %sub)
    %suspend2 = call i1 @llvm.experimental.coro.suspend(token none, i1 false)
    br i1 %suspend2, label %coro.start, label %cleanup

In this case, the coroutine frame would include a suspend index that will indicate at which suspend point the coroutine needs to resume. The resume function will use an index to jump to an appropriate basic block and will look as follows:

define internal fastcc void @f.resume(%f.frame* nocapture nonnull %frame.ptr.resume) {
entry:
  %index.addr = getelementptr %f.frame, %f.frame* %frame.ptr.resume, i64 0, i32 2
  %index = load i32, i32* %0, align 4
  %switch = icmp eq i32 %index, 0
  br i1 %switch, label %resume, label %coro.start

coro.start:
  ...
  br label %exit

resume:
  ...
  br label %exit

exit:
  %storemerge = phi i32 [ 1, %resume ], [ 0, %coro.start ]
  store i32 %storemerge, i32* %index.addr, align 4
  ret void
}

If different cleanup code needs to get executed for different suspend points, a similar switch will be in the f.destroy function.

Distinct Save and Suspend

In the previous example, setting a resume index (or some other state change that needs to happen to prepare a coroutine for resumption) happens at the same time as a suspension of a coroutine. However, in certain cases, it is necessary to control when coroutine is prepared for resumption and when it is suspended.

In the following example, a coroutine represents some activity that is driven by completions of asynchronous operations async_op1 and async_op2 which get a coroutine handle as a parameter and resume the coroutine once async operation is finished.

void g() {
   for (;;)
     if (cond()) {
        async_op1(<coroutine-handle>); // will resume once async_op1 completes
        <suspend>
        do_one();
     }
     else {
        async_op2(<coroutine-handle>); // will resume once async_op2 completes
        <suspend>
        do_two();
     }
   }
}

In this case, coroutine should be ready for resumption prior to a call to async_op1 and async_op2. The coro.save intrinsic is used to indicate a point when coroutine should be ready for resumption:

if.true:
  %save1 = call token @llvm.experimental.coro.save()
  call void async_op1(i8* %frame)
  %suspend1 = call i1 @llvm.experimental.coro.suspend(token %save1, i1 false)
  br i1 %suspend1, label %resume1, label %cleanup

if.false:
  %save2 = call token @llvm.experimental.coro.save()
  call void async_op2(i8* %frame)
  %suspend2 = call i1 @llvm.experimental.coro.suspend(token %save2, i1 false)
  br i1 %suspend2, label %resume2, label %cleanup

Coroutine Promise

A coroutine author or a frontend may designate a distinguished alloca that can be used to communicate with the coroutine. This distinguished alloca is called coroutine promise and is provided as a third parameter to the coro.init intrinsic.

The following coroutine designates a 32 bit integer promise and uses it to store the current value produced by a coroutine.

define i8* @f(i32 %n) {
entry:
  %promise = alloca i32
  %pv = bitcast i32* %promise to i8*
  %frame.size = call i32 @llvm.experimental.coro.size()
  %alloc = call noalias i8* @malloc(i32 %frame.size)
  %frame = call i8* @llvm.experimental.coro.init(i8* %alloc, i32 0, i8* %pv, i8* null)
  %first.return = call i1 @llvm.experimental.coro.fork()
  br i1 %first.return, label %coro.return, label %coro.start

coro.start:
  %n.val = phi i32 [ %n, %entry ], [ %inc, %resume ]
  store i32 %n.val, i32* %promise
  %suspend = call i1 @llvm.experimental.coro.suspend2(token none, i1 false)
  br i1 %suspend, label %resume, label %cleanup

resume:
  %inc = add i32 %n.val, 1
  br label %coro.start

cleanup:
  %mem = call i8* @llvm.experimental.coro.delete(i8* %frame)
  call void @free(i8* %mem)
  br label %coro.return

coro.return:
  ret i8* %frame
}

A coroutine consumer can rely on the coro.promise intrinsic to access the coroutine promise.

define i32 @main() {
entry:
  %hdl = call i8* @f(i32 4)
  %promise.addr = call i32* @llvm.experimental.coro.promise.p0i32(i8* %hdl)
  %val0 = load i32, i32* %promise.addr
  call void @yield(i32 %val0)
  call void @llvm.experimental.coro.resume(i8* %hdl)
  %val1 = load i32, i32* %promise.addr
  call void @yield(i32 %val1)
  call void @llvm.experimental.coro.resume(i8* %hdl)
  %val2 = load i32, i32* %promise.addr
  call void @yield(i32 %val2)
  call void @llvm.experimental.coro.destroy(i8* %hdl)
  ret i32 0
}

There is also an intrinsic coro.from.promise that performs a reverse operation. Given an address of a coroutine promise, it obtains a coroutine handle. This intrinsic is the only mechanism for a user code outside of the coroutine to get access to the coroutine handle.

Final Suspend

A coroutine author or a frontend may designate a particular suspend to be final, by setting the second argument of the coro.suspend intrinsic to true. Such a suspend point has two properties:

  • it is possible to check whether a suspended coroutine is at the final suspend point via coro.done intrinsic;
  • a resumption of a coroutine stopped at the final suspend point leads to undefined behavior. The only possible action for a coroutine at a final suspend point is destroying it via coro.destroy intrinsic.

From the user perspective, final suspend point represents an idea of a coroutine reaching the end. From the compiler perspective it is an optimization opportunity for reducing number of resume points (and therefore switch cases) in the resume function.

The following is an example of a function that keeps resuming the coroutine until the final suspend point is reached after which point the coroutine is destroyed:

define i32 @main() {
entry:
  %coro = call i8* @g()
  br %while.cond
while.cond:
  %done = call i1 @llvm.experimental.coro.done(i8* %coro)
  br i1 %done, label %while.end, label %while.body
while.body:
  call void @llvm.experimental.coro.resume(i8* %coro)
  br label %while.cond
while.end:
  call void @llvm.experimental.coro.destroy(i8* %coro)
  ret i32 0
}

Intrinsics

Coroutine Manipulation Intrinsics

Intrinsics described in this section are used to manipulate an existing coroutine. They can be used in any function which happen to have a pointer to a coroutine frame or a pointer to a coroutine promise.

'llvm.experimental.coro.destroy' Intrinsic

Syntax:
declare void @llvm.experimental.coro.destroy(i8* <handle>)
Overview:

The 'llvm.experimental.coro.destroy' intrinsic destroys a suspended coroutine.

Arguments:

The argument is a coroutine handle to a suspended coroutine.

Semantics:

When possible, the coro.destroy intrinsic is replaced with a direct call to coroutine destroy function. Otherwise it is replaced with an indirect call based on the function pointer for the destroy function stored in the coroutine frame. Destroying a coroutine that is not suspended leads to undefined behavior.

'llvm.experimental.coro.resume' Intrinsic

declare void @llvm.experimental.coro.resume(i8* <handle>)
Overview:

The 'llvm.experimental.coro.resume' intrinsic resumes a suspended coroutine.

Arguments:

The argument is a handle to a suspended coroutine.

Semantics:

When possible, the coro.resume intrinsic is replaced with a direct call to coroutine resume function. Otherwise it is replaced with an indirect call based on the function pointer for the resume function stored in the coroutine frame. Resuming a coroutine that is not suspended leads to undefined behavior.

'llvm.experimental.coro.done' Intrinsic

declare i1 @llvm.experimental.coro.done(i8* <handle>)
Overview:

The 'llvm.experimental.coro.done' intrinsic checks whether a suspended coroutine is at the final suspend point or not.

Arguments:

The argument is a handle to a suspended coroutine.

Semantics:

Using this intrinsic on a coroutine that does not have a final suspend point or on a coroutine that is not suspended leads to undefined behavior.

'llvm.experimental.coro.promise' Intrinsic

declare <type>* @llvm.experimental.coro.promise.p0<type>(i8* <handle>)
Overview:

The 'llvm.experimental.coro.promise' intrinsic returns a pointer to a coroutine promise.

Arguments:

The argument is a handle to a coroutine.

Semantics:

Using this intrinsic on a coroutine that does not have a coroutine promise leads to undefined behavior. It is possible to read and modify coroutine promise of the coroutine which is currently executing. The coroutine author and a coroutine user are responsible to makes sure there is no data races.

'llvm.experimental.coro.from.promise' Intrinsic

declare i8* @llvm.experimental.coro.from.promise.p0<type>(<type>* <handle>)
Overview:

The 'llvm.experimental.coro.from.promise' intrinsic returns a coroutine handle given the coroutine promise.

Arguments:

An address of a coroutine promise.

Semantics:

Using this intrinsic on a coroutine that does not have a coroutine promise results in undefined behavior.

Coroutine Structure Intrinsics

Intrinsics described in this section are used within a coroutine to describe the coroutine structure. They should not be used outside of a coroutine.

'llvm.experimental.coro.size' Intrinsic

declare i32 @llvm.experimental.coro.size()
declare i64 @llvm.experimental.coro.size()
Overview:

The 'llvm.experimental.coro.size' intrinsic returns the number of bytes required to store a coroutine frame.

Arguments:

None.

Semantics:

The coro.size intrinsic is lowered to a constant representing the size of the coroutine frame.

'llvm.experimental.coro.init' Intrinsic

declare i8* @llvm.experimental.coro.init(i8* %mem, i32 %align, i8* %promise, i8* %fnaddr)
Overview:

The 'llvm.experimental.coro.init' intrinsic returns an address of the coroutine frame.

Arguments:

The first argument is a pointer to a block of memory in which coroutine frame will reside. This could be the result of an allocation function or the result of a call to a coro.elide intrinsics representing a storage that can be used on a frame of the calling function.

The second argument provides information on alignment of the memory returned by the allocation function and given to coro.init by the first parameter. If this argument is 0, the memory is assumed to be aligned to 2 * sizeof(i8*). This argument only accepts constants.

The third argument, if not null, designates a particular alloca instruction to be a coroutine promise.

The fourth argument is a function pointer to a coroutine itself. If this argument is null, CoroEarly pass will replace it with an address of the enclosing function.

Note

Since coro.init intrinsic is not lowered until late optimizer passes, fnaddr argument can be used to distinguish between coro.init that describes a structure of a pre-split coroutine or a coro.init belonging to a post-split coroutine that was inlined into a different function.

Semantics:

Depending on the alignment requirements of the objects in the coroutine frame and/or on the codegen compactness reasons the pointer returned from coro.init may be at offset to the %mem argument. (This could be beneficial if instructions that express relative access to data can be more compactly encoded with small positive and negative offsets).

Frontend should emit exactly one coro.init intrinsic per coroutine. It should appear prior to coro.fork intrinsic.

'llvm.experimental.coro.delete' Intrinsic

declare i8* @llvm.experimental.coro.delete(i8* %frame)
Overview:

The 'llvm.experimental.coro.delete' intrinsic returns a pointer to a block of memory where coroutine frame is stored or null if the allocation of the coroutine frame was elided.

Arguments:

A pointer to the coroutine frame. This should be the same pointer that was returned by prior coro.init call.

Example (allow heap allocation elision):
cleanup:
  %mem = call i8* @llvm.experimental.coro.delete(i8* %frame)
  %tobool = icmp ne i8* %mem, null
  br i1 %tobool, label %if.then, label %if.end

if.then:
  call void @free(i8* %mem)
  br label %if.end

if.end:
  ret void
Example (no heap allocation elision):
cleanup:
  %mem = call i8* @llvm.experimental.coro.delete(i8* %frame)
  call void @free(i8* %mem)
  ret void

'llvm.experimental.coro.elide' Intrinsic

declare i8* @llvm.experimental.coro.elide()
Overview:

The 'llvm.experimental.coro.elide' intrinsic returns an address of the memory on the callers frame where coroutine frame of this coroutine can be placed and null otherwise.

Arguments:

None

Semantics:

If the coroutine is eligible for heap elision and the ramp function is inlined in its caller, this intrinsic is lowered to an alloca storing the coroutine frame. Otherwise, it is lowered to constant null.

Example:
entry:
  %elide = call i8* @llvm.experimental.coro.elide()
  %0 = icmp ne i8* %elide, null
  br i1 %0, label %coro.init, label %coro.alloc

coro.alloc:
  %frame.size = call i32 @llvm.experimental.coro.size()
  %alloc = call i8* @malloc(i32 %frame.size)
  br label %coro.init

coro.init:
  %phi = phi i8* [ %elide, %entry ], [ %alloc, %coro.alloc ]
  %frame = call i8* @llvm.experimental.coro.init(i8* %phi, i32 0, i8* null, i8* null)

'llvm.experimental.coro.frame' Intrinsic

declare i8* @llvm.experimental.coro.frame()
Overview:

The 'llvm.experimental.coro.frame' intrinsic returns an address of the coroutine frame.

Arguments:

None

Semantics:

This intrinsic is lowered to refer to the coro.init instruction. This is a frontend convenience intrinsic that makes it easier to refer to the coroutine frame. This intrinsic is not necessary for the llvm coroutine model and can be removed.

'llvm.experimental.coro.fork' Intrinsic

declare i1 @llvm.experimental.coro.fork()
Overview:

The 'llvm.experimental.coro.fork' intrinsic is used to indicate where the control should transfer on the first suspension of the coroutine.

Arguments:

None

Semantics:

The true branch of the the conditional branch consuming the boolean value returned from this intrinsic indicates where the control should transfer on the first suspension of the coroutine. In the ramp function, when suspend points are lowered, every coro.suspend is replaced with a jump to the basic block designated by the true branch.

The 'coro.fork` itself is always lowered to constant false.

'llvm.experimental.coro.resume.end' Intrinsic

declare void @llvm.experimental.coro.resume.end()
Overview:

The 'llvm.experimental.coro.resume.end' marks the point where execution of the resume part of the coroutine should end and control returns back to the caller.

Arguments:

None

Semantics:

The coro.resume.end intrinsic is a no-op during an initial invocation of the coroutine. When the coroutine resumes, the intrinsic marks the point when coroutine need to return control back to the caller.

This intrinsic is removed by the CoroSplit pass when a coroutine is split into the start, resume and destroy parts. In start part, the intrinsic is removed, in resume and destroy parts, it is replaced with ret void instructions and the rest of the block containing coro.resume.end instruction is discarded.

'llvm.experimental.coro.suspend' Intrinsic

declare i1 @llvm.experimental.coro.suspend(token %save, i1 %final)
Overview:

The 'llvm.experimental.coro.suspend' marks the point where execution of the coroutine need to get suspended and control returned back to the caller. Conditional branch consuming the result of this intrinsic marks basic blocks where coroutine should proceed when resumed via coro.resume and coro.destroy intrinsics if the coroutine is suspended at this particular suspend point.

Arguments:

The first argument refers to a token of coro.save intrinsic that marks the point when coroutine state is prepared for suspension. If none token is passed, the intrinsic behaves as if there were a coro.save immediately preceding the coro.suspend intrinsic.

The second argument indicates whether this suspension point is final. The second argument only accepts constants. If more than one suspend point is designated as final, the resume and destroy branches should lead to the same basic blocks.

Semantics:

If a coroutine that was suspended at the suspend point marked by this intrinsic is resumed via coro.resume the control will transfer to the basic block marked by the true branch of the conditional branch consuming the result of the coro.suspend. If it is resumed via coro.destroy, it will proceed to the basic block indicated by the false branch.

If suspend intrinsic is marked as final, it can consider the true branch unreachable and can perform optimizations that can take advantage of that fact.

'llvm.experimental.coro.save' Intrinsic

declare token @llvm.experimental.coro.save()
Overview:

The 'llvm.experimental.coro.save' marks the point where a coroutine is considered suspened (and thus eligible for resumption). Its return value should be consumed by exactly one coro.suspend intrinsic.

Arguments:

None

Semantics:

Whatever coroutine state changes are required to enable resumption of the coroutine from the corresponding suspend point should be done at the point of coro.save intrinsic.

Example:

Separate save and suspend points are necessary when a coroutine is used to represent an asynchronous control flow driven by callbacks representing completions of asynchronous operations.

In such a case, a coroutine should be ready for resumption prior to a call to async_op function that may trigger resumption of a coroutine from the same or a different thread possibly prior to async_op call returning control back to the coroutine:

%save = call token @llvm.experimental.coro.save()
call void async_op(i8* %frame)
%suspend = call i1 @llvm.experimental.coro.suspend(token %save, i1 false)
br i1 %suspend, label %resume, label %cleanup

Coroutine Transformation Passes

CoroEarly

The pass CoroEarly lowers coroutine intrinsics that hide the details of the structure of the coroutine frame, but, otherwise not needed to be preserved to help later coroutine passes. This pass lowers coro.frame, coro.done, coro.promise and coro.from.promise intrinsics.

CoroInline

Since coroutine transformation need to be done in the IPO order and inlining pre-split coroutine is undesirable, the CoroInline pass wraps the inliner pass to execute coroutine and inliner passes in the following order.

  1. Call sites in the function F are inlined as appropriate
  2. CoroElide pass is run on the function F to see if any coroutines were inlined and are eligible for coroutine frame elision optimization.
  3. If function F is a coroutine, resume and destroy parts are extracted into F.resume and F.destroy functions by the CoroSplit pass.

CoroSplit

The pass CoroSplit extracts resume and destroy parts into separate functions.

CoroElide

The pass CoroElide examines if the inlined coroutine is eligible for heap allocation elision optimization. If so, it replaces coro.elide intrinsic with an address of a coroutine frame placed on its caller and replaces coro.delete intrinsics with null to remove the deallocation code. This pass also replaces coro.resume and coro.destroy intrinsics with direct calls to resume and destroy functions for a particular coroutine where possible.

CoroCleanup

This pass runs late to lower all coroutine related intrinsics not replaced by earlier passes.

Areas Requiring Attention

  1. Debug information is not supported at the moment.
  2. A coroutine frame is bigger than it could be. Adding stack packing and stack coloring like optimization on the coroutine frame will result in tighter coroutine frames.
  3. The CoroElide optimization pass relies on coroutine ramp function to be inlined. It would be beneficial to split the ramp function further to increase the chance that it will get inlined into its caller.
  4. Design a convention that would make it possible to apply coroutine heap elision optimization across ABI boundaries.
  5. Cannot handle coroutines with inalloca parameters (used in x86 on Windows)
  6. Alignment is ignored by coro.init and coro.delete intrinsics.