Skip to content

[RuntimeAsync] Merge from labs, all except JIT #114675

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Apr 28, 2025
Merged

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Apr 15, 2025

This PR includes everything from the runtimelab implementation of Runtime Async feature, except: JIT changes and tests.
Some changes in between VM and JIT are hard to separate, so included here as well.

The main effect of the VM changes is classification of methods by MethodReturnKind. We recognize methods that return Task/ValueTask/Task<T>/ValueTask<T> as special.

A task-returning method definition will now get two methoddescs in the type system:

  • one has the original Task-returning signature
  • another is T-returning, but called with a calling convention that uses a continuation object to handle cases of incomplete result leading to suspending/resuming the chain of calls.

We call the pair "variants". For most purposes the methods are identical and parallel - i.e. if generic, both have exactly same set of type parameters, if task-returning variant overrides another one in the base class, the async variant overrides the async variant in the base, etc...
Both variants can signal that the result is not ready yet - via incomplete Task or by returning a Continuation object.

The purpose of this duality is that async variants can be much more efficient when suspension does not happen, leading to substantial savings when a chain of awaits may be turned into a chain of calls in "async space". Here the JIT part that optimizes Await pattern into async calls.

From the public/external point of view the async variants are an implementation detail that "does not exist", and can change in future releases. Async variants should not show up in reflection, for example.

In terms of implementation logic one variant contains the user code/IL and another is a thunk that forwards to it. Which one contains the user code depends on presence of MethodImpl.Async. If there is no MethodImpl.Async the user code belongs to the task-returning variant, otherwise to the async variant.

Note that for the most part Runtime Async feature is pay-for-play. JIT features will only engage if the user code is compiled with a compiler that is enlightened to use the new API for async/await.
However, the creation of variant pairs for task-returning methods must happen unconditionally prior to possible use. Since these are just entry points in the type system, the impact should not be high. We will, of course measure and see if some action is needed on that.

In this PR the generation of variants will require FEATURE_RUNTIME_ASYNC, which will not be defined by default, so by default none of the above will happen.

@ghost
Copy link

ghost commented Apr 15, 2025

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost
Copy link

ghost commented Apr 15, 2025

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@VSadov
Copy link
Member Author

VSadov commented Apr 15, 2025

CC @jakobbotsch

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces support for runtime async variants by adding new callbacks, updated async structures, and modifications to the runtime’s internal async continuation and await handling logic. Key changes include:

  • New async callbacks in JIT interface headers and corresponding method implementations.
  • Addition of the CORINFO_ASYNC_INFO structure, async continuation logic, and related flag updates in the runtime and EE headers.
  • Updates in CoreLib to support async continuations and improved error/patchpoint handling.

Reviewed Changes

Copilot reviewed 99 out of 100 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/coreclr/tools/aot/jitinterface/jitinterface_generated.h Added new async callbacks for obtaining async info and resumption stub.
src/coreclr/tools/Common/JitInterface/ThunkGenerator/ThunkInput.txt Updated thunk definitions to include async info.
src/coreclr/tools/Common/JitInterface/CorInfoTypes.cs Introduced the CORINFO_ASYNC_INFO struct.
src/coreclr/tools/Common/JitInterface/CorInfoImpl*.{cs,generated.cs} Added async methods with placeholder implementations for getAsyncInfo and getAsyncResumptionStub.
src/coreclr/inc/* Extended various headers with new async tokens, flags, and helper functions.
src/coreclr/System.Private.CoreLib/* Integrated async continuation allocation and dispatch logic within CoreLib.
Files not reviewed (1)
  • src/coreclr/clrdefinitions.cmake: Language not supported
Comments suppressed due to low confidence (1)

src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/RuntimeHelpers.CoreCLR.cs:829

  • [nitpick] The continuation dispatch logic in DispatchContinuations is complex and central to async resumption. Ensure that this new logic is thoroughly exercised by unit/integration tests to verify proper handling of both normal and exceptional control flows.
private static unsafe Continuation? DispatchContinuations(Continuation? continuation)

@VSadov
Copy link
Member Author

VSadov commented Apr 16, 2025

I recommend to review this with dot . Most interesting changes are under VM

#define METHOD_TOKEN_REMAINDER_MASK ((1 << METHOD_TOKEN_REMAINDER_BIT_COUNT) - 1)
#define METHOD_TOKEN_RANGE_BIT_COUNT (24 - METHOD_TOKEN_REMAINDER_BIT_COUNT)
#define METHOD_TOKEN_RANGE_MASK ((1 << METHOD_TOKEN_RANGE_BIT_COUNT) - 1)

enum class AsyncMethodKind
{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments in AsyncMethodKind may be a good start with the async method classification

@@ -230,6 +230,11 @@ I4ARRAYREF SetUpWrapperInfo(MethodDesc *pMD)

GCX_PREEMP();

if (pMD->IsAsyncMethod())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are quite a few changes like this where we do not expect async methods to participate in some scenarios (like COM interop here).
Sometimes it is less clear if scenario may actually need to support async methods somehow.

We prefer throw rather than asserts in this and similar cases - so it is more apparent if some scenario makes it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, many of these can be hit. For example, this throws NotSupportedException in the runtimelab async branch and prints 7 in main:

using System;
using System.Runtime.InteropServices;
using System.Threading.Tasks;

Console.WriteLine(Marshal.GetEndComSlot(typeof(IFoo)));

public interface IFoo
{
    Task M();
}

Before we turn on runtime async support in main, we need to go over all places where you are adding these NotSupportedException, validate whether they can be hit, and fix things up as appropriate to avoid breaking changes. We know that the test coverage for built-in COM interop is very spotty, but customers depend on every corner case behavior of built-in COM interop. cc @AaronRobinsonMSFT

Could you please tag all these places that need review with async TODO so that they can be found easily and none of them falls through the cracks?

// For {Task-returning, Async} variants of the same definition
// we associate the methoddef with the Task-returning variant since it
// matches the methadata signature.
if (!pMD->IsUnboxingStub() && !pMD->IsAsyncVariantMethod())
{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be interesting. Both variants have the same memberDef, so it is 2->1 relationship.
However when going from memberDef to methoddesc is 1->1 map, and starting with memberDef will produce the non-async methoddesc - the one that matches the metadata signature. (but it may not be the one that owns the IL!)

@@ -203,10 +203,11 @@ NESTED_ENTRY OnHijackTripThread, _TEXT
push rax ; make room for the real return address (Rip)
push rdx
PUSH_CALLEE_SAVED_REGISTERS
push_vol_reg rcx
Copy link
Member Author

@VSadov VSadov Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some asm changes in Hijacking. The purpose will become more clear when JIT changes are brought in. In short - we are adding a possible extra return (the continuation, which is a managed object), thus we need to add an extra calee-trash register to the captured contexts.

On amd64, if a call ends up being async call, the continuation return register is RCX.

@@ -1851,6 +1851,14 @@ FCIMPL1(MethodTable*, MethodTableNative::InstantiationArg0, MethodTable* mt);
}
FCIMPLEND

FCIMPL1(OBJECTHANDLE, MethodTableNative::GetLoaderAllocatorHandle, MethodTable *mt)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When suspending, the continuation holds the state of the call. The state may need to include loader allocator reference as the method frame could be the only thing holding it alive. This helper is used in related logic.

@@ -1357,7 +1357,8 @@ void HijackFrame::GcScanRoots_Impl(promote_func *fn, ScanContext* sc)
{
LIMITED_METHOD_CONTRACT;

ReturnKind returnKind = m_Thread->GetHijackReturnKind();
bool hasAsyncRet;
ReturnKind returnKind = m_Thread->GetHijackReturnKind(&hasAsyncRet);
Copy link
Member Author

@VSadov VSadov Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reporting the continuation to GC relies on liveness info at call site, just like for other returns.

As usual x86 is special. As its GC info does not cover volatile registers, we need some extra code to figure if continuation needs to be reported. (this is somewhat similar to ReturnKind, which also has special case on x86)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@VSadov VSadov Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. It is not very complicated though.

  • There is an extra optional continuation return. If it is not NULL, means the result is not ready and the continuation has captured the state.
    The format of continuation is opaque, but changing the format will likely be R2R breaking.

  • There is also an extra continuation parameter (position depends on platform, x86 is different, as usual).
    If the continuation argument is not NULL, means this is not the first time the method runs, and it needs to unpack the continuation and continue running.
    The method may return a continuation again, if there is another await that is not ready with results. That is why we do not resume the entire continuation stack - there could be an await in a loop.

This is all there is from the ABI point, I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a section on Async calling convention in the ABI document

@@ -2448,6 +2448,7 @@ public TaskAwaiter GetAwaiter()
/// true to attempt to marshal the continuation back to the original context captured; otherwise, false.
/// </param>
/// <returns>An object used to await this task.</returns>
[Intrinsic]
public ConfiguredTaskAwaitable ConfigureAwait(bool continueOnCapturedContext)
Copy link
Member Author

@VSadov VSadov Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConfigureAwait can participate in JIT optimizations (JIT changes are not in this PR), thus needs to be matched by name.

@VSadov
Copy link
Member Author

VSadov commented Apr 27, 2025

Could you please tag all these places that need review with async TODO so that they can be found easily and none of them falls through the cracks?

Added // TODO: (async) revisit and examine if this needs to be supported and a tracking issue to revisit/remove the TODOs: #115094

@VSadov
Copy link
Member Author

VSadov commented Apr 27, 2025

In RuntimeMethodHandle::GetMethodDef there is a call to GetCallingConventionInfo. we should check what impact that has.

I think, if starting with methoddef, we will always get the Task-returning variant with a regular call convention.

There is a bit of a corner case for the non-Task returning methods that are explicitly MethodImpl.Async - like AsyncHelpers.Await(). Need to think how that should be represented. Perhaps just a regular calling convention. These methods should not be callable via reflection anyways (need to make sure, test).

I've logged a general issue to follow up with Reflection scenarios: #115099

// "BypassReadyToRun" is until AOT/R2R typesystem has support for MethodImpl.Async
// Must be NoInlining because we use AsyncSuspend to manufacture an explicit suspension point.
// It will not capture/restore any local state that is live across it.
[BypassReadyToRun]
Copy link
Member Author

@VSadov VSadov Apr 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing [BypassReadyToRun] on non-Task Async helpers breaks some scenarios. I had to put [BypassReadyToRun] back until R2R typesystem at least understands MethodImpl.Async and its impact on call convention.

The followup issue for R2R is here: #115098

@VSadov
Copy link
Member Author

VSadov commented Apr 27, 2025

I think I have addressed all suggestions/concerns, but there were many, so I may have missed some.
@jkotas @davidwrighton - please take a look again if there is something more needed. Thanks!

Copy link
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but please get an approval from Jan too.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well!

@jkotas jkotas merged commit dea4f97 into dotnet:main Apr 28, 2025
148 checks passed
@VSadov VSadov deleted the asyncMove2 branch April 28, 2025 18:16
@VSadov
Copy link
Member Author

VSadov commented Apr 28, 2025

Thanks!!!

jakobbotsch added a commit that referenced this pull request Apr 30, 2025
Add the JIT parts of runtime async. This introduces a new transformation that
runs before lowering and that transforms the function into a state machine, with
states set up to allow suspension and resumption at async calls (the suspension
points).

Suspension is indicated by the callee returning a continuation object using a
new calling convention. When suspension happens, the caller similarly suspends
by capturing all live state and saving it in a continuation object. These
continuation objects are then linked together and continue to be propagated back
to the callers until we finally get to a caller that is not async anymore (i.e.
that expects to see a `Task`/`Task<T>`). The VM synthesizes a `Task`/`Task<T>`
wrapper for the async functions that hide away the management the continuation
objects. See #114675 for more details
around this and around the calling convention.

The continuation objects can later be used to resume the function at the point
where suspension happened. This is accomplished by async functions also taking a
continuation object as a parameter. When such a parameter is passed (i.e. it is
non-null), the JIT will restore all live state from the continuation and resume
from the correct location. The continuations store a state number so that
resumption can know where to resume.

The full resumption process also involves an IL stub called the async resumption
stub. This stub is responsible for calling the async function with the right
stack frame setup for arguments and simultaneously passing a non-null
continuation. The stack frame setup done by the IL async resumption stub is
important as the JIT uses this space to restore live parameters from the
continuation.

Continuations similarly support propagation of the return values from the callee
and of potential exceptions thrown by the callee. Return values are stored in a
known location in the continuation object, and the async resumption stubs are
responsible for propagating these values into the next continuation when
suspension/resumption has happened. The JIT's resumption code will fetch the
return value from the known location and copy it to the right place in the
caller. Similarly, exceptions are kept in a known place and are handled by being
rethrown from the right location when present.

OSR functions come with complications. Since they rely on frame setup done by
the corresponding tier-0 method the resumption of these methods needs to happen
through the tier 0 method. When OSR methods suspend they store their IL offset
in the continuation object, while tier 0 methods with patchpoints will store -1
in the continuation object. The VM then always resumes these methods in the tier
0 method, which knows to use the IL offset to determine whether resumption
should happen in the tier 0 method or whether control needs to continue into an
OSR method.

pRD->volatileCurrContextPointers.X0 = &m_Args->X0;
pRD->volatileCurrContextPointers.X1 = &m_Args->X1;
pRD->volatileCurrContextPointers.X1 = &m_Args->X2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to overwrite the value written above? It looks like this should be:

Suggested change
pRD->volatileCurrContextPointers.X1 = &m_Args->X2;
pRD->volatileCurrContextPointers.X2 = &m_Args->X2;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Looks like a typo. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants