Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Shrink AsyncValueTaskMethodBuilder #24431

Closed
wants to merge 1 commit into from
Closed

Conversation

benaadams
Copy link
Member

@benaadams benaadams commented May 6, 2019

Remove the struct wrappings over Task in the AsyncMethodBuilders by moving the implementations to static helper members on AsyncMethodBuilderCore.

Shrink AsyncValueTaskMethodBuilder to a single pointer size.

Remove bool _useBuilder field from AsyncValueTaskMethodBuilder<TResult> as we can check if m_task is set instead.

/cc @stephentoub

#endif

/// <summary>The generic builder object to which this non-generic instance delegates.</summary>
private AsyncTaskMethodBuilder<VoidTaskResult> m_builder; // mutable struct: must not be readonly. Debugger depends on the exact name of this field.
Copy link
Member Author

@benaadams benaadams May 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is commented Debugger depends on the exact name of this field.

Not sure if removing the wrapping and changing it to m_task will cause issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if removing the wrapping and changing it to m_task will cause issue?

It might. You would need to test it. Does stepping out of an async method that yielded continue to work correctly? Does stepping over a yielding await continue to work correctly?

@benaadams
Copy link
Member Author

benaadams commented May 7, 2019

As the Task is now unwrapped and we test and can set it directly, using the utility methods, rather than going via the wrapped builder; Stream.ReadAsync's local function: async ValueTask<int> FinishReadAsync benefits from changes as follows

 G_M40602_IG27:
        mov      rax, bword ptr [rbp+10H]
        mov      dword ptr [rax+16], -2
        mov      rdi, bword ptr [rbp+10H]
        add      rdi, 24       
-       cmp      byte  ptr [rdi+5], 0
-       je       SHORT G_M40602_IG31
-       add      rdi, 8
-       cmp      gword ptr [rdi], 0
-       jne      SHORT G_M40602_IG30
-       mov      ebx, esi
-       cmp      ebx, 9
-       jge      SHORT G_M40602_IG28
-       cmp      esi, -1
-       jl       SHORT G_M40602_IG28
-       call     [CORINFO_HELP_READYTORUN_STATIC_BASE]
-       mov      rax, gword ptr [rax+0950H]
-       lea      edx, [rbx+1]
-       cmp      edx, dword ptr [rax+8]
-       jae      SHORT G_M40602_IG33
-       inc      ebx
-       movsxd   rdx, ebx
-       mov      rdx, gword ptr [rax+8*rdx+16]
+       mov      rdx, gword ptr [rcx]
+       test     rdx, rdx
+       jne      SHORT G_M40595_IG28
+       mov      dword ptr [rcx+8], esi
+       mov      byte  ptr [rcx+12], 1
        jmp      SHORT G_M40602_IG29
 
-G_M40602_IG28:
-       call     [CORINFO_HELP_READYTORUN_NEW]
-       mov      rdx, rax
-       mov      dword ptr [rdx+52], 0xD1FFAB1E
-       mov      dword ptr [rdx+56], esi
-
-G_M40602_IG29:
-       mov      rcx, rdi
-       call     [CORINFO_HELP_CHECKED_ASSIGN_REF]
-       jmp      SHORT G_M40602_IG32
 
-G_M40602_IG30:
+G_M40595_IG28:
-       mov      rcx, rdi
+       mov      rcx, gword ptr [rcx]
        mov      edx, esi
-       call     [AsyncTaskMethodBuilder`1:SetExistingTaskResult(int):this]
+       call     [AsyncMethodBuilderCore:SetExistingTaskResult(ref,int)]
-       jmp      SHORT G_M40602_IG32
 
-G_M40602_IG31:
+G_M40595_IG29:
-        mov      dword ptr [rdi], esi
-        mov      byte  ptr [rdi+4], 1
+        nop      
 
-G_M40602_IG32:
+G_M40595_IG30:
         lea      rsp, [rbp-38H]
         pop      rbx

/// <summary>The builder this void builder wraps.</summary>
private AsyncTaskMethodBuilder _builder; // mutable struct: must not be readonly
/// <summary>The lazily-initialized built task.</summary>
private Task<VoidTaskResult> m_task; // lazily-initialized: must not be readonly. Debugger depends on the exact name of this field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you verified that stepping over yielding awaits in the debugger in VS continues to work correctly?

@@ -29,24 +29,24 @@ public struct AsyncVoidMethodBuilder
{
/// <summary>The synchronization context associated with this operation.</summary>
private SynchronizationContext? _synchronizationContext;
/// <summary>The builder this void builder wraps.</summary>
private AsyncTaskMethodBuilder _builder; // mutable struct: must not be readonly
/// <summary>The lazily-initialized built task.</summary>
Copy link
Member

@stephentoub stephentoub May 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all these changes to AsyncVoidMethodBuilder really necessary? Devs that care about perf are using "async void"? And the changes result in a measurable and meaningful throughput improvement?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The methods are there now, so its just swapping from forwarding to AsyncTaskMethodBuilder to AsyncTaskMethodBuilderCore

@stephentoub
Copy link
Member

Thanks, @benaadams, for working on this. I'm not sure how much benefit there is to shrinking the size of AsyncValueTaskMethodBuilder, but if we can do so without introducing more bloat elsewhere, cool. I'm a little concerned with the amount of churn here, especially since the benefit of most of it isn't clear to me. Can you elaborate?

@benaadams
Copy link
Member Author

I'm a little concerned with the amount of churn here, especially since the benefit of most of it isn't clear to me. Can you elaborate?

In order to drop the two fields _haveResult and _useBuilder from AsyncValueTaskMethodBuilder and turn it into a pointer sized struct wrapper and change AsyncValueTaskMethodBuilder<TResult> into pointer+sizeof(TResult) sized; they need direct access to the Task.

For AsyncValueTaskMethodBuilder this problematic as it is a wrapper on AsyncTaskMethodBuilder which is then a wrapper on AsyncTaskMethodBuilder<VoidTaskResult>, which then contains the Task; similarly on the other.

The cleanest approach is to remove the extra 2 layers of struct wrapping and put the Task directly into the AsyncValueTaskMethodBuilder. However, that's problematic as they then no longer have access to the instance methods from AsyncTaskMethodBuilder<TResult>.

So the next step is to move the logic from AsyncTaskMethodBuilder<TResult> to AsyncMethodBuilderCore so they can both have access to it.

At that point, none of the MethodBuilders need to do any wrapping, can have the Task themselves and can just call into AsyncMethodBuilderCore directly; which is where its ended up.

So for AsyncIteratorMethodBuilder it wraps, AsyncTaskMethodBuilder which wraps AsyncTaskMethodBuilder<VoidTaskResult>

So the call chain is

AsyncIteratorMethodBuilder.Complete()
=> AsyncTaskMethodBuilder.SetResult()
=> AsyncTaskMethodBuilder<VoidTaskResult>.SetResult()

And that's repeated for the various types (AsyncVoidMethodBuilder, AsyncValueTaskMethodBuilder, AsyncVoidMethodBuilder<TResult>)

This changes the 3 levels of struct wrapping over a Task; to everything becoming a single struct over a Task and going to the same direct methods in AsyncMethodBuilderCore with AsyncTaskMethodBuilder<VoidTaskResult> also becoming a pass-through to AsyncMethodBuilderCore; with most methods now being direct pass-throughs e.g.

AsyncIteratorMethodBuilder.Complete() => AsyncMethodBuilderCore.SetResult()

I understand the concern on the churn; but hopefully its easier to follow as all the calls are now direct to their centralised implementation rather than going through multiple layers? And it may be easier for the Jit (or might not make much difference)

@benaadams
Copy link
Member Author

Total bytes of diff: -795 (-0.02% of base)
    diff is an improvement.

Total byte diff includes 831 bytes from reconciling methods
        Base had   13 unique methods,     2562 unique bytes
        Diff had   18 unique methods,     3393 unique bytes

Top file improvements by size (bytes):
        -795 : System.Private.CoreLib.dasm (-0.02% of base)

1 total files with size differences (1 improved, 0 regressed), 0 unchanged.

Will look into the specifics tomorrow

@benaadams
Copy link
Member Author

benaadams commented May 30, 2019

Total bytes of diff: -679 (-0.02% of base)
    diff is an improvement.

Total byte diff includes 1108 bytes from reconciling methods
        Base had   13 unique methods,     2562 unique bytes
        Diff had   20 unique methods,     3670 unique bytes

Top file improvements by size (bytes):
        -679 : System.Private.CoreLib.dasm (-0.02% of base)

Ignoring the churn in the method builder files; the other impacted methods in coreclr are:

1 total files with size differences (1 improved, 0 regressed), 0 unchanged.

Top method regressions by size (bytes):
          11 (10.48% of base) : System.Private.CoreLib.dasm - Stream:<ReadAsync>g__FinishReadAsync|47_0(ref,ref,struct):struct
          11 ( 9.65% of base) : System.Private.CoreLib.dasm - StreamReader:ReadAsyncInternal(struct,struct):struct:this
          11 (12.36% of base) : System.Private.CoreLib.dasm - StreamReader:ReadBufferAsync():struct:this
          11 (10.19% of base) : System.Private.CoreLib.dasm - TextReader:ReadBlockAsyncInternal(struct,struct):struct:this
           1 ( 0.18% of base) : System.Private.CoreLib.dasm - <AsyncHelper>d__3`1:MoveNext():this (2 methods)

Top method improvements by size (bytes):
        -108 (-6.25% of base) : System.Private.CoreLib.dasm - <ReadBufferAsync>d__69:MoveNext():this
        -107 (-3.10% of base) : System.Private.CoreLib.dasm - <ReadAsyncInternal>d__66:MoveNext():this
         -94 (-10.80% of base) : System.Private.CoreLib.dasm - <ReadBlockAsyncInternal>d__20:MoveNext():this
         -86 (-9.36% of base) : System.Private.CoreLib.dasm - <<ReadAsync>g__FinishReadAsync|47_0>d:MoveNext():this
         -77 (-13.10% of base) : System.Private.CoreLib.dasm - <DisposeAsyncCore>d__33:MoveNext():this
         -72 (-10.29% of base) : System.Private.CoreLib.dasm - <DisposeAsyncCore>d__99:MoveNext():this
          -5 (-5.62% of base) : System.Private.CoreLib.dasm - FileStream:DisposeAsyncCore():struct:this
          -5 (-5.62% of base) : System.Private.CoreLib.dasm - StreamWriter:DisposeAsyncCore():struct:this
          -4 (-7.27% of base) : System.Private.CoreLib.dasm - <<ReadAsync>g__FinishReadAsync|47_0>d:SetStateMachine(ref):this
          -4 (-7.27% of base) : System.Private.CoreLib.dasm - <DisposeAsyncCore>d__99:SetStateMachine(ref):this
          -4 (-7.27% of base) : System.Private.CoreLib.dasm - <ReadAsyncInternal>d__66:SetStateMachine(ref):this
          -4 (-7.27% of base) : System.Private.CoreLib.dasm - <ReadBufferAsync>d__69:SetStateMachine(ref):this
          -4 (-7.27% of base) : System.Private.CoreLib.dasm - <DisposeAsyncCore>d__33:SetStateMachine(ref):this
          -4 (-7.27% of base) : System.Private.CoreLib.dasm - <ReadBlockAsyncInternal>d__20:SetStateMachine(ref):this
          -2 (-1.82% of base) : System.Private.CoreLib.dasm - <AsyncHelper>d__3`1:SetStateMachine(ref):this (2 methods)
          -2 (-0.25% of base) : System.Private.CoreLib.dasm - <AsyncHelper2>d__4`1:MoveNext():this (2 methods)
          -1 (-0.29% of base) : System.Private.CoreLib.dasm - <AsyncHelper3>d__5:MoveNext():this
          -1 (-0.19% of base) : System.Private.CoreLib.dasm - <FinishWriteAsync>d__60:MoveNext():this

90 total methods with size differences (62 improved, 28 regressed), 18323 unchanged.

The Jit does does unnecessary work with the 3 wrappers; as for AsyncIteratorMethodBuilder:Create() it clears the same address twice and then loads the null from the address to return it; peeling away the wrappers makes it simpler (might be the generic type in the base struct?):

 ; Assembly listing for method AsyncIteratorMethodBuilder:Create():struct
-; Lcl frame size = 8
+; Lcl frame size = 0
 
 G_M16816_IG01:
-       push     rax
-       xor      rax, rax
-       mov      qword ptr [rsp], rax
+       nop    
 
 G_M16816_IG02:
        xor      rax, rax
-       mov      gword ptr [rsp], rax
-       mov      rax, gword ptr [rsp]
 
 G_M16816_IG03:
-       add      rsp, 8
        ret      
 
-; Total bytes of code 22, prolog size 7 for method AsyncIteratorMethodBuilder:Create():struct
+; Total bytes of code 8, prolog size 5 for method AsyncIteratorMethodBuilder:Create():struct

@benaadams
Copy link
Member Author

Improvements from <ReadBufferAsync>d__69:MoveNext():this

image

image

Regression from Stream:<ReadAsync>g__FinishReadAsync|47_0(ref,ref,struct):struct is from the struct being decomposed rather than treated as a xmm var

image

Same regression in StreamReader:ReadAsyncInternal(struct,struct):struct:this

image

@benaadams
Copy link
Member Author

<DisposeAsyncCore>d__33:MoveNext():this has some good gains

image

However, it then also has a gain from SetResult() not inlining

image

So not sure if the AggressiveInline it or leave as is?

@benaadams
Copy link
Member Author

From the Jit side of things for <ReadBufferAsync>d__69:MoveNext():this its tracking 9 less variables

image

Which I assume is a good thing :)

@stephentoub
Copy link
Member

Thanks. Have you run any perf tests?

@benaadams
Copy link
Member Author

Rebased

Have you run any perf tests?

Will do and test the debugger.

Though I think the bigger gain will be from the copy improvements @AndyAyersMS and @mikedn are looking at in #24915 (comment)

private Task Task => _builder.Task;
public Task Task
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is repeated in each variant of the builder, is there a better way to express it as a short-circuit inside AsyncMethodBuilderCore.InitializeTaskAsPromise with aggressive inlining there?

I'm probably too shallow in the code here, I'm honestly not a fan of gettors that have side-effects either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is AggressivelyInlined but InitializeTaskAsPromise is NoInlined

not a fan of gettors that have side-effects either.

Its a Lazy initialize, rather than full side effect

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to have InitializeTaskAsPromise inlined too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No; as isn't always called; the backing task can be set elsewhere (and I think usually is)

// ProjectN's AsyncTaskMethodBuilder<VoidTaskResult>.Create() currently does additional debugger-related
// work, so we need to delegate to it.
new AsyncTaskMethodBuilder { m_builder = AsyncTaskMethodBuilder<VoidTaskResult>.Create() };
var result = new AsyncTaskMethodBuilder();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this InitializeTaskIfDebugging call into the (possibly new internal) constructor for AsyncTaskMethodBuilder now to ensure it doesn't get in future code-creep?

Also, since this behavior is needed for each variant of the *Builder, how can we ensure this constructor behavior unless we have a abstract base class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless we have a abstract base class

No inheritence as they are all structs which do not support it

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, duh. Sorry for that... wondering still about the fragility of this duplicate code.


/// <summary>Gets the <see cref="System.Threading.Tasks.Task"/> for this builder.</summary>
/// <returns>The <see cref="System.Threading.Tasks.Task"/> representing the builder's asynchronous operation.</returns>
/// <exception cref="System.InvalidOperationException">The builder is not initialized.</exception>
public Task Task
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
get => m_builder.Task;
get => m_task ?? AsyncMethodBuilderCore.InitializeTaskAsPromise(ref m_task!); // TODO-NULLABLE: Remove ! when nullable attributes are respected

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result.InitializeTaskAsStateMachineBox();
}
return result;
return AsyncMethodBuilderCore.InitalizeTaskIfDebugging(ref result, ref result.m_task!); // TODO-NULLABLE: Remove ! when nullable attributes are respected

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parallel comment to the AsyncMethodBuilder.Create above.

CreateDebugFinalizableAsyncStateMachineBox<IAsyncStateMachine>() :
new AsyncStateMachineBox<IAsyncStateMachine>());
#endif
get => m_task ?? AsyncMethodBuilderCore.InitializeTaskAsPromise(ref m_task!); // TODO-NULLABLE: Remove ! when nullable attributes are respected

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// ProjectN's AsyncTaskMethodBuilder.Create() currently does additional debugger-related
// work, so we need to delegate to it.
new AsyncValueTaskMethodBuilder() { _methodBuilder = AsyncTaskMethodBuilder.Create() };
var result = new AsyncValueTaskMethodBuilder();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
}

[MethodImpl(MethodImplOptions.NoInlining)]
private static ValueTask CreateValueTask(ref Task<VoidTaskResult> task) => new ValueTask(task ?? AsyncMethodBuilderCore.InitializeTaskAsPromise(ref task!)); // TODO-NULLABLE: Remove ! when nullable attributes are respected

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is subtly different from code in https://github.com/dotnet/coreclr/pull/24431/files#diff-290cb2024350a19884ed781417721d54R170 in that it wraps the m_task by getting it by ref in it's only caller. We should likely follow this pattern or the other one as best we can.

// work, so we need to delegate to it.
new AsyncValueTaskMethodBuilder<TResult>() { _methodBuilder = AsyncTaskMethodBuilder<TResult>.Create() };
var result = new AsyncValueTaskMethodBuilder<TResult>();
return AsyncMethodBuilderCore.InitalizeTaskIfDebugging(ref result, ref result.m_task!); // TODO-NULLABLE: Remove ! when nullable attributes are respected

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
}

[MethodImpl(MethodImplOptions.NoInlining)]
private static ValueTask<TResult> CreateValueTask(ref Task<TResult> task) => new ValueTask<TResult>(task ?? AsyncMethodBuilderCore.InitializeTaskAsPromise(ref task!)); // TODO-NULLABLE: Remove ! when nullable attributes are respected

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benaadams
Copy link
Member Author

Lot of bit rot here.

@benaadams benaadams closed this Oct 17, 2019
@benaadams benaadams deleted the ATMB branch October 17, 2019 22:10
@stephentoub
Copy link
Member

I'm experimenting with your approach in my pooling PR as well. I'm hopeful as a next step it'll let me actually make the pooling opt-in via a feature flag.

@benaadams
Copy link
Member Author

I'm experimenting with your approach in my pooling PR as well.

😄

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants