-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ExecutionContext fast-paths #36538
Conversation
src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs
Outdated
Show resolved
Hide resolved
...braries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs
Outdated
Show resolved
Hide resolved
...raries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncTaskMethodBuilderT.cs
Outdated
Show resolved
Hide resolved
@stephentoub issues resolved? |
src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs
Outdated
Show resolved
Hide resolved
Functionally it looks ok to me. Does it have a measurable perf impact? |
Is peanut butter; at least 0.5% of ThreadPool.Dispatch for Json Though it does also fix an existing issue with EC restore for |
Does that mean we see this change improving json TE throughout by 10K RPS? Can we run the benchmarks with it? |
I'm not seeing this comment anymore, but the delegate would call the method through a precode and the precode target would be updated to point to the Tier1 code once it's ready. Currently when tiering is enabled, direct calls also call through a precode. When a method is jitted before a callee is jitted, the call would go through a precode (when tiering is disabled too). If the callee is already Tier1 it may be possible to avoid calling through the precode, I'll have to look into this, the same would apply to delegates as well. |
Just curious... Are the precodes jmp blocks (like a trampoline or tailcall, as it were) or does it end up being a double call?
So they always called this way rather direct calls going directly to method; making inlining having an additional effectiveness of eliminating the indirection? (aside from the other advantages of inlining) |
Yea they are jmp blocks
Possibly, I had done a bit of prototyping on this before and didn't see much improvement in steady-state from fixing direct calls but some noticeable startup improvement as there would be more calls involved. May also depend on arch/processor, as prefetching helps to avoid the overhead. Delegate calls may be more expensive though, curious to see. |
If I was to go all in; it would be something like this https://github.com/dotnet/runtime/compare/master...benaadams:EC-all-in?expand=1 To turn all the Default context => Default context runs into direct calls rather than the triple: call => delegate => call; so it would also pick up all these in for example the Fortunes benchmark: Since they are chains of async only the first couple are picked up by this change as they run directly on ThreadPool, the follow on inlines don't get picked up. |
Hmm, might be able to do that, but in better way |
@adamsitnik can you help here? |
03f4148
to
31c37a6
Compare
Sure! Is there any chance that you could send me a |
@benaadams I've run them twice ( You can find the trace files in |
@benaadams, seems like this should be closed, since for whatever reason we see things getting worse with it? |
Would like to revisit when EH write through is on by default |
Sounds good. Let's close it until then. Thanks. |
Improve some of the ThreadPool fast-paths, reduce indirect calls for
AsyncValueTaskMethodBuilder<T>
;AsyncTaskMethodBuilder<T>
andTimer
; by calling.MoveNext()
directly call rather via the indirect chainEC.Run -> delegate -> .MoveNext()
which should help with the CPU's instruction cache and indirect branch prediction (Intel only tries to predict one indirect call, not chains?).Split merged methods switched on if into their respective callers:
AsyncValueTaskMethodBuilder<T>
andAsyncTaskMethodBuilder<T>
forMoveNext()
andIThreadPoolWorkItem.Execute()
(since they are top level virtual/interface and their callee is too big to inline into them)Optimize invocations from
ThreadPoolWorkQueue.Dispatch
:AsyncValueTaskMethodBuilder<T>
;AsyncTaskMethodBuilder<T>
andTimer
can call.MoveNext()
directly if Default context and called fromThreadPoolWorkQueue.Dispatch
.AsyncValueTaskMethodBuilder<T>
;AsyncTaskMethodBuilder<T>
andTimer
don't need to touchThreadStatic
CurrentThread called fromThreadPoolWorkQueue.Dispatch
.AsyncValueTaskMethodBuilder<T>
;AsyncTaskMethodBuilder<T>
andTimer
can use simpler EC run when not-Default context and called fromThreadPoolWorkQueue.Dispatch
.Optimize invocations for Default context:
AsyncValueTaskMethodBuilder<T>
;AsyncTaskMethodBuilder<T>
andTimer
can call.MoveNext()
directly if Default context and thread currently on Default context.Json callers (on Default context; which become extra fast-pathed)
Fortunes callers (on Default context; which become extra fast-pathed)