[RFC] Reusable ThreadPool workitems #20936

benaadams · 2018-11-11T00:49:50Z

For the global threadpool queue workitems can be reused.

This introduces a threadpool thread local pool (64 items) and a global pool (256 items).

There are two types of pools; one for WaitCallback and one for Action<TState>.

Items are returned to the threadpool local pool of the thread they execute on; when that is full they are returned to the global pool; until full then they are discarded.

Queued items from a threadpool thread first attempt to get an item from their local pool, then the global pool, then they create a new item.

Queued items from a non-threadpool thread first attempt to get an item from the global pool, then they create a new item.

Items queued to the thread local work queues, rather than the global queue do not use reusable items.

Generic Action<TState> items are only pooled if they are an object type; as they are stored as object and pass through a type converter to execute; which would box value-types.

Before

After

/cc @stephentoub @davidfowl

stephentoub · 2018-11-11T01:08:19Z

Can you share a macro benchmark (e.g. some real-ish asp.net workload) that demonstrates this is actually a net win on throughout/load/etc.? I get nervous any time we introduce a global pool like this. It's only going to help with the first N items queued to execute concurrently (where N is however many you're willing to cache), it introduces contention, these are otherwise relatively small objects that are inexpensive to create, etc.

stephentoub · 2018-11-11T01:08:34Z

cc: @kouvel, @jkotas

benaadams · 2018-11-11T02:02:18Z

I get nervous any time we introduce a global pool like this.

Me too; why I marked it as RFC 😄

Can you share a macro benchmark (e.g. some real-ish asp.net workload) that demonstrates this is actually a net win on throughout/load/etc.?

Working on it. Current allocations for 816,628 requests in 2.2 for the aspnet/KestrelHttpServer/.../benchmarkapps/PlatformBenchmarks

benaadams · 2018-11-11T02:11:18Z

In 3.0 with this change for 948,734 requests

Though need to compare 3.0 before and after for perf

benaadams · 2018-11-11T02:53:11Z

Hmm, looking at the allocations; wasn't this ValueTaskAwaiter<TResult> interface cast boxing elided by the Jit?

coreclr/src/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilder.cs

Lines 392 to 396 in 463ba88

    
           else if ((null != (object)default(TAwaiter)) && (awaiter is IStateMachineBoxAwareAwaiter)) 
        
           { 
        
               try 
        
               { 
        
                   ((IStateMachineBoxAwareAwaiter)awaiter).AwaitUnsafeOnCompleted(box);

/cc @AndyAyersMS @stephentoub

jkotas · 2018-11-11T02:57:51Z

it introduces contention, these are otherwise relatively small objects that are inexpensive to create, etc.

Just to add one more less obvious to the list: more cards with Gen2 -> Gen0 pointers that make Gen0 pauses longer.

Can you share a macro benchmark

Agree, the allocating rate does not really matter for these kind of optimizations. What matters is throughput, length of GC pauses are, peak memory consumption, etc.

We had number of these custom allocators to solve various problems, and they pretty much always turned out to be a bad idea at the end after being fine-tuned to work well under different conditions (example: #18360). It is hard to beat GC for small objects.

benaadams · 2018-11-11T03:58:41Z

Add issue for the allocations of ValueTaskAwaiter<T> and TaskAwaiter https://github.com/dotnet/coreclr/issues/20937

benaadams · 2018-11-11T04:58:59Z

Those allocations might be tiered jit; but alas Windows update (insider build) broke my WSL test bed after reporting so I can't re-test in same way 😢

AndyAyersMS · 2018-11-11T08:24:48Z

Yes, tiering is a possible explanation for the allocations, but I expect to switch to tier 1 code quickly enough that in realistic tests it wouldn't be noticeable.

benaadams · 2018-11-11T13:06:42Z

Yes, tiering is a possible explanation for the allocations, but I expect to switch to tier 1 code quickly enough that in realistic tests it wouldn't be noticeable.

Manually, triggering 20ish requests triggers improved asm. The thousands of allocations from starting with the load-test is probably more a demonstration of the benefits of tier0, in that it was able to handle the load much earlier (as was running checked build)

benaadams · 2018-11-11T13:25:06Z

Allocations after a few warm-up rounds, for 1,254,606 requests against PR (149.4k allocated objects 12.67MB, 144.4k collected object 11.13MB)

GC Heap 366 new objects (17.3kB); 1.6k dead objects (89.1kB); 4.6k survived objects (1.53MB)

benaadams · 2018-11-12T05:13:51Z

Not worth it; Kestrel can SocketTransport can eliminate almost all of its QUWI callbacks by implementing IThreadPoolWorkItem aspnet/KestrelHttpServer#3095

Pipes can equally do the same (when scheduler == ThreadPoolScheduler); though the Pipe may need 4 subobjects so it can implement the interface 4 times; or a state carrier type that it shares.

Reusable ThreadPool workitems

8a3f35e

This was referenced Nov 11, 2018

Add YieldAwaiter support to the async method builder delegate optimization #17441

Merged

JIT: after devirtualization try undoing box and invoking unboxed entry #14698

Merged

benaadams closed this Nov 12, 2018

benaadams mentioned this pull request Jan 31, 2020

Devirtualiazation and unboxing regression dotnet/runtime#11443

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Reusable ThreadPool workitems #20936

[RFC] Reusable ThreadPool workitems #20936

benaadams commented Nov 11, 2018

stephentoub commented Nov 11, 2018

stephentoub commented Nov 11, 2018

benaadams commented Nov 11, 2018

benaadams commented Nov 11, 2018

benaadams commented Nov 11, 2018

jkotas commented Nov 11, 2018

benaadams commented Nov 11, 2018 •

edited

Loading

benaadams commented Nov 11, 2018

AndyAyersMS commented Nov 11, 2018

benaadams commented Nov 11, 2018 •

edited

Loading

benaadams commented Nov 11, 2018 •

edited

Loading

benaadams commented Nov 12, 2018 •

edited

Loading

[RFC] Reusable ThreadPool workitems #20936

[RFC] Reusable ThreadPool workitems #20936

Conversation

benaadams commented Nov 11, 2018

stephentoub commented Nov 11, 2018

stephentoub commented Nov 11, 2018

benaadams commented Nov 11, 2018

benaadams commented Nov 11, 2018

benaadams commented Nov 11, 2018

jkotas commented Nov 11, 2018

benaadams commented Nov 11, 2018 • edited Loading

benaadams commented Nov 11, 2018

AndyAyersMS commented Nov 11, 2018

benaadams commented Nov 11, 2018 • edited Loading

benaadams commented Nov 11, 2018 • edited Loading

benaadams commented Nov 12, 2018 • edited Loading

benaadams commented Nov 11, 2018 •

edited

Loading

benaadams commented Nov 11, 2018 •

edited

Loading

benaadams commented Nov 11, 2018 •

edited

Loading

benaadams commented Nov 12, 2018 •

edited

Loading