-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Conversation
Can you share a macro benchmark (e.g. some real-ish asp.net workload) that demonstrates this is actually a net win on throughout/load/etc.? I get nervous any time we introduce a global pool like this. It's only going to help with the first N items queued to execute concurrently (where N is however many you're willing to cache), it introduces contention, these are otherwise relatively small objects that are inexpensive to create, etc. |
Me too; why I marked it as RFC 😄
Working on it. Current allocations for 816,628 requests in 2.2 for the aspnet/KestrelHttpServer/.../benchmarkapps/PlatformBenchmarks |
Hmm, looking at the allocations; wasn't this coreclr/src/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilder.cs Lines 392 to 396 in 463ba88
|
Just to add one more less obvious to the list: more cards with Gen2 -> Gen0 pointers that make Gen0 pauses longer.
Agree, the allocating rate does not really matter for these kind of optimizations. What matters is throughput, length of GC pauses are, peak memory consumption, etc. We had number of these custom allocators to solve various problems, and they pretty much always turned out to be a bad idea at the end after being fine-tuned to work well under different conditions (example: #18360). It is hard to beat GC for small objects. |
Add issue for the allocations of |
Those allocations might be tiered jit; but alas Windows update (insider build) broke my WSL test bed after reporting so I can't re-test in same way 😢 |
Yes, tiering is a possible explanation for the allocations, but I expect to switch to tier 1 code quickly enough that in realistic tests it wouldn't be noticeable. |
Manually, triggering 20ish requests triggers improved asm. The thousands of allocations from starting with the load-test is probably more a demonstration of the benefits of tier0, in that it was able to handle the load much earlier (as was running checked build) |
Not worth it; Kestrel can SocketTransport can eliminate almost all of its QUWI callbacks by implementing Pipes can equally do the same (when scheduler == |
For the global threadpool queue workitems can be reused.
This introduces a threadpool thread local pool (64 items) and a global pool (256 items).
There are two types of pools; one for
WaitCallback
and one forAction<TState>
.Items are returned to the threadpool local pool of the thread they execute on; when that is full they are returned to the global pool; until full then they are discarded.
Queued items from a threadpool thread first attempt to get an item from their local pool, then the global pool, then they create a new item.
Queued items from a non-threadpool thread first attempt to get an item from the global pool, then they create a new item.
Items queued to the thread local work queues, rather than the global queue do not use reusable items.
Generic
Action<TState>
items are only pooled if they are an object type; as they are stored asobject
and pass through a type converter to execute; which would box value-types.Before
![image](https://user-images.githubusercontent.com/1142958/48306616-b8200880-e533-11e8-9ecb-ba5fc99453aa.png)
After
![image](https://user-images.githubusercontent.com/1142958/48307596-2f5f9780-e548-11e8-94e8-6d265d5e650d.png)
/cc @stephentoub @davidfowl