New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Swift Actor/Tasks concurrency on Linux - Lock contention in rand() #760
Comments
Would it be possible to use the Windows optimization for Linux also possibly to work around this? |
CC @rokhinip |
Is there any plan to solve this issue? It's reaching a third year since reported :( |
Silence :( |
Hi everyone, Sadly this is no small feat and a large project in itself. We don't currently have more details to share on the long term. We'd certainly welcome any PRs that could help remove the short-term pain, but medium-to-long term we think replacing the executors backing Swift Concurrency may be a preferable direction here. |
What does this mean for me if I use libdispatch directly in a C++ application as a lock-free thread pool implementation? Is this variant of libdispatch officially discontinued? |
Would such a PR be ok?
|
No. The Swift project is just considering whether it should continue to use libdispatch as the standard implementation of our thread pool on non-Darwin platforms.
Without any changes to make the thread pools coordinate, they'd both create their own threads. Out of the box, that means you could end up with |
Both? Perhaps there's a misunderstanding: I don't use Swift, I just use libdispatch. |
That seems abstractly right; could you adjust the comment and make a real PR? |
Then you would not be affected by Swift's use of a different thread pool implementation. |
Well, my question was about what it means for libdispatch bugs priority & maintenance in general :) |
@ilya-fedin thanks for the PR, we'll try to rerun the original workload test (@freef4ll could you please try to give it a spin with a benchmark perhaps so we can get before/after numbers and add that to #804 as further input).
@ktoso and @rjmccall - thanks for clarifying your thoughts on the possible future direction of the shared concurrency pool for Swift, I think the above quoted sentence would be pragmatic. I think what you suggest is the right call (I remember when we spent some time getting libdispatch to work on Solaris back in the day - the mach/kqueue bits makes it a bit challenging to keep the codebase in sync as have been seen - Windows doesn't make it easier). I also know that the needs of various users on different platforms can differ quite a bit too (e.g. back in the day we wanted to prioritise latency over energy efficiency and had a spin thread for picking up new work as a configurable default, for data center usage with many cores available, it was a great improvement for the stuff we did). One could envision a few different variations there even on the same OS platform. To structure the code base of such an API such that not only multiple platforms are easy to support, but also such that one could have a couple of variants per platform would be nice. Also, just to mention a need to put it into your thought process - it's desirable to be able to pin executors to a given thread and it'd be nice if a future API made that fairly straightforward (not sure how the interaction with the concurrent pool and additional such threads would look like, but it'd be nice if it could be managed by the same code base...) - this is especially interesting for I/O threads where one might pin the thread to a specific core, which one designates to handle the interrupts from the network card (with the use of a user-land networking stack, this allows for low latency processing of inbound packets with good cache behaviour...). |
The #804 helps the CPU usage of our stress workload, reduction from 100% utilisation of 30 CPU cores to ~18 cores on a 32 core system. Sadly, the throughput numbers of the workload do not change. The lock contention in rand() is now replaced with spending 20% in _dispatch_root_queue_mediator_is_gone() : When larger amount of Actors are present, apple/swift#68299 reproduces. |
Thanks for verifying on your end @freef4ll Merged and should be part of 5.9.3 https://forums.swift.org/t/development-open-for-swift-5-9-3-for-linux-and-windows/69020 |
We have a workload pipeline which is chaining several thousand Actors to each other via AsyncStream processing pipeline.
There is a multiplication affect that a single event at the start of the processing pipeline will be amplified as the event will be delivered to several Tasks processing the events concurrently. The processing time of each wakeup is currently quite small and on several microseconds range currently.
Under Linux, what was observed when stressing this processing pipeline is that ~45% of the stacks show
__DISPATCH_ROOT_QUEUE_CONTENDED_WAIT__()
, which is leading to lock contention in glibc rand() - as there are ~60 threads which are created and they all contend here:This is occurring in every entrance of DISPATCH_ROOT_QUEUE_CONTENDED_WAIT(), while using macro _dispatch_contention_wait_until() which in turn uses _dispatch_contention_spins(), in here the rand() call comes in and the macro produces just these 4 values: 31, 63, 95 and 127 for how many pause/yield instructions to execute.
The following example can reproduce the issue where ~28% of the time when sampling is spent in the code path mentioned.
The example creates 5000 tasks which work between 1μs and 3μs and then sleep for random 6-10 milliseconds. The point of the test is to create the contention and to illustrate the issue with rand():
When run on Ryzen 5950X system, 18-19 HT cores are spent processing the workload. While on M1 Pro just ~4.
The text was updated successfully, but these errors were encountered: