-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HashedWheelTimerScheduler waiting overhead fix #4032
Conversation
Actually, one more thing that would be nice to implement here is using |
Good idea - not sure why the Cluster.Sharding specs all failed the first time around. Might be that these changes with the timer may have affected how long certain things happen - i.e. tasks that used to take ~n seconds now take < n. |
Good idea regarding |
@IgorFedchenko looks like we're having issues with these sharding specs again, but only on Windows - doesn't look like that issue came up at all on Linux. Would you mind looking into it? |
@Aaronontheweb Sure |
Looks like the multi-node specs don't start properly or hang with these changes too.... Weird. |
@Aaronontheweb Sharding tests pass well on my local machine. And all of them are failing due to timeouts, which makes me thing that they can be racy too. Let me set larger timeout for few of them, just for experiment. |
@Aaronontheweb Seems like Sharding tests are pretty heavy... Should we group them into xunit test collection to run one-by-one? |
@IgorFedchenko yeah, we can disable parallelization there by using collections or by doing what we did here: That will use the |
Awesome |
@@ -1,5 +1,12 @@ | |||
<Project Sdk="Microsoft.NET.Sdk"> | |||
<Import Project="..\..\..\common.props" /> | |||
|
|||
<!-- use the xunit JSON settings file to disable parallel tests running--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could not just reference <Import Project="..\..\xunitSettings.props" />
here, because this props
file references ..\..\xunit.runner.json
relative path, which is not valid here (need one more ..\
). Possibly there is some cleaner war do do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... I'm sure there is using some MsBuild variables, but good point. May as well just use collection fixtures then.
It helped pretty well - most of failing tests passed. But still, few tests failing due timings, and pass locally. I will try to increase timeouts... |
@Aaronontheweb All right, seems like the only result I can get is occasionally get all tests passed single time, before some of them will hit some timeouts again. Some tests are racy, and there is nothing I can do with this (quickly). One time they pass, and the next time they do not. I will rollback timeout changes, since they are not very helpful. |
Why does the MNTR not run for this PR? |
@Zetanova So, you think that there is almost nothing to do with this sleeping overhead? We have added custom scheduler, that will use single thread but allow to use Existing scheduling algorithm that was already implemented here before my PR already looks good for me - and, it is organized the same way as in scala, so even if some optimizations are possible I do not think it may be any kind of bottleneck. After some local testing of latest version, I have also noticed much higher CPU load. So I replaced |
So now it is working, mostly as before, and I do not expect much throughput improvements... But feeling that something is not right is still there :) We can also make use of custom scheduler, but let it allocate 2 (or another fixed number) of threads for that. |
You are running in the normal wrong assumptions:
if i have some time, i will try my luck with it next week |
@IgorFedchenko @Zetanova so here's what I'd propose:
For instance,
We can't worry about being super-precise with the SpinWait might be advantageous to use in scenarios where we can predict that the next bucket will be coming up very soon - i.e. when the scheduler is using either a very short interval for a recurring task or there's a large number of scheduled tasks queued. In scenarios where the workload is more infrequent, Sleeping is probably ok. I'd stick to working within the structure we have and focusing in making it so we waste less of the CPU's time when the scheduler is idle. |
Well, not exactly... Of course 1ms is a huge amount of time for almost any modern CPU, and there is no way to ask CPU to wait (only to ask OS to release resources for a while). The initial idea was to make some useless work on current thread but make more accurate pauses between weel ticks. And see if this will improve overall performance, since scheduler is a critical place and maybe giving "some" more CPU would worth it. @Aaronontheweb So, indeed, does not seem like I will revert back changes, and will add small optimization to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments - but do you have any additional benchmark data for how these changes may have impacted CPU?
} | ||
|
||
var stopWatch = Stopwatch.StartNew(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of starting a new Stopwatch
here, if you use HighRestMonotonicClock.Ticks
that will give you the current Stopwatch.Elapsed
value since Akka.NET started and you can acquire a second comparison value where you have Stopwatch.Sleep
instead - just to save on allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, actually had to move this stopwatch
to private field instantiated once - but missed this. And sure, your suggestion is even better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used the same thing for sleep time tracking when making Thread.SpinWait
, instead of separate Stopwatch
instance usage.
/// API for internal usage | ||
/// </summary> | ||
[InternalApi] | ||
public static readonly AtomicCounter TotalTicksRequiredToWaitStrict = new AtomicCounter(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these being used for exactly? Testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, some more comments required here... Yes, for testing - I will update summary now.
@IgorFedchenko looks like we need an API approval here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - need to see updated benchmark figures though @IgorFedchenko
@Aaronontheweb Benchmark results (compare to this one) - I was just waiting for it to finish:
Looks almost the same - sometimes little bit faster, sometimes slower. |
So, about sleep times. I started one sample from benchmark, which is sending 10_000 messages (10 bytes each) in request-reply manner from 3 concurrent clients (that is ~3333 messages per client) to single server via There were Well, here is the file (initially created for me, without formatting, but will share it with you, guys). Sleep numbers are in ascending order. So it is obvious that my So here are two questions:
|
@IgorFedchenko so, it sounds like this PR won't add any value to the existing scheduler performance then? |
@Aaronontheweb If in most use cases scheduler does not have much work to do (as in my benchmark) - then yes, there is nothing to optimize. Well, I wish we could make scheduler sleep for more accurate interval, but looks like the only way to do this (and keep things simple enough) is using |
@Aaronontheweb Should we close it? |
I looked at the current implementation and it seams to be well done. The only improvement what i can see is:
|
Close #4031
As stated in original issue, scheduler is waiting more then it should.
For this PR, I have added simple test, based on one of Akka.IO benchmarks - it is performing some communication via Akka.IO.Tcp, and uses some counters in
HashedWheelTimerScheduler
to track, how much ticks it should be waiting each time before performing scheduled actions, and how much it is actually working.Then, there are few simple assertions:
This allows to keep track of how much overhead do we have.
Original implementation was ceiling required ticks to closest milliseconds value, and was making
Thread.Sleep
on it. This was giving much overhead in cases when sleep is going to be short (less then ~10-15 ms on Windows).After some experiments, I ended up with pretty simple solution - using empty loop with
Stopwatch.ElapsedTicks
condition gives much more accurate results.This is what I was getting in test output with initial implementation:
And this is what I am getting with updated one (test passes of course):
Time loss is much smaller.
P.S. With that said,
Sleep
is still for good when we need to wait more then 10-15 ms. Not sure how often this can be the case, but still, I madeSleep
function smart enough to useThread.Sleep
instead of looping when required delay is big enough (scala's implementation calls "big enough" 1ms under linux and 10ms under windows).