Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HashedWheelTimerScheduler waiting overhead fix #4032

Closed
wants to merge 43 commits into from

Conversation

IgorFedchenko
Copy link
Contributor

Close #4031

As stated in original issue, scheduler is waiting more then it should.

For this PR, I have added simple test, based on one of Akka.IO benchmarks - it is performing some communication via Akka.IO.Tcp, and uses some counters in HashedWheelTimerScheduler to track, how much ticks it should be waiting each time before performing scheduled actions, and how much it is actually working.

Then, there are few simple assertions:

msActual.ShouldBeLessThan(msRequired * 1.1M, "We absolutelly do not want scheduler to have more then 10% sleep overhead");
msActual.ShouldBeLessThan(msRequired * 1.05M, "We do not want scheduler to have more then 5% sleep overhead");
msActual.ShouldBeLessThan(msRequired * 1.01M, "Would be really nice for scheduler to have less then 1% sleep overhead");

This allows to keep track of how much overhead do we have.

Original implementation was ceiling required ticks to closest milliseconds value, and was making Thread.Sleep on it. This was giving much overhead in cases when sleep is going to be short (less then ~10-15 ms on Windows).

After some experiments, I ended up with pretty simple solution - using empty loop with Stopwatch.ElapsedTicks condition gives much more accurate results.
This is what I was getting in test output with initial implementation:

We do not want scheduler to have more then 5% sleep overhead
Required wait time is 19,0089815ms, and actual is 20,5070609ms

We absolutelly do not want scheduler to have more then 10% sleep overhead
Required wait time is 18,2111832ms, and actual is 20,6246174ms

And this is what I am getting with updated one (test passes of course):

Required wait time is 21,7370645ms, and actual is 21,7728415ms

Required wait time is 21,1151001ms, and actual is 21,1228956ms

Required wait time is 22,147157ms, and actual is 22,1707158ms

Required wait time is 22,0152161ms, and actual is 22,0428943ms

Time loss is much smaller.

P.S. With that said, Sleep is still for good when we need to wait more then 10-15 ms. Not sure how often this can be the case, but still, I made Sleep function smart enough to use Thread.Sleep instead of looping when required delay is big enough (scala's implementation calls "big enough" 1ms under linux and 10ms under windows).

@IgorFedchenko
Copy link
Contributor Author

Actually, one more thing that would be nice to implement here is using Task instead of Thread here. I do not see any issues with making Run function async, and make use of Task.Delay for long waits instead of just blocking the thread. Will do.

@Aaronontheweb
Copy link
Member

Good idea - not sure why the Cluster.Sharding specs all failed the first time around. Might be that these changes with the timer may have affected how long certain things happen - i.e. tasks that used to take ~n seconds now take < n.

@Aaronontheweb
Copy link
Member

Good idea regarding async / await - we need to get into the habit of not blocking threads when we're waiting for work.

@Aaronontheweb
Copy link
Member

@IgorFedchenko looks like we're having issues with these sharding specs again, but only on Windows - doesn't look like that issue came up at all on Linux. Would you mind looking into it?

@IgorFedchenko
Copy link
Contributor Author

Would you mind looking into it?

@Aaronontheweb Sure

@Aaronontheweb
Copy link
Member

Looks like the multi-node specs don't start properly or hang with these changes too.... Weird.

@IgorFedchenko
Copy link
Contributor Author

@Aaronontheweb Sharding tests pass well on my local machine. And all of them are failing due to timeouts, which makes me thing that they can be racy too.

Let me set larger timeout for few of them, just for experiment.

@IgorFedchenko
Copy link
Contributor Author

IgorFedchenko commented Nov 15, 2019

@Aaronontheweb Seems like Sharding tests are pretty heavy... Should we group them into xunit test collection to run one-by-one?

@Aaronontheweb
Copy link
Member

@IgorFedchenko yeah, we can disable parallelization there by using collections or by doing what we did here:

https://github.com/akkadotnet/akka.net/blob/dev/src/core/Akka.Remote.Tests/Akka.Remote.Tests.csproj#L3

That will use the xunit.runner.json settings to disable parallelization for the entire DLL.

@IgorFedchenko
Copy link
Contributor Author

That will use the xunit.runner.json settings to disable parallelization for the entire DLL.

Awesome

@@ -1,5 +1,12 @@
<Project Sdk="Microsoft.NET.Sdk">
<Import Project="..\..\..\common.props" />

<!-- use the xunit JSON settings file to disable parallel tests running-->
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could not just reference <Import Project="..\..\xunitSettings.props" /> here, because this props file references ..\..\xunit.runner.json relative path, which is not valid here (need one more ..\). Possibly there is some cleaner war do do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... I'm sure there is using some MsBuild variables, but good point. May as well just use collection fixtures then.

@IgorFedchenko
Copy link
Contributor Author

It helped pretty well - most of failing tests passed. But still, few tests failing due timings, and pass locally. I will try to increase timeouts...

@IgorFedchenko
Copy link
Contributor Author

@Aaronontheweb All right, seems like the only result I can get is occasionally get all tests passed single time, before some of them will hit some timeouts again. Some tests are racy, and there is nothing I can do with this (quickly). One time they pass, and the next time they do not.

I will rollback timeout changes, since they are not very helpful.

This reverts commit 004c09b.

Revert "Increased other timeouts"

This reverts commit 72331b7.

Revert "Increased timeouts for failing tests"

This reverts commit 659e1e5.
@Aaronontheweb
Copy link
Member

Why does the MNTR not run for this PR?

@IgorFedchenko
Copy link
Contributor Author

@Zetanova So, you think that there is almost nothing to do with this sleeping overhead? We have added custom scheduler, that will use single thread but allow to use Task.Delay instead of Thread.Sleep to save some resources (and maybe this will also reduce sleep time), but this is all we can do here? Maybe you are right, in terms that there will always be trade-off between having scheduler sleeping less time and CPU load.

Existing scheduling algorithm that was already implemented here before my PR already looks good for me - and, it is organized the same way as in scala, so even if some optimizations are possible I do not think it may be any kind of bottleneck.

After some local testing of latest version, I have also noticed much higher CPU load. So I replaced Thread.SpanWait to Task.Delay under new custom TaskScheduler that will execute it on single thread. If this will help with MNTR failures, I will run Akka.IO benchmark to see if async/await usage on single thread will give any improvements. But not sure about that, since we are releasing thread that is not used by anyone else.

@IgorFedchenko
Copy link
Contributor Author

IgorFedchenko commented Nov 26, 2019

So now it is working, mostly as before, and I do not expect much throughput improvements... But feeling that something is not right is still there :)
We want to wait for 10ms or less between each bucket registrations execution. And each time we will wait for at least ~15ms. Each time. My sleep overhead test shows sleeping 28 seconds instead of requested 15 seconds now (I have commented out this check for now, but maybe will bring it back).
Yes, we can not afford to spend so much CPU cycles on Thread.SpinWait, but I'll try some more ideas to change this...
If one thread should either sleep for 15+ms or not sleep at all, maybe worth trying to start two threads with some initial time shift to make them both trying to handle buckets scheduling. I.e. the one who has woken up closer to weel tick will execute bucket tasks and sleep again. sometimes this may give us more accurate pauses between weel ticks, with the cost of two threads.
Also, we can make number of threads configurable, because obviously the more threads are trying to catch tick moment, the more accurate it will be scheduled.
A kind of crazy idea, but might work? I.e. two threads with two loops, each is waiting for tick, and once awaken first of them marks tick as handled and executes bucket tasks, and second starts sleeping for next tick immediately (and therefore has more chances to wake up earlier next time).
Is it worth trying? Or I am missing something?

We can also make use of custom scheduler, but let it allocate 2 (or another fixed number) of threads for that.

@Zetanova
Copy link
Contributor

Zetanova commented Nov 26, 2019

You are running in the normal wrong assumptions:

  1. 1ms is a very short time.
    => It is not for the CPU, SpinWaits are for a few tousend interations (less then 10k).
    Thats on a 3.4Ghz CPU how long?

  2. A process can tell the CPU to wait for a time.
    => It can not, it can do work or tell the OS not to require the thread anymore.

if i have some time, i will try my luck with it next week

@Aaronontheweb
Copy link
Member

@IgorFedchenko @Zetanova so here's what I'd propose:

  • Let's start with ditching the TaskScheduler and stick with a dedicated thread in order to keep things simple
  • Let's focus on improving the CPU utilization of that thread, when at all possible.

For instance, Thread.Sleep may not be that big of a performance hit in the grand scheme of things:

This function causes a thread to relinquish the remainder of its time slice and become unrunnable for an interval based on the value of dwMilliseconds. The system clock "ticks" at a constant rate. If dwMilliseconds is less than the resolution of the system clock, the thread may sleep for less than the specified length of time. If dwMilliseconds is greater than one tick but less than two, the wait can be anywhere between one and two ticks, and so on. To increase the accuracy of the sleep interval, call the timeGetDevCaps function to determine the supported minimum timer resolution and the timeBeginPeriod function to set the timer resolution to its minimum.

We can't worry about being super-precise with the Scheduler - the underlying operating system doesn't. We need something that is good enough and that's what the hashed wheel timer mechanism does - predicts the number of revolutions around the wheel that are needed before the task is due based on a constant "tick" rate. What we can probably do, however, is make the way we wait for those ticks to elapse more efficient - I'm open to suggestions on that, but I'd stick with running things inside a dedicated thread and using threading primitives like SpinWait and Sleep

SpinWait might be advantageous to use in scenarios where we can predict that the next bucket will be coming up very soon - i.e. when the scheduler is using either a very short interval for a recurring task or there's a large number of scheduled tasks queued. In scenarios where the workload is more infrequent, Sleeping is probably ok. I'd stick to working within the structure we have and focusing in making it so we waste less of the CPU's time when the scheduler is idle.

@IgorFedchenko
Copy link
Contributor Author

@Zetanova

You are running in the normal wrong assumptions

Well, not exactly... Of course 1ms is a huge amount of time for almost any modern CPU, and there is no way to ask CPU to wait (only to ask OS to release resources for a while).

The initial idea was to make some useless work on current thread but make more accurate pauses between weel ticks. And see if this will improve overall performance, since scheduler is a critical place and maybe giving "some" more CPU would worth it.
But as tests showing, this is too high cost when we are running multiple ActorSystem's in parallel. Several MNTR tests (each is starting several child processes with their own schedulers) load CPU too much, so we have to drop initial idea.

@Aaronontheweb So, indeed, does not seem like async/await will give us much improvement when using TaskScheduler over single thread - maybe Thread.Sleep will do the same job for us.

I will revert back changes, and will add small optimization to use Thread.SpinWait when there are only few hundreds of ticks to wait for (something close to what @Zetanova said about SpinWait usage).

Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments - but do you have any additional benchmark data for how these changes may have impacted CPU?

}

var stopWatch = Stopwatch.StartNew();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of starting a new Stopwatch here, if you use HighRestMonotonicClock.Ticks that will give you the current Stopwatch.Elapsed value since Akka.NET started and you can acquire a second comparison value where you have Stopwatch.Sleep instead - just to save on allocations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, actually had to move this stopwatch to private field instantiated once - but missed this. And sure, your suggestion is even better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used the same thing for sleep time tracking when making Thread.SpinWait, instead of separate Stopwatch instance usage.

/// API for internal usage
/// </summary>
[InternalApi]
public static readonly AtomicCounter TotalTicksRequiredToWaitStrict = new AtomicCounter(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these being used for exactly? Testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, some more comments required here... Yes, for testing - I will update summary now.

@Aaronontheweb
Copy link
Member

@IgorFedchenko looks like we need an API approval here.

Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - need to see updated benchmark figures though @IgorFedchenko

@IgorFedchenko
Copy link
Contributor Author

IgorFedchenko commented Dec 2, 2019

@Aaronontheweb Benchmark results (compare to this one) - I was just waiting for it to finish:

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18362
Intel Core i7-4770 CPU 3.40GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100
  [Host]     : .NET Core 2.1.11 (CoreCLR 4.6.27617.04, CoreFX 4.6.27617.02), X64 RyuJIT
  Job-ESBJQR : .NET Core 2.1.11 (CoreCLR 4.6.27617.04, CoreFX 4.6.27617.02), X64 RyuJIT

InvocationCount=1  IterationCount=100  LaunchCount=1  
RunStrategy=Monitoring  UnrollFactor=1  WarmupCount=1  

|                    Method | MessageCount | MessageLength | ClientsCount |        Mean |     Error |     StdDev |      Median |     Gen 0 |     Gen 1 | Gen 2 | Allocated |
|-------------------------- |------------- |-------------- |------------- |------------:|----------:|-----------:|------------:|----------:|----------:|------:|----------:|
| ClientServerCommunication |          100 |            10 |            1 |    57.73 ms |  2.044 ms |   6.026 ms |    57.76 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |            10 |            3 |    29.01 ms |  1.106 ms |   3.262 ms |    28.57 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |            10 |            5 |    23.37 ms |  0.742 ms |   2.188 ms |    22.69 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |            10 |            7 |    26.15 ms |  1.859 ms |   5.480 ms |    24.66 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |            10 |           10 |    24.30 ms |  0.561 ms |   1.653 ms |    23.67 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |            10 |           20 |    34.56 ms |  0.319 ms |   0.940 ms |    34.32 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |            10 |           30 |    48.56 ms |  0.599 ms |   1.767 ms |    48.45 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |            10 |           40 |    60.99 ms |  0.968 ms |   2.854 ms |    60.02 ms | 1000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |           100 |            1 |    43.49 ms |  0.578 ms |   1.706 ms |    43.08 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |           100 |            3 |    22.63 ms |  0.531 ms |   1.564 ms |    22.14 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |           100 |            5 |    21.61 ms |  0.462 ms |   1.363 ms |    21.17 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |           100 |            7 |    22.17 ms |  0.535 ms |   1.577 ms |    21.54 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |           100 |           10 |    23.79 ms |  0.394 ms |   1.161 ms |    23.54 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |           100 |           20 |    35.11 ms |  0.446 ms |   1.315 ms |    34.92 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |           100 |           30 |    47.21 ms |  0.674 ms |   1.989 ms |    46.43 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |           100 |           40 |    61.06 ms |  0.859 ms |   2.531 ms |    60.52 ms | 1000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |            10 |            1 |   403.88 ms |  5.701 ms |  16.808 ms |   399.22 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |            10 |            3 |   179.15 ms |  5.153 ms |  15.195 ms |   173.13 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |            10 |            5 |   146.48 ms |  3.080 ms |   9.081 ms |   141.66 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |            10 |            7 |   130.54 ms |  1.972 ms |   5.813 ms |   128.32 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |            10 |           10 |   127.79 ms |  2.292 ms |   6.759 ms |   125.80 ms | 1000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |            10 |           20 |   136.60 ms |  3.864 ms |  11.392 ms |   134.19 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |            10 |           30 |   136.68 ms |  2.551 ms |   7.521 ms |   134.19 ms | 1000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |            10 |           40 |   144.96 ms |  2.349 ms |   6.925 ms |   143.74 ms | 1000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |           100 |            1 |   402.41 ms |  3.585 ms |  10.570 ms |   399.46 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |           100 |            3 |   174.00 ms |  4.007 ms |  11.813 ms |   167.71 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |           100 |            5 |   142.37 ms |  2.376 ms |   7.006 ms |   139.87 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |           100 |            7 |   139.31 ms |  3.900 ms |  11.498 ms |   137.67 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |           100 |           10 |   129.31 ms |  2.730 ms |   8.050 ms |   126.67 ms | 1000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |           100 |           20 |   129.00 ms |  1.351 ms |   3.983 ms |   127.82 ms | 1000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |           100 |           30 |   136.91 ms |  2.293 ms |   6.760 ms |   134.40 ms | 1000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |           100 |           40 |   150.63 ms |  3.591 ms |  10.588 ms |   147.31 ms | 1000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |        10000 |            10 |            1 | 3,943.60 ms | 12.900 ms |  38.036 ms | 3,934.15 ms | 7000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |        10000 |            10 |            3 | 1,648.77 ms | 20.125 ms |  59.340 ms | 1,627.00 ms | 7000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |        10000 |            10 |            5 | 1,355.34 ms | 17.621 ms |  51.957 ms | 1,332.90 ms | 7000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |        10000 |            10 |            7 | 1,244.47 ms | 20.884 ms |  61.578 ms | 1,218.32 ms | 7000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |        10000 |            10 |           10 | 1,182.45 ms | 21.427 ms |  63.179 ms | 1,154.13 ms | 7000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |        10000 |            10 |           20 | 1,069.44 ms | 12.701 ms |  37.450 ms | 1,062.56 ms | 7000.0000 | 1000.0000 |     - |   1.58 KB |
| ClientServerCommunication |        10000 |            10 |           30 | 1,099.14 ms | 13.493 ms |  39.783 ms | 1,085.21 ms | 8000.0000 | 1000.0000 |     - |   1.58 KB |
| ClientServerCommunication |        10000 |            10 |           40 | 1,091.91 ms | 15.335 ms |  45.217 ms | 1,078.84 ms | 8000.0000 | 1000.0000 |     - |   1.58 KB |
| ClientServerCommunication |        10000 |           100 |            1 | 3,945.08 ms | 14.754 ms |  43.501 ms | 3,938.04 ms | 8000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |        10000 |           100 |            3 | 1,658.57 ms | 23.353 ms |  68.858 ms | 1,625.38 ms | 8000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |        10000 |           100 |            5 | 1,348.68 ms | 14.412 ms |  42.493 ms | 1,331.29 ms | 8000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |        10000 |           100 |            7 | 1,238.80 ms | 21.984 ms |  64.820 ms | 1,212.24 ms | 8000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |        10000 |           100 |           10 | 1,289.90 ms | 40.027 ms | 118.019 ms | 1,270.24 ms | 8000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |        10000 |           100 |           20 | 1,179.72 ms | 29.509 ms |  87.009 ms | 1,164.57 ms | 8000.0000 | 1000.0000 |     - |   1.58 KB |
| ClientServerCommunication |        10000 |           100 |           30 | 1,155.36 ms | 27.313 ms |  80.533 ms | 1,136.48 ms | 9000.0000 | 1000.0000 |     - |   1.58 KB |
| ClientServerCommunication |        10000 |           100 |           40 | 1,236.05 ms | 32.779 ms |  96.649 ms | 1,230.72 ms | 9000.0000 | 1000.0000 |     - |   1.58 KB |

Looks almost the same - sometimes little bit faster, sometimes slower.
May need to play with TicksToUseThreadSpinWait value in scheduler - right now it is set to 100 ticks, which means that when we have 101 tick to wait we will sleep for at least 10 000 * 10 ticks (10ms). Or even better - get the list of sleep times during some simple benchmark and try to set this ticks treashold to less "random" value.

@IgorFedchenko
Copy link
Contributor Author

So, about sleep times.

I started one sample from benchmark, which is sending 10_000 messages (10 bytes each) in request-reply manner from 3 concurrent clients (that is ~3333 messages per client) to single server via Akka.IO.Tcp (this is what my benchmark from previous comment is doing), and have done this for 100 iterations (pass 10k messages to this 3 clients 100 times). Each client is sending next message once server responded with reply for the previous message sent by this client.

There were 19284 sleeps. 18950 from them were in [80_000, 90_000] ticks range (10k is 1ms as you probably know). 157 sleeps below 80K, and overall it looks similar to normalized distribution.

Well, here is the file (initially created for me, without formatting, but will share it with you, guys). Sleep numbers are in ascending order.
waitings.txt

So it is obvious that my 100 ticks threshold has nothing to do this this. And actually does not seem that we can successfully use Thread.SpinWait here, because delays by the most part are near to the wheel tick value of 100K ticks, and none of them are below 20K - which is pretty big value for SpinWait if I understand right.

So here are two questions:

  1. Is it too low load for scheduler to make any conclusions? I mean, how can I (and is it possible) to make sleep times closer to 1K or less? If so, I could make some sample run with such scenario and see what threshold might be used for Thread.SpinWait in that case.

  2. If performing all scheduled actions for 1-2 ms is find and something typical for scheduler, that we just can not optimize it with Thread.SpinWait and just have to wait for 15+ ms each time instead of 8-9ms requested.

@Aaronontheweb
Copy link
Member

@IgorFedchenko so, it sounds like this PR won't add any value to the existing scheduler performance then?

@IgorFedchenko
Copy link
Contributor Author

@Aaronontheweb If in most use cases scheduler does not have much work to do (as in my benchmark) - then yes, there is nothing to optimize. Well, I wish we could make scheduler sleep for more accurate interval, but looks like the only way to do this (and keep things simple enough) is using Thread.SpinWait, which requires too much CPU for the timings we have.

@IgorFedchenko
Copy link
Contributor Author

@Aaronontheweb Should we close it?

@Zetanova
Copy link
Contributor

I looked at the current implementation and it seams to be well done.
My test was always on point with the tick schedule less then one ms off (sometimes)

The only improvement what i can see is:

  1. Maybe to use a readonly ValueType for the arrays in the buckets. This will speed up the iterations over them. But the effect will be only minimal and only noticeable for very large entry counts (>5k)
  2. Maybe to sleep longer then it is known that the next ticks will be empty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HashedWheelTimerScheduler is spending 35% of execution time in sleep
3 participants