HashedWheelTimerScheduler waiting overhead fix #4032

IgorFedchenko · 2019-11-12T19:32:43Z

As stated in original issue, scheduler is waiting more then it should.

For this PR, I have added simple test, based on one of Akka.IO benchmarks - it is performing some communication via Akka.IO.Tcp, and uses some counters in HashedWheelTimerScheduler to track, how much ticks it should be waiting each time before performing scheduled actions, and how much it is actually working.

Then, there are few simple assertions:

msActual.ShouldBeLessThan(msRequired * 1.1M, "We absolutelly do not want scheduler to have more then 10% sleep overhead");
msActual.ShouldBeLessThan(msRequired * 1.05M, "We do not want scheduler to have more then 5% sleep overhead");
msActual.ShouldBeLessThan(msRequired * 1.01M, "Would be really nice for scheduler to have less then 1% sleep overhead");

This allows to keep track of how much overhead do we have.

Original implementation was ceiling required ticks to closest milliseconds value, and was making Thread.Sleep on it. This was giving much overhead in cases when sleep is going to be short (less then ~10-15 ms on Windows).

After some experiments, I ended up with pretty simple solution - using empty loop with Stopwatch.ElapsedTicks condition gives much more accurate results.
This is what I was getting in test output with initial implementation:

We do not want scheduler to have more then 5% sleep overhead
Required wait time is 19,0089815ms, and actual is 20,5070609ms

We absolutelly do not want scheduler to have more then 10% sleep overhead
Required wait time is 18,2111832ms, and actual is 20,6246174ms

And this is what I am getting with updated one (test passes of course):

Required wait time is 21,7370645ms, and actual is 21,7728415ms

Required wait time is 21,1151001ms, and actual is 21,1228956ms

Required wait time is 22,147157ms, and actual is 22,1707158ms

Required wait time is 22,0152161ms, and actual is 22,0428943ms

Time loss is much smaller.

P.S. With that said, Sleep is still for good when we need to wait more then 10-15 ms. Not sure how often this can be the case, but still, I made Sleep function smart enough to use Thread.Sleep instead of looping when required delay is big enough (scala's implementation calls "big enough" 1ms under linux and 10ms under windows).

IgorFedchenko · 2019-11-12T19:34:44Z

Actually, one more thing that would be nice to implement here is using Task instead of Thread here. I do not see any issues with making Run function async, and make use of Task.Delay for long waits instead of just blocking the thread. Will do.

Aaronontheweb · 2019-11-12T19:58:42Z

Good idea - not sure why the Cluster.Sharding specs all failed the first time around. Might be that these changes with the timer may have affected how long certain things happen - i.e. tasks that used to take ~n seconds now take < n.

Aaronontheweb · 2019-11-12T19:59:08Z

Good idea regarding async / await - we need to get into the habit of not blocking threads when we're waiting for work.

Aaronontheweb · 2019-11-12T20:11:01Z

@IgorFedchenko looks like we're having issues with these sharding specs again, but only on Windows - doesn't look like that issue came up at all on Linux. Would you mind looking into it?

IgorFedchenko · 2019-11-13T12:50:51Z

Would you mind looking into it?

@Aaronontheweb Sure

Aaronontheweb · 2019-11-13T20:02:21Z

Looks like the multi-node specs don't start properly or hang with these changes too.... Weird.

IgorFedchenko · 2019-11-15T18:32:30Z

@Aaronontheweb Sharding tests pass well on my local machine. And all of them are failing due to timeouts, which makes me thing that they can be racy too.

Let me set larger timeout for few of them, just for experiment.

IgorFedchenko · 2019-11-15T18:38:23Z

@Aaronontheweb Seems like Sharding tests are pretty heavy... Should we group them into xunit test collection to run one-by-one?

Aaronontheweb · 2019-11-15T18:41:28Z

@IgorFedchenko yeah, we can disable parallelization there by using collections or by doing what we did here:

https://github.com/akkadotnet/akka.net/blob/dev/src/core/Akka.Remote.Tests/Akka.Remote.Tests.csproj#L3

That will use the xunit.runner.json settings to disable parallelization for the entire DLL.

IgorFedchenko · 2019-11-15T18:42:14Z

That will use the xunit.runner.json settings to disable parallelization for the entire DLL.

Awesome

This reverts commit 1528025.

IgorFedchenko · 2019-11-15T18:51:33Z

src/contrib/cluster/Akka.Cluster.Sharding.Tests/Akka.Cluster.Sharding.Tests.csproj

@@ -1,5 +1,12 @@
 <Project Sdk="Microsoft.NET.Sdk">
  <Import Project="..\..\..\common.props" />
+
+    <!-- use the xunit JSON settings file to disable parallel tests running-->


Could not just reference <Import Project="..\..\xunitSettings.props" /> here, because this props file references ..\..\xunit.runner.json relative path, which is not valid here (need one more ..\). Possibly there is some cleaner war do do that.

Ah... I'm sure there is using some MsBuild variables, but good point. May as well just use collection fixtures then.

IgorFedchenko · 2019-11-15T19:31:50Z

It helped pretty well - most of failing tests passed. But still, few tests failing due timings, and pass locally. I will try to increase timeouts...

IgorFedchenko · 2019-11-15T20:24:01Z

@Aaronontheweb All right, seems like the only result I can get is occasionally get all tests passed single time, before some of them will hit some timeouts again. Some tests are racy, and there is nothing I can do with this (quickly). One time they pass, and the next time they do not.

I will rollback timeout changes, since they are not very helpful.

This reverts commit 004c09b. Revert "Increased other timeouts" This reverts commit 72331b7. Revert "Increased timeouts for failing tests" This reverts commit 659e1e5.

Aaronontheweb · 2019-11-15T22:35:38Z

Why does the MNTR not run for this PR?

…tSettings.props

IgorFedchenko · 2019-11-26T12:50:34Z

@Zetanova So, you think that there is almost nothing to do with this sleeping overhead? We have added custom scheduler, that will use single thread but allow to use Task.Delay instead of Thread.Sleep to save some resources (and maybe this will also reduce sleep time), but this is all we can do here? Maybe you are right, in terms that there will always be trade-off between having scheduler sleeping less time and CPU load.

Existing scheduling algorithm that was already implemented here before my PR already looks good for me - and, it is organized the same way as in scala, so even if some optimizations are possible I do not think it may be any kind of bottleneck.

After some local testing of latest version, I have also noticed much higher CPU load. So I replaced Thread.SpanWait to Task.Delay under new custom TaskScheduler that will execute it on single thread. If this will help with MNTR failures, I will run Akka.IO benchmark to see if async/await usage on single thread will give any improvements. But not sure about that, since we are releasing thread that is not used by anyone else.

…pprovement)

IgorFedchenko · 2019-11-26T14:01:22Z

So now it is working, mostly as before, and I do not expect much throughput improvements... But feeling that something is not right is still there :)
We want to wait for 10ms or less between each bucket registrations execution. And each time we will wait for at least ~15ms. Each time. My sleep overhead test shows sleeping 28 seconds instead of requested 15 seconds now (I have commented out this check for now, but maybe will bring it back).
Yes, we can not afford to spend so much CPU cycles on Thread.SpinWait, but I'll try some more ideas to change this...
If one thread should either sleep for 15+ms or not sleep at all, maybe worth trying to start two threads with some initial time shift to make them both trying to handle buckets scheduling. I.e. the one who has woken up closer to weel tick will execute bucket tasks and sleep again. sometimes this may give us more accurate pauses between weel ticks, with the cost of two threads.
Also, we can make number of threads configurable, because obviously the more threads are trying to catch tick moment, the more accurate it will be scheduled.
A kind of crazy idea, but might work? I.e. two threads with two loops, each is waiting for tick, and once awaken first of them marks tick as handled and executes bucket tasks, and second starts sleeping for next tick immediately (and therefore has more chances to wake up earlier next time).
Is it worth trying? Or I am missing something?

We can also make use of custom scheduler, but let it allocate 2 (or another fixed number) of threads for that.

Zetanova · 2019-11-26T14:11:08Z

You are running in the normal wrong assumptions:

1ms is a very short time.
=> It is not for the CPU, SpinWaits are for a few tousend interations (less then 10k).
Thats on a 3.4Ghz CPU how long?
A process can tell the CPU to wait for a time.
=> It can not, it can do work or tell the OS not to require the thread anymore.

if i have some time, i will try my luck with it next week

Aaronontheweb · 2019-11-26T14:41:19Z

@IgorFedchenko @Zetanova so here's what I'd propose:

Let's start with ditching the TaskScheduler and stick with a dedicated thread in order to keep things simple
Let's focus on improving the CPU utilization of that thread, when at all possible.

For instance, Thread.Sleep may not be that big of a performance hit in the grand scheme of things:

https://docs.microsoft.com/en-us/dotnet/api/system.threading.thread.sleep?view=netframework-4.8
https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-sleep?redirectedfrom=MSDN - the underlying Windows system call that Thread.Sleep invokes

This function causes a thread to relinquish the remainder of its time slice and become unrunnable for an interval based on the value of dwMilliseconds. The system clock "ticks" at a constant rate. If dwMilliseconds is less than the resolution of the system clock, the thread may sleep for less than the specified length of time. If dwMilliseconds is greater than one tick but less than two, the wait can be anywhere between one and two ticks, and so on. To increase the accuracy of the sleep interval, call the timeGetDevCaps function to determine the supported minimum timer resolution and the timeBeginPeriod function to set the timer resolution to its minimum.

We can't worry about being super-precise with the Scheduler - the underlying operating system doesn't. We need something that is good enough and that's what the hashed wheel timer mechanism does - predicts the number of revolutions around the wheel that are needed before the task is due based on a constant "tick" rate. What we can probably do, however, is make the way we wait for those ticks to elapse more efficient - I'm open to suggestions on that, but I'd stick with running things inside a dedicated thread and using threading primitives like SpinWait and Sleep

SpinWait might be advantageous to use in scenarios where we can predict that the next bucket will be coming up very soon - i.e. when the scheduler is using either a very short interval for a recurring task or there's a large number of scheduled tasks queued. In scenarios where the workload is more infrequent, Sleeping is probably ok. I'd stick to working within the structure we have and focusing in making it so we waste less of the CPU's time when the scheduler is idle.

IgorFedchenko · 2019-11-26T14:51:04Z

@Zetanova

You are running in the normal wrong assumptions

Well, not exactly... Of course 1ms is a huge amount of time for almost any modern CPU, and there is no way to ask CPU to wait (only to ask OS to release resources for a while).

The initial idea was to make some useless work on current thread but make more accurate pauses between weel ticks. And see if this will improve overall performance, since scheduler is a critical place and maybe giving "some" more CPU would worth it.
But as tests showing, this is too high cost when we are running multiple ActorSystem's in parallel. Several MNTR tests (each is starting several child processes with their own schedulers) load CPU too much, so we have to drop initial idea.

@Aaronontheweb So, indeed, does not seem like async/await will give us much improvement when using TaskScheduler over single thread - maybe Thread.Sleep will do the same job for us.

I will revert back changes, and will add small optimization to use Thread.SpinWait when there are only few hundreds of ticks to wait for (something close to what @Zetanova said about SpinWait usage).

Aaronontheweb

Left some comments - but do you have any additional benchmark data for how these changes may have impacted CPU?

Aaronontheweb · 2019-11-28T17:18:13Z

src/core/Akka/Actor/Scheduler/HashedWheelTimerScheduler.cs

                    }

+                    var stopWatch = Stopwatch.StartNew();


Instead of starting a new Stopwatch here, if you use HighRestMonotonicClock.Ticks that will give you the current Stopwatch.Elapsed value since Akka.NET started and you can acquire a second comparison value where you have Stopwatch.Sleep instead - just to save on allocations.

Ah, actually had to move this stopwatch to private field instantiated once - but missed this. And sure, your suggestion is even better.

Used the same thing for sleep time tracking when making Thread.SpinWait, instead of separate Stopwatch instance usage.

Aaronontheweb · 2019-11-28T17:20:29Z

src/core/Akka/Actor/Scheduler/HashedWheelTimerScheduler.cs

+        /// API for internal usage
+        /// </summary>
+        [InternalApi]
+        public static readonly AtomicCounter TotalTicksRequiredToWaitStrict = new AtomicCounter(0);


What are these being used for exactly? Testing?

Indeed, some more comments required here... Yes, for testing - I will update summary now.

Aaronontheweb · 2019-12-02T18:11:30Z

@IgorFedchenko looks like we need an API approval here.

Aaronontheweb

LGTM - need to see updated benchmark figures though @IgorFedchenko

IgorFedchenko · 2019-12-02T20:04:16Z

@Aaronontheweb Benchmark results (compare to this one) - I was just waiting for it to finish:

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18362
Intel Core i7-4770 CPU 3.40GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100
  [Host]     : .NET Core 2.1.11 (CoreCLR 4.6.27617.04, CoreFX 4.6.27617.02), X64 RyuJIT
  Job-ESBJQR : .NET Core 2.1.11 (CoreCLR 4.6.27617.04, CoreFX 4.6.27617.02), X64 RyuJIT

InvocationCount=1  IterationCount=100  LaunchCount=1  
RunStrategy=Monitoring  UnrollFactor=1  WarmupCount=1  

|                    Method | MessageCount | MessageLength | ClientsCount |        Mean |     Error |     StdDev |      Median |     Gen 0 |     Gen 1 | Gen 2 | Allocated |
|-------------------------- |------------- |-------------- |------------- |------------:|----------:|-----------:|------------:|----------:|----------:|------:|----------:|
| ClientServerCommunication |          100 |            10 |            1 |    57.73 ms |  2.044 ms |   6.026 ms |    57.76 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |            10 |            3 |    29.01 ms |  1.106 ms |   3.262 ms |    28.57 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |            10 |            5 |    23.37 ms |  0.742 ms |   2.188 ms |    22.69 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |            10 |            7 |    26.15 ms |  1.859 ms |   5.480 ms |    24.66 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |            10 |           10 |    24.30 ms |  0.561 ms |   1.653 ms |    23.67 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |            10 |           20 |    34.56 ms |  0.319 ms |   0.940 ms |    34.32 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |            10 |           30 |    48.56 ms |  0.599 ms |   1.767 ms |    48.45 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |            10 |           40 |    60.99 ms |  0.968 ms |   2.854 ms |    60.02 ms | 1000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |           100 |            1 |    43.49 ms |  0.578 ms |   1.706 ms |    43.08 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |           100 |            3 |    22.63 ms |  0.531 ms |   1.564 ms |    22.14 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |           100 |            5 |    21.61 ms |  0.462 ms |   1.363 ms |    21.17 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |           100 |            7 |    22.17 ms |  0.535 ms |   1.577 ms |    21.54 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |           100 |           10 |    23.79 ms |  0.394 ms |   1.161 ms |    23.54 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |           100 |           20 |    35.11 ms |  0.446 ms |   1.315 ms |    34.92 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |           100 |           30 |    47.21 ms |  0.674 ms |   1.989 ms |    46.43 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |          100 |           100 |           40 |    61.06 ms |  0.859 ms |   2.531 ms |    60.52 ms | 1000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |            10 |            1 |   403.88 ms |  5.701 ms |  16.808 ms |   399.22 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |            10 |            3 |   179.15 ms |  5.153 ms |  15.195 ms |   173.13 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |            10 |            5 |   146.48 ms |  3.080 ms |   9.081 ms |   141.66 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |            10 |            7 |   130.54 ms |  1.972 ms |   5.813 ms |   128.32 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |            10 |           10 |   127.79 ms |  2.292 ms |   6.759 ms |   125.80 ms | 1000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |            10 |           20 |   136.60 ms |  3.864 ms |  11.392 ms |   134.19 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |            10 |           30 |   136.68 ms |  2.551 ms |   7.521 ms |   134.19 ms | 1000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |            10 |           40 |   144.96 ms |  2.349 ms |   6.925 ms |   143.74 ms | 1000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |           100 |            1 |   402.41 ms |  3.585 ms |  10.570 ms |   399.46 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |           100 |            3 |   174.00 ms |  4.007 ms |  11.813 ms |   167.71 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |           100 |            5 |   142.37 ms |  2.376 ms |   7.006 ms |   139.87 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |           100 |            7 |   139.31 ms |  3.900 ms |  11.498 ms |   137.67 ms |         - |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |           100 |           10 |   129.31 ms |  2.730 ms |   8.050 ms |   126.67 ms | 1000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |           100 |           20 |   129.00 ms |  1.351 ms |   3.983 ms |   127.82 ms | 1000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |           100 |           30 |   136.91 ms |  2.293 ms |   6.760 ms |   134.40 ms | 1000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |         1000 |           100 |           40 |   150.63 ms |  3.591 ms |  10.588 ms |   147.31 ms | 1000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |        10000 |            10 |            1 | 3,943.60 ms | 12.900 ms |  38.036 ms | 3,934.15 ms | 7000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |        10000 |            10 |            3 | 1,648.77 ms | 20.125 ms |  59.340 ms | 1,627.00 ms | 7000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |        10000 |            10 |            5 | 1,355.34 ms | 17.621 ms |  51.957 ms | 1,332.90 ms | 7000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |        10000 |            10 |            7 | 1,244.47 ms | 20.884 ms |  61.578 ms | 1,218.32 ms | 7000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |        10000 |            10 |           10 | 1,182.45 ms | 21.427 ms |  63.179 ms | 1,154.13 ms | 7000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |        10000 |            10 |           20 | 1,069.44 ms | 12.701 ms |  37.450 ms | 1,062.56 ms | 7000.0000 | 1000.0000 |     - |   1.58 KB |
| ClientServerCommunication |        10000 |            10 |           30 | 1,099.14 ms | 13.493 ms |  39.783 ms | 1,085.21 ms | 8000.0000 | 1000.0000 |     - |   1.58 KB |
| ClientServerCommunication |        10000 |            10 |           40 | 1,091.91 ms | 15.335 ms |  45.217 ms | 1,078.84 ms | 8000.0000 | 1000.0000 |     - |   1.58 KB |
| ClientServerCommunication |        10000 |           100 |            1 | 3,945.08 ms | 14.754 ms |  43.501 ms | 3,938.04 ms | 8000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |        10000 |           100 |            3 | 1,658.57 ms | 23.353 ms |  68.858 ms | 1,625.38 ms | 8000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |        10000 |           100 |            5 | 1,348.68 ms | 14.412 ms |  42.493 ms | 1,331.29 ms | 8000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |        10000 |           100 |            7 | 1,238.80 ms | 21.984 ms |  64.820 ms | 1,212.24 ms | 8000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |        10000 |           100 |           10 | 1,289.90 ms | 40.027 ms | 118.019 ms | 1,270.24 ms | 8000.0000 |         - |     - |   1.58 KB |
| ClientServerCommunication |        10000 |           100 |           20 | 1,179.72 ms | 29.509 ms |  87.009 ms | 1,164.57 ms | 8000.0000 | 1000.0000 |     - |   1.58 KB |
| ClientServerCommunication |        10000 |           100 |           30 | 1,155.36 ms | 27.313 ms |  80.533 ms | 1,136.48 ms | 9000.0000 | 1000.0000 |     - |   1.58 KB |
| ClientServerCommunication |        10000 |           100 |           40 | 1,236.05 ms | 32.779 ms |  96.649 ms | 1,230.72 ms | 9000.0000 | 1000.0000 |     - |   1.58 KB |

Looks almost the same - sometimes little bit faster, sometimes slower.
May need to play with TicksToUseThreadSpinWait value in scheduler - right now it is set to 100 ticks, which means that when we have 101 tick to wait we will sleep for at least 10 000 * 10 ticks (10ms). Or even better - get the list of sleep times during some simple benchmark and try to set this ticks treashold to less "random" value.

IgorFedchenko · 2019-12-03T16:44:23Z

So, about sleep times.

I started one sample from benchmark, which is sending 10_000 messages (10 bytes each) in request-reply manner from 3 concurrent clients (that is ~3333 messages per client) to single server via Akka.IO.Tcp (this is what my benchmark from previous comment is doing), and have done this for 100 iterations (pass 10k messages to this 3 clients 100 times). Each client is sending next message once server responded with reply for the previous message sent by this client.

There were 19284 sleeps. 18950 from them were in [80_000, 90_000] ticks range (10k is 1ms as you probably know). 157 sleeps below 80K, and overall it looks similar to normalized distribution.

Well, here is the file (initially created for me, without formatting, but will share it with you, guys). Sleep numbers are in ascending order.
waitings.txt

So it is obvious that my 100 ticks threshold has nothing to do this this. And actually does not seem that we can successfully use Thread.SpinWait here, because delays by the most part are near to the wheel tick value of 100K ticks, and none of them are below 20K - which is pretty big value for SpinWait if I understand right.

So here are two questions:

Is it too low load for scheduler to make any conclusions? I mean, how can I (and is it possible) to make sleep times closer to 1K or less? If so, I could make some sample run with such scenario and see what threshold might be used for Thread.SpinWait in that case.
If performing all scheduled actions for 1-2 ms is find and something typical for scheduler, that we just can not optimize it with Thread.SpinWait and just have to wait for 15+ ms each time instead of 8-9ms requested.

Aaronontheweb · 2019-12-03T17:00:50Z

@IgorFedchenko so, it sounds like this PR won't add any value to the existing scheduler performance then?

IgorFedchenko · 2019-12-03T17:22:06Z

@Aaronontheweb If in most use cases scheduler does not have much work to do (as in my benchmark) - then yes, there is nothing to optimize. Well, I wish we could make scheduler sleep for more accurate interval, but looks like the only way to do this (and keep things simple enough) is using Thread.SpinWait, which requires too much CPU for the timings we have.

IgorFedchenko · 2019-12-06T14:58:24Z

@Aaronontheweb Should we close it?

Zetanova · 2020-05-18T21:56:34Z

I looked at the current implementation and it seams to be well done.
My test was always on point with the tick schedule less then one ms off (sometimes)

The only improvement what i can see is:

Maybe to use a readonly ValueType for the arrays in the buckets. This will speed up the iterations over them. But the effect will be only minimal and only noticeable for very large entry counts (>5k)
Maybe to sleep longer then it is known that the next ticks will be empty.

IgorFedchenko added 3 commits November 12, 2019 19:06

Optimized waiting and added test for wait time assertions

b058744

Refactored sleeping code and using Thread.Sleep for long pauses

a0dd7b7

Test minor refactoring

0b71f4c

Updated to use async/await support, and Task.Delay

f795063

Merge branch 'dev' into scheduler-wait-fix

ab61ec5

Aaronontheweb added 2 commits November 13, 2019 16:04

Merge branch 'dev' into scheduler-wait-fix

64ed7f5

Merge branch 'dev' into scheduler-wait-fix

e9ff148

Increased timeouts for some Sharding tests

1528025

IgorFedchenko added 2 commits November 15, 2019 21:48

Running Akka.Cluster.Sharding tests sequentially

83e57ad

Revert "Increased timeouts for some Sharding tests"

83ed97f

This reverts commit 1528025.

IgorFedchenko commented Nov 15, 2019

View reviewed changes

IgorFedchenko added 3 commits November 15, 2019 22:32

Increased timeouts for failing tests

659e1e5

Increased other timeouts

72331b7

And one more timeout increased

004c09b

Restored initial timeouts

1b1e906

This reverts commit 004c09b. Revert "Increased other timeouts" This reverts commit 72331b7. Revert "Increased timeouts for failing tests" This reverts commit 659e1e5.

IgorFedchenko added 2 commits November 16, 2019 11:37

Using MSBuild variable to fix referencing xunit.runner.json from xuni…

0e5eb4b

…tSettings.props

Split Azure Pipeline jobs into stages to run them all sequencially

11f208e

Use Task.Delay instead of SpinWait to check if it fails due to CPU load

06cbdb8

IgorFedchenko added 2 commits November 26, 2019 16:07

Updated HashWeelScheduler sleep overhead test

76f7d0c

Marked public scheduler properties as internal usage (and added API a…

950fdcf

…pprovement)

IgorFedchenko added 7 commits November 26, 2019 18:11

Revert back to plain-old single read, but with SpinWait optimization

a3553ad

Merge branch 'dev' into scheduler-wait-fix

2932716

Merge branch 'dev' into scheduler-wait-fix

40541c3

Merge branch 'dev' into scheduler-wait-fix

3c49a6a

Revert Api Approvement for internal api

f6c128c

Unsafe threading handling restored

328f22a

One more minor change reverted

0859181

Aaronontheweb requested changes Nov 28, 2019

View reviewed changes

Minor updates and refactoring

2757d31

IgorFedchenko added 2 commits December 2, 2019 21:53

Merge branch 'dev' into scheduler-wait-fix

24a7613

Added API approval for public fields with internal usage

1c0a872

Aaronontheweb approved these changes Dec 2, 2019

View reviewed changes

Aaronontheweb closed this Dec 6, 2019

IgorFedchenko deleted the scheduler-wait-fix branch December 16, 2019 18:39

Aaronontheweb mentioned this pull request May 18, 2020

Performance regression on upgrade to 1.4 #4417

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HashedWheelTimerScheduler waiting overhead fix #4032

HashedWheelTimerScheduler waiting overhead fix #4032

IgorFedchenko commented Nov 12, 2019

IgorFedchenko commented Nov 12, 2019

Aaronontheweb commented Nov 12, 2019

Aaronontheweb commented Nov 12, 2019

Aaronontheweb commented Nov 12, 2019

IgorFedchenko commented Nov 13, 2019

Aaronontheweb commented Nov 13, 2019

IgorFedchenko commented Nov 15, 2019

IgorFedchenko commented Nov 15, 2019 •

edited

Loading

Aaronontheweb commented Nov 15, 2019

IgorFedchenko commented Nov 15, 2019

IgorFedchenko Nov 15, 2019

Aaronontheweb Nov 15, 2019

IgorFedchenko commented Nov 15, 2019

IgorFedchenko commented Nov 15, 2019

Aaronontheweb commented Nov 15, 2019

IgorFedchenko commented Nov 26, 2019

IgorFedchenko commented Nov 26, 2019 •

edited

Loading

Zetanova commented Nov 26, 2019 •

edited

Loading

Aaronontheweb commented Nov 26, 2019

IgorFedchenko commented Nov 26, 2019

Aaronontheweb left a comment

Aaronontheweb Nov 28, 2019

IgorFedchenko Nov 29, 2019

IgorFedchenko Nov 29, 2019

Aaronontheweb Nov 28, 2019

IgorFedchenko Nov 29, 2019

Aaronontheweb commented Dec 2, 2019

Aaronontheweb left a comment

IgorFedchenko commented Dec 2, 2019 •

edited

Loading

IgorFedchenko commented Dec 3, 2019

Aaronontheweb commented Dec 3, 2019

IgorFedchenko commented Dec 3, 2019

IgorFedchenko commented Dec 6, 2019

Zetanova commented May 18, 2020

HashedWheelTimerScheduler waiting overhead fix #4032

HashedWheelTimerScheduler waiting overhead fix #4032

Conversation

IgorFedchenko commented Nov 12, 2019

IgorFedchenko commented Nov 12, 2019

Aaronontheweb commented Nov 12, 2019

Aaronontheweb commented Nov 12, 2019

Aaronontheweb commented Nov 12, 2019

IgorFedchenko commented Nov 13, 2019

Aaronontheweb commented Nov 13, 2019

IgorFedchenko commented Nov 15, 2019

IgorFedchenko commented Nov 15, 2019 • edited Loading

Aaronontheweb commented Nov 15, 2019

IgorFedchenko commented Nov 15, 2019

IgorFedchenko Nov 15, 2019

Choose a reason for hiding this comment

Aaronontheweb Nov 15, 2019

Choose a reason for hiding this comment

IgorFedchenko commented Nov 15, 2019

IgorFedchenko commented Nov 15, 2019

Aaronontheweb commented Nov 15, 2019

IgorFedchenko commented Nov 26, 2019

IgorFedchenko commented Nov 26, 2019 • edited Loading

Zetanova commented Nov 26, 2019 • edited Loading

Aaronontheweb commented Nov 26, 2019

IgorFedchenko commented Nov 26, 2019

Aaronontheweb left a comment

Choose a reason for hiding this comment

Aaronontheweb Nov 28, 2019

Choose a reason for hiding this comment

IgorFedchenko Nov 29, 2019

Choose a reason for hiding this comment

IgorFedchenko Nov 29, 2019

Choose a reason for hiding this comment

Aaronontheweb Nov 28, 2019

Choose a reason for hiding this comment

IgorFedchenko Nov 29, 2019

Choose a reason for hiding this comment

Aaronontheweb commented Dec 2, 2019

Aaronontheweb left a comment

Choose a reason for hiding this comment

IgorFedchenko commented Dec 2, 2019 • edited Loading

IgorFedchenko commented Dec 3, 2019

Aaronontheweb commented Dec 3, 2019

IgorFedchenko commented Dec 3, 2019

IgorFedchenko commented Dec 6, 2019

Zetanova commented May 18, 2020

IgorFedchenko commented Nov 15, 2019 •

edited

Loading

IgorFedchenko commented Nov 26, 2019 •

edited

Loading

Zetanova commented Nov 26, 2019 •

edited

Loading

IgorFedchenko commented Dec 2, 2019 •

edited

Loading