New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.10.4: Timing Oddities #433

Closed
Mike-EEE opened this Issue Apr 22, 2017 · 7 comments

Comments

Projects
None yet
3 participants
@Mike-EEE

Mike-EEE commented Apr 22, 2017

I have gone down another timing rabbit hole, I am afraid. I am beginning to think that my machine is fubared and a reinstall will do me good. What makes me say this is that I recently created a performance VM, which has 6 of my 8 cores assigned to it, and it seems to deliver far better and more consistent results.

Upon upgrading to 0.10.4, my experience on my (possibly-fubar'd) machine seems much more chunky and slow. The results are themselves are now slower (but probably more accurate).

However, I am struggling to understand this, which is a concise example of the pain I have been enduring of late:

MainWarmup  1: 524288 op, 525385681.75 ns, 1.0021 us/op
MainWarmup  2: 524288 op, 526700831.41 ns, 1.0046 us/op
MainWarmup  3: 524288 op, 526987551.74 ns, 1.0051 us/op
MainWarmup  4: 524288 op, 528491726.49 ns, 1.0080 us/op
MainWarmup  5: 524288 op, 526077020.94 ns, 1.0034 us/op

MainTarget  1: 524288 op, 543544267.43 ns, 1.0367 us/op
MainTarget  2: 524288 op, 541075317.50 ns, 1.0320 us/op
MainTarget  3: 524288 op, 543599618.85 ns, 1.0368 us/op
MainTarget  4: 524288 op, 543177841.06 ns, 1.0360 us/op
MainTarget  5: 524288 op, 543132452.90 ns, 1.0359 us/op
...

Notice that the main target jumps ~.03us from the warmup. This is surprisingly (and maddeningly) consistent, and by consistent I mean the amount, not the direction. Here is another example:

MainWarmup  1: 524288 op, 563842738.88 ns, 1.0754 us/op
MainWarmup  2: 524288 op, 561412811.69 ns, 1.0708 us/op
MainWarmup  3: 524288 op, 563789878.27 ns, 1.0753 us/op
MainWarmup  4: 524288 op, 563330461.52 ns, 1.0745 us/op
MainWarmup  5: 524288 op, 563496239.01 ns, 1.0748 us/op

MainTarget  1: 524288 op, 547626711.15 ns, 1.0445 us/op
MainTarget  2: 524288 op, 548645177.22 ns, 1.0465 us/op
MainTarget  3: 524288 op, 545946518.91 ns, 1.0413 us/op
MainTarget  4: 524288 op, 549033467.40 ns, 1.0472 us/op
MainTarget  5: 524288 op, 546995705.01 ns, 1.0433 us/op

There is .03us again, but in the different direction.

Now take an example from my newly installed VM (that is indeed/ironically hosted on my possibly horked machine that produces the dubious results above):

MainWarmup  1: 524288 op, 591198200.00 ns, 1.1276 us/op
MainWarmup  2: 524288 op, 597866500.00 ns, 1.1403 us/op
MainWarmup  3: 524288 op, 588773000.00 ns, 1.1230 us/op
MainWarmup  4: 524288 op, 604942200.00 ns, 1.1538 us/op
MainWarmup  5: 524288 op, 589829200.00 ns, 1.1250 us/op

MainTarget  1: 524288 op, 618755700.00 ns, 1.1802 us/op
MainTarget  2: 524288 op, 578465000.00 ns, 1.1033 us/op
MainTarget  3: 524288 op, 615863200.00 ns, 1.1747 us/op
MainTarget  4: 524288 op, 584967800.00 ns, 1.1157 us/op
MainTarget  5: 524288 op, 595187800.00 ns, 1.1352 us/op

This seems more appropriate and expected to me.

So I am beginning to wonder if there is a setting/configuration somewhere that I do not have set properly on my primary host, or perhaps a bad install. For reference, here are the stats from my host:

BenchmarkDotNet=v0.10.4, OS=Windows 10.0.15063
Processor=Intel Core i7-4820K CPU 3.70GHz (Haswell), ProcessorCount=8
Frequency=3613277 Hz, Resolution=276.7571 ns, Timer=TSC
  [Host]     : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.2046.0
  Job-NGUNOA : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.2046.0

As you can see, I just upgraded to Creator Update which installs .NET 4.7 and the latest RyuJIT.

Here's the stats from my VM:

BenchmarkDotNet=v0.10.4, OS=Windows 10.0.15063
Processor=Intel Core i7-4820K CPU 3.70GHz (Haswell), ProcessorCount=6
Frequency=10000000 Hz, Resolution=100.0000 ns, Timer=UNKNOWN
  [Host]     : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.2046.0
  Job-NGUNOA : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.2046.0

I noticed that the Timer is UNKNOWN. Could this be contributing in some way? (FWIW I tried playing around with the different clocks without much luck).

Finally, I will also state that I still encounter wild discrepancies on my host machine. For instance from the results above I've noticed that it starts out at around ~1.10us and then after successive tries (about 10 or so) it will work its way down to ~1.00us (or even to 986ns at one point). As you can imagine, determining if a change I made an impact is nearly impossible from this process (or at the very least extremely time-consuming).

Conversely, the VM hangs steady at around 1.10-1.12us each and every time.

So, is it possible I have a bad install/configuration somewhere? One thought that occurs to me is that I do own a Rampage Extreme IV and there is a bunch of overclocking on it that I (read: a friend 😆) did when I first got it (a few years ago). Could it be that a setting somewhere is causing all my angst? Even if that were the case it would seem to cascade to the VM.

So, still lost here, unfortunately. Any insight/assistance would be greatly appreciated.

@Mike-EEE

This comment has been minimized.

Show comment
Hide comment
@Mike-EEE

Mike-EEE Apr 27, 2017

Wanted to post an update here and report that indeed this appears to be related to two underlying issues of my own making:

  1. Use of overclocked non-QVL'd memory for my Rampage Extreme IV
  2. A really terrible Windows install with extensive disk issues that were not resolved until I could chkdsk it from another install altogether. I have had this install since Windows 8.0 with all sorts of installations and so it's any wonder I was experiencing grief.

So I have basically spent the week engaged in some applied spring cleaning, whether I liked it or not. 😆

Now I am on a totally brand new environment, with QVL-d memory, and things run so much better. Additionally, I was pleasantly surprised to find that I could run BDN from Safe Mode. 😈 So I have a super-great, extremely consistent environment now that runs like I expect it. Well, mostly (see below).

That's the good news. The bad news is that I still see some weird behavior with my executions.

I didn't want to get into this earlier (or even now) with a lot of detail, but the oddity that I am seeing appears to stem from GC. It seems that running code that involves the use of a params call (array allocation) introduces a variance between two builds. When I comment out the call to the method that makes use of the params, the build produces slower results -- even in Safe Mode -- as well as the incredible variance in the results. Which doesn't make sense at all because less allocations should be faster.

Anyways, I am going to attempt to make a very simple project that hopefully outlines and captures this for you. I will update here once I do.

Mike-EEE commented Apr 27, 2017

Wanted to post an update here and report that indeed this appears to be related to two underlying issues of my own making:

  1. Use of overclocked non-QVL'd memory for my Rampage Extreme IV
  2. A really terrible Windows install with extensive disk issues that were not resolved until I could chkdsk it from another install altogether. I have had this install since Windows 8.0 with all sorts of installations and so it's any wonder I was experiencing grief.

So I have basically spent the week engaged in some applied spring cleaning, whether I liked it or not. 😆

Now I am on a totally brand new environment, with QVL-d memory, and things run so much better. Additionally, I was pleasantly surprised to find that I could run BDN from Safe Mode. 😈 So I have a super-great, extremely consistent environment now that runs like I expect it. Well, mostly (see below).

That's the good news. The bad news is that I still see some weird behavior with my executions.

I didn't want to get into this earlier (or even now) with a lot of detail, but the oddity that I am seeing appears to stem from GC. It seems that running code that involves the use of a params call (array allocation) introduces a variance between two builds. When I comment out the call to the method that makes use of the params, the build produces slower results -- even in Safe Mode -- as well as the incredible variance in the results. Which doesn't make sense at all because less allocations should be faster.

Anyways, I am going to attempt to make a very simple project that hopefully outlines and captures this for you. I will update here once I do.

@adamsitnik

This comment has been minimized.

Show comment
Hide comment
@adamsitnik

adamsitnik Apr 29, 2017

Member

@Mike-EEE

  1. @AndreyAkinshin could you tell Mike why is it not recommended to run benchmarks with virtual machine?
  2. I suspec the Roslyn optimizations here. Roslyn can optimize your params method call and few other things. It can invoke an overload of the given method that accepts for example (int i1, int i2, int i3) instead of params int[] ints. The best way to verify is to apply [KeepBenchmarkFiles] attribute to your class that contains benchmarks, run the benchmarks, copy the output executable file path, find your dll in this folder and open it with ILSpy or ILDasm to see what IL was produced. You can even make it more simple and just build your project and go to it's output and see what's the IL inside.
Member

adamsitnik commented Apr 29, 2017

@Mike-EEE

  1. @AndreyAkinshin could you tell Mike why is it not recommended to run benchmarks with virtual machine?
  2. I suspec the Roslyn optimizations here. Roslyn can optimize your params method call and few other things. It can invoke an overload of the given method that accepts for example (int i1, int i2, int i3) instead of params int[] ints. The best way to verify is to apply [KeepBenchmarkFiles] attribute to your class that contains benchmarks, run the benchmarks, copy the output executable file path, find your dll in this folder and open it with ILSpy or ILDasm to see what IL was produced. You can even make it more simple and just build your project and go to it's output and see what's the IL inside.
@Mike-EEE

This comment has been minimized.

Show comment
Hide comment
@Mike-EEE

Mike-EEE Apr 30, 2017

Ahhh to clarify here @adamsitnik I was running benchmarks in:

  • My VM
  • My VM host
  • Surface Pro 3

The VM and SP3 were showing vastly different results and behavior than my host. I have since refreshed my entire environment and everything seems so much better now, especially with the expected timings where the params generation is (now) a smidge slower than without. That part seems working as expected now. I do still get oddities with "jumps" between warmup and main, and I hope to highlight them with a simple example project sometime this next week.

FWIW I am running on Windows Server 2016 as a Hyper-V host. I switched to Server when I tried going into Safe Mode with Windows 10 (explicitly for benchmarking) and noticed that even there, Cortana and other services were still enabled. Didn't seem so "safe" to me and figured Server would be much more streamlined (and was right).

With that said, one final thought (for now): with all of this work of setting up services and a new machine (well SUITE of machines), I was thinking that this project could totally innovate this space (more than it already has!) by offering it as a service: Benchmarking-as-a-Service (Like a B-a-a-S! Hehe),

I have seen more and more articles and blogs using Benchmark.NET as its authoritative benchmark reference (and rightfully so). However, all the numbers are relative to the developers machine. If anything with all of my pain here it would seem that benchmarking is the antithesis of "works on my machine" as there are an incredible amount of variables at play that can impact results. How great would it be to have a service we could simply send our code to crunch and spit out an authoritative, 99.9999-percentile consistent/accurate number that can be referenced and known/understood by others?

Additionally, being able to scale it out and run it concurrently on numerous/multiple machines and instances would be invaluable in scenarios such as this to track down WTF is going on and/or if oddities are showing up elsewhere, too. :)

If I am not mistaken @AndreyAkinshin you work for JetBrains, correct? This would be something they should be all over, IMO. Additionally, it would be priceless to be able to integrate something like this with dotMemory/dotTrace to really get at the root of what is going on by having it also generate associated files, regardless.

Mike-EEE commented Apr 30, 2017

Ahhh to clarify here @adamsitnik I was running benchmarks in:

  • My VM
  • My VM host
  • Surface Pro 3

The VM and SP3 were showing vastly different results and behavior than my host. I have since refreshed my entire environment and everything seems so much better now, especially with the expected timings where the params generation is (now) a smidge slower than without. That part seems working as expected now. I do still get oddities with "jumps" between warmup and main, and I hope to highlight them with a simple example project sometime this next week.

FWIW I am running on Windows Server 2016 as a Hyper-V host. I switched to Server when I tried going into Safe Mode with Windows 10 (explicitly for benchmarking) and noticed that even there, Cortana and other services were still enabled. Didn't seem so "safe" to me and figured Server would be much more streamlined (and was right).

With that said, one final thought (for now): with all of this work of setting up services and a new machine (well SUITE of machines), I was thinking that this project could totally innovate this space (more than it already has!) by offering it as a service: Benchmarking-as-a-Service (Like a B-a-a-S! Hehe),

I have seen more and more articles and blogs using Benchmark.NET as its authoritative benchmark reference (and rightfully so). However, all the numbers are relative to the developers machine. If anything with all of my pain here it would seem that benchmarking is the antithesis of "works on my machine" as there are an incredible amount of variables at play that can impact results. How great would it be to have a service we could simply send our code to crunch and spit out an authoritative, 99.9999-percentile consistent/accurate number that can be referenced and known/understood by others?

Additionally, being able to scale it out and run it concurrently on numerous/multiple machines and instances would be invaluable in scenarios such as this to track down WTF is going on and/or if oddities are showing up elsewhere, too. :)

If I am not mistaken @AndreyAkinshin you work for JetBrains, correct? This would be something they should be all over, IMO. Additionally, it would be priceless to be able to integrate something like this with dotMemory/dotTrace to really get at the root of what is going on by having it also generate associated files, regardless.

@Mike-EEE

This comment has been minimized.

Show comment
Hide comment
@Mike-EEE

Mike-EEE Apr 30, 2017

Oh I've gone and done it now. :)
https://youtrack.jetbrains.com/issue/RSRP-464596

(Please vote!)

Mike-EEE commented Apr 30, 2017

Oh I've gone and done it now. :)
https://youtrack.jetbrains.com/issue/RSRP-464596

(Please vote!)

@adamsitnik

This comment has been minimized.

Show comment
Hide comment
@adamsitnik

adamsitnik May 7, 2017

Member

@Mike-EEE I like the idea of Benchmarking-as-a-Service but I am afraid that .NET community is not ready for that. Devs want everything for free, as soon as they have to pay for any tool they start searching for something that is free.

Member

adamsitnik commented May 7, 2017

@Mike-EEE I like the idea of Benchmarking-as-a-Service but I am afraid that .NET community is not ready for that. Devs want everything for free, as soon as they have to pay for any tool they start searching for something that is free.

@Mike-EEE

This comment has been minimized.

Show comment
Hide comment
@Mike-EEE

Mike-EEE May 17, 2017

@adamsitnik yeah it was just a thought and I am in agreement with your assessment. I am sure it will get built one way or another, in any case.

An update on this issue: my effort to refresh my digital life has spilled into my real life, so that's set me back a few weeks here. I have this issue flagged in my TODO list once I return and will aim to get you a simple, sample project when I can.

Mike-EEE commented May 17, 2017

@adamsitnik yeah it was just a thought and I am in agreement with your assessment. I am sure it will get built one way or another, in any case.

An update on this issue: my effort to refresh my digital life has spilled into my real life, so that's set me back a few weeks here. I have this issue flagged in my TODO list once I return and will aim to get you a simple, sample project when I can.

@Mike-EEE

This comment has been minimized.

Show comment
Hide comment
@Mike-EEE

Mike-EEE Jul 2, 2017

ALRIGHT. More than two months later... 😛

Really could not have anticipated the sheer amount of offline time this would take. So much accomplished, however. I have completely refreshed my environment (both virtual and real life haha) and now things run much better and (more importantly) as expected.

I am not sure what exactly happened, and I am apt to not care at the point, only that it forced me to commit the significant undertaking that has just transpired. Now I am on the latest Windows 10 install along with Visual Studio 2017 Preview 3 and I am seeing the results that I expect with the same code.

So my best guess on what happened was the tooling along with some wonky OS glitchery (yes that's a word). I was probably running up against something lurking within the VS2015 .NET Core 1.0 tooling.

Anyways, everything is running MUCH better now. Even building projects is blazing fast compared to '15. Night and day difference. I will be closing this issue now as this problem has fixed itself. Thanks again all for your awesome assistance and (more importantly) patience!

Mike-EEE commented Jul 2, 2017

ALRIGHT. More than two months later... 😛

Really could not have anticipated the sheer amount of offline time this would take. So much accomplished, however. I have completely refreshed my environment (both virtual and real life haha) and now things run much better and (more importantly) as expected.

I am not sure what exactly happened, and I am apt to not care at the point, only that it forced me to commit the significant undertaking that has just transpired. Now I am on the latest Windows 10 install along with Visual Studio 2017 Preview 3 and I am seeing the results that I expect with the same code.

So my best guess on what happened was the tooling along with some wonky OS glitchery (yes that's a word). I was probably running up against something lurking within the VS2015 .NET Core 1.0 tooling.

Anyways, everything is running MUCH better now. Even building projects is blazing fast compared to '15. Night and day difference. I will be closing this issue now as this problem has fixed itself. Thanks again all for your awesome assistance and (more importantly) patience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment