Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider re-compling CoreCLR's unmanaged code to mitigate JCC erratum (Nov. 2019) #13794

Closed
damageboy opened this issue Nov 14, 2019 · 6 comments
Assignees
Labels
area-Meta tenet-performance Performance related issue untriaged New issue has not been triaged by the area owner

Comments

@damageboy
Copy link
Contributor

damageboy commented Nov 14, 2019

Intel has recently released a microcode update that adversely affects the performance of... all jumps (!):

https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf

I've personally had the "pleasure" of encountering this whilst benchmarking CoreCLR 3.0 Array.Sort() on clearlinux which seems to have pushed this microcode update earlier than others:

https://twitter.com/damageboy/status/1194751035136450560?s=20

For context, Here's the before/after perf hit on Array.Sort():
Before:
image

After:
image

Very specifically, I was hit with a 20% drop in Array.Sort(), and a smaller (by sheer luck I presume?) hit with my own code. Intel claims that the average drop in performance should be in the %0-%4 range.

The erratum claims that the microcode update adversely effects:

In this context, Jump Instructions include all jump types: conditional jump (Jcc), macro-
fused op-Jcc (where op is one of cmp, test, add, sub, and, inc, or dec), direct
unconditional jump, indirect jump, direct/indirect call, and return.

The recommended fix, per the erratum, is to recompile the code with (Section 3.1.1):
-mbranches-within-32B-boundaries

In theory, this could end with a relatively minor(?) CoreCLR runtime release. Or incorporated into the 3.1 release cycle?

In addition, I think we should have a discussion about possibly applying similar (conditional) fixes to the JIT itself w.r.t the generated code on Intel processors, as it doesn't seem this issue is about to simply go away on its own.

I will open a separate issue for the JIT related discussion.

@jkotas
Copy link
Member

jkotas commented Nov 14, 2019

Thanks for opening the issue on this. We have been looking into this.

@stephentoub
Copy link
Member

stephentoub commented Nov 14, 2019

Separately, since you used Array.Sort as an example, it's no longer implemented in C; as of last week the implementation is all C#. I'm curious if that affects your stated impact at all, or if you see a similar affect on Array.Sort with the latest master?

@damageboy
Copy link
Contributor Author

damageboy commented Nov 14, 2019

@stephentoub That's a very good question.
I haven't tested with master yet, and I am aware of the C++ -> C# transition for that specific piece of code.

My code itself (in this case the Scalar + Unmanaged variations of IntroSort) is definitely
affected (as seen in the same screenshot), with very wild outcomes:

  • Scalar, for example, suffers a 13% perf drop.
  • Unmanaged is hit with a very minor 1.4% drop.

Given that It was pretty much a copy-paste of IntroSort from CoreCLR to begin with, I think this gives you a "preview" of where this might go.

I assume that the effects are random in nature, depending on code generation/allocation address randomness.
But in general, the more branchy your code is, you'll probably get hit harder, which can also be seen in my two examples: these are practically the same functions, except that only one of them has extra bounds-checking that is not eliminated normally (nor did I attempt to optimize it away) and the other doesn't...

My other benchmarks actually end up doing substantially fewer branches (as in orders of
magnitudes fewer branches), so I suspect I won't see a lot of movement on that front...

@damageboy
Copy link
Contributor Author

damageboy commented Nov 15, 2019

Just wanted to circle back and to say that I managed to revert the microcode update in a very localized way on clearlinux (or I assume any Linux):

  1. Change the kernel boot parameters to avoid early loading of microcode updates (remove the initrd=/EFI/org.clearlinux/freestanding-00-intel-ucode.cpio from the boot cmd line)
  2. Revert to an older package of microcode from Intel to make sure that late loading of microcode does not sneak the update back in, by updating /lib/firmware/intel-ucode/ to the contents of the 2019-09-18 ucode release

After doing those two steps and (naturally) rebooting, I'm back to older microcode (0xb4) for my kaby-lake processor:

$ grep 'stepping\|model\|microcode' /proc/cpuinfo | head -4
model		: 158
model name	: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
stepping	: 9
microcode	: 0xb4

Rerunning the same benchmarks now yields the good old results I've been seeing in the past months:

BenchmarkDotNet=v0.12.0, OS=clear-linux-os 31620
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 4 logical and 4 physical cores
.NET Core SDK=3.0.100
  [Host]     : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT
  Job-BMTZZL : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT

InvocationCount=3  IterationCount=15  LaunchCount=2  
UnrollFactor=1  WarmupCount=10  
Method N Mean [us] Time / N [ns]
ArraySort 100 2.795 27.9545
Scalar 100 4.511 45.1068
Unmanaged 100 4.154 41.5399
ArraySort 1000 35.758 35.7583
Scalar 1000 58.767 58.7666
Unmanaged 1000 42.754 42.7538
ArraySort 10000 507.538 50.7538
Scalar 10000 680.009 68.0009
Unmanaged 10000 516.547 51.6547
ArraySort 100000 5,920.847 59.2085
Scalar 100000 6,678.108 66.7811
Unmanaged 100000 5,585.576 55.8558
ArraySort 1000000 69,018.648 69.0186
Scalar 1000000 76,603.616 76.6036
Unmanaged 1000000 65,503.492 65.5035
ArraySort 10000000 794,042.304 79.4042
Scalar 10000000 882,212.364 88.2212
Unmanaged 10000000 743,297.864 74.3298

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@danmoseley
Copy link
Member

danmoseley commented Feb 28, 2020

@brianrob is work planned here? Can you provide guidance on what to do with this and #13795

@brianrob
Copy link
Member

brianrob commented Feb 28, 2020

I hope to have all of the data in a format for sharing shortly. The short answer is that I don't think there is any immediate work that needs to be done here, but I will provide more context on this once I have all of the data compiled.

@msftbot msftbot bot locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Meta tenet-performance Performance related issue untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

6 participants