Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance anomalies in comparsion to Mono #13425

Open
nxrighthere opened this issue Sep 16, 2019 · 26 comments
Open

Performance anomalies in comparsion to Mono #13425

nxrighthere opened this issue Sep 16, 2019 · 26 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@nxrighthere
Copy link

nxrighthere commented Sep 16, 2019

Recently I made a bunch of various benchmarks to test Unity's Burst compiler against native compilers. I've also included Mono and CoreCLR out of curiosity, the code is available here in .NET folder. I've noticed strange results in two tests (Sieve of Eratosthenes and Particle Kinematics) where CoreCLR performs way much slower than Mono for some reason, I think this requires in-depth analysis by appropriate developers of .NET Core.

I'm happy to provide any additional info or assistance if required.

category:cq
theme:needs-triage
skill-level:expert
cost:medium

@EgorBo
Copy link
Member

EgorBo commented Sep 16, 2019

@nxrighthere have you tested Mono-LLVM-JIT (.NET 5 runtime) by the way? 🙂 with recent changes it supports "fast-math" and all Math(F) methods are @llvm.intrinsics (let me know if you need any help to setup it)

@benaadams
Copy link
Member

Looking at the tests, you may need to do 30+ iterations of the methods for .NET Core 3.0 prior to doing the measurement to allow tiered compilation to kick in.

@nxrighthere
Copy link
Author

@EgorBo Thanks for the suggestion, going to try to play with it for sure. 👍
@benaadams Yea, I was thinking about it too, going to install .NET Core 3.0 then. Thanks.

@EgorBo
Copy link
Member

EgorBo commented Sep 16, 2019

Dry run:
image

(CoreCLR 3.0 vs .NET 5 (Mono-LLVM-JIT runtime), our llvm backend for LLVM currently uses only just a few optimization passes)

Ubuntu 18
Core i7 4930K (Ivy Bridge)

@EgorBo
Copy link
Member

EgorBo commented Sep 16, 2019

Also, it seems you use stackalloc a lot - it probably makes sense to clear InitLocals for the whole project (so it will not have to clear the memory everytime you allocate them)

@nxrighthere
Copy link
Author

nxrighthere commented Sep 16, 2019

@benaadams I've tried .NET Core 3.0.100-preview9 and engage tiered compilation through heavy iterations, but results are almost the same, unfortunately. 😢

@EgorBo Interesting, I've never heard before about InitLocals. I see many articles around reflection stuff, not sure how to use it properly tho.

@jeffschwMSFT
Copy link
Member

cc @BruceForstall @sergiy-k

@BruceForstall
Copy link
Member

@dotnet/jit-contrib

@BruceForstall
Copy link
Member

@nxrighthere Have you tried disabling tiered compilation (set COMPlus_TieredCompilation=0) to force tier 1 compilation from the start?

Have you seen https://github.com/dotnet/performance? Maybe you should consider contributing the benchmarks to that set, to be run regularly on .NET?

@nxrighthere
Copy link
Author

@BruceForstall Indeed, -set COMPlus_TieredCompilation=0 solved this, here's the diff with disabled Tiered Compilation.

Have you seen https://github.com/dotnet/performance? Maybe you should consider contributing the benchmarks to that set, to be run regularly on .NET?

I was not aware of this repository, will consider contributing directly into it, thank you.

Should I close this issue or keep it open?

@BruceForstall
Copy link
Member

@nxrighthere Your linked repo mentions ".NET Core 2.2.402". Have you tried with the latest .NET Core 3.0 build to see if there is any difference?

We're always looking for good benchmarks to use for performance comparison. It looks like you've found some where there are perf gaps between RyuJIT and other options that could be investigated.

Should I close this issue or keep it open?

Seems reasonable to keep it open for now.

@nxrighthere
Copy link
Author

@BruceForstall Here's the diff with results for 3.0.100-rc1. There's only one noticeable difference: recursive Fibonacci is slower by 22% with the new version, all other tests remain with near the same numbers.

@nxrighthere
Copy link
Author

@EgorBo I'm a bit lost with Mono's LLVM. The 6.0.0.334 version on the website is able to compile the code with --aot=llvm,llvmllc="-mcpu=* -fp-contract=fast"? Also, what should be set to -mcpu parameter for AMD FX (Vishera)? Thanks.

@EgorBo
Copy link
Member

EgorBo commented Sep 18, 2019

@nxrighthere --aot=llvm,mcpu=native --ffast-math But it will be slower than what I tested (mono-netcore-runtime, LLVM jit) you are going to benchmark "legacy" mono with LLVM AOT (which has some limitations).
It's a bit difficult to setup mono-netcore for now (netcore/./build.sh --llvm -c Release)

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@nxrighthere
Copy link
Author

@EgorBo Hey Egor, it's possible to build the runtime with LLVM JIT from master on Windows right now?

@AndyAyersMS
Copy link
Member

Looks like we never drilled in to understand why Core is slower -- seems like we ought to do so, there may be one or two things there we can address without needing entire new classes of optimization.

@nxrighthere
Copy link
Author

nxrighthere commented Feb 22, 2020

Well, in general, it's all fine right now except places where floating-point arithmetic is involved, since as far as I know there's no equivalent to -ffast-math / /fp:fast in .NET Core.

@AndyAyersMS
Copy link
Member

Is there some writeup you can point me at with more details?

@nxrighthere
Copy link
Author

Related issue #12753

@AndyAyersMS
Copy link
Member

Thanks. I was actually looking for analysis showing that fast fp is the root cause of the perf differences in Core vs Mono-LLVM. I suspect there's more going on than just that...

@danmoseley
Copy link
Member

Cc @tannergooding

@EgorBo
Copy link
Member

EgorBo commented Feb 22, 2020

@AndyAyersMS I think one of the low hanging fruits is a*b+c to fma recognition.

@EgorBo
Copy link
Member

EgorBo commented Feb 22, 2020

@nxrighthere we are still moving things here and there but it's already possible for macOS and Linux:

./buid.sh -c Release /p:MonoEnableLLVM=true

then go to cd src/mono/netcore
and do

make run-sample

After that you should see .dotnet-mono folder in the repo root (make sure MONO_ENV_OPTIONS=--llvm is set as a env variable when you will use it to run benchmarks)

@nxrighthere
Copy link
Author

After upgrading to .NET 5 Preview 8, I noticed a significant regression in this recursive Fibonacci test. Execution is slower by 40% vs .NET Core 3.1.101 while in other tests .NET 5 shows better results.

@AndyAyersMS
Copy link
Member

If you're talking about a regression in CoreCLR perf, it's likely because of #35020.

@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

8 participants