Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable R2R compilation/inlining of PInvoke stubs where no marshalling is required #22560

Merged
merged 10 commits into from Apr 1, 2019

Conversation

@fadimounir
Copy link
Member

commented Feb 13, 2019

These changes enable the inlining of some PInvokes that do not require any marshalling. With inlined pinvokes, R2R performance should become slightly better, since we'll avoid jitting some of the pinvoke IL stubs that we jit today for S.P.CoreLib. Performance gains not yet measured.

Added JIT_PInvokeBegin/End helpers for all architectures. Linux stubs not yet implemented
Add INLINE_GETTHREAD for arm/arm64
Set CORJIT_FLAG_USE_PINVOKE_HELPERS jit flag for ReadyToRun compilations

src/vm/arm/PInvokeStubs.asm Outdated Show resolved Hide resolved
src/vm/dllimport.cpp Outdated Show resolved Hide resolved
src/vm/dllimport.cpp Outdated Show resolved Hide resolved
src/jit/compiler.h Outdated Show resolved Hide resolved
@fadimounir fadimounir force-pushed the fadimounir:enable_some_pinvokes branch from 8e27394 to 776cf97 Feb 27, 2019
@fadimounir fadimounir changed the title [WIP] Enable some pinvokes - Do not merge Enable R2R compilation/inlining of PInvoke stubs where no marshalling is required Feb 27, 2019
@fadimounir

This comment has been minimized.

Copy link
Member Author

commented Feb 27, 2019

@jkotas PTAL. I'm still going to run the P0 tests with crossgen enabled, for verification, and will get some perf measurements.

src/jit/lower.cpp Outdated Show resolved Hide resolved
src/jit/compiler.hpp Outdated Show resolved Hide resolved
src/vm/amd64/PInvokeStubs.asm Outdated Show resolved Hide resolved
src/vm/amd64/PInvokeStubs.asm Outdated Show resolved Hide resolved
src/vm/jithelpers.cpp Outdated Show resolved Hide resolved
@fadimounir

This comment has been minimized.

Copy link
Member Author

commented Mar 11, 2019

@dotnet-bot test Windows_NT x64 Checked CoreFX Tests

@fadimounir

This comment has been minimized.

Copy link
Member Author

commented Mar 11, 2019

@jkotas PTAL at the new changes I submitted

@jkotas

This comment has been minimized.

Copy link
Member

commented Mar 11, 2019

The delta looks reasonable to me. Have you done any R2R specific testing on this?

@fadimounir

This comment has been minimized.

Copy link
Member Author

commented Mar 11, 2019

I have run the P0 tests using the 'crossgen' command as described in this doc: https://github.com/dotnet/coreclr/blob/master/Documentation/building/windows-test-instructions.md
Results were clean.

@jkotas

This comment has been minimized.

Copy link
Member

commented Mar 11, 2019

These helpers have tight interaction with the GC. I would also do some crossgen+GC stress testing (with tiered compilation disabled).

@fadimounir

This comment has been minimized.

Copy link
Member Author

commented Mar 11, 2019

Sounds good. I'll look into it

@fadimounir

This comment has been minimized.

Copy link
Member Author

commented Mar 13, 2019

@jkotas crossgen testing with and without gc stress was clean with regards to these changes (x64 only). For the other architectures, my targeted pinvoke test case had some form of GC stress enabled, and was passing.

@fadimounir

This comment has been minimized.

Copy link
Member Author

commented Mar 13, 2019

I'm still waiting on the perf job to complete to see what the impact of the changes are.

@fadimounir

This comment has been minimized.

Copy link
Member Author

commented Mar 13, 2019

/cc @sergiy-k

src/vm/amd64/PInvokeStubs.asm Outdated Show resolved Hide resolved
src/vm/amd64/PInvokeStubs.asm Show resolved Hide resolved
@fadimounir fadimounir force-pushed the fadimounir:enable_some_pinvokes branch from cdc492b to 76a9115 Mar 19, 2019
@fadimounir

This comment has been minimized.

Copy link
Member Author

commented Mar 21, 2019

Hmm... Doesn't look like we're getting noticable startup perf wins I expected: http://benchview/compare?jobid=158586&comparejobids=[158569]&testid=61944&
There are some scenarios that actually seem slower now. I'll need to dig in further.. I do see some good wins under the "Inlining" category here, with 11% faster execution. CscBench is also 1% faster. There are just some tests mainly under BenchmarksGame and Benchstone that seem slightly slower. Could it be noise?

@AndyAyersMS, @jkotas what do you guys think?

/cc @brianrob

@jkotas

This comment has been minimized.

Copy link
Member

commented Mar 28, 2019

I tried one local experiment with one of the Linq tests (loop, counting items), and here are the numbers:
This is an average of 5.6% perf gains due to the inlined pinvokes in System.Private.CoreLib.dll.

Where does Linq use PInvokes to explain this gain?

89 jitted methods. Total JIT time = 21.8 ms
43 jitted methods. Total JIT time = 15.1 ms

I think that this is the most meaningful measurement. How many of these methods are PInvoke stubs? It would be useful to get the list and see how many of them are easy to convert to blittable PInvokes as follow up.

src/inc/readytorun.h Outdated Show resolved Hide resolved
@jkotas

This comment has been minimized.

Copy link
Member

commented Mar 28, 2019

This comment is closed, but I do not see a response to it. Just want to make sure you have seen it:

Another option is to move the popping of the frame on the slow path into the C helper. If you do that the need for this macro will disapper and you will have bit less of assembly code to maintain which is always goodness.

@fadimounir

This comment has been minimized.

Copy link
Member Author

commented Mar 28, 2019

Where does Linq use PInvokes to explain this gain?

Without tiered compilation, the previous lab results were showing a 20% regression for some weird reason, even though Linq shouldn't really be impacted by pinvokes. I just wanted to dig deeper into that regression, and make sure it was bogus.

@fadimounir

This comment has been minimized.

Copy link
Member Author

commented Mar 28, 2019

Another option is to move the popping of the frame on the slow path into the C helper

Can this be done in the same JIT_RareDisableHelper method or should I add a wrapper for it? I don't know what else uses this helper, and if popping the frame from the thread at that location would have other side effects.

@fadimounir

This comment has been minimized.

Copy link
Member Author

commented Mar 28, 2019

How many of these methods are PInvoke stubs? It would be useful to get the list and see how many of them are easy to convert to blittable PInvokes as follow up.

After a second look, I just realized that the baseline measurement may have also been a partial R2R image, that's why it has more jitting. However, for a helloworld scenario, i can confirm by debugging that there are about 5 or 6 pinvokes getting inlined and invoked (JIT_PInvokeBegin/End called)

@jkotas

This comment has been minimized.

Copy link
Member

commented Mar 28, 2019

Can this be done in the same JIT_RareDisableHelper method

It should be separate method. I would copy&paste the code for JIT_RareDisableHelper and added the extra piece to it.

src/inc/corinfo.h Outdated Show resolved Hide resolved
src/vm/amd64/PInvokeStubs.asm Outdated Show resolved Hide resolved
src/vm/jithelpers.cpp Outdated Show resolved Hide resolved
src/vm/jithelpers.cpp Outdated Show resolved Hide resolved
fadimounir added 10 commits Dec 13, 2018
…e any marshalling. With inlined pinvokes, R2R performance should become slightly better, since we'll avoid jitting some of the pinvoke IL stubs that we jit today for S.P.CoreLib. Performance gains not yet measured.

Added JIT_PInvokeBegin/End helpers for all architectures. Linux stubs not yet implemented
Add INLINE_GETTHREAD for arm/arm64
Set CORJIT_FLAG_USE_PINVOKE_HELPERS jit flag for ReadyToRun compilations
Increase size reserve for InlineCallFrame
Small adjustment to the arm/arm64 INLINE_GET_THREAD macros
@fadimounir fadimounir force-pushed the fadimounir:enable_some_pinvokes branch from 857a3f0 to b62dcbe Apr 1, 2019
@jkotas
jkotas approved these changes Apr 1, 2019
@fadimounir fadimounir merged commit bc9248c into dotnet:master Apr 1, 2019
57 of 60 checks passed
57 of 60 checks passed
coreclr-ci Build #20190401.77 failed
Details
coreclr-ci (Test pri0 Linux arm64 checked) Test pri0 Linux arm64 checked failed
Details
coreclr-ci (Build Linux arm debug)
Details
Ubuntu arm Cross Checked crossgen_comparison Build and Test Build finished.
Details
Ubuntu arm Cross Release crossgen_comparison Build and Test Build finished.
Details
Ubuntu x64 Checked CoreFX Tests Build finished.
Details
Ubuntu x64 Formatting Build finished.
Details
WIP Ready for review
Details
Windows_NT x64 Checked CoreFX Tests Build finished.
Details
Windows_NT x64 Formatting Build finished.
Details
Windows_NT x64 Release CoreFX Tests Build finished.
Details
Windows_NT x64 full_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x64 min_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x86 full_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x86 min_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
coreclr-ci (Build Linux arm checked) Build Linux arm checked succeeded
Details
coreclr-ci (Build Linux arm debug) Build Linux arm debug succeeded
Details
coreclr-ci (Build Linux arm release) Build Linux arm release succeeded
Details
coreclr-ci (Build Linux arm64 checked) Build Linux arm64 checked succeeded
Details
coreclr-ci (Build Linux arm64 debug) Build Linux arm64 debug succeeded
Details
coreclr-ci (Build Linux arm64 release) Build Linux arm64 release succeeded
Details
coreclr-ci (Build Linux_musl arm64 checked) Build Linux_musl arm64 checked succeeded
Details
coreclr-ci (Build Linux_musl arm64 debug) Build Linux_musl arm64 debug succeeded
Details
coreclr-ci (Build Linux_musl arm64 release) Build Linux_musl arm64 release succeeded
Details
coreclr-ci (Build Linux_musl x64 checked) Build Linux_musl x64 checked succeeded
Details
coreclr-ci (Build Linux_musl x64 debug) Build Linux_musl x64 debug succeeded
Details
coreclr-ci (Build Linux_musl x64 release) Build Linux_musl x64 release succeeded
Details
coreclr-ci (Build Linux_rhel6 x64 checked) Build Linux_rhel6 x64 checked succeeded
Details
coreclr-ci (Build Linux_rhel6 x64 debug) Build Linux_rhel6 x64 debug succeeded
Details
coreclr-ci (Build Linux_rhel6 x64 release) Build Linux_rhel6 x64 release succeeded
Details
coreclr-ci (Build Windows_NT arm Checked) Build Windows_NT arm Checked succeeded
Details
coreclr-ci (Build Windows_NT arm Debug) Build Windows_NT arm Debug succeeded
Details
coreclr-ci (Build Windows_NT arm64 Checked) Build Windows_NT arm64 Checked succeeded
Details
coreclr-ci (Build Windows_NT arm64 Debug) Build Windows_NT arm64 Debug succeeded
Details
coreclr-ci (Build Windows_NT x64 Checked) Build Windows_NT x64 Checked succeeded
Details
coreclr-ci (Build Windows_NT x64 Debug) Build Windows_NT x64 Debug succeeded
Details
coreclr-ci (Build Windows_NT x86 Checked) Build Windows_NT x86 Checked succeeded
Details
coreclr-ci (Build Windows_NT x86 Debug) Build Windows_NT x86 Debug succeeded
Details
coreclr-ci (Test Pri0 Linux x64 checked) Test Pri0 Linux x64 checked succeeded
Details
coreclr-ci (Test Pri0 Linux_musl arm64 checked) Test Pri0 Linux_musl arm64 checked succeeded
Details
coreclr-ci (Test Pri0 Linux_musl x64 release) Test Pri0 Linux_musl x64 release succeeded
Details
coreclr-ci (Test Pri0 OSX x64 checked) Test Pri0 OSX x64 checked succeeded
Details
coreclr-ci (Test Pri0 Windows_NT arm checked) Test Pri0 Windows_NT arm checked succeeded
Details
coreclr-ci (Test Pri0 Windows_NT arm64 checked) Test Pri0 Windows_NT arm64 checked succeeded
Details
coreclr-ci (Test Pri0 Windows_NT x64 checked) Test Pri0 Windows_NT x64 checked succeeded
Details
coreclr-ci (Test Pri0 Windows_NT x86 checked) Test Pri0 Windows_NT x86 checked succeeded
Details
coreclr-ci (Test pri0 Linux arm checked) Test pri0 Linux arm checked succeeded
Details
coreclr-ci (Test pri0 Linux_musl x64 checked) Test pri0 Linux_musl x64 checked succeeded
Details
coreclr-ci (Test pri0 Linux_rhel6 x64 checked) Test pri0 Linux_rhel6 x64 checked succeeded
Details
coreclr-ci (build Linux x64 Checked) build Linux x64 Checked succeeded
Details
coreclr-ci (build Linux x64 Debug) build Linux x64 Debug succeeded
Details
coreclr-ci (build Linux x64 Release) build Linux x64 Release succeeded
Details
coreclr-ci (build OSX x64 Checked) build OSX x64 Checked succeeded
Details
coreclr-ci (build OSX x64 Debug) build OSX x64 Debug succeeded
Details
coreclr-ci (build OSX x64 Release) build OSX x64 Release succeeded
Details
coreclr-ci (build Windows_NT arm Release) build Windows_NT arm Release succeeded
Details
coreclr-ci (build Windows_NT arm64 Release) build Windows_NT arm64 Release succeeded
Details
coreclr-ci (build Windows_NT x64 Release) build Windows_NT x64 Release succeeded
Details
coreclr-ci (build Windows_NT x86 Release) build Windows_NT x86 Release succeeded
Details
license/cla All CLA requirements met.
Details
@AndyAyersMS

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

Hmm... Doesn't look like we're getting noticable startup perf wins I expected

Not too surprising; the jit-focused CoreCLR perf tests do not measure startup (or jit time, for the most part). Using ETW to look at jit time and jit requests (or using scenario startup metrics) is a better way to assess this.

Is there a follow-up plan to enable this for non-windows platforms?

@fadimounir

This comment has been minimized.

Copy link
Member Author

commented Apr 1, 2019

Is there a follow-up plan to enable this for non-windows platforms?

Yes. I'm currently working on it and will create a separate PR

@fadimounir fadimounir deleted the fadimounir:enable_some_pinvokes branch Jul 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.