Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Enable R2R compilation/inlining of PInvoke stubs where no marshalling is required #22560

Merged
merged 10 commits into from Apr 1, 2019

Conversation

fadimounir
Copy link

@fadimounir fadimounir commented Feb 13, 2019

These changes enable the inlining of some PInvokes that do not require any marshalling. With inlined pinvokes, R2R performance should become slightly better, since we'll avoid jitting some of the pinvoke IL stubs that we jit today for S.P.CoreLib. Performance gains not yet measured.

Added JIT_PInvokeBegin/End helpers for all architectures. Linux stubs not yet implemented
Add INLINE_GETTHREAD for arm/arm64
Set CORJIT_FLAG_USE_PINVOKE_HELPERS jit flag for ReadyToRun compilations

@fadimounir fadimounir added the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Feb 13, 2019
src/vm/arm/PInvokeStubs.asm Outdated Show resolved Hide resolved
src/vm/dllimport.cpp Outdated Show resolved Hide resolved
src/vm/dllimport.cpp Outdated Show resolved Hide resolved
src/jit/compiler.h Outdated Show resolved Hide resolved
@fadimounir fadimounir removed the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Feb 27, 2019
@fadimounir fadimounir changed the title [WIP] Enable some pinvokes - Do not merge Enable R2R compilation/inlining of PInvoke stubs where no marshalling is required Feb 27, 2019
@fadimounir
Copy link
Author

@jkotas PTAL. I'm still going to run the P0 tests with crossgen enabled, for verification, and will get some perf measurements.

src/jit/lower.cpp Outdated Show resolved Hide resolved
src/jit/compiler.hpp Outdated Show resolved Hide resolved
src/vm/jithelpers.cpp Outdated Show resolved Hide resolved
@fadimounir
Copy link
Author

@dotnet-bot test Windows_NT x64 Checked CoreFX Tests

@fadimounir
Copy link
Author

@jkotas PTAL at the new changes I submitted

@jkotas
Copy link
Member

jkotas commented Mar 11, 2019

The delta looks reasonable to me. Have you done any R2R specific testing on this?

@fadimounir
Copy link
Author

I have run the P0 tests using the 'crossgen' command as described in this doc: https://github.com/dotnet/coreclr/blob/master/Documentation/building/windows-test-instructions.md
Results were clean.

@jkotas
Copy link
Member

jkotas commented Mar 11, 2019

These helpers have tight interaction with the GC. I would also do some crossgen+GC stress testing (with tiered compilation disabled).

@fadimounir
Copy link
Author

Sounds good. I'll look into it

@fadimounir
Copy link
Author

@jkotas crossgen testing with and without gc stress was clean with regards to these changes (x64 only). For the other architectures, my targeted pinvoke test case had some form of GC stress enabled, and was passing.

@fadimounir
Copy link
Author

I'm still waiting on the perf job to complete to see what the impact of the changes are.

@fadimounir
Copy link
Author

/cc @sergiy-k

src/vm/amd64/PInvokeStubs.asm Outdated Show resolved Hide resolved
src/vm/amd64/PInvokeStubs.asm Show resolved Hide resolved
@fadimounir
Copy link
Author

fadimounir commented Mar 21, 2019

Hmm... Doesn't look like we're getting noticable startup perf wins I expected: http://benchview/compare?jobid=158586&comparejobids=[158569]&testid=61944&
There are some scenarios that actually seem slower now. I'll need to dig in further.. I do see some good wins under the "Inlining" category here, with 11% faster execution. CscBench is also 1% faster. There are just some tests mainly under BenchmarksGame and Benchstone that seem slightly slower. Could it be noise?

@AndyAyersMS, @jkotas what do you guys think?

/cc @brianrob

@jkotas
Copy link
Member

jkotas commented Mar 28, 2019

This comment is closed, but I do not see a response to it. Just want to make sure you have seen it:

Another option is to move the popping of the frame on the slow path into the C helper. If you do that the need for this macro will disapper and you will have bit less of assembly code to maintain which is always goodness.

@fadimounir
Copy link
Author

fadimounir commented Mar 28, 2019

Where does Linq use PInvokes to explain this gain?

Without tiered compilation, the previous lab results were showing a 20% regression for some weird reason, even though Linq shouldn't really be impacted by pinvokes. I just wanted to dig deeper into that regression, and make sure it was bogus.

@fadimounir
Copy link
Author

Another option is to move the popping of the frame on the slow path into the C helper

Can this be done in the same JIT_RareDisableHelper method or should I add a wrapper for it? I don't know what else uses this helper, and if popping the frame from the thread at that location would have other side effects.

@fadimounir
Copy link
Author

How many of these methods are PInvoke stubs? It would be useful to get the list and see how many of them are easy to convert to blittable PInvokes as follow up.

After a second look, I just realized that the baseline measurement may have also been a partial R2R image, that's why it has more jitting. However, for a helloworld scenario, i can confirm by debugging that there are about 5 or 6 pinvokes getting inlined and invoked (JIT_PInvokeBegin/End called)

@jkotas
Copy link
Member

jkotas commented Mar 28, 2019

Can this be done in the same JIT_RareDisableHelper method

It should be separate method. I would copy&paste the code for JIT_RareDisableHelper and added the extra piece to it.

src/inc/corinfo.h Outdated Show resolved Hide resolved
src/vm/jithelpers.cpp Outdated Show resolved Hide resolved
src/vm/jithelpers.cpp Outdated Show resolved Hide resolved
fadimounir added 10 commits April 1, 2019 08:40
…e any marshalling. With inlined pinvokes, R2R performance should become slightly better, since we'll avoid jitting some of the pinvoke IL stubs that we jit today for S.P.CoreLib. Performance gains not yet measured.

Added JIT_PInvokeBegin/End helpers for all architectures. Linux stubs not yet implemented
Add INLINE_GETTHREAD for arm/arm64
Set CORJIT_FLAG_USE_PINVOKE_HELPERS jit flag for ReadyToRun compilations
Increase size reserve for InlineCallFrame
Small adjustment to the arm/arm64 INLINE_GET_THREAD macros
@fadimounir fadimounir merged commit bc9248c into dotnet:master Apr 1, 2019
@AndyAyersMS
Copy link
Member

Hmm... Doesn't look like we're getting noticable startup perf wins I expected

Not too surprising; the jit-focused CoreCLR perf tests do not measure startup (or jit time, for the most part). Using ETW to look at jit time and jit requests (or using scenario startup metrics) is a better way to assess this.

Is there a follow-up plan to enable this for non-windows platforms?

@fadimounir
Copy link
Author

Is there a follow-up plan to enable this for non-windows platforms?

Yes. I'm currently working on it and will create a separate PR

@fadimounir fadimounir deleted the enable_some_pinvokes branch July 1, 2019 19:46
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
… is required (dotnet/coreclr#22560)

* These changes enable the inlining of some PInvokes that do not require any marshalling. With inlined pinvokes, R2R performance should become slightly better, since we'll avoid jitting some of the pinvoke IL stubs that we jit today for S.P.CoreLib. Performance gains not yet measured.

* Added JIT_PInvokeBegin/End helpers for all architectures. Linux stubs not yet implemented
* Add INLINE_GETTHREAD for arm/arm64
* Set CORJIT_FLAG_USE_PINVOKE_HELPERS jit flag for ReadyToRun compilations
* Updating R2RDump tool to handle pinvokes


Commit migrated from dotnet/coreclr@bc9248c
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
6 participants