-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Enable R2R compilation/inlining of PInvoke stubs where no marshalling is required #22560
Conversation
8e27394
to
776cf97
Compare
@jkotas PTAL. I'm still going to run the P0 tests with crossgen enabled, for verification, and will get some perf measurements. |
@dotnet-bot test Windows_NT x64 Checked CoreFX Tests |
@jkotas PTAL at the new changes I submitted |
The delta looks reasonable to me. Have you done any R2R specific testing on this? |
I have run the P0 tests using the 'crossgen' command as described in this doc: https://github.com/dotnet/coreclr/blob/master/Documentation/building/windows-test-instructions.md |
These helpers have tight interaction with the GC. I would also do some crossgen+GC stress testing (with tiered compilation disabled). |
Sounds good. I'll look into it |
@jkotas crossgen testing with and without gc stress was clean with regards to these changes (x64 only). For the other architectures, my targeted pinvoke test case had some form of GC stress enabled, and was passing. |
I'm still waiting on the perf job to complete to see what the impact of the changes are. |
/cc @sergiy-k |
cdc492b
to
76a9115
Compare
Hmm... Doesn't look like we're getting noticable startup perf wins I expected: http://benchview/compare?jobid=158586&comparejobids=[158569]&testid=61944& @AndyAyersMS, @jkotas what do you guys think? /cc @brianrob |
76a9115
to
457d79c
Compare
This comment is closed, but I do not see a response to it. Just want to make sure you have seen it:
|
Without tiered compilation, the previous lab results were showing a 20% regression for some weird reason, even though Linq shouldn't really be impacted by pinvokes. I just wanted to dig deeper into that regression, and make sure it was bogus. |
Can this be done in the same JIT_RareDisableHelper method or should I add a wrapper for it? I don't know what else uses this helper, and if popping the frame from the thread at that location would have other side effects. |
After a second look, I just realized that the baseline measurement may have also been a partial R2R image, that's why it has more jitting. However, for a helloworld scenario, i can confirm by debugging that there are about 5 or 6 pinvokes getting inlined and invoked (JIT_PInvokeBegin/End called) |
It should be separate method. I would copy&paste the code for |
…e any marshalling. With inlined pinvokes, R2R performance should become slightly better, since we'll avoid jitting some of the pinvoke IL stubs that we jit today for S.P.CoreLib. Performance gains not yet measured. Added JIT_PInvokeBegin/End helpers for all architectures. Linux stubs not yet implemented Add INLINE_GETTHREAD for arm/arm64 Set CORJIT_FLAG_USE_PINVOKE_HELPERS jit flag for ReadyToRun compilations
Increase size reserve for InlineCallFrame
Small adjustment to the arm/arm64 INLINE_GET_THREAD macros
857a3f0
to
b62dcbe
Compare
Not too surprising; the jit-focused CoreCLR perf tests do not measure startup (or jit time, for the most part). Using ETW to look at jit time and jit requests (or using scenario startup metrics) is a better way to assess this. Is there a follow-up plan to enable this for non-windows platforms? |
Yes. I'm currently working on it and will create a separate PR |
… is required (dotnet/coreclr#22560) * These changes enable the inlining of some PInvokes that do not require any marshalling. With inlined pinvokes, R2R performance should become slightly better, since we'll avoid jitting some of the pinvoke IL stubs that we jit today for S.P.CoreLib. Performance gains not yet measured. * Added JIT_PInvokeBegin/End helpers for all architectures. Linux stubs not yet implemented * Add INLINE_GETTHREAD for arm/arm64 * Set CORJIT_FLAG_USE_PINVOKE_HELPERS jit flag for ReadyToRun compilations * Updating R2RDump tool to handle pinvokes Commit migrated from dotnet/coreclr@bc9248c
These changes enable the inlining of some PInvokes that do not require any marshalling. With inlined pinvokes, R2R performance should become slightly better, since we'll avoid jitting some of the pinvoke IL stubs that we jit today for S.P.CoreLib. Performance gains not yet measured.
Added JIT_PInvokeBegin/End helpers for all architectures. Linux stubs not yet implemented
Add INLINE_GETTHREAD for arm/arm64
Set CORJIT_FLAG_USE_PINVOKE_HELPERS jit flag for ReadyToRun compilations