-
Notifications
You must be signed in to change notification settings - Fork 5k
Create a single copy of stub templates #114462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tagging subscribers to this area: @mangod9 |
We can classify them based on binary format main...am11:runtime:binfmt and use: |
commit 40f2a6ac4c7393294d794f55aa9206f8eb815b5c Author: David Wrighton <davidwr@microsoft.com> Date: Tue Apr 1 15:16:13 2025 -0700 Build out ExecutableAllocator api for template mapping commit dd22e4213563461948e702f035c0558784c3f767 Author: David Wrighton <davidwr@microsoft.com> Date: Tue Apr 1 13:07:42 2025 -0700 X64 template code gen is correct commit 89fb8b6 Author: David Wrighton <davidwr@microsoft.com> Date: Mon Mar 31 14:00:23 2025 -0700 Add support for Amd64 thunk pages
…oreclr template in ways that are unsafe.
9d37418
to
53b3ce9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 12 out of 15 changed files in this pull request and generated no comments.
Files not reviewed (3)
- src/coreclr/clrdefinitions.cmake: Language not supported
- src/coreclr/vm/amd64/thunktemplates.S: Language not supported
- src/coreclr/vm/arm64/thunktemplates.S: Language not supported
Comments suppressed due to low confidence (2)
src/coreclr/vm/precode.cpp:527
- [nitpick] Consider replacing the magic constant 0x4000 with a named constant (e.g. kExpectedTemplatePageSize) to improve code clarity and maintainability.
if (StubPrecodeCodeTemplate != NULL && pageSize != 0x4000)
src/coreclr/vm/precode.cpp:674
- [nitpick] Consider replacing the magic constant 0x4000 with a named constant to enhance clarity and maintainability in the template usage check.
if (FixupPrecodeCodeTemplate != NULL && pageSize != 0x4000)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I checked the Apple specific code paths and they look reasonable.
I skimmed over the Linux parts too; they are similar in principle to the code used in NativeAOT before the switch to vm_remap
. I didn't review them in detail though.
There's probably no point of doing this on Windows. However, for additional context there's a closed PR that implements it on NativeAOT - #113160. The problem is that if dynamic code is prohibited by a process-wide policy this stops working due to the remapped thunks not being valid control-flow guard targets (and |
@filipnavara well, there are actually some practical reasons to want to support it on Windows. Notably, typical L2/L3 caches on processors are pretty much always PIPT, so by using a single copy of the stub templates, applications with many methods in the hot loop may use less of their precious cache space on these stubs. It's not likely this is super important, but it's not a meaningless improvement. However, incompatibility with CFG makes this impractical, since that is a baseline requirement for many, many important .NET customers, in particular, most of the customers with the really large applications that would benefit most. |
I didn't mean that there are no reasons to do it. I just meant that it's not feasible/practical given the restrictions imposed by the system. We tried in the past and failed (#113114 (comment)). It's possible that we missed something but it also feels that the deadend we hit is a deliberate system restriction. I noticed that you had a branch attempting to do it so I thought it's reasonable to warn you before you hit the same trap. We made the mapping part working on a system that's not locked down but that's going to be increasingly less useful scenario. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good, I just have a couple of nits / questions.
{ | ||
locals.data.fdImage = fd; | ||
locals.data.offsetInFileOfStartOfSection = 0; | ||
locals.data.addrOfStartOfSection = (void*)0x10000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why the start addr of the section in the case we use the shared memory file is 0x10000.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment. Let me know if you're ok with it.
vm_prot_t prot, max_prot; | ||
kern_return_t ret; | ||
|
||
// Allocate two contiguous ranges of memory: the first range will contain the trampolines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit - trampolines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the working to stubs
{ | ||
if (IsDoubleMappingEnabled() && VMToOSInterface::AllocateThunksFromTemplateRespectsStartAddress()) | ||
{ | ||
CRITSEC_Holder csh(m_CriticalSection); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we take the critical section here when the ReleaseWorker does it too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh. Weird. I see no reason I did that. critical section removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
…net#115665) This reverts commit f7fc178.
- This logic always assumed that the lowest address of a mapped file was its "base address", but in the presence of dotnet/runtime#114462 that is no longer correct. We may map stubs at lower addresses. - Fix by checking the start of the mapped region for the magic value for the start of a MachO binary
Instead of generating stub templates by running the code page generation function for each stub code region that is generated, map a single file into memory in multiple locations instead. This logic is architected so that it can fall back to the previous model of using the double mapping logic to generate the stubs.
We implement 3 schemes for finding the to use as mapping data.
dladdr
anddl_iterate_phdr
functionality to find the file name of the coreclr binary and the section that contains it, and then we can use mmap to map from the file directly into memory. This mode is disabled by default as it doesn't support all scenarios, but it does allow for testing of the majority of the logic used on Apple platforms on a Linux machine.