Skip to content

Create a single copy of stub templates #114462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 7, 2025

Conversation

davidwrighton
Copy link
Member

@davidwrighton davidwrighton commented Apr 9, 2025

Instead of generating stub templates by running the code page generation function for each stub code region that is generated, map a single file into memory in multiple locations instead. This logic is architected so that it can fall back to the previous model of using the double mapping logic to generate the stubs.

We implement 3 schemes for finding the to use as mapping data.

  1. On apple platforms, we compile the stubs into the coreclr Mach-O file, and use the virtual memory manager support for mapping memory from the coreclr binary into various locations on the heap.
  2. On Linux platforms, we can compile the stubs into the coreclr ELF file, and then use the dladdr and dl_iterate_phdr functionality to find the file name of the coreclr binary and the section that contains it, and then we can use mmap to map from the file directly into memory. This mode is disabled by default as it doesn't support all scenarios, but it does allow for testing of the majority of the logic used on Apple platforms on a Linux machine.
  3. On Unix platforms that are not Linux or Linux when mode 2 is not enabled, we use an anonymous in memory file just like our normal executable allocation path.
  4. On Windows platforms, we currently do not yet support this mode.

Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@am11
Copy link
Member

am11 commented Apr 10, 2025

3. On Unix platforms that are not Linux,

We can classify them based on binary format main...am11:runtime:binfmt and use: #ifdef MINIPAL_TARGET_ELF to treat all ELF platforms alike.

commit 40f2a6ac4c7393294d794f55aa9206f8eb815b5c
Author: David Wrighton <davidwr@microsoft.com>
Date:   Tue Apr 1 15:16:13 2025 -0700

    Build out ExecutableAllocator api for template mapping

commit dd22e4213563461948e702f035c0558784c3f767
Author: David Wrighton <davidwr@microsoft.com>
Date:   Tue Apr 1 13:07:42 2025 -0700

    X64 template code gen is correct

commit 89fb8b6
Author: David Wrighton <davidwr@microsoft.com>
Date:   Mon Mar 31 14:00:23 2025 -0700

    Add support for Amd64 thunk pages
@davidwrighton davidwrighton marked this pull request as ready for review April 16, 2025 17:26
@Copilot Copilot AI review requested due to automatic review settings April 16, 2025 17:26
@davidwrighton davidwrighton changed the title [DRAFT] Create a single copy of stub templates Create a single copy of stub templates Apr 16, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 12 out of 15 changed files in this pull request and generated no comments.

Files not reviewed (3)
  • src/coreclr/clrdefinitions.cmake: Language not supported
  • src/coreclr/vm/amd64/thunktemplates.S: Language not supported
  • src/coreclr/vm/arm64/thunktemplates.S: Language not supported
Comments suppressed due to low confidence (2)

src/coreclr/vm/precode.cpp:527

  • [nitpick] Consider replacing the magic constant 0x4000 with a named constant (e.g. kExpectedTemplatePageSize) to improve code clarity and maintainability.
if (StubPrecodeCodeTemplate != NULL && pageSize != 0x4000)

src/coreclr/vm/precode.cpp:674

  • [nitpick] Consider replacing the magic constant 0x4000 with a named constant to enhance clarity and maintainability in the template usage check.
if (FixupPrecodeCodeTemplate != NULL && pageSize != 0x4000)

Copy link
Member

@filipnavara filipnavara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I checked the Apple specific code paths and they look reasonable.

I skimmed over the Linux parts too; they are similar in principle to the code used in NativeAOT before the switch to vm_remap. I didn't review them in detail though.

@filipnavara
Copy link
Member

4. On Windows platforms, we currently do not yet support this mode.

There's probably no point of doing this on Windows. However, for additional context there's a closed PR that implements it on NativeAOT - #113160. The problem is that if dynamic code is prohibited by a process-wide policy this stops working due to the remapped thunks not being valid control-flow guard targets (and SetProcessValidCallTargets is also prohibited in this mode).

@davidwrighton
Copy link
Member Author

@filipnavara well, there are actually some practical reasons to want to support it on Windows. Notably, typical L2/L3 caches on processors are pretty much always PIPT, so by using a single copy of the stub templates, applications with many methods in the hot loop may use less of their precious cache space on these stubs. It's not likely this is super important, but it's not a meaningless improvement. However, incompatibility with CFG makes this impractical, since that is a baseline requirement for many, many important .NET customers, in particular, most of the customers with the really large applications that would benefit most.

@filipnavara
Copy link
Member

filipnavara commented Apr 28, 2025

there are actually some practical reasons to want to support it on Windows.

I didn't mean that there are no reasons to do it. I just meant that it's not feasible/practical given the restrictions imposed by the system. We tried in the past and failed (#113114 (comment)). It's possible that we missed something but it also feels that the deadend we hit is a deliberate system restriction.

I noticed that you had a branch attempting to do it so I thought it's reasonable to warn you before you hit the same trap. We made the mapping part working on a system that's not locked down but that's going to be increasingly less useful scenario.

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good, I just have a couple of nits / questions.

{
locals.data.fdImage = fd;
locals.data.offsetInFileOfStartOfSection = 0;
locals.data.addrOfStartOfSection = (void*)0x10000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why the start addr of the section in the case we use the shared memory file is 0x10000.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment. Let me know if you're ok with it.

vm_prot_t prot, max_prot;
kern_return_t ret;

// Allocate two contiguous ranges of memory: the first range will contain the trampolines
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit - trampolines?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the working to stubs

{
if (IsDoubleMappingEnabled() && VMToOSInterface::AllocateThunksFromTemplateRespectsStartAddress())
{
CRITSEC_Holder csh(m_CriticalSection);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we take the critical section here when the ReleaseWorker does it too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. Weird. I see no reason I did that. critical section removed.

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@davidwrighton davidwrighton merged commit ac760bf into dotnet:main May 7, 2025
95 checks passed
thaystg added a commit that referenced this pull request May 16, 2025
hoyosjs pushed a commit that referenced this pull request May 17, 2025
davidwrighton added a commit to davidwrighton/runtime that referenced this pull request May 19, 2025
davidwrighton added a commit that referenced this pull request May 20, 2025
* Reapply "Create a single copy of stub templates (#114462)" (#115665)

This reverts commit f7fc178.

* Disable feature on Apple platforms for now

Co-authored-by: Tom McDonald <tommcdon@microsoft.com>
hoyosjs pushed a commit to dotnet/diagnostics that referenced this pull request May 22, 2025
- This logic always assumed that the lowest address of a mapped file was
its "base address", but in the presence of dotnet/runtime#114462 that is
no longer correct. We may map stubs at lower addresses.
- Fix by checking the start of the mapped region for the magic value for
the start of a MachO binary
SimaTian pushed a commit that referenced this pull request May 27, 2025
* Reapply "Create a single copy of stub templates (#114462)" (#115665)

This reverts commit f7fc178.

* Disable feature on Apple platforms for now

Co-authored-by: Tom McDonald <tommcdon@microsoft.com>
@github-actions github-actions bot locked and limited conversation to collaborators Jun 7, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants