Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a managed implementation of assembly binder #91400

Draft
wants to merge 143 commits into
base: main
Choose a base branch
from

Conversation

huoyaoyuan
Copy link
Member

#85558 (comment)

It should be possible to move the whole assembly loader to C#. We would need a special unmanaged path to load CoreLib to bootstrap, but the rest can be managed. It is too big to do it all at once. I think a good place to start with this refactoring would be BINDER_SPACE::*.

I've ported the types under BINDER_SPACE::, and a large portion of binding logic in assemblybindercommon.cpp. I also created a disabled feature switch to guard the code.

The logic is ported closely from C++ and aims line-to-line match, to keep the behavior. Initially I want to convert all the HRESULTs to exceptions, but it's not practical to use exception in control logic of binder, so I kept HRESULTs in AssemblyBinderCommon.

/cc @jkotas

Is this in the desired direction? I haven't do anything around managed/unmanaged boundary. I think code should be ported to managed, until something we don't want.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Aug 31, 2023
@ghost
Copy link

ghost commented Aug 31, 2023

Tagging subscribers to this area: @vitek-karas, @agocke, @VSadov
See info in area-owners.md if you want to be subscribed.

Issue Details

#85558 (comment)

It should be possible to move the whole assembly loader to C#. We would need a special unmanaged path to load CoreLib to bootstrap, but the rest can be managed. It is too big to do it all at once. I think a good place to start with this refactoring would be BINDER_SPACE::*.

I've ported the types under BINDER_SPACE::, and a large portion of binding logic in assemblybindercommon.cpp. I also created a disabled feature switch to guard the code.

The logic is ported closely from C++ and aims line-to-line match, to keep the behavior. Initially I want to convert all the HRESULTs to exceptions, but it's not practical to use exception in control logic of binder, so I kept HRESULTs in AssemblyBinderCommon.

/cc @jkotas

Is this in the desired direction? I haven't do anything around managed/unmanaged boundary. I think code should be ported to managed, until something we don't want.

Author: huoyaoyuan
Assignees: -
Labels:

area-AssemblyLoader-coreclr

Milestone: -

@jkotas
Copy link
Member

jkotas commented Aug 31, 2023

This is good raw material. Creating a clean managed/unmanaged boundary is probably going to be the harder part. Some thoughts:

  • The unmanaged binder calls out to managed to log tracing messages and to fire resolve events. This back and forth between managed and unmanaged should go away, the unmanaged code should just do a single call to managed to do everything.
  • The binding using TPA list should be folded into DefaultAssemblyLoadContext, no need for it to live in separate type.
  • The managed equivalent of TextualIdentityParser exists as System.Reflection.AssemblyNameFormatter. We should just use the managed AssemblyNameFormatter instead of duplicating it as TextualIdentifyParser.
  • TextualIdentifyParser may be used by debugger or in places that are not able to run managed code. If you run into situations like this, leave it alone in C++ for now.
  • We avoid C# record in core libraries. C# record generates too much hidden code bloat that the trimmer is not able to delete.
  • I am not sure what is the purpose of the binding failure HRESULT error cache. The VM caches the binding failures with the exception details (look for AppDomain::AddExceptionToCache). I would think that the binding failure HRESULT error cache should be unnecessary.

int* dwPAFlags = stackalloc int[2];
using IMdInternalImport pIMetaDataAssemblyImport = BinderAcquireImport(pPEImage, dwPAFlags);

Architecture = AssemblyBinderCommon.TranslatePEToArchitectureType(dwPAFlags);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The managed binder should not need to deal with image architectures. The unmanaged runtime should validate very early whether the image is good for current architecture (and reject it if it is not). By the time we get to the managed binder, we should know that the image architecture is good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started with focusing the data structures, and did not look at the caller. Leaving the cleanup later should be fine, since cleaning managed code is easy.

@vitek-karas
Copy link
Member

@elinor-fung FYI

@huoyaoyuan
Copy link
Member Author

Good points for the behaviors that we don't need in managed binder. I'm aiming for a working POC first, and not tweaking yet. Records are just simplification since I don't want to write GetHashCode etc. first.

Rough thought of steps:

  • Explore the scope of managed binder, create shape for things to port, and use placeholder for boundary and complex functions.
  • Fulfill implementation and boundary incrementally, to get the POC working.
  • Create feature switch guard for unmanaged code replaced, and filter out things we can't remove, for example which used by debugger.
  • Tweak boundary and optimise managed code.
  • Turn on the feature switch eventually.

Questions:

  • Should I create a tracking issue for this? This is likely not a single-PR work.
  • Should we get some code merged with feature switch disabled first? How complete should it be?
  • Will this affect bootstrapping on new platform/architecture?

@lambdageek
Copy link
Member

@jkotas More for my own education, rather than as a comment on the details of this PR: how does a managed assembly binder interact with class loading/initialization in coreclr? In Mono we have scenarios where constructing a class triggers assembly loading while we're holding the global loader lock - which can run managed loader events which can lead to deadlocks (#72796) . My understanding is that CoreCLR class loading is less atomic - relying on load levels and individual class locks. But I was under the impression that there are still scenarios where managed callbacks fire while a native lock is held. Does moving the assembly binder to managed code affect those scenarios?

Also what happens if you trigger class initialization from the managed assembly binder? One way I could confuse mono was by doing a Console.WriteLine in the AppDomain.AssemblyLoad event handler before calling WriteLine (#51881) - which triggers loading of System.Console.dll which triggers a recursive class initialization of Console

It's possible that the interaction between the assembly binder and class initialization is more well behaved in CoreCLR, but I'm curious if porting the binder to managed has any impact.

@jkotas
Copy link
Member

jkotas commented Jan 22, 2024

Right, assembly loader needs to be able to call AssemblyLoadContext callbacks and AppDomain events that are managed code. Rewriting more of the assembly loader in managed code means that there is going to be more managed code running, but it does not change any fundamental invariants.

I was under the impression that there are still scenarios where managed callbacks fire while a native lock is held. Does moving the assembly binder to managed code affect those scenarios?

CoreCLR class loading/initialization has lock per type. (The implementation of this lock is being rewritten in #96653 to be more scalable and performant.) Triggering assembly loading or more type loading from type loading works fine, as long as there is no recursive loading of the same type.

class initialization

CoreCLR does not trigger class initializers during type or assembly loader. The class initializers are triggered only once the code runs. We are out of the assembly and type loader by that time.

@agocke agocke marked this pull request as draft February 27, 2024 18:55
@agocke
Copy link
Member

agocke commented Feb 27, 2024

Moved to draft while we figure out if we can either fix the perf regressions, or accept them.

@huoyaoyuan
Copy link
Member Author

huoyaoyuan commented Mar 10, 2024

I've done some measurement about working set:

It's about 3.5MB of total regression comparing to main for a given run. Break down by vmmap:
Heap (unmanaged?): +0.8MB
Image: +1.5MB. The working set of CoreLib, coreclr, clrjit and icu.dll has all increased.
Managed: vmmap can't show this category of PR build, but successes for main branch build. It's about 1.5MB for main.
Mapped file: +0.6MB for icudtl.dat
Page table: -0.3MB
Private data: +276KB, a block of R/W data
Sharable: +2MB. Not sure what it represents.

So one obvious conclusion is that some globalization related code are touched unintentionally.

Setting globalization invariant mode doesn't change the regression though.

Copy link
Contributor

@ANahr ANahr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple cases in the code that might trigger globalization code

Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@huoyaoyuan huoyaoyuan reopened this Apr 12, 2024
@huoyaoyuan
Copy link
Member Author

Keep open since it's functionally complete.

Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@huoyaoyuan huoyaoyuan reopened this May 12, 2024
@huoyaoyuan
Copy link
Member Author

Status report:

Working Set:

About 15.2 MB -> 16.3 MB under default configuration.
The difference is mainly +0.6MB in mapped image of CoreLib, and +0.3MB of managed heap, +0.2MB of sharable page.
There is reduction of mapped image of coreclr.dll, but the difference is much smaller.

Start time:

Measured with Measure-Command in PowerShell as an external source. The startup time range shifts from 23.6 ~ 24.8ms to 25.3 ~ 26.7ms. The regression is clearly measurable.

File size:

-39KB in coreclr.dll, +92KB in CoreLib (R2R).

There are still some space of dead code for optimization, but I kept this PR as close to native code as possible to help reviewing.
Cleanup around managed->native calls may also improve performance in the future.

Comment on lines +171 to +176
if (outPath.EndsWith(".ni.dll", StringComparison.OrdinalIgnoreCase)
|| outPath.EndsWith(".ni.exe", StringComparison.OrdinalIgnoreCase))
{
simpleName = outPath[iSimpleNameStart..^7];
isNativeImage = true;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ni.dll still a thing? I remember it's output of NGen.

Even if it's still a thing, we should probably remove this too because the shared framework (TPA) does not contain any ni files.
The performance difference is negligible though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-AssemblyLoader-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants