Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssemblyLoadContext: requiring full cooperation to stay "inside" is fundamentally flawed #45285

Closed
jazzdelightsme opened this issue Nov 27, 2020 · 20 comments

Comments

@jazzdelightsme
Copy link
Contributor

Description

My overall scenario is that there is a native app, and I want to use .NET 5 C# to write a plugin for that app (see also: #1633). I have tried porting to .NET Core before, but each time I've run into significant problems. I just made another attempt, and I was excited to try to use the ability of AssemblyLoadContext to be unloaded, so that my plugin would have the same behavior as it already does in full .NET, where it can be unloaded (although I think it will require use of a shim DLL that does always stay loaded, because I can't specify an ALC to start in from the native side). But I ran into a significant problem even getting my code to work correctly, and I realize this will also completely preclude me from using collectible ALCs in the future as well.

I followed the hostfxr sample to write code to boot up the CLR and then use load_assembly_and_get_function_pointer_fn to get a pointer I can call to run my C# code. I learned that will load my code in an isolated context (“Calling this function will load the specified assembly in isolation”).

And what's wrong with that? Being in an IsolatedComponentLoadContext doesn't seem like such a bad thing, right?

But my code was malfunctioning, in a baffling way, until @jkotas helped me realize that my code was being loaded multiple times, in different ALCs. My plugin hosts the PowerShell runtime (7.1, built on .NET 5), and in PowerShell, when you load a [binary] module (Import-Module foo.dll), it just uses Assembly.LoadFrom("foo.dll") to load it... and Assembly.LoadFrom uses AssemblyLoadContext.Default to load into. So without trying, my code accidentally "escaped" the "isolated" ALC it had been originally loaded into (and hilarity ensued).

So: the current design of .NET/ALCs is that if you want a body of code to run in a separate, non-default ALC, then all that code must be purpose-written to stay in that ALC. Staying "inside" the "current" ALC is not something that happens by default; you have to do something special to not accidentally "escape" into the default ALC.

This is a deep, fundamental flaw.

Because as the amount of code that you want to run in a separate ALC grows, the chance of depending on some code that accidentally falls out of the ALC approaches 1. This is a serious crack in the foundation. As you put more code on top of it, the foundation will fail. Please change the design of loading/ALCs to make "staying in the current" ALC the default!

See also:

#41625: AssemblyLoadContext does not provide proper isolation of types
#598: Add support to specify the ALC in which to generate a dynamic assembly.

#13472: There's no way to call native library resolution on a specific AssemblyLoadContext (comment):

"However, isolation & dynamic loading is (almost) useless if it relies on libraries being written in a special way to make them "isolation-friendly" or "dynamic-load-friendly": Even if you write an assembly specifically to be isolated, chances are that you'll have at least one NuGet dependency that hasn't been written specially to be isolated."

#29842: WIP API Proposal - AssemblyBuilder.DefineDynamicAssembly for AssemblyLoadContext

#32851: Direct assemblies away from IsolatedComponentLoadContext towards a single AssemblyLoadContext (in which someone realized they had to do something special to get everything in the same ALC)

(and there are probably more, but I got tired of looking through all the Issues)

Regression?

Yes; I have code on full .NET Framework that uses AppDomains and nothing ever gets accidentally loaded in some other AppDomain.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-AssemblyLoader-coreclr untriaged New issue has not been triaged by the area owner labels Nov 27, 2020
@ghost
Copy link

ghost commented Nov 27, 2020

Tagging subscribers to this area: @vitek-karas, @agocke, @CoffeeFlux
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

My overall scenario is that there is a native app, and I want to use .NET 5 C# to write a plugin for that app (see also: #1633). I have tried porting to .NET Core before, but each time I've run into significant problems. I just made another attempt, and I was excited to try to use the ability of AssemblyLoadContext to be unloaded, so that my plugin would have the same behavior as it already does in full .NET, where it can be unloaded (although I think it will require use of a shim DLL that does always stay loaded, because I can't specify an ALC to start in from the native side). But I ran into a significant problem even getting my code to work correctly, and I realize this will also completely preclude me from using collectible ALCs in the future as well.

I followed the hostfxr sample to write code to boot up the CLR and then use load_assembly_and_get_function_pointer_fn to get a pointer I can call to run my C# code. I learned that will load my code in an isolated context (“Calling this function will load the specified assembly in isolation”).

And what's wrong with that? Being in an IsolatedComponentLoadContext doesn't seem like such a bad thing, right?

But my code was malfunctioning, in a baffling way, until @jkotas helped me realize that my code was being loaded multiple times, in different ALCs. My plugin hosts the PowerShell runtime (7.1, built on .NET 5), and in PowerShell, when you load a [binary] module (Import-Module foo.dll), it just uses Assembly.LoadFrom("foo.dll") to load it... and Assembly.LoadFrom uses AssemblyLoadContext.Default to load into. So without trying, my code accidentally "escaped" the "isolated" ALC it had been originally loaded into (and hilarity ensued).

So: the current design of .NET/ALCs is that if you want a body of code to run in a separate, non-default ALC, then all that code must be purpose-written to stay in that ALC. Staying "inside" the "current" ALC is not something that happens by default; you have to do something special to not accidentally "escape" into the default ALC.

This is a deep, fundamental flaw.

Because as the amount of code that you want to run in a separate ALC grows, the chance of depending on some code that accidentally falls out of the ALC approaches 1. This is a serious crack in the foundation. As you put more code on top of it, the foundation will fail. Please change the design of loading/ALCs to make "staying in the current" ALC the default!

See also:

#41625: AssemblyLoadContext does not provide proper isolation of types
#598: Add support to specify the ALC in which to generate a dynamic assembly.

#13472: There's no way to call native library resolution on a specific AssemblyLoadContext (comment):

"However, isolation & dynamic loading is (almost) useless if it relies on libraries being written in a special way to make them "isolation-friendly" or "dynamic-load-friendly": Even if you write an assembly specifically to be isolated, chances are that you'll have at least one NuGet dependency that hasn't been written specially to be isolated."

#29842: WIP API Proposal - AssemblyBuilder.DefineDynamicAssembly for AssemblyLoadContext

#32851: Direct assemblies away from IsolatedComponentLoadContext towards a single AssemblyLoadContext (in which someone realized they had to do something special to get everything in the same ALC)

(and there are probably more, but I got tired of looking through all the Issues)

Regression?

Yes; I have code on full .NET Framework that uses AppDomains and nothing ever gets accidentally loaded in some other AppDomain.

Author: jazzdelightsme
Assignees: -
Labels:

area-AssemblyLoader-coreclr, untriaged

Milestone: -

@vitek-karas
Copy link
Member

I do agree that there's no bullet proof solution to this. The closest one can get right now is using "contextual reflection" - for example EnterContextualReflection that works on almost all APIs - Assembly.LoadFrom is an exception unfortunately.

The main reason is that LoadFrom is defined to load into the default load context. It's something we might be able to change though - either with another "contextual flag" or just outright as a "breaking change" in another major release.

Assembly.LoadFile is similar and can probably be changed similarly to LoadFrom as described above.

Basically the problem is that in .NET Framework AppDomain was a concept where the runtime provided a "current AppDomain" context for all running code. And had mechanisms to switch between app domains. In .NET Core there's no such "current context" maintained by the runtime. We kind of introduced one with the "contextual reflection" mentioned above. It might be possible to extend/improve it to work in more cases correctly.

The detailed design doc for this feature is here: https://github.com/dotnet/runtime/blob/master/docs/design/features/AssemblyLoadContext.ContextualReflection.md

Please note though that it is not going to be bullet proof - possibly ever with just AssemblyLoadContext. Any code has the ability to call AssemblyLoadContext.Default.LoadFromAssemblyPath which would (and should) always load into the default context (regardless of the contextual reflection settings).

There's also the question of unloadability. Even if the "plugin" can be fully isolated in its own load context, it still doesn't guarantee that it will be possible to unload - in .NET Core unloadability is "cooperative" in that all references to the load context (and everything in it) must be released in order to unload. Currently there's quite a few places even in the core framework which will hold onto "random" referenced for long time (or forever).

We have a user story issue to track basically this work here: #43544
I think adding the requirement to solve the Assembly.LoadFrom and Assembly.LoadFile issues to that story would make a lot of sense.

@vitek-karas
Copy link
Member

Let me add one more thing. .NET Framework had remoting - or more precisely - transparent remoting proxies. A way to make an object instance of type A to look like type B for a different app domain (given that A and B where very similar). The proxies where then used to facilitate automatic switching between app domains. .NET Core doesn't have transparent proxies (they relied on remoting which was dropped in .NET Core for several reasons). It is possible to build "proxies" like this in .NET Core if the types passed around are interfaces. We don't have this productized (we had a prototype) yet.

Currently maintaining the "context" is explicit - for the scenario in this issue that's not a big problem as the boundary is clearly defined in the native/managed split. But for full solution in all cases having some proxy solution would probably be needed.

@jazzdelightsme
Copy link
Contributor Author

Thanks Vitek!

In .NET Core there's no such "current context" maintained by the runtime.

Yes, that is definitely a problem.

Please note though that it is not going to be bullet proof - possibly ever with just AssemblyLoadContext. Any code has the ability to call AssemblyLoadContext.Default.LoadFromAssemblyPath which would (and should) always load into the default context (regardless of the contextual reflection settings).

Yes, I agree that "not bullet proof" is the correct design. Just as with AppDomains, nobody should be trying to forcibly prevent code from using a different AssemblyLoadContext; you should always be able to say AssemblyLoadContext.Default.Blah. But doing so should always be an explicit action that you take (e.g. accessing AssemblyLoadContext.Default). IOW there should be a door that you are required to open to do that. Not locked--not even a bathroom lock! But a door, so that you (or one of your thirteen nuget package houseguests) don't just accidentally wander over what currently amounts to a chalk line on the floor.

If I don't take some explicit action (such as by directly accessing AssemblyLoadContext.Default or some other ALC), then my expectation is that I will never just accidentally somehow end up loading stuff in the default ALC.

@govert
Copy link
Contributor

govert commented Dec 1, 2020

I develop a framework for making Excel add-ins with .NET. We currently use AppDomains to isolate different add-ins which might be independently developed but are loaded into the same Excel process.

Even if we are able to put in place .NET 5 support, and try to deal with the ALC limitations, next year we have to deal with .NET 6 and the current status that different versions of .NET core are not supported in the same process. Something that might be attractive and help the current issue would be to load completely separate .NET Core universes into a process. This would mean separate GC heaps, assembly resolution spaces etc.

One of the .NET Core value propositions was that you can isolate your app distribution and 'bring your own .NET along'. But for the in-process add-in case, it's still very hard to see how to benefit from the great feature and performance work being done.

@vitek-karas
Copy link
Member

@govert thanks a lot for describing your scenario. We've been hearing the request for in-proc SxS support quite a few times and discussed this internally several times (why, what and so on). The current status is that the product is not designed for it, and thus adding it is not exactly cheap. Also as far as I know this is not going to happen for .NET 6 either (things may change though). In any case what would be interesting for me is what kind of limitations you would be willing to accept with such support (this is also partially the list of big challenges of doing this):

  • Debugger support - currently all debuggers "assume" there's only one runtime in the process. The closest debuggers got to SxS runtime support is that VS supports .NET Framework 2., .NET Framework 4. and .NET Core SxS in one process with some (rather rough) way to pick which one to use. Since .NET Core is SxS a lot - specifically if you want to use self-contained deployments. This would mean that debuggers would have to be able to pick a specific runtime instance. In your scenario, how important is it to be able to debugging one specific runtime (plugin) out of potentially many?
  • Tracing - basically a similar story to debuggers. AFAIK the tracing (EventPipe) is no designed currently to support multiple runtimes in one process. Again -how important is it for your scenario to be able to support this for multiple runtimes in one process.
  • Interop - interop between the runtimes. Is this something your scenario would need (or it would be really nice to have)? Anything on top of the basic "they can talk to each other the same way any other native component would".

@govert
Copy link
Contributor

govert commented Dec 1, 2020

@vitek-karas Thank you for your reply.

I think what would be most helpful, both for the in-proc SxS / .NET 6 runtime problem, and for the ALC issue that is raised in this issue, is some guidance from the dotnet team on what scenarios are supported now and in future. We can then consider behaviour that does not conform to the plan (e.g. libraries and APIs that jump out of the ALC) as bugs to be fixed, and limitations like the ones you mention as things to document or work around.

If I publish a component today (e.g. an COM add-in for an Office app, an MMC snap-in) that will run in a process where unknown independent components will also be installed in future, can I reasonably use .NET 5? Not having some kind of AppDomain / ALC isolation is one problem. The danger of another component targeting .NET 6 being installed next year and hence breaking mine would be another. Perhaps some of the AOT options being looked at can help here too, as long as multiple GC / JIT worlds can live together in a process.

About the limitations you mention, I would say for my Excel scenario none are showstoppers:

  • Debugging - I guess the bulk of debugging happens in a development environment where you can limit the number of clashing components loaded into your process. But it means that problems due to multiple add-ins clashing are then very hard to debug.
  • Tracing - would multiple add-ins that target different runtimes both want to use tracing? Maybe. If it is not supported, it would help if there were some way to warn a developer that this and similar APIs might stop working when other components load into the process.
  • Interop - this is the smallest problem in the Excel (native host process) scenario. One could have some kind of (native) library to deal with this for add-ins that want to cooperate. I guess this is a much bigger problem when the host itself is a managed application that wants to interact with add-ins in a rich way. For the Excel case it's a combination of a native C API and COM.

As you say, .NET Core / .NET 5 is not designed for these scenarios, and I have to give some guidance to my users on how to make the tough call of staying on a stable but deprecated .NET Framework for the foreseeable future vs. dealing with the stability / isolation problems that are the trade-off when going to .NET 5+. What should I say?

@vitek-karas
Copy link
Member

I think it's fair to say that right now using .NET Core to implement plugins to arbitrary host comes with quite a few hurdles. You summarized it pretty well above. The only thing I would add is that if the component wants it can declare it's dependency on .NET Core version using the rollForward feature. If all components specify the rollForward=LatestMajor then it should work going forward in all cases (regardless which component will be loaded first, they will all load the latest runtime version available on the machine). The downside is that there might be changes between major versions which could potentially break older components. That said NuGet packages behave similarly to components with rollForward=LatestMajor already, so the chances of things working are pretty high.

@jkotas
Copy link
Member

jkotas commented Dec 1, 2020

Out-of-proc hosting for plugins is an alternative solution for the versioning and SxS problems discussed here. In addition, out-of-proc solves reliability and performance issues, but also creates new one, for example protocol to use for communication. I expect that out-of-proc hosting for plugins is going to be more common going forward. Visual Studio is heading in this direction: https://devblogs.microsoft.com/visualstudio/the-future-of-visual-studio-extensions/

@govert
Copy link
Contributor

govert commented Dec 1, 2020

Out-of-process is often an option to get isolation. But in the Excel context, the add-ins expose functions that are called directly from the worksheet during calculation. In that setting performance is critical, and performance improvements would be a large reason for considering .NET 5+.
The new Office Javascript add-in engine had to be moved back into the Excel process for the same performance reason, which I'm sure causes them comparable problems.

@govert
Copy link
Contributor

govert commented Jan 23, 2021

To further emphasize the initial message and request of this issue, I'll provide a bit more background and details of how it affects my use case.

For NativeApp I develop a mixed native and managed AddInManager to allow third-party managed AddIns to be developed with .NET.
I am adding support for .NET 5+ add-ins (only on Windows).

NativeApp interacts with AddIns in various ways.

  • NativeApp calls AddInManager, which loads AddIn (and dependencies).
  • NativeApp calls C-style exports (reverse P/Invoke) from AddIn (actually stubs generated and exported at runtime via AddInManager).
  • NativeApp exposes a COM object model which AddIn references and uses, including calls to COM events that AddIn subscribes to.
  • NativeApp calls via IDispatch into various COM objects exposed by AddIns.
  • NativeApp has a native scripting engine that calls into late-bound IDispatch and early-bound COM objects exposed by AddIn.

AddIns are independently developed and multiple add-ins from different developers need to share the single NativeApp process. AddIns may depend on various further managed and native libraries.

AddInManager will try to isolate AddIns by loading them into individual AssemblyLoadContexts.

  • During initial load of the AddIns, AddInManager can wrap calls in a ContextualReflectionScope, so that further loads during this phase happen in the isolated ALC. This looks OK apart from loads with the problematic APIs mentioned here, Assembly.LoadFrom and Assembly.LoadFile which can make a mess as described earlier.

  • Next, NativeApp will call the C-style exported functions in AddIn. Each such function may in turn cause further assemblies to be loaded. AddInManager can try to have the exported stub function also wrap the internal call in a ContextualReflectionScope. Performance here here is critical but the overhead seems small, so that's a possibility.

  • There there are all the various COM calls from NativeApp -> AddIn. For example a callback called via IDispatch on a class implemented in AddIn, or a COM event that is raised by NativeApp, or even a true COM interface exported by AddIn called from the NativeApp scripting engine. Such a callback or event can cause code to run in AddIn that needs to be isolated in a ContextualReflectionScope.

It seems the options for .NET 5 are

  • either the AddInManager must shim everything COM-related - the events from the NativeApp object model and all exported interfaces from the AddIn, or
  • the burden of isolation falls on the AddIn developer, and they need to learn to make the extra EnterContextualReflection call in every entry-point method. Or they must be very sure which further dependencies might be loaded. However, since the timing of the assembly loads is not deterministic (it might depend on the sequence of UI or scripting interactions coming from the user via NativeApp), there will be ongoing discovery of problems at runtime.

Maybe what is needed to provide effective isolation for the AddIns (besides addressing the Assembly.LoadFrom and Assembly.LoadFile issue) is to have a mechanism that effectively provides an automatic enrollment into the ContextualReflectionScope at all these times where native -> managed transitions happen. So the native exports and COM exports all need to be aware of the ALC under which they are set up. It need not be a security barrier or 'bullet-proof' against hostile AddIns but should be such that 'plain' AddIn code has safe default behaviour without having to add extra plumbing calls to every method and COM event handler.

This is exactly what AppDomains provided in the .NET Framework, and to me it looks likely some future version of .NET will need to provide again.
Tongue-in-cheek I look forward to 'Project Reunion.NET' coming in .NET 7 ;-)

/cc @vitek-karas @jkotas @AaronRobinsonMSFT @elinor-fung

@vitek-karas
Copy link
Member

I absolutely agree that this is a problem - no question there. I'm just curious about these scenarios.

The problem only exists if the code does dynamic loading of assemblies - so basically calls to things like Type.GetType, Assembly.Load and similar. Direct assembly references in compiled code should never have a problem (and don't need "contextual reflection" workaround).
Is it really that common in your code to load code dynamically?

@jazzdelightsme
Copy link
Contributor Author

jazzdelightsme commented Jan 26, 2021

@vitek-karas My scenario involves hosting the PowerShell runtime, so yes: every "module import" is an Assembly.Load, and that is not going to change. And in fact, even in my own code, I have an Assembly.Load--basically my "plugin" itself can have "plugins".

Another debugger plugin I am aware of ("MEX") also uses Assembly.Load internally (and are also stuck on old desktop .NET).

"Is it that common?" Not every program does dynamic loading, and so not every plugin does dynamic loading... but it's just as important and powerful a tool for plugins as it is for regular programs.

@vitek-karas
Copy link
Member

Thanks for the scenario descriptions. I don't think AppDomains are coming back, but maybe there's some other way to make this work at least most of the time. The most worrying case for me is the COM - that's hard - it would basically have to be a feature at the interop layer... which I don't like very much.

@vitek-karas vitek-karas added this to the Future milestone Jul 9, 2021
@vitek-karas vitek-karas removed the untriaged New issue has not been triaged by the area owner label Jul 9, 2021
@marcin-krystianc
Copy link
Contributor

I want to ask about a scenario that causes some problems for us. We have ManagedApp that loads multiple managed plugins, e.g.:

  • MyManagedPluginFoo which loads NativeDependencyA1 via DllImport, but NativeDependencyA1 loads itself a NativeDependencyB1 via LoadLibrary or dllimport.
  • MyManagedPluginBar which loads NativeDependencyA2 via DllImport, but NativeDependencyA2 loads itself a NativeDependencyB2 via LoadLibrary or dllimport.
  • Managed plugins are implemented by us, but native dependencies are not (so we don't control how they work )
            ->  MyManagedPluginFoo -> NativeDependencyA1 -> NativeDependencyB1
ManagedApp
            ->  MyManagedPluginBar -> NativeDependencyA2 -> NativeDependencyB2

The problem is that the first level of native dependencies is resolved correctly and we can use both versions of A (A1 and A2) but the second level of native dependencies is beyond any control so we end up using only a single version of B (either B1 or B2, whichever is loaded first).
Is there anything in the current (.NET 7) ALCs implementation that would give us any control over native->native dependency resolution? Is it even possible for a managed application to influence the dependency resolution happening in native code?

@jkotas
Copy link
Member

jkotas commented Mar 18, 2023

give us any control over native->native dependency resolution? Is it even possible for a managed application to influence the dependency resolution happening in native code?

You can partially control this by PInvoking to platform-specific native library loading method instead of the NativeLibrary.Load API. For example, For example, passing LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR flag to LoadLibraryEx on Windows may give you what you are looking for when NativeDependencyA dependency on NativeDependencyB is static.

@marcin-krystianc
Copy link
Contributor

give us any control over native->native dependency resolution? Is it even possible for a managed application to influence the dependency resolution happening in native code?

You can partially control this by PInvoking to platform-specific native library loading method instead of the NativeLibrary.Load API. For example, For example, passing LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR flag to LoadLibraryEx on Windows may give you what you are looking for when NativeDependencyA dependency on NativeDependencyB is static.

In our case, we do not have access to the source code of NativeDependencyA so we cannot modify the code which loads the NativeDependencyB.
I think I didn't say it clearly enough in my previous comment - our biggest problem is that only one version of NativeDependencyB is loaded into memory, but different versions of NativeDependencyA expect a different version of NativeDependencyB due to the fact that different versions of NativeDependencyB are not binary compatible.
It is another case of the imperfect isolation problem, and since the native code is not ALC aware, my understanding is that it cannot even "cooperate" to stay within the current ALC.

@jkotas
Copy link
Member

jkotas commented Mar 20, 2023

In our case, we do not have access to the source code of NativeDependencyA so we cannot modify the code which loads the NativeDependencyB.

Right, LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR gives you some level of control how NativeDependencyA loads NativeDependencyB, even if you are not able to modify NativeDependencyA.

It is another case of the imperfect isolation problem

I agree. FWIW, AppDomains on .NET Framework had the same problem with isolation of native dependencies.

@agocke
Copy link
Member

agocke commented Apr 25, 2024

As mentioned in the other issue, we do not expect to ever support “non-cooperative” assembly unloading.

I’m closing this issue to signal the appropriate guidance.

In .NET core, all plugins and their dependencies must be safe for unloading if they want to support unloading.

@agocke agocke closed this as not planned Won't fix, can't repro, duplicate, stale Apr 25, 2024
@jazzdelightsme
Copy link
Contributor Author

For reference: as noted in the other Issue, this Issue has absolutely nothing to do with non-cooperative assembly unloading. We do not want that (and plus it is not possible 😛). JKotas gave a more sensible explanation as to the lack of intention to do anything in this area on the other Issue, here.

@github-actions github-actions bot locked and limited conversation to collaborators May 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Archived in project
Development

No branches or pull requests

7 participants