Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssemblyLoadContext.Unload silently fails to unload Assemblies, leaking filehandles #44679

Closed
dave-yotta opened this issue Nov 12, 2020 · 23 comments
Labels
area-AssemblyLoader-coreclr untriaged New issue has not been triaged by the area owner

Comments

@dave-yotta
Copy link

Issue Title

The Unload method does not free any assemblies returned from LoadFromStream that are still referenced somewhere.

General

netcoreapp3.1

        public class TestAssemblyLoader : AssemblyLoadContext, IAssemblyLoader
        {
            public TestAssemblyLoader() : base(true)
            {
                LoadedAssemblies = new Dictionary<string, Assembly>();

                Resolving += LoadContext_Resolving;
            }

            private Assembly LoadContext_Resolving(AssemblyLoadContext arg1, AssemblyName arg2)
            {
                return LoadedAssemblies[arg2.FullName];
            }

            private Dictionary<string, Assembly> LoadedAssemblies { get; }

            Assembly IAssemblyLoader.LoadFromStream(Stream stream)
            {
                var assembly = LoadFromStream(stream);
                return LoadedAssemblies[assembly.FullName] = assembly;
            }

            protected sealed override Assembly Load(AssemblyName assemblyName)
            {
                return null;
            }

            public void Dispose()
            {
                LoadedAssemblies.Clear(); // Without this call, the assemblies in this dictionary don't get unloaded.
                Unload();
            }
        }

Symptom is visible in unreleased file handles - example repro here: https://github.com/dave-yotta/roslyn-assemblyunload

It might make sense for this to be expected behaviour, but I certainly didn't expect it. (nor did our production machines 😢)

At least can there be an argument to unload like bool throwIfAnyUnloadFailed = false so that it's self-documenting to callers somehow?

Related issue I'm coming from here: dotnet/roslyn#49282

@PathogenDavid
Copy link
Contributor

Unloading an AssemblyLoadContext is more of a request for future unload than a command for immediate unload. The runtime won't actually unload the assemblies if they're still in use. (And having them be accessible via your LoadedAssemblies constitutes as "in use".)

As noted on the documentation, unloading doesn't happen right away and calling the method only initiates the process:

There's one noteworthy difference between the unloading using AssemblyLoadContext and using AppDomains. With AppDomains, the unloading is forced. At unload time, all threads running in the target AppDomain are aborted, managed COM objects created in the target AppDomain are destroyed, and so on. With AssemblyLoadContext, the unload is "cooperative". Calling the AssemblyLoadContext.Unload method just initiates the unloading.

The documentation lists some examples of things that prevent unloading and implies that keeping references to assemblies in the AssemblyLoadContext implementation is OK, but it's worth noting that it's "OK" only if you also discard all references to the AssemblyLoadContext after initiating unload. (Which you're supposed to do: "Unloading will not occur while there are references to the AssemblyLoadContext.")

At least can there be an argument to unload like bool throwIfAnyUnloadFailed = false so that it's self-documenting to callers somehow?

I don't claim to be an expert on the internals of AssemblyLoadContext's implementation, but based on my understanding I don't think this is really possible without doing an immediate full garbage collection and wouldn't be possible if you're using the pattern of discarding the AssemblyLoadContext reference its self since it couldn't be discarded while you're calling Unload.

(Also dotnet/runtime would've been a more appropriate repo for this issue.)

@dave-yotta
Copy link
Author

dave-yotta commented Nov 13, 2020

Ok - should I move this issue across?

The documentation does imply that Assembly references within the AssemblyLoadContext will not stop the unload, however that's not what I see here. I don't see any drop in file handles and have tested for a minute, stacking up 2000 file handles.

This could just be waiting for a GC; I could test by calling the GC after the AssemblyLoadContext is out of scope, perhaps this will trigger the unload - perhaps file handles are not tracked by the GC? 2000 is a lot, and in production it simply hits 100% and dies.

I suppose one could add an way to do an unload that makes the caller aware of any assemblies that are not yet up for collection, but the use case for that might be limited - an update to the documentation might be more appropriate - however leaks are scary so it would be nice to somehow get that self documented while coding...

@mairaw mairaw transferred this issue from dotnet/core Nov 14, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-AssemblyLoader-coreclr untriaged New issue has not been triaged by the area owner labels Nov 14, 2020
@ghost
Copy link

ghost commented Nov 14, 2020

Tagging @vitek-karas, @agocke, @CoffeeFlux as subscribed to this area.
See info in area-owners.md if you want to be subscribed.

Issue Details
Description:

Issue Title

The Unload method does not free any assemblies returned from LoadFromStream that are still referenced somewhere.

General

netcoreapp3.1

        public class TestAssemblyLoader : AssemblyLoadContext, IAssemblyLoader
        {
            public TestAssemblyLoader() : base(true)
            {
                LoadedAssemblies = new Dictionary<string, Assembly>();

                Resolving += LoadContext_Resolving;
            }

            private Assembly LoadContext_Resolving(AssemblyLoadContext arg1, AssemblyName arg2)
            {
                return LoadedAssemblies[arg2.FullName];
            }

            private Dictionary<string, Assembly> LoadedAssemblies { get; }

            Assembly IAssemblyLoader.LoadFromStream(Stream stream)
            {
                var assembly = LoadFromStream(stream);
                return LoadedAssemblies[assembly.FullName] = assembly;
            }

            protected sealed override Assembly Load(AssemblyName assemblyName)
            {
                return null;
            }

            public void Dispose()
            {
                LoadedAssemblies.Clear(); // Without this call, the assemblies in this dictionary don't get unloaded.
                Unload();
            }
        }

Symptom is visible in unreleased file handles - example repro here: https://github.com/dave-yotta/roslyn-assemblyunload

It might make sense for this to be expected behaviour, but I certainly didn't expect it. (nor did our production machines 😢)

At least can there be an argument to unload like bool throwIfAnyUnloadFailed = false so that it's self-documenting to callers somehow?

Related issue I'm coming from here: dotnet/roslyn#49282

Author: dave-yotta
Assignees: -
Labels:

area-AssemblyLoader-coreclr, untriaged

Milestone: -

@janvorli
Copy link
Member

This is actually by design, although I agree that it is not obvious and can be confusing.
The main point is that an AssemblyLoadContext instance is just a representation of the assembly load context. Your TestAssemblyLoader instance that's in the default context holds the assemblies loaded into the context represented by the TestAssemblyLoader. Unload can complete only if all references from outside of the context to assemblies loaded into an unloadable AssemblyLoadContext, to types from those assemblies and instances of those types are gone.
It may seem that this should still be fine, since once the reference to the TestAssemblyLoader in your code is gone, GC should be able to collect it and thus remove references to the LoadedAssemblies and after that to the instances of Assembly inside of it. The problem though is that the unload process in the runtime needs to keep the TestAssemblyLoader alive by itself until the last phase of the unload when we detect that there are no more outside references to assemblies loaded into the context, types from those assemblies and their instances. So when unloading is initiated, we create a strong GC handle that keeps the TestAssemblyLoader alive. And then the Assemblies stored in the TestAssemblyLoader are a problem, as they prevent the unload process from entering the phase when we destroy that strong GC handle.
That means that clearing the dictionary you have in your example is necessary. Alternatively, you can store weak references to the Assemblies in your dictionary instead. That way, no clearing would be necessary.

@dave-yotta
Copy link
Author

dave-yotta commented Nov 23, 2020

Ok thanks - makes sense, this isn't a bug.

I wasn't aware the AssemblyLoadContext was under a keepalive there, maybe an extra bullet point could be added to the top of https://docs.microsoft.com/en-us/dotnet/standard/assembly/unloadability#use-collectible-assemblyloadcontext?

It would be really nice to have something in the unload signature that indicates the cooperative nature, for example Task Unload() taking the signal from event Unloading which apparently is fired when unloading is completed. Or a clearer method name like RequestUnload/BeginUnload. I understand these things are difficult to change.

I don't want to take from reading docs and careful design - but this one is so easy to miss.

@vitek-karas
Copy link
Member

Can you please file an issue for the docs (there's a link at the bottom of the page) - it's better if it comes from you because you know what is confusing/missing for you specifically?

As for the Unload I just wanted to mention that even the event is not the actual unload. Unloading fires when the runtime decides to actually do something about the unload, it's meant as a notification to interested parties that unload is about to happen and that they should "cleanup". At that point the load context is still alive including all the assemblies in it. I assume it would be technically possible to fire a different event when the unload is actually finished, but the tricky bit is what would be passed as a parameter. The load context itself should be collected at that point and thus not available. If we fire a bit earlier and pass the load context (even if it's empty), what would prevent the code to reintroduce references to it or actually load assemblies into it... it just gets rather complicated. Maybe the Task based interface would work nicer (as it doesn't need any context parameter).

@dave-yotta
Copy link
Author

Yep no problem. Not sure how you might do this - typically unload needs to be called from the dispose method - I guess IAsyncDisposable can be used and would be compatible with a Task based interface. I'll leave it up to you to decide if there's anything that can be done here.

@vitek-karas
Copy link
Member

The problem with just blind wait on unload is that it can take a really long time - calling Unload itself doesn't trigger GC, and so the actual unload would wait for GC to collect and more importantly finalize some of the objects. So a better approach would be to trigger GC manually, but that's not something a normal API should do, it's probably best to leave to the app to do that.
And even with all that, there's no guarantee the unload will happen, if something somewhere holds on to a reference, it will not unload.

The way I look at it - if unloading in the app is mainly about freeing resources, then there's no reason to actually wait for it (GC should handle that mostly). If the app needs the code to be unloaded (typical case is it wants to update the "plugin" and load it again, in-place), then it will need to manually trigger GC and implement some kind of timeout with error handling if the unload can't happen.

I know this is not an ideal behavior of the runtime for certain scenarios. The opposite approach was implemented in .NET Framework, where runtime basically guaranteed the unload would happen within some reasonable time frame, but that came with many problems, reliability being one (rudely unloading random code could and did lead to state corruption).

@jkotas
Copy link
Member

jkotas commented Nov 24, 2020

The opposite approach was implemented in .NET Framework, where runtime basically guaranteed the unload would happen within some reasonable time frame

FWIW, AppDomain unloading was not guaranteed to succeed. It could timeout out and fail too, e.g. when the AppDomain was stuck in unmanaged code.

@dave-yotta
Copy link
Author

dave-yotta commented Nov 25, 2020

Yeah I see the problems here. FYI we're compiling code fragments associated with actions in our workflow system (user defined stuff), so easily in the thousands of compilations which is what made it such a problem for us, not that this is your problem :).

More broadly, maybe what's desired here is the information required to gracefully identify and deal with failing unloads. An additional event UnloadPending(UnloadPendingEventArgs args) which can in some way list the refs it's waiting on in those args and maybe maybe UnloadFailed, UnloadCompleted events too could be provided. The presence of these events will speak to the programmer about the non-immediate and non-guaranteed nature of the unload, and can be hooked into to provide some kind of circuit breaker.

On the other hand, one does not start using the collectable assembly load context without already thinking about leaks. Therefore tests/investigations will be underway to confirm the assemblies are being released. It either works at this stage - in which case graceful handling is not necessary, or it does not work at this stage - in which case further investigations are done. It either works or it doesn't. In our investigations, memory usage flattened out but we did not think to check file handles or to look at the heap more closely.

@janvorli
Copy link
Member

@dave-yotta are you aware of the doc on using and debugging unloadability I have written in the past?
https://docs.microsoft.com/en-us/dotnet/standard/assembly/unloadability

@vitek-karas
Copy link
Member

FYI we're compiling code fragments associated with actions in our workflow system (user defined stuff), so easily in the thousands of compilations which is what made it such a problem for us, not that this is your problem :).

Thanks for the scenario, it's great to learn what the feature is used for. And I think that at least partially it is our problem, if the feature we built doesn't help solve real-world scenarios, it's no good.

One other option how to improve both debuggability and potentially usability might be to use tracing instead of "callbacks". We could add tracing events around unloading (we really should do that, regardless). When debugging it would be probably much easier to consume the events than to rely on modifying the code. And it's possible to get the events from the code as well (although not as easy as direct callbacks). The other advantage is that tracing is VERY cheap if turned off (likely to be by far the most common case).

@dave-yotta
Copy link
Author

@janvorli thanks, yes I read this at least a couple of times, once when initially developing the unloadability (which also involved some tricky work figuring out how to get roslyn to give assembly bytecode with zero residual allocations/loads of assemblies), and once when debugging the file handle leak observed in production.

Unfortunately since memory usage flattened out compared to the eventual OOM we were getting previously, it seemed fixed and we didn't use sos to check more deeply. This might have had more to do with changes to the other code changes e.g. using CSharpCompilation.Create instead of CSharpScript.Create in our roslyn stuff. I also believe the file handle leak is only happening on linux (which is not the platform we develop on) - I'd have to confirm all that though.

I've linked further up here a ticket for the docs page to add a note that AssemblyLoadContext has a strong ref and wont be up for collection when you might normally expect - perhaps this is stated further on in the document and I've not read carefully enough - but it does feel worth mentioning at the top there.

@dave-yotta
Copy link
Author

Tracing might be helpful. The reason I suggest adding something to the code is just to imply something about the nature of the unload to the developer. I agree the way unloading is done is correct - a corrupt stack is a very dangerous thing.

Ideally this stuff shouldn't reach prod - we should try to have tests which assert that resources are freed, which could be done from a load testing point of view just by checking memory and file handle usage under load. However we don't really know what kind of resources the AssemblyLoadContext is going to allocate, so a way, just from a unit testing perspective, to ask the runtime if an AssemblyLoadContext is available to be collected so that we can fail the test if it is not would be really helpful. However it's not something you'd want to advertise in non testing scenarios, since it will likely interact with the GC and other native handles...

@vitek-karas
Copy link
Member

The way to test this in a controlled environment is to hold a weak reference to the AssemblyLoadContext and then in a loop trigger GC and test if the load context was collected. The sample code for that is in the doc @janvorli mentioned: https://docs.microsoft.com/en-us/dotnet/standard/assembly/unloadability#use-a-custom-collectible-assemblyloadcontext

If the weak ref points to nowhere, the load context was collected and I think it's reasonable then to trust that runtime freed all of the resources associated with it. That said if there's custom code loaded into the context which may have native resource there's nothing the runtime can do to make sure those are released as well. So that would probably require some custom testing after the load context is gone.

@janvorli
Copy link
Member

You can also use weak refs on the assemblies loaded into an unloadable context and use them to indicate whether a specific assembly was unloaded or not.

@dave-yotta
Copy link
Author

Yep you're both absolutely correct - I need to read that document more carefully. I can write a test for this now :)

The code fragments are in a whitelisted sandbox of sorts and thus very limited and simple so I do hope the day never comes that they can allocate native resources - but that's a very good point.

If you want to keep this open to provide tracing of unload events, please do so, otherwise feel free to close this ticket. Thanks to all for taking the time to help me with this!

@vitek-karas
Copy link
Member

For the tracing support - if you can think of things you would find useful, could you please list those. It's always better to have real users tell us what they need than for us to try to guess.

@dave-yotta
Copy link
Author

dave-yotta commented Nov 25, 2020

I would suggest for the AssemblyLoadContext per assembly if that level of detail is available:

  • Assembly: loaded/unload initiated/unload failed/unload completed
  • AssemblyLoadContext: unload initiated/unload completed

And perhaps at a more verbose trace level, if possible:

  • Assembly: not yet unloadable (each time it is checked following unload initiated)

@janvorli
Copy link
Member

Please note that there is nothing that we can qualify as unload failure. There is no deadline after which we would consider the unloading as failed in the runtime. So if I take it to the extreme, you can call Unload and hold a reference to something inside of the context. If you release that reference a week later, the unload will complete at that point. The initiated / completed event should not be problematic to add.
Regarding the "not yet unloadable" - there is no periodic checking happening. It is all GC driven. So when the last reference to an assembly is gone and GC collects it, it results in removing a reference to the LoaderAllocator managed instance (that reference is stored in the Assembly's SyncRoot). So there is no explicit walking of a list of assemblies in the AssemblyLoadContext and checking whether they can be unloaded.
Btw, if you are interested in the low level behavior, there is an image of the dependency references that keep AssemblyLoadContext here: https://github.com/dotnet/runtime/blob/master/docs/design/features/unloadability.md#assemblyloadcontext-unloading-process

@dave-yotta
Copy link
Author

Those two trace events for initiate/complete it is then :)

@vitek-karas
Copy link
Member

I created #45264 to track the tracing support for unload. (It's cleaner to have it as a separate issue).

@dave-yotta
Copy link
Author

I guess this can be closed then! :)

@ghost ghost locked as resolved and limited conversation to collaborators Jan 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-AssemblyLoader-coreclr untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

6 participants