Fix leaks after unsuccessful assembly load #68203

VSadov · 2022-04-19T04:50:09Z

This is about regular (not collectible) assembly loads. Unsuccessful load leaves stuff behind that can accumulate rather quickly if the client, such as serializer, opportunistically loads various non-existant things.
The leak is a regression compared to .net FX.

As the load fails we leak both the managed and the native parts of the context into which we loaded nothing.

Fixes:#58093

dotnet-issue-labeler · 2022-04-19T04:50:13Z

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

ghost · 2022-04-19T04:50:26Z

Tagging subscribers to this area: @vitek-karas, @agocke, @VSadov
See info in area-owners.md if you want to be subscribed.

Issue Details

In progress.

This is about regular (not collectible) assembly loads. Unsuccessful load laves stuff behind that can accumulate rather quickly if the client, such as serializer, opportunistically loads various non-existant things.
The leak is a regression compared to .net FX.

Fixes:#58093

Author:	VSadov
Assignees:	VSadov
Labels:	`area-AssemblyLoader-coreclr`
Milestone:	-

vitek-karas · 2022-04-19T07:45:14Z

src/libraries/System.Private.CoreLib/src/System/Runtime/Loader/AssemblyLoadContext.cs

+        }
+
+        internal void RemoveFromAllContexts()
+        {
            Dictionary<long, WeakReference<AssemblyLoadContext>> allContexts = AllContexts;


This seems to be the only functional change here - that we now remove the ALC from this dictionary. But that dictionary only has a weak ref to the ALC - so how come it caused a memory leak?

This is In Progress. There will be changes on the native side as well. The managed side by itself only leaks dictionary entries and weak references (and weak handles with them).

vitek-karas · 2022-04-19T07:47:58Z

I don't mind taking a targeted fix for the serialization scenario, but I would like to see a discussion of fixing this more broadly?
I though that one can get a leak by simply calling new AssemblyLoadContext (but I didn't really look into it in detail) - since that creates the native backing object which can't be freed (if the ALC is marked as non-collectible).

VSadov · 2022-04-19T15:34:48Z

one can get a leak by simply calling new AssemblyLoadContext

I did not think about that as it seems more intentional than Assembly.LoadFile leaking on failures. Failing Assembly.LoadFile is not a new scenario and we used to not leak, so trying to load was a reasonable pattern.

I will think about this. It may be possible to fix this more generally. Perhaps we could do the cleanup of unused contexts in the finalizer.

jkotas · 2022-04-19T15:37:23Z

The non-collectible ALCs are not designed to be destroyable. I do not think we should be adding overhead to non-collectible ALCs by trying to make them destroyable or collectible.

Would it be better to fix the LoadFile problem by caching the failed LoadFile ALCs for given path? Ie change s_loadfile collection to map a path to ALC, and keep retrying the load on the one ALC.

jkotas · 2022-04-19T15:41:19Z

Assembly.LoadFile leaking on failures

I believe that .NET Framework also leaks on Assembly.LoadFile failures in the general case. It may work fine for the trivial case of non-existent file - we can add early file existence check to match that behavior.

VSadov · 2022-04-19T15:48:29Z

If LoadFile uses different path every time we would still leak. Besides there could be scenarios when loading the second time from the same path should deterministically work (i.e. file appears). Remembering a failure might not be a good solution.

Empty not collectible contexts are relatively simple things since they do not have allocators. I think disposing them when they did not load anything could be doable.

vitek-karas · 2022-04-19T15:49:01Z

I was thinking that we could fix it such that implementing the same pattern as LoadFile using public APIs would also not leak. But I guess it's not that important.

I think we should at least understand (and document) what the current behavior is in terms of leaking memory/resources.
Fixing LoadFile the way @jkotas suggests sounds reasonable as well. It's comparable to what happens if AssemblyLoadContext.LoadFromAssemblyName fails, we also cache the failure in some cases.

vitek-karas · 2022-04-19T15:50:02Z

Remembering a failure might not be a good solution.

I think the idea was to just reuse the same ALC for the exact same path. We would still try the load every time.

VSadov · 2022-04-19T15:50:30Z

Pre-probing for file existence in LoadFile seems like a good idea regardless.

VSadov · 2022-04-19T15:55:04Z

I think the idea was to just reuse the same ALC for the exact same path. We would still try the load every time.

Do we even need to remember the path? Can we use the last ALC that failed?

jkotas · 2022-04-19T16:07:20Z

Can we use the last ALC that failed?

I do not think you want to be reusing failed ALCs for different paths. The failed assembly can be stuck in the ALC already that interact in odd ways with anything new that you would load into it.

VSadov · 2022-04-19T16:10:07Z

Hmm. If ALC remembers the failure, then can we safely reuse it at all ? - for the same path or different.

vitek-karas · 2022-04-19T16:49:42Z

I think (this has always been a mystery to me) that the failure cache is only for "lookup by name" failures. Not for loading from a file path.

I agree that we should not reuse ALC from a different path. For example, what if we find and load the assembly, but then it fails because it has something weird in its header. Does the runtime actually free this fully? Ideally it should, but I would not trust it right now.

jkotas · 2022-04-19T16:52:53Z

If the failure happens early during the binding, the runtime will free the assembly fully. Once we bind to a file and start actually loading it (ie going through

runtime/src/coreclr/vm/domainassembly.h

Lines 36 to 49 in d603356

    
           FILE_LOAD_CREATE, 
        
           FILE_LOAD_BEGIN, 
        
           FILE_LOAD_FIND_NATIVE_IMAGE, 
        
           FILE_LOAD_VERIFY_NATIVE_IMAGE_DEPENDENCIES, 
        
           FILE_LOAD_ALLOCATE, 
        
           FILE_LOAD_ADD_DEPENDENCIES, 
        
           FILE_LOAD_PRE_LOADLIBRARY, 
        
           FILE_LOAD_LOADLIBRARY, 
        
           FILE_LOAD_POST_LOADLIBRARY, 
        
           FILE_LOAD_EAGER_FIXUPS, 
        
           FILE_LOAD_DELIVER_EVENTS, 
        
           FILE_LOAD_VTABLE_FIXUPS, 
        
           FILE_LOADED,                    // Loaded by not yet active 
        
           FILE_ACTIVE                     // Fully active (constructors run & security checked)

), there is no way back.

VSadov · 2022-04-19T17:37:30Z

@jkotas @vitek-karas I will think more about this and get back to you
What we have now:

pre-validating the file existence seems uncontroversial and might handle most cases for LoadFile. We may end up with just that
anything that is based on sameness of the path between loads feels like inviting trouble.
There could be a different file at the same location deterministically. Besides what is the "same" path? - canonical form?, with symlinks? Treating path as identity often gets us in trouble.
In the load sequence there must be a point before which a failure leaves no traces. For managed assemblies (i.e. not IJW) we can get fairly far before it is irreversible (FILE_LOADED maybe?). We could identify such state and leak only if we reached it.
I am not sure if this is feasible or worth the trouble. On one hand this would allow new AssemblyLoadContext to not automatically leak. On the other the solution with pre-probing of the path is a lot cheaper and might be enough.
There is also Assembly.Load(byte[]) scenario. File pre-probing will not apply, but it might be more acceptable to leak in that case.

jkotas · 2022-04-19T17:44:51Z

pre-validating the file existence seems uncontroversial and might handle most cases for LoadFile

Agree.

anything that is based on sameness of the path between loads feels like inviting trouble.

Assembly.LoadFile caching has been always based on sameness of the path. I agree it is problematic, but it is what .NET Framework did and this API exists for .NET Framework compatibility.

VSadov · 2022-04-22T15:00:07Z

While it may be possible in some cases to unload/destroy the unused load context, it appears to be complicated.
It would also depend on how far we got in the loader when the failure occured.

Based on the cost/risk vs. benefit, I think we should go with just probing for the file presence for now.

This would not preclude more involved fixes in the future - if we find that desirable.

VSadov · 2022-04-23T20:30:01Z

Thanks!!

jkotas · 2022-04-25T02:46:10Z

This was merge with red CI. The CI failure (#68477) was introduced by this change. The failure is hitting all CI jobs. I am going to revert the change to stabilize the CI.

This reverts commit 4fe6359.

VSadov · 2022-04-25T16:52:03Z

The Wasm browser tests did not look like ones that could be affected by the change (and being only ones affected). Turns out they could. :-\

jkotas · 2022-04-25T17:04:01Z

If you are merging on red, it is a best practice to always link to issues that track the failures and open a new issue if the failure is not tracked. It will make you think twice about whether the failure is really unrelated to your changes.

VSadov · 2022-04-25T17:55:46Z

I agree. I should have checked for existing issues.

The change is in the shared code though. I am still not sure how it could fail and specifically on Wasm.

VSadov · 2022-04-25T17:57:32Z

Somehow it is possible to load files that do not exist?

jkotas · 2022-04-25T17:59:37Z

wasm browser is a weird hybrid between single file and regular layout. Assemblies are accessible via fake file paths, but the files do not actually exist.

VSadov · 2022-04-25T18:11:46Z

I think it would fail for single file too. That is unfortunate. Since we virtualize assembly files, we cant rely on physical file check.

jkotas · 2022-04-25T18:14:48Z

I think it would fail for single file too

I do not think we are virtualizing files for regular single file.

jkotas · 2022-04-25T18:16:02Z

The easiest way to deal with this problem may be to ifdef out the file existence check for browser.

VSadov · 2022-04-25T18:58:45Z

I do not think we are virtualizing files for regular single file.

Right. We virtualize when loading by assembly name, but load by file name just opens a PEImage using its path. We could search in the bundle, but we do not.

I also noticed that mono implementation of InternalLoadFromPath patches \ in the file name to be a correct file separator. I think we will have to do that before checking for the file presence. And also under idef.

VSadov · 2022-04-25T19:37:12Z

The new attempt at this change is at #68502

ghost assigned VSadov Apr 19, 2022

VSadov added the area-AssemblyLoader-coreclr label Apr 19, 2022

vitek-karas reviewed Apr 19, 2022

View reviewed changes

simpler fix

891fde2

VSadov force-pushed the alcLeak branch from 3c6ee31 to 891fde2 Compare April 22, 2022 05:00

vitek-karas approved these changes Apr 22, 2022

View reviewed changes

VSadov marked this pull request as ready for review April 22, 2022 15:00

runfoapp bot mentioned this pull request Apr 22, 2022

jit.1 work item failing on mono #67888

Closed

VSadov merged commit 4fe6359 into dotnet:main Apr 23, 2022

VSadov deleted the alcLeak branch April 23, 2022 20:30

VSadov mentioned this pull request Apr 23, 2022

Failed Assembly.Load and Assembly.LoadFile leaks memory #58093

Closed

jkotas mentioned this pull request Apr 25, 2022

WasmTestOnBrowser-System.CodeDom.Tests failing in CI #68477

Closed

jkotas added a commit that referenced this pull request Apr 25, 2022

Revert "simpler fix (#68203)"

dd26991

This reverts commit 4fe6359.

jkotas mentioned this pull request Apr 25, 2022

Revert "Fix leaks after unsuccessful assembly load" #68478

Merged

jkotas added a commit that referenced this pull request Apr 25, 2022

Revert "simpler fix (#68203)" (#68478)

47d9c43

This reverts commit 4fe6359.

VSadov restored the alcLeak branch April 25, 2022 16:56

VSadov mentioned this pull request Apr 25, 2022

Fix leaks after unsuccessful assembly load (another try) #68502

Merged

ghost locked as resolved and limited conversation to collaborators May 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix leaks after unsuccessful assembly load #68203

Fix leaks after unsuccessful assembly load #68203

VSadov commented Apr 19, 2022 •

edited

Loading

dotnet-issue-labeler bot commented Apr 19, 2022

ghost commented Apr 19, 2022

vitek-karas Apr 19, 2022

VSadov Apr 19, 2022

vitek-karas commented Apr 19, 2022

VSadov commented Apr 19, 2022

jkotas commented Apr 19, 2022

jkotas commented Apr 19, 2022 •

edited

Loading

VSadov commented Apr 19, 2022

vitek-karas commented Apr 19, 2022

vitek-karas commented Apr 19, 2022

VSadov commented Apr 19, 2022

VSadov commented Apr 19, 2022

jkotas commented Apr 19, 2022

VSadov commented Apr 19, 2022 •

edited

Loading

vitek-karas commented Apr 19, 2022

jkotas commented Apr 19, 2022 •

edited

Loading

VSadov commented Apr 19, 2022

jkotas commented Apr 19, 2022

VSadov commented Apr 22, 2022 •

edited

Loading

VSadov commented Apr 23, 2022

jkotas commented Apr 25, 2022

VSadov commented Apr 25, 2022

jkotas commented Apr 25, 2022

VSadov commented Apr 25, 2022

VSadov commented Apr 25, 2022

jkotas commented Apr 25, 2022

VSadov commented Apr 25, 2022

jkotas commented Apr 25, 2022

jkotas commented Apr 25, 2022

VSadov commented Apr 25, 2022 •

edited

Loading

VSadov commented Apr 25, 2022

Fix leaks after unsuccessful assembly load #68203

Fix leaks after unsuccessful assembly load #68203

Conversation

VSadov commented Apr 19, 2022 • edited Loading

dotnet-issue-labeler bot commented Apr 19, 2022

ghost commented Apr 19, 2022

vitek-karas Apr 19, 2022

Choose a reason for hiding this comment

VSadov Apr 19, 2022

Choose a reason for hiding this comment

vitek-karas commented Apr 19, 2022

VSadov commented Apr 19, 2022

jkotas commented Apr 19, 2022

jkotas commented Apr 19, 2022 • edited Loading

VSadov commented Apr 19, 2022

vitek-karas commented Apr 19, 2022

vitek-karas commented Apr 19, 2022

VSadov commented Apr 19, 2022

VSadov commented Apr 19, 2022

jkotas commented Apr 19, 2022

VSadov commented Apr 19, 2022 • edited Loading

vitek-karas commented Apr 19, 2022

jkotas commented Apr 19, 2022 • edited Loading

VSadov commented Apr 19, 2022

jkotas commented Apr 19, 2022

VSadov commented Apr 22, 2022 • edited Loading

VSadov commented Apr 23, 2022

jkotas commented Apr 25, 2022

VSadov commented Apr 25, 2022

jkotas commented Apr 25, 2022

VSadov commented Apr 25, 2022

VSadov commented Apr 25, 2022

jkotas commented Apr 25, 2022

VSadov commented Apr 25, 2022

jkotas commented Apr 25, 2022

jkotas commented Apr 25, 2022

VSadov commented Apr 25, 2022 • edited Loading

VSadov commented Apr 25, 2022

VSadov commented Apr 19, 2022 •

edited

Loading

jkotas commented Apr 19, 2022 •

edited

Loading

VSadov commented Apr 19, 2022 •

edited

Loading

jkotas commented Apr 19, 2022 •

edited

Loading

VSadov commented Apr 22, 2022 •

edited

Loading

VSadov commented Apr 25, 2022 •

edited

Loading