Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix leaks after unsuccessful assembly load #68203

Merged
merged 1 commit into from
Apr 23, 2022
Merged

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Apr 19, 2022

This is about regular (not collectible) assembly loads. Unsuccessful load leaves stuff behind that can accumulate rather quickly if the client, such as serializer, opportunistically loads various non-existant things.
The leak is a regression compared to .net FX.

As the load fails we leak both the managed and the native parts of the context into which we loaded nothing.

Fixes:#58093

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@ghost
Copy link

ghost commented Apr 19, 2022

Tagging subscribers to this area: @vitek-karas, @agocke, @VSadov
See info in area-owners.md if you want to be subscribed.

Issue Details

In progress.

This is about regular (not collectible) assembly loads. Unsuccessful load laves stuff behind that can accumulate rather quickly if the client, such as serializer, opportunistically loads various non-existant things.
The leak is a regression compared to .net FX.

Fixes:#58093

Author: VSadov
Assignees: VSadov
Labels:

area-AssemblyLoader-coreclr

Milestone: -

}

internal void RemoveFromAllContexts()
{
Dictionary<long, WeakReference<AssemblyLoadContext>> allContexts = AllContexts;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be the only functional change here - that we now remove the ALC from this dictionary. But that dictionary only has a weak ref to the ALC - so how come it caused a memory leak?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is In Progress. There will be changes on the native side as well. The managed side by itself only leaks dictionary entries and weak references (and weak handles with them).

@vitek-karas
Copy link
Member

I don't mind taking a targeted fix for the serialization scenario, but I would like to see a discussion of fixing this more broadly?
I though that one can get a leak by simply calling new AssemblyLoadContext (but I didn't really look into it in detail) - since that creates the native backing object which can't be freed (if the ALC is marked as non-collectible).

@VSadov
Copy link
Member Author

VSadov commented Apr 19, 2022

one can get a leak by simply calling new AssemblyLoadContext

I did not think about that as it seems more intentional than Assembly.LoadFile leaking on failures. Failing Assembly.LoadFile is not a new scenario and we used to not leak, so trying to load was a reasonable pattern.

I will think about this. It may be possible to fix this more generally. Perhaps we could do the cleanup of unused contexts in the finalizer.

@jkotas
Copy link
Member

jkotas commented Apr 19, 2022

The non-collectible ALCs are not designed to be destroyable. I do not think we should be adding overhead to non-collectible ALCs by trying to make them destroyable or collectible.

Would it be better to fix the LoadFile problem by caching the failed LoadFile ALCs for given path? Ie change s_loadfile collection to map a path to ALC, and keep retrying the load on the one ALC.

@jkotas
Copy link
Member

jkotas commented Apr 19, 2022

Assembly.LoadFile leaking on failures

I believe that .NET Framework also leaks on Assembly.LoadFile failures in the general case. It may work fine for the trivial case of non-existent file - we can add early file existence check to match that behavior.

@VSadov
Copy link
Member Author

VSadov commented Apr 19, 2022

If LoadFile uses different path every time we would still leak. Besides there could be scenarios when loading the second time from the same path should deterministically work (i.e. file appears). Remembering a failure might not be a good solution.

Empty not collectible contexts are relatively simple things since they do not have allocators. I think disposing them when they did not load anything could be doable.

@vitek-karas
Copy link
Member

I was thinking that we could fix it such that implementing the same pattern as LoadFile using public APIs would also not leak. But I guess it's not that important.

I think we should at least understand (and document) what the current behavior is in terms of leaking memory/resources.
Fixing LoadFile the way @jkotas suggests sounds reasonable as well. It's comparable to what happens if AssemblyLoadContext.LoadFromAssemblyName fails, we also cache the failure in some cases.

@vitek-karas
Copy link
Member

Remembering a failure might not be a good solution.

I think the idea was to just reuse the same ALC for the exact same path. We would still try the load every time.

@VSadov
Copy link
Member Author

VSadov commented Apr 19, 2022

Pre-probing for file existence in LoadFile seems like a good idea regardless.

@VSadov
Copy link
Member Author

VSadov commented Apr 19, 2022

I think the idea was to just reuse the same ALC for the exact same path. We would still try the load every time.

Do we even need to remember the path? Can we use the last ALC that failed?

@jkotas
Copy link
Member

jkotas commented Apr 19, 2022

Can we use the last ALC that failed?

I do not think you want to be reusing failed ALCs for different paths. The failed assembly can be stuck in the ALC already that interact in odd ways with anything new that you would load into it.

@VSadov
Copy link
Member Author

VSadov commented Apr 19, 2022

Hmm. If ALC remembers the failure, then can we safely reuse it at all ? - for the same path or different.

@vitek-karas
Copy link
Member

I think (this has always been a mystery to me) that the failure cache is only for "lookup by name" failures. Not for loading from a file path.

I agree that we should not reuse ALC from a different path. For example, what if we find and load the assembly, but then it fails because it has something weird in its header. Does the runtime actually free this fully? Ideally it should, but I would not trust it right now.

@jkotas
Copy link
Member

jkotas commented Apr 19, 2022

If the failure happens early during the binding, the runtime will free the assembly fully. Once we bind to a file and start actually loading it (ie going through

FILE_LOAD_CREATE,
FILE_LOAD_BEGIN,
FILE_LOAD_FIND_NATIVE_IMAGE,
FILE_LOAD_VERIFY_NATIVE_IMAGE_DEPENDENCIES,
FILE_LOAD_ALLOCATE,
FILE_LOAD_ADD_DEPENDENCIES,
FILE_LOAD_PRE_LOADLIBRARY,
FILE_LOAD_LOADLIBRARY,
FILE_LOAD_POST_LOADLIBRARY,
FILE_LOAD_EAGER_FIXUPS,
FILE_LOAD_DELIVER_EVENTS,
FILE_LOAD_VTABLE_FIXUPS,
FILE_LOADED, // Loaded by not yet active
FILE_ACTIVE // Fully active (constructors run & security checked)
), there is no way back.

@VSadov
Copy link
Member Author

VSadov commented Apr 19, 2022

@jkotas @vitek-karas I will think more about this and get back to you
What we have now:

  • pre-validating the file existence seems uncontroversial and might handle most cases for LoadFile. We may end up with just that
  • anything that is based on sameness of the path between loads feels like inviting trouble.
    There could be a different file at the same location deterministically. Besides what is the "same" path? - canonical form?, with symlinks? Treating path as identity often gets us in trouble.
  • In the load sequence there must be a point before which a failure leaves no traces. For managed assemblies (i.e. not IJW) we can get fairly far before it is irreversible (FILE_LOADED maybe?). We could identify such state and leak only if we reached it.
    I am not sure if this is feasible or worth the trouble. On one hand this would allow new AssemblyLoadContext to not automatically leak. On the other the solution with pre-probing of the path is a lot cheaper and might be enough.
  • There is also Assembly.Load(byte[]) scenario. File pre-probing will not apply, but it might be more acceptable to leak in that case.

@jkotas
Copy link
Member

jkotas commented Apr 19, 2022

pre-validating the file existence seems uncontroversial and might handle most cases for LoadFile

Agree.

anything that is based on sameness of the path between loads feels like inviting trouble.

Assembly.LoadFile caching has been always based on sameness of the path. I agree it is problematic, but it is what .NET Framework did and this API exists for .NET Framework compatibility.

@VSadov
Copy link
Member Author

VSadov commented Apr 22, 2022

While it may be possible in some cases to unload/destroy the unused load context, it appears to be complicated.
It would also depend on how far we got in the loader when the failure occured.

Based on the cost/risk vs. benefit, I think we should go with just probing for the file presence for now.

This would not preclude more involved fixes in the future - if we find that desirable.

@VSadov VSadov marked this pull request as ready for review April 22, 2022 15:00
@VSadov
Copy link
Member Author

VSadov commented Apr 23, 2022

Thanks!!

@VSadov VSadov merged commit 4fe6359 into dotnet:main Apr 23, 2022
@VSadov VSadov deleted the alcLeak branch April 23, 2022 20:30
@jkotas
Copy link
Member

jkotas commented Apr 25, 2022

This was merge with red CI. The CI failure (#68477) was introduced by this change. The failure is hitting all CI jobs. I am going to revert the change to stabilize the CI.

jkotas added a commit that referenced this pull request Apr 25, 2022
jkotas added a commit that referenced this pull request Apr 25, 2022
@VSadov
Copy link
Member Author

VSadov commented Apr 25, 2022

The Wasm browser tests did not look like ones that could be affected by the change (and being only ones affected). Turns out they could. :-\

@jkotas
Copy link
Member

jkotas commented Apr 25, 2022

If you are merging on red, it is a best practice to always link to issues that track the failures and open a new issue if the failure is not tracked. It will make you think twice about whether the failure is really unrelated to your changes.

@VSadov
Copy link
Member Author

VSadov commented Apr 25, 2022

I agree. I should have checked for existing issues.

The change is in the shared code though. I am still not sure how it could fail and specifically on Wasm.

@VSadov
Copy link
Member Author

VSadov commented Apr 25, 2022

Somehow it is possible to load files that do not exist?

@jkotas
Copy link
Member

jkotas commented Apr 25, 2022

wasm browser is a weird hybrid between single file and regular layout. Assemblies are accessible via fake file paths, but the files do not actually exist.

@VSadov
Copy link
Member Author

VSadov commented Apr 25, 2022

I think it would fail for single file too. That is unfortunate. Since we virtualize assembly files, we cant rely on physical file check.

@jkotas
Copy link
Member

jkotas commented Apr 25, 2022

I think it would fail for single file too

I do not think we are virtualizing files for regular single file.

@jkotas
Copy link
Member

jkotas commented Apr 25, 2022

The easiest way to deal with this problem may be to ifdef out the file existence check for browser.

@VSadov
Copy link
Member Author

VSadov commented Apr 25, 2022

I do not think we are virtualizing files for regular single file.

Right. We virtualize when loading by assembly name, but load by file name just opens a PEImage using its path. We could search in the bundle, but we do not.

I also noticed that mono implementation of InternalLoadFromPath patches \ in the file name to be a correct file separator. I think we will have to do that before checking for the file presence. And also under idef.

@VSadov
Copy link
Member Author

VSadov commented Apr 25, 2022

The new attempt at this change is at #68502

@ghost ghost locked as resolved and limited conversation to collaborators May 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants