Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

MetadataLoadContext feature #33201

Merged
merged 21 commits into from
Nov 8, 2018
Merged

MetadataLoadContext feature #33201

merged 21 commits into from
Nov 8, 2018

Conversation

steveharter
Copy link
Member

@steveharter steveharter commented Nov 1, 2018

This feature adds support to perform read-only reflection over assemblies without loading them into or interfering with the runtime\AppDomain context. The assemblies can be of a different architecture and may target a different runtime version. It is a replacement (or alternative to) Assembly.ReflectionOnlyLoad since that does not exist in .NET Core.

Fixes #2800

The assembly will ship OOB targeting .NET Core. A netstandard package for use with .NET Framework will be available as well.

Public API:

namespace System.Reflection
{
    public abstract class MetadataAssemblyResolver
    {
        public abstract Assembly Resolve(MetadataLoadContext context, AssemblyName assemblyName = null);
    }

    public class PathAssemblyResolver : MetadataAssemblyResolver
    {
        public PathAssemblyResolver(IEnumerable<string> assemblyPaths);
        public override Assembly Resolve(MetadataLoadContext context, AssemblyName assemblyName);
    }

    public sealed partial class MetadataLoadContext : System.IDisposable
    {
        public MetadataLoadContext(MetadataAssemblyResolver resolver, string coreAssemblyName = null);
        public Assembly CoreAssembly { get; }
        public void Dispose();
        public System.Collections.Generic.IEnumerable<Assembly> GetAssemblies();
        public Assembly LoadFromAssemblyName(AssemblyName assemblyName);
        public Assembly LoadFromAssemblyName(string assemblyName);
        public Assembly LoadFromAssemblyPath(string assemblyPath);
        public Assembly LoadFromByteArray(byte[] assembly);
        public Assembly LoadFromStream(System.IO.Stream assembly);
    }
}

API notes:

  • MetadataLoadContext will attempt to find the core assembly if coreAssemblyName is not passed into the constructor. It uses "mscorlib", "System.Runtime" and "netstandard". If the core assembly is not found an exception is raised.

Background:

  • Code originally pulled from corefxlab repo (authored by Atsushi Kanamori). Basically since then the code has changed the public API changed, tests added, misc clean-up, and made compatible with corefx build systems.
  • The API documentation at is currently being updated to reflect the latest API and semantics and will also moved to corefx.

ToDos based on feedback:

@steveharter steveharter added the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Nov 1, 2018
@steveharter steveharter self-assigned this Nov 1, 2018
netcoreapp
</PackageConfigurations>
<BuildConfigurations>
netstandard;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the review we said we'd make a netstandard ref assembly but implement net4x and netcoreapp with netstandard implementation throwing platformnotsupported. This would eliminate the type delegator hack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, if the hack works, I don't really have an issue with it.

Copy link
Member Author

@steveharter steveharter Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hack does work, although not as efficient from directly deriving from TypeInfo because of an extra (unused) field in TypeDelegator.

IMO keeping it in netstandard would be simpler. If we change netstandard in the future to expose a protected ctor on TypeInfo that would also address this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericstj is a netstandard assembly preferred (in general) when we only have scenarios for using it with netfx, or does it make sense just to target netfx instead?

In this case, targeting netfx makes a case for removing a hack that is present in netstandard, but not sure what we would be giving up by not also supporting netstandard or making that a PNS assembly. For example, a third party assembly or tooling like Xamarin may only want to target netstandard for simplicity. I suppose we could support all three (netcoreapp, standard and netfx) if we had to.

foreach (string path in assemblyPaths)
{
string file = Path.GetFileNameWithoutExtension(path);
_fileToPaths.Add(file, path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this handle same file name, different strong name/version?

Copy link
Contributor

@nguerrera nguerrera Nov 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is more than one possible binding policy for a set of assembly paths.

We may want to give some control to the user on it with knobs or virtual.

With or without knobs, a good default would be to study what csc does and mimic it as much as possible. A common use case for this API is to use the same references as csc and it would be desirable for this and csc to bind them the same way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nguerrera for mainstream scenarios, I don't see the same assembly by name being loaded more than once for inspection purposes, or care to distinguish strong names. .NET Core, for example, works well with the current approach as both take a simple list of paths to assemblies.

Do you have any (mainstream-possible) scenarios?

Options:

  1. Do nothing; someone could add their own AssemblyNameAndDirectoryResolver that takes a list of AssemblyNames and a corresponding list of directories.
  2. Add a new constructor such as:
    public PathAssemblyResolver(IEnumerable<string> assemblyNames, IEnumerable<string> directories)
    where each assembly name can be simple (just name) or strong (including version, publickeytoken, culture).

However, for incoming calls to Resolve() we'd have to add policy to determine whether to key off the exact version + publickeytoken + culture, or be flexible and accept >= version, simple name only, etc.

Copy link
Contributor

@nguerrera nguerrera Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

someone could add their own AssemblyNameAndDirectoryResolver that takes a list of AssemblyNames and a corresponding list of directories.
add a new constructor such as: public PathAssemblyResolver(IEnumerable<string> assemblyNames, IEnumerable<string> directories)

I don't understand why the inputs to the resolver would have to change to support distinct versions / strong names.

we'd have to add policy to determine whether to key off the exact version + publickeytoken + culture, or be flexible and accept >= version, simple name only, etc

This is the hard part for sure. It is exactly what I meant with possible "knobs".

I don't think there's much value in this resolver if we're just going to ignore the edge cases and write the same naive code that anyone could write themselves.

At a minimum, I think:

  1. We should document the restriction.
  2. We should handle the unsupported case more deliberately than letting the dictionary explode when two files have the same simple name.

But I'd prefer that we had parity with csc in how we bind and also in identifying the core assembly from the set even if it does not have one of the well-known names. My hope was that for build scenarios, you could just pass @(ReferencePath) in msbuild and get that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to chime in here, that a more sophisticated default resolver would be greatly useful for anyone doing anything but the simplest of assembly loading. Mono.Cecil's resolver is quite simple, and only handles a few default cases … so it's been quite a task to write one from scratch for my project. I would implore that @nguerrera's thoughts here be considered 😄 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the inputs to the resolver would have to change to support distinct versions / strong names.

They wouldn't have to change - just an example of a very explicit model from the caller without the resolver having to inspect each assembly...

Closer to what you were thinking would be to keep the same constructor, but use a list not a hashtable. The Resolve() method would then use any "knobs" to control what it needs to do (like version >=, pick closest culture, etc) and will have to load\inspect each assembly (with same name) to determine best fit.

In either case though, yes I will add your two minimum asks (documentation, better "duplicate assembly" exception).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelmartinez are you referring to matching run-time binding semantics or having better inspection-only\reflection-only binding semantics?

Since the runtime assembly binding logic in CoreClr (binding redirects, default search locations) is quite differnet from NetFx (explicit paths determined by parsing deps.json files), yes it is not easy to write a general-purpose binder that matches runtime semantics. Here's one request to expose the internal functionality.

//
internal abstract class LeveledTypeInfo :
#if netstandard
TypeDelegator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the hack I was referring to.

@marek-safar
Copy link
Contributor

Why does MetadataLoadContext implement System.IDisposable? This seems to be redundant.

I'd also suggest moving the corlib resolution to MetadataAssemblyResolver possibly as a new virtual method e.g. ResolveCoreAssembly. Current design hardcodes a few predefines core assemblies but does not allow to override the logic. For example what if someone wants to load them in a different order or each of them from a different location, etc

@nguerrera
Copy link
Contributor

IDisposable is used because it holds locks on files. They are memory mapped and data is pulled from them lazily.

I like the virtual resolve core assembly. (Aside: Default does allow you to find each if them in arbitrarily different places, but not to probe in a different order or for a different name.)

The path resolver could avoid probing for names that are not its set. It could also fall back to scanning entire set for referenceless assembly with System.Object typedef as csc does.

@marek-safar
Copy link
Contributor

IDisposable is used because it holds locks on files. They are memory mapped and data is pulled from them lazily.

Where is that? I found the IDisposable interface to be present only in reference code not in the actual implementation

pkg/descriptions.json Outdated Show resolved Hide resolved
//
public abstract bool ImplicitlyOverrides(M baseMember, M derivedMember);

//
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally these would be <summary> as well. I think it's fine to just open an issue and perhaps someone will clean these up later with a big regex.

<Compile Include="System\Reflection\TypeLoading\Types\RoType.TypeClassification.cs" />
<Compile Include="System\Reflection\TypeLoading\Types\RoWrappedType.cs" />
</ItemGroup>
<ItemGroup>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a condition for netcoreapp only:

<ItemGroup Condition="'$(TargetGroup)' == 'netcoreapp'"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have property for this: <ItemGroup Condition="$(TargetsNetCoreApp)"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joperezr could this cause compilation errors for netstandard if some of the assemblies referenced are not part of netstandard? The latest build did get some errors but at this point in time not sure what caused those. What does this condition do except remove unnecessary references to assemblies in the netstandard case?

/// The assembly name does not need to include the assembly file extension (.dll). If an extension exists
/// it is removed.
/// </remarks>
/// <exception cref="System.ArgumentException">Thrown when the assembly simple name already exists.</exception>
Copy link
Contributor

@nguerrera nguerrera Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be on the constructor, not the class.

Also, it should say that the assemblyPaths must not have two files with the same simple name.

Finally, I'd prefer if the exception was thrown explicitly instead of letting dictionary throw, which would look like a bug in the call stack. And assemblyPaths should probably be the paramName.

(I am still nervous about this restriction. It will break valid (not common, but still valid) projects to insist that all assemblies on @(ReferencePath) have a unique file-name-without-extension so I'm not sure this API addresses the build scenario correctly and I thought that was the purpose of it. I'm almost tempted to say let's not have it if we're just going to do the trivial case.)

Copy link
Member Author

@steveharter steveharter Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will clarify the comments.

An explicit throw here will cause a tiny perf hit since it must call Contains - it would be nice if there's a TryAdd() method...

What do you suggest to fix the restriction? If we want to emulate the CoreClr host (to a 99% level) then we need to load each assembly (that has the same simple name) and pick the highest assemblyversion+fileversion. The simplest policy would be to just use the first assembly encountered. To add more flexibility we could also place the default policy in a protected virtual method ResolveDuplicate that could be overridden

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to emulate the CoreClr host

This API is geared towards compilers/tools/ It should be taking Roslyn as the gold standard, not CoreClr host. What does Roslyn do for duplicates?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is what I've been saying:

#33201 (comment)

a good default would be to study what csc does and mimic it as much as possible. A common use case for this API is to use the same references as csc and it would be desirable for this and csc to bind them the same way.

How about this: we cut PathAssemblyResolver from the initial check-in and open an issue to track adding it. I would prefer shipping v1 without it than picking this policy without exploring csc parity. This policy is trivial to implement yourself so we're not holding much value back. I think it's probably enough work to match csc to treat as a distinct feature/review.

This looks like it might be the Roslyn code:

http://source.roslyn.io/#Microsoft.CodeAnalysis/ReferenceManager/CommonReferenceManager.Resolution.cs,99d93dd36939a602,references

Note that the warning it mentions, is usually disabled, and I've suggested removing it from the compiler:

dotnet/roslyn#19640

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding @jaredpar and @tmat: Is that the right source of truth for how the compiler binds assembly references?

Copy link
Member Author

@steveharter steveharter Nov 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with removing it for now. Anecdotal evendence shows that the compiler will pick the last assembly reference given same version. And it only bases conflicts on the actual assembly name internally not the physical file name.

However it is more complicated and msbuild may interact in addition to Roslyn\csc:
CS1703:"Multiple assemblies with equivalent identity have been imported"
CS1704 C# An assembly with the same simple name has already been imported. Try removing one of the references (e.g.) or sign them to enable side-by-side.
warning MSB3243: No way to resolve conflict between FOO and BAR. Choosing FOO arbitrarily

Copy link
Contributor

@nguerrera nguerrera Nov 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it only bases conflicts on the actual assembly name internally not the physical file name.

Then we should do the same.

msbuild may interact in addition to Roslyn\csc:
warning MSB3243: No way to resolve conflict between FOO and BAR. Choosing FOO arbitrarily

We should ignore anything higher than csc in the build for the specification here. The rest of the msbuild targets resolve references down to paths. Once you have a set of paths as here, csc is the gold standard. The immediate use case we have (and other build tools will have) for this API is to pass the same @(ReferencePath) that csc gets. We should interpret it the same way.

CS1703:"Multiple assemblies with equivalent identity have been imported"

This is for the case where the full strong name is duplicated. We can error for that too.

CS1704 C# An assembly with the same simple name has already been imported. Try removing one of the references (e.g.) or sign them to enable side-by-side.

We can match this too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will cut PathAssemblyResolver for now until we can properly add requirements.

@steveharter
Copy link
Member Author

@jkotas @nguerrera Any other feedback before approval? There are two Todos linked in the description (dynamic\runtime assembly support and PathAssemblyResolver) - I'll start on the PathAssemblyResolver ASAP

Copy link
Member

@ahsonkhan ahsonkhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the late feedback, but some of the changes in #33734 brought my attention to this PR, so I thought I would review it here.

<Import Project="..\Directory.Build.props" />
<PropertyGroup>
<AssemblyVersion>4.0.0.0</AssemblyVersion>
<AssemblyKey>OPEN</AssemblyKey>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be <StrongNameKeyId>Open</StrongNameKeyId>?

We don't use AssemblyKey anywhere else in the repo.

cc @ericstj

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears AssemblyKey was renamed to StrongNameKeyId about 5 weeks ago, so I will change it.

<Import Project="$([MSBuild]::GetDirectoryNameOfFileAbove($(MSBuildThisFileDirectory), Directory.Build.props))\Directory.Build.props" />
<ItemGroup>
<ProjectReference Include="..\ref\System.Reflection.MetadataLoadContext.csproj">
<SupportedFramework>uap10.0.16299;net461;netcoreapp2.0;$(AllXamarinFrameworks)</SupportedFramework>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this particular version of uap? Should it instead of UAPvNextTFM?

cc @joperezr

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the literal and now using the variable

</ProjectReference>
<ProjectReference Include="..\src\System.Reflection.MetadataLoadContext.csproj" />
</ItemGroup>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove extra line

<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<ProjectGuid>{EC84D608-71BC-4FE4-86C8-CA30F3B0CC7D}</ProjectGuid>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is unsafe about the reference assembly? I don't think we need this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it

<PropertyGroup>
<ProjectGuid>{EC84D608-71BC-4FE4-86C8-CA30F3B0CC7D}</ProjectGuid>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
<CLSCompliant>false</CLSCompliant>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it not compliant? What API(s) required this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it

// base types. This is used when the binding flags passed to a Get() api limit the search to the immediate type only in order to avoid triggering
// unnecessary assembly resolving and a lot of unnecessary ParameterInfo creation and comparison checks.
//
internal sealed class QueriedMemberList<M> where M : MemberInfo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: TMemberInfo instead of M?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes changing that would better adhere, but this code is also duplicated on corect, so I'd prefer to keep it consistent for now for easier diffing.

return false;
}

public M Current
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is AggressiveInlining necessary here?

public M Current => _queriedMembers[_index];

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, a corert idiom. I'll note these for any possible future sync\sharing.

return Array.Empty<M>();

M[] newArray = new M[count];
CopyTo(newArray, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong but there seems to be some redundant/unnecessary work going on here.

We declare an array of size Count and to find count we are looping through UnfilteredCount _queriedMembers. We then call copy which does the exact same loop again. Can this be written more optimally?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does appear at least the calls to .Matches are redundant and methods could be factored to remove that. I'll add this to corert future.

}
else if (_capacity == _count)
{
int newCapacity = 2 * _capacity;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for this to overflow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in the current usages; no list will come close to 2 billion (2* _capacity=1 billion); also in corert

<ProjectGuid>{443B7659-B3FA-4B32-88BA-3A0517E21018}</ProjectGuid>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
<Configurations>netcoreapp-Debug;netcoreapp-Release;netstandard-Debug;netstandard-Release</Configurations>
<LangVersion>7.2</LangVersion>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary. We already point to latest:

<LangVersion>latest</LangVersion> <!-- default to allowing all language features -->

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was necessary at the time it was added when building+running the tests from VS since it didn't target the latest yet. However, it is now no longer necessary so I removed it.

<value>It is illegal to request the default value on a ParameterInfo loaded by a MetadataLoadContext. Use RawDefaultValue instead.</value>
</data>
<data name="BadImageFormat_TypeRefBadScopeType" xml:space="preserve">
<value>Assembly '{0}' contains a type reference 0x{1} that contains an invalid scope token.</value>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider removing 0x from the exception text and instead format it in place.

https://github.com/dotnet/corefx/pull/33201/files#diff-8807fb1ace13b51b12fb6b921d056d27R123

throw new BadImageFormatException(SR.Format(SR.BadImageFormat_TypeRefBadScopeType, module.Assembly.FullName, handle.GetToken().ToString("x8")));

Then becomes:

throw new BadImageFormatException(SR.Format(SR.BadImageFormat_TypeRefBadScopeType, module.Assembly.FullName, $"0x{handle.GetToken():x8}"));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks

@steveharter
Copy link
Member Author

@ahsonkhan I will address your feedback the second week of December. Thanks

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
10 participants