Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get InvocationList of MulticastDelegate without any allocation #41849

Closed
RamType0 opened this issue Sep 4, 2020 · 63 comments · Fixed by #97683
Closed

Get InvocationList of MulticastDelegate without any allocation #41849

RamType0 opened this issue Sep 4, 2020 · 63 comments · Fixed by #97683
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime trimming-for-aot `EnableAggressiveTrimming=true` used for running tests with AOT
Milestone

Comments

@RamType0
Copy link

RamType0 commented Sep 4, 2020

Background and Motivation

Currently,we have an API to invocation list of MulticastDelegate,it is MulticastDelegate.GetInvocationList().

It allocates array every time we call it.
And maybe that's why we have internal HasSingleTarget property in it.

We need handsome,and performant API.

Proposed API

public class Delegate
{
    // Existing API
    // public virtual System.Delegate[] GetInvocationList();

    // Returns whether the delegate has a single target. It is a property to highlight that it is guaranteed 
    // to be a very cheap operation that is important for set of scenarios addressed by this proposal.
    // Alternative names: HasSingleTarget, IsTrueMulticast (inverted condition), HasSingleElementInvocationList
    public bool IsSingle { get; }

    // Returns length of the invocation list. It is a method to highlight that it may not be a very cheap operation.
    // This method is optional part of the proposal for completeness. It is fine to omit it from the approved shape.
    public int GetInvocationListLength();

    // Returns invocation list enumerator
    public static InvocationListEnumerator<TDelegate> EnumerateInvocationList<TDelegate>(TDelegate d) where TDelegate : Delegate;

    // The shape of this enumerator matches existing StringBuilder.ChunkEnumerator and Activity.Enumerator
    // Neither of these existing enumerators implement IEnumerable and IEnumerator. This can be changed. 
    // Q: How are we deciding whether enumerators like this should implement IEnumerable and IEnumerator?
    // Note that implementing these interfaces makes each instantiation of the type more expensive, so it is a tradeoff.
    public struct InvocationListEnumerator<TDelegate> where TDelegate : Delegate
    {
        public TDelegate Current { get; }
        public bool MoveNext();

        // EditorBrowsable to match existing StringBuilder.ChunkEnumerator and Activity.Enumerator
        [EditorBrowsable(EditorBrowsableState.Never)] // Only here to make foreach work
        public InvocationListEnumerator<TDelegate> GetEnumerator() => this;
    }
}

Original rejected proposal

public ReadOnlySpan<Delegate> InvocationList => delegates == null ? MemoryMarshal.CreateSpan(this,1) : delegates;
@RamType0 RamType0 added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Sep 4, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Sep 4, 2020
@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@GrabYourPitchforks
Copy link
Member

We need handsome,and performant API.

This doesn't really describe the motivation, just the desired solution. Where do you see this API being used? Do you anticipate those scenarios seeing a measurable perf benefit? That would help justify the work needed for this API.

Thanks!

@RamType0
Copy link
Author

RamType0 commented Sep 4, 2020

I have used this API to handle exception from each invoked delegate.

@GrabYourPitchforks
Copy link
Member

Would syntax like the following work for you instead? I don't think there's a reliable way to create a one-element ROS<Delegate> without allocating or introducing a new field to MulticastDelegate, neither of which would be desirable.

SomeDelegate myDelegate = GetDelegate();
foreach (Delegate innerDelegate in myDelegate)
{
    var castDelegate = (SomeDelegate)innerDelegate;
    // use 'castDelegate' here
}

You should also spend some time prototyping this to make sure that the performance is what you expect. In your own application, make sure that the cost of this allocation really is measurable. You can use FieldInfo.GetValue via reflection to access the private _invocationList instance field for the purpose of testing.

@RamType0
Copy link
Author

RamType0 commented Sep 4, 2020

Ah,I just mistaken that we need ref Delegate for CreateSpan,but not Delegate.
But there are still available solution.
alter type of MuticastDelegate.delegates to object,and just assign it self to it if it is not multicasting.
Then, implementation would become

public ReadOnlySpan<Delegate> InvocationList => delegates == this ? MemoryMarshal.CreateSpan<Delegate>(ref delegates,1) : Unsafe.As<Delegate[]>( delegates);

@GrabYourPitchforks
Copy link
Member

just assign it self to it if it is not multicasting.

That's probably not a viable solution. There's a lot of code in the runtime - both managed and unmanaged - that has knowledge about the exact layout of delegate instances and what all the different instance fields represent. It would be very risky (and possibly also a negative perf hit) to update all of those call sites. That's not a good tradeoff for a niche API.

@RamType0
Copy link
Author

RamType0 commented Sep 4, 2020

Aren't you confusing it with field of Delegate?
I saw many comment which is similar to what you say in Delegate.
But I have not seen such a comment in MulticastDelegate.

@GrabYourPitchforks
Copy link
Member

But I have not seen such a comment in MulticastDelegate.

See here for some examples. You can follow the call graphs backward and see that there's a decent amount of code which relies on this field containing a restricted set of possible values.

@RamType0
Copy link
Author

RamType0 commented Sep 5, 2020

Oh,maybe I was seeing mono's one.

@ericstj ericstj removed the untriaged New issue has not been triaged by the area owner label Sep 11, 2020
@ericstj ericstj added this to the 6.0.0 milestone Sep 11, 2020
@GrabYourPitchforks GrabYourPitchforks added the api-needs-work API needs work before it is approved, it is NOT ready for implementation label Nov 9, 2020
@bollhals
Copy link
Contributor

SomeDelegate myDelegate = GetDelegate();
foreach (Delegate innerDelegate in myDelegate)
{
    var castDelegate = (SomeDelegate)innerDelegate;
    // use 'castDelegate' here
}

Such syntax (as long as it is not allocating) would actually be very nice for when you'd have to invoke each delegate within a try catch block to ensure each gets called. So e.g.

SomeDelegate myDelegate = GetDelegate();
foreach (Delegate innerDelegate in myDelegate)
{
    try
    {
        // invoke single delegate target
        ((SomeDelegate)innerDelegate).Invoke()
    }
    catch (Exception exception)
    {
        //Do something with that exception
    }
}

As of right now we (Example) just call GetInvocationList() which always allocates. (It shows up already in measurements, although not yet high up in the list. Still working on reducing others first)

@RamType0
Copy link
Author

RamType0 commented Nov 11, 2020

I have just created benchmark project.
In this project,we have these variants of GetInvocationlist.

  • CoreCLRGetInvocationList
    Just a copy from current CoreCLR's implementation.

  • MyGetInvocationList
    My optimized implementation of GetInvocationList without altering or adding any API.

  • UnsafeGetInvocationList
    My optimized allocation less new API.
    This is unsafe because the result will become wrong if we access the result after altering the delegate.
    But if we use this correctly, it is very fast and safe.

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.508 (2004/?/20H1)
Intel Core i9-9900K CPU 3.60GHz (Coffee Lake), 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.100
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.20.51904, CoreFX 5.0.20.51904), X64 RyuJIT
  DefaultJob : .NET Core 5.0.0 (CoreCLR 5.0.20.51904, CoreFX 5.0.20.51904), X64 RyuJIT

Method N Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
CoreCLRGetInvocationList 1 10.774 ns 0.1214 ns 0.1077 ns 0.0038 - - 32 B
MyGetInvocationList 1 9.302 ns 0.1656 ns 0.1549 ns 0.0038 - - 32 B
UnsafeGetInvocationList 1 2.917 ns 0.0370 ns 0.0346 ns - - - -
CoreCLRGetInvocationList 10 82.291 ns 0.7804 ns 0.7299 ns 0.0124 - - 104 B
MyGetInvocationList 10 23.457 ns 0.4810 ns 0.4499 ns 0.0124 - - 104 B
UnsafeGetInvocationList 10 7.849 ns 0.0776 ns 0.0726 ns - - - -
CoreCLRGetInvocationList 100 732.493 ns 10.2674 ns 9.6041 ns 0.0982 - - 824 B
MyGetInvocationList 100 154.543 ns 3.0153 ns 4.2270 ns 0.0985 - - 824 B
UnsafeGetInvocationList 100 53.075 ns 0.5351 ns 0.4743 ns - - - -
CoreCLRGetInvocationList 1000 7,922.301 ns 100.5212 ns 89.1095 ns 0.9460 - - 8024 B
MyGetInvocationList 1000 1,405.915 ns 9.4829 ns 8.8703 ns 0.9575 - - 8024 B
UnsafeGetInvocationList 1000 676.750 ns 6.3342 ns 5.6151 ns - - - -

@GrabYourPitchforks
Copy link
Member

@RamType0 The benchmark code makes incorrect assumptions about the layout of that field, and it makes incorrect assumptions about how the C# language treats reachability of objects passed as in parameters. This could lead to reliability problems at runtime.

I'll re-up my comment from #41849 (comment). We could probably add an enumerator to allow this scenario if it's really that important, but creating a ROS<T> is likely a non-starter. For now, a workaround could be to call GetInvocationList once and cache the result.

@RamType0
Copy link
Author

RamType0 commented Nov 11, 2020

@RamType0 The benchmark code makes incorrect assumptions about the layout of that field, and it makes incorrect assumptions about how the C# language treats reachability of objects passed as in parameters. This could lead to reliability problems at runtime.

This caused just by MemoryMarshal.CreateReadOnlySpan.
Actually, C# compiler correctly handles this.

public static ref readonly T UnsafeGetInvocationListByRefAndCount<T>(in T d,out int count)
            where T : MyMulticastDelegate
        {
            var _invocationList = d._invocationList;
            if (_invocationList != null)
            {
                var invocationList = Unsafe.As<object[]>(_invocationList);
                count = (int)d._invocationCount;
                return ref Unsafe.As<object, T>(ref MemoryMarshal.GetArrayDataReference(invocationList));
            }
            else
            {
                count = 1;
                return ref d;
            }
        }

So, it is not a UnsafeGetInvocationList specific problem.
And also, this problem has been forgiven for MemoryMarshal.CreateReadOnlySpan, which is already approved public API.

I'll re-up my comment from #41849 (comment). We could probably add an enumerator to allow this scenario if it's really that important, but creating a ROS<T> is likely a non-starter. For now, a workaround could be to call GetInvocationList once and cache the result.

It is better than current API.
But we have thought that List.Enumerator is slow, and we have finally created CollectionsMarshal.AsSpan.
So I proposed API for creating ReadOnlySpan.

@bollhals
Copy link
Contributor

I'll re-up my comment from #41849 (comment). We could probably add an enumerator to allow this scenario if it's really that important, but creating a ROS<T> is likely a non-starter. For now, a workaround could be to call GetInvocationList once and cache the result.

In which use case (where you'd use this API) would a ROS actually be more beneficial than an enumerator?

@GrabYourPitchforks In your example you've set the type of the loop variable to Delegate, I assume it wouldn't be possible and/or safe to type that to the actual delegate type (e.g. foreach (Action<int> innerDelegate in myDelegate)?

@GrabYourPitchforks
Copy link
Member

@bollhals If it's an instance method, the type would be Delegate, and the caller would need to cast. (These casts should be very cheap.) If it's an extension method, it can probably be typed appropriately.

@Joe4evr
Copy link
Contributor

Joe4evr commented Nov 12, 2020

But we have thought that List.Enumerator is slow

Exactly what is slow about it?

@RamType0
Copy link
Author

But we have thought that List.Enumerator is slow

Exactly what is slow about it?

using foreach for List<T> is slower than using foreach for T[].
Then, we have finally created CollectionsMarshal.AsSpan.
This unsafe API was created just for performance.

@RamType0
Copy link
Author

@GrabYourPitchforks BTW, MyGetInvocationList is seems to be better than CoreCLRGetInvocationList without any API changes.
If I created PR for it, is it beneficial?

@BreyerW
Copy link

BreyerW commented Nov 12, 2020

If im not mistaken real reason was performance of list with large structs. Eg imagine a list with million Matrix4x4 structs. Each such matrix holds 16 floats making it pretty bulky and each such struct had to be copied every time a value was retrieved via foreach in our case meaning potentially 1 milion copies of relatively large struct. This can definitely tank performance. Class based cases are largely unaffected unless u wanted to swap references itself during iteration

@RamType0
Copy link
Author

If im not mistaken real reason was performance of list with large structs. Eg imagine a list with million Matrix4x4 structs. Each such matrix holds 16 floats making it pretty bulky and each such struct had to be copied every time a value was retrieved via foreach in our case meaning potentially 1 milion copies of relatively large struct. This can definitely tank performance. Class based cases are largely unaffected unless u wanted to swap references itself during iteration

At least, the posted benchmark uses int for element, which is smaller than "reference".

@Joe4evr
Copy link
Contributor

Joe4evr commented Nov 12, 2020

using foreach for List<T> is slower than using foreach for T[].

Being skeptical of a post made more than 10 years ago, I went and did a proper benchmark.


BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.572 (2004/?/20H1)
Intel Core i7-10610U CPU 1.80GHz, 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100
  [Host]        : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  .NET Core 3.1 : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  .NET Core 5.0 : .NET Core 5.0.0 (CoreCLR 5.0.20.51904, CoreFX 5.0.20.51904), X64 RyuJIT
Method Job Runtime Mean Error StdDev Median
LargeStructList .NET Core 3.1 .NET Core 3.1 78.698 ns 0.8484 ns 0.7084 ns 78.545 ns
LargeStructArray .NET Core 3.1 .NET Core 3.1 28.893 ns 0.2721 ns 0.2546 ns 28.873 ns
ClassList .NET Core 3.1 .NET Core 3.1 73.004 ns 1.4792 ns 3.0217 ns 74.361 ns
ClassArray .NET Core 3.1 .NET Core 3.1 8.114 ns 0.0707 ns 0.0661 ns 8.104 ns
IntList .NET Core 3.1 .NET Core 3.1 42.759 ns 0.2340 ns 0.1954 ns 42.711 ns
IntArray .NET Core 3.1 .NET Core 3.1 7.497 ns 0.1566 ns 0.1465 ns 7.479 ns
LargeStructList .NET Core 5.0 .NET Core 5.0 70.169 ns 0.9476 ns 0.8400 ns 69.955 ns
LargeStructArray .NET Core 5.0 .NET Core 5.0 24.355 ns 0.5135 ns 0.9893 ns 24.142 ns
ClassList .NET Core 5.0 .NET Core 5.0 67.844 ns 0.6781 ns 0.6343 ns 67.873 ns
ClassArray .NET Core 5.0 .NET Core 5.0 6.542 ns 0.0661 ns 0.0586 ns 6.543 ns
IntList .NET Core 5.0 .NET Core 5.0 44.817 ns 0.7760 ns 0.7258 ns 44.625 ns
IntArray .NET Core 5.0 .NET Core 5.0 7.864 ns 0.2957 ns 0.8718 ns 7.717 ns

So I concede that lists are slower than a raw array, however I also don't think this difference is the end of the world, provided you're not literally on fire on a burning hot path. Now I'm very curious on where this difference comes from, since I was assuming it could get much closer to array performance than this.

@stephentoub
Copy link
Member

stephentoub commented Nov 12, 2020

Now I'm very curious on where this difference comes from

foreach on an array is desugared by the C# compiler into simple for loop. foreach on a list needs to go through its struct Enumerator, does version checks to make sure the list hasn't changed between iterations, etc.

e.g.
https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIAYACY8lAbhpuIGYnSGBhBgG8aDMUx4BLAHYYGAQShRsATwAUMjAG0Aug0kYY+XAEpR4kdXHX9shrgCu+BgF4GddlZtiAZtBjYYAAWDBp2krb6hsYm9k4MANRukp7eYsQA7HH4qeIAvuZihRK2cgAykrgYahVVADyaAHxRRqbFlmml2a7uud5+sIEhYXIRMi0x3Un6fTaZ2bMF1HlAA===

@RamType0
Copy link
Author

RamType0 commented Nov 12, 2020

@Joe4evr
Thanks for posting the benchmark! I couldn't find benchmark in recent environment.

So I concede that lists are slower than a raw array, however I also don't think this difference is the end of the world, provided you're not literally on fire on a burning hot path. Now I'm very curious on where this difference comes from, since I was assuming it could get much closer to array performance than this.

"foreach" for span or array is converted to "for".
And also, the generated pattern of "for" eliminates bounds check to throwing IndexOutOfRangeException.
Both of these optimization is specific for array or span.

It is also seems that IntList is significant faster than ClassList.
Maybe this is because of array's covariant check.

@GrabYourPitchforks
Copy link
Member

The CollectionsMarshal class was added to allow efficient bulk operations over the list, such as bulk moving data or bulk modification of data. It wasn't added because of any perceived deficiency in List's enumerator.

We're getting very off-track here. If the request is specifically for projection of a MulticastDelegate as a ROS over its inner targets, then that is not feasible (as previously mentioned) and this issue should be closed. This issue provided several alternative designs, but if we don't have a consumer for those alternative APIs then this work will never rise above the cut line for any release.

@jkotas
Copy link
Member

jkotas commented Dec 13, 2023

However, if there is a value with exposing a property or method for GetInvocationList().Length that would be fine

@terrajobst What would be a good API name for that?

As @weltkante pointed out, special casing the single-cast delegate fast path is fairly common. It would be nice to introduce proper API to replace the solutions based on private reflection:

@terrajobst
Copy link
Member

terrajobst commented Dec 13, 2023

@jkotas

@terrajobst What would be a good API name for that?

The objection to exposing the proposed properties was raised by @GrabYourPitchforks, so I let him speak to the details here.

My understanding was that InvocationCount would return the number of elements the iterator would produce (equivalent to GetInvocationList().Length) and that the Boolean IsSingleInvocation would be equivalent to InvocationCount == 0, so those names seemed reasonable.

In order for me to propose better names is dependent upon understanding the desired semantics.

@jkotas
Copy link
Member

jkotas commented Dec 13, 2023

My understanding was that InvocationCount would return the number of elements the iterator would produce

Yes, that's correct.

Exposing InvocationCount with assumption that it is a fast O(1) operation locks us into the current internal delegate implementation. This assumption would be broken if we were to change the internal delegate implementation to a different data structure that does not allow fast O(1) length operation. (The internal data structure used by delegates was changed at least once in .NET runtime history.) This lock-in is not a showstopper, I am just pointing out the downside of exposing full InvocationCount when the API use cases only need to check for == 1.

@terrajobst terrajobst added api-needs-work API needs work before it is approved, it is NOT ready for implementation and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Dec 13, 2023
@terrajobst
Copy link
Member

Marking the API as I forgot to do that in the meeting 🤦

@halter73
Copy link
Member

I was concerned that Delegate.InvocationCount might be misconstrued to mean the number of times the Delegate has been invoked.

@jkotas
Copy link
Member

jkotas commented Dec 14, 2023

I was concerned that Delegate.InvocationCount might be misconstrued

Renamed to GetInvocationListLength() for clarity.

@jkotas
Copy link
Member

jkotas commented Dec 14, 2023

I have updated the proposal at the top with the feedback so far. Any comments?

@weltkante
Copy link
Contributor

I have updated the proposal at the top with the feedback so far. Any comments?

IsSingle is probably meant to be a boolean?

@jkotas
Copy link
Member

jkotas commented Dec 18, 2023

I think this is ready for next round of API review discussion.

@jkotas jkotas added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-needs-work API needs work before it is approved, it is NOT ready for implementation labels Dec 18, 2023
@bartonjs
Copy link
Member

bartonjs commented Jan 16, 2024

Video

  • We discussed whether the new enumerator needs to declare IEnumerable and/or IEnumerator, and we feel that's not necessary given that GetInvocationList() is already a Linq.Enumerable-friendly version of the data.
  • EnumerateInvocationList being static was questioned, and it was answered that it was to make the enumerated values be typed as the input type, rather than going back to just System.Delegate.
  • IsSingle got changed to the first alternative proposal of HasSingleTarget
  • GetInvocationListLength was removed, pending a more concrete need in the future.
public class Delegate
{
    // Existing API
    // public virtual System.Delegate[] GetInvocationList();

    // Returns whether the delegate has a single target. It is a property to highlight that it is guaranteed 
    // to be a very cheap operation that is important for set of scenarios addressed by this proposal.
    public bool HasSingleTarget { get; }

    // Returns invocation list enumerator
    public static InvocationListEnumerator<TDelegate> EnumerateInvocationList<TDelegate>(TDelegate d) where TDelegate : Delegate;

    // The shape of this enumerator matches existing StringBuilder.ChunkEnumerator and Activity.Enumerator
    // Neither of these existing enumerators implement IEnumerable and IEnumerator. This can be changed. 
    // Q: How are we deciding whether enumerators like this should implement IEnumerable and IEnumerator?
    // Note that implementing these interfaces makes each instantiation of the type more expensive, so it is a tradeoff.
    public struct InvocationListEnumerator<TDelegate> where TDelegate : Delegate
    {
        public TDelegate Current { get; }
        public bool MoveNext();

        // EditorBrowsable to match existing StringBuilder.ChunkEnumerator and Activity.Enumerator
        [EditorBrowsable(EditorBrowsableState.Never)] // Only here to make foreach work
        public InvocationListEnumerator<TDelegate> GetEnumerator() => this;
    }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jan 16, 2024
jkotas added a commit to jkotas/runtime that referenced this issue Jan 30, 2024
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 30, 2024
jkotas added a commit that referenced this issue Jan 31, 2024
)

* Add APIs for allocation-free delegate invocation list inspection

Fixes #41849

* Apply suggestions from code review

Co-authored-by: Stephen Toub <stoub@microsoft.com>

* Update src/libraries/System.Private.CoreLib/src/System/Delegate.cs

* Return empty enumerator for null

* Perf tweaks

---------

Co-authored-by: Stephen Toub <stoub@microsoft.com>
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 31, 2024
@weltkante
Copy link
Contributor

I've wanted to try out the new APIs but they don't seem to appear in the daily builds, am I missing something? Sorry if there's known delays between PRs and daily builds, didn't see anything being mentioned.

@jkotas
Copy link
Member

jkotas commented Feb 9, 2024

Yes, .NET SDK runs behind dotnet/runtime. You can check https://github.com/dotnet/installer/blob/main/eng/Version.Details.xml#L30 to see the dotnet/runtime commit that is in the SDK.

@jkotas
Copy link
Member

jkotas commented Feb 9, 2024

Also https://github.com/dotnet/runtime/blob/main/docs/project/dogfooding.md#advanced-scenario---using-a-daily-build-of-microsoftnetcoreapp has instructions for how to get daily builds of dotnet/runtime without waiting for the bits to flow to .NET SDK.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime trimming-for-aot `EnableAggressiveTrimming=true` used for running tests with AOT
Projects
None yet
Development

Successfully merging a pull request may close this issue.