Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to inline MethodGroup/Lamda directly passed as an argument #10049

Open
dmitriyse opened this issue Mar 27, 2018 · 16 comments
Open

Allow to inline MethodGroup/Lamda directly passed as an argument #10049

dmitriyse opened this issue Mar 27, 2018 · 16 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Milestone

Comments

@dmitriyse
Copy link

This is an example where inlining logically possible, but currently not supported by JIT:

using System;
using System.Runtime.CompilerServices;

public class C {

    public void InlineTest() {
 		// This method should not call anything
        // Everything can be inlined
        DoWithGuardTwice(PrintHello);
        
        //DoWithGuardTwice(()=>{Console.WriteLine("bye");});
    }
    
    public void PrintHello()
    {
        Console.WriteLine("Hello");
    }
    
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public void DoWithGuardTwice(/*Probably with some hint here*/ Action a)
    {
        lock(this)
        {
            a();
            a();
        }
    }    
}

https://sharplab.io/#v2:C4LghgzgtgPgAgBgARwIwG4CwAoRLUB0ASgK4B2wAllAKYEDCA9lAA6UA2NATgMrcBulAMY0IWbDjgBmFACYk9JAG8cOJOpQy4AFiQBJMu0pkaAFVHAAFAEplagJD2A9E6SmAFpQhJawd4wATJAh/EnYgskZgJCEwdnYkMDIATz9jAHM1DWyXJABRfm5UzzJ0mKSkACMaJGMjEwCs7PUAEUYAdUo/AHESMC4A0wB3YRpLAAUuY2AACRp4xmtxZo0m5pc2zp6+geHRyxsAXgA+JTQATksAIkrkmiulgF8ltce1tekUXUnpuYWbNYqbArdQXa5/diMB7LDRvYGreHqADaAFkaH5AnpWOxLGiMQEsSx2AB5FhURhkCAEACC6XSXFEEEohQM9Qy1gAuh8tLpNl13L1+oMRiJLE4AFSTRiVMCVdjJJAjPzBZg1ErRdzcGji1xoeRgayAtbZSFCADWljSEENiOaQJBzTANhhDvUTpetuycK92RwcKAA===

This is a complex type of inlining. As an alternative this type of inlining was proposed to be implemented at the C# compiler level: dotnet/csharplang#1413

category:cq
theme:inlining
skill-level:expert
cost:medium

@AndyAyersMS
Copy link
Member

This is similar to #7948 and we lack compelling examples.

@mikedn
Copy link
Contributor

mikedn commented Mar 28, 2018

@AndyAyersMS As far as I can tell this is slightly different in that in requires to inline the 2 delegate calls as well. #7948 is basically a prerequisite.

@dmitriyse
Copy link
Author

Yes, this issue is not about "lock".

using System;
using System.Runtime.CompilerServices;

public class C {

    public void InlineTest() {
 		// This method should not call anything
        // Everything can be inlined
        DoTwice(PrintHello);
        
        //DoTwice(()=>{Console.WriteLine("bye");});
    }
    
    public void PrintHello()
    {
        Console.WriteLine("Hello");
    }
    
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public void DoTwice(/*Probably with some hint here*/ Action a)
    {
            a();
            a();
    }    
}

And also not about double call. Double call just simplifies to see inlining in the x86 ASM code.

@mikernet
Copy link
Contributor

mikernet commented Oct 2, 2019

I think this could be a big win in some situations. The inlining process already combines constants passed as parameters that are involved in arithmetic with other constants or other parameters passed as constants into a single constant value and that delegate is essentially passed as a constant method token so it could be treated similarly to eliminate the delegate invoke call.

@mikernet
Copy link
Contributor

mikernet commented Oct 2, 2019

As far as I know, methods with delegate calls don't get inlined at all though, so I guess that would need to be addressed first. I can't find info on why that is though. Is that still the case?

@AndyAyersMS
Copy link
Member

methods with delegate calls don't get inlined at all though

Yes, it seems odd. Something I'm looking into...

@AndyAyersMS
Copy link
Member

I've experimentally unblocked inlining of methods with delegate invokes in the jit. Not surprisingly this leads to more inlining. I am not yet sure if this leads to correctness issues, but setting that aside for the moment, we can at least assess the likely code size impact:

PMI Diffs for System.Private.CoreLib.dll, framework assemblies for x64 default jit
Summary:
(Lower is better)
Total bytes of diff: 28926 (0.07% of base)
    diff is a regression.
Top file regressions by size (bytes):
       13873 : CommandLine.dasm (3.12% of base)
        3433 : Microsoft.CodeAnalysis.dasm (0.20% of base)
        2486 : System.Private.Xml.dasm (0.07% of base)
        1801 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (0.06% of base)
        1300 : Microsoft.CodeAnalysis.VisualBasic.dasm (0.02% of base)
26 total files with size differences (0 improved, 26 regressed), 103 unchanged.
Top method regressions by size (bytes):
        2742 (79.04% of base) : CommandLine.dasm - Trial:Collect(ref):ref (7 methods)
        2742 (94.52% of base) : CommandLine.dasm - ResultExtensions:Collect(ref):ref (7 methods)
        2682 (50.15% of base) : CommandLine.dasm - ResultExtensions:Flatten(ref):ref (7 methods)
         762 ( 8.51% of base) : CommandLine.dasm - HelpText:AutoBuild(ref,ref,ref,bool,int):ref (7 methods)
         540 (21.98% of base) : CommandLine.dasm - Trial:Flatten(ref):ref (7 methods)
Top method improvements by size (bytes):
        -479 (-50.00% of base) : System.Private.CoreLib.dasm - StreamWriter:WriteLine(ref):this (2 methods)
         -60 (-18.52% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - TraceLog:CreateFromLttngTextDataFile(ref,ref,ref):ref
         -37 (-1.45% of base) : System.Net.Http.dasm - HttpKnownHeaderNames:TryGetHeaderName(ref,int,int,ref,ref,byref):bool
          -2 (-0.43% of base) : xunit.performance.execution.dasm - BenchmarkEventSource:BenchmarkIterationStop(ref,ref,int,long):this
Top method regressions by size (percentage):
          75 (1,500.00% of base) : NuGet.Configuration.dasm - PackageSourceProvider:<.ctor>b__9_0(ref,ref):this
          64 (914.29% of base) : System.Private.CoreLib.dasm - _ThreadPoolWaitOrTimerCallback:WaitOrTimerCallback_Context_f(ref)
          64 (640.00% of base) : System.Private.CoreLib.dasm - _ThreadPoolWaitOrTimerCallback:WaitOrTimerCallback_Context_t(ref)
          42 (150.00% of base) : System.Runtime.Serialization.Formatters.dasm - SerializationEvents:InvokeOnSerializing(ref,struct):this
          42 (150.00% of base) : System.Runtime.Serialization.Formatters.dasm - SerializationEvents:InvokeOnDeserializing(ref,struct):this
Top method improvements by size (percentage):
        -479 (-50.00% of base) : System.Private.CoreLib.dasm - StreamWriter:WriteLine(ref):this (2 methods)
         -60 (-18.52% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - TraceLog:CreateFromLttngTextDataFile(ref,ref,ref):ref
         -37 (-1.45% of base) : System.Net.Http.dasm - HttpKnownHeaderNames:TryGetHeaderName(ref,int,int,ref,ref,byref):bool
          -2 (-0.43% of base) : xunit.performance.execution.dasm - BenchmarkEventSource:BenchmarkIterationStop(ref,ref,int,long):this
337 total methods with size differences (4 improved, 333 regressed), 202607 unchanged.

The jit is not accurately modelling the size expansion that comes with delegate invocation and so we're probably a bit too aggressive here. But there aren't many opportunities so perhaps it doesn't matter, except where stylistic issues lead to heavy use of delegates as seems to be happening in the CommandLine assembly.

It appears from the above that the inlining of methods that invoke delegates rarely leads to knock-on optimizations. So at this point I still don't see great opportunity here.

While it's true that locally constructed and consumed delegates should be optimizable, there are a number of other issues we need to sort through first before we can tackle that.

@mikernet
Copy link
Contributor

mikernet commented Oct 3, 2019

I doubt there are many BCL opportunities here for optimization but for some library authors it could be very useful, even if only applied when aggressive inlining is requested. There are several places in my DB / entity library codebase that would benefit with this, particularly if/when the delegate call can be eliminated for a direct method call when passed as a constant method token argument.

@mikernet
Copy link
Contributor

mikernet commented Oct 4, 2019

Another good use case is a parameter validation library we've standardized on that can take delegates for exception message generation, i.e:

_count = count.ThrowIfOutOfRange(0, 10, nameof(count), GetLocalizedOutOfRangeMessage);

Even if the delegate call isn't eliminated in the "exception" case that's fine as that should be rare, but we use these throw helpers for all validated parameters in all our methods which adds up so we would like them inlined to eliminate method call overhead. We currently can't use the delegate versions in performance-sensitive areas.

Once again, this could be enabled only for methods marked for aggressive inlining as we do this proactively for such methods.

@AndyAyersMS
Copy link
Member

Can you point me at an example...? I agree that jit-diff might not see enough stylistic variety.

@mikernet
Copy link
Contributor

mikernet commented Oct 5, 2019

Sorry I'm not sure I understand, what would you like an example of? The parameter validator extension methods like the one above are probably the simplest and most obvious examples I can give. The other cases follow similar patterns, in the sense that they will also be inlinable if the example above is inlinable.

Are you just asking me to expand the above example to a complete code example or do you want something else?

@AndyAyersMS
Copy link
Member

Are you just asking me to expand the above example to a complete code example or do you want something else?

Sorry for the confusion. I'd be grateful if you could point me at a complete code example.

@mikernet
Copy link
Contributor

mikernet commented Oct 20, 2019

Hey sorry for the delay, got busy handling a few other priorities so I put this off.

For this particular case, we implemented a simple workaround by refactoring the code that calls the delegate into a separate method. That works here as the delegate call is not on a hot path, though I haven't had a chance to fully analyze some of the other places where this might be desirable yet. I imagine the benefit of inlining a method that has a delegate call on a hot path is going to be much smaller, possibly negligible. It's not too painful to refactor out the delegate call (especially with local methods) in the very few places where this level of micro-optimization makes sense so this is less of an issue for me at this point.

That said, a full example before refactoring would look something like this:

public static class ThrowExtensions
{
    [DebuggerHidden]
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static string ThrowIfNullOrWhiteSpace(this string value, string paramName, Func<string> whitespaceMessage)
    {
        if (value == null)
            throw new ArgumentNullException(paramName);

        if (string.IsNullOrWhiteSpace(value))
            throw new ArgumentException(whitespaceMessage?.Invoke(), paramName);

        return value;
    }

    [DebuggerHidden]
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static int ThrowIfOutOfRange(this int value, int min, int max, string paramName, Func<int, int, string> message)
    {
        if (value < min || value > max)
            throw new ArgumentOutOfRangeException(paramName, message?.Invoke(min, max));

        return value;
    }
}

public class SeniorCitizen
{
    public static int MinAllowedAge { get; set; } = 60;

    public static int MaxAllowedAge { get; set; } = 120;

    private string _name;
    private int _age;

    public string Name
    {
        get => _name;
        set => _name = value.ThrowIfNullOrWhiteSpace(nameof(value), GetNameRequiredMessage);
    }

    public int Age 
    {
        get => _age;
        set => _age = value.ThrowIfOutOfRange(MinAllowedAge, MaxAllowedAge, nameof(value), GetAgeOutOfRangeMessage);
    }

    // logic to pull messages out of resource strings or whatever

    private static string GetNameRequiredMessage();

    private static string GetAgeOutOfRangeMessage(int min, int max);
}

This is made up to simplify the example obviously. The performance for a case like this wouldn't need to be micro-optimized, but we have performance-sensitive code in our DB engine core that uses similar patterns.

Refactored code looks like:

    [DebuggerHidden]
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static int ThrowIfOutOfRange(this int value, int min, int max, string paramName, Func<int, int, string> message)
    {
        if (value < min || value > max)
            ThrowException(min, max, paramName, message);

        return value;

        // Note: workaround for delegate invoke preventing inlining and also reduces the size 
        // of code that is inlined, so leave this as a separate method.
        static void ThrowException(int min, int max, string paramName, Func<int, int, string> message) 
            => throw new ArgumentOutOfRangeException(paramName, message?.Invoke(min, max));
    }

@mikernet
Copy link
Contributor

mikernet commented Oct 20, 2019

Random side note: a neat optimization I noticed in the JIT assembly is that exception throwing code always seems to get shuffled to the bottom of methods so that non-exceptional execution flow doesn't need to jump...how cool is that?

@mikernet
Copy link
Contributor

mikernet commented Oct 20, 2019

FYI: In microbenchmarks on .NET Core 3, the simplest throw extension methods (i.e. ThrowIfNull) run ~3x faster when inlined, while the most complex methods (i.e. ThrowIfOutOfRange<T> which accepts an IComparer<T> parameter) run about 30% faster. Most fall somewhere in the middle in terms of performance benefit. We also run macro benchmarks on core DB code to test differences between various inlining strategies across the codebase and we see gains with inlining applied on these methods.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@ZacharyPatten
Copy link

ZacharyPatten commented Jun 24, 2021

It is already possible to achieve functional inlining via JIT reification with generics+structs, but it is rather verbose code.

// this top-level program compiles in .NET 5
C c = new();
c.DoWithGuardTwice<ConsoleWriteHelloWorld>();

interface IAction
{
	void Invoke();
}

class C
{
	public void DoWithGuardTwice<TAction>(TAction action = default)
		where TAction : struct, IAction
	{
		lock (this)
		{
			action.Invoke();
			action.Invoke();
		}
	}
}

struct ConsoleWriteHelloWorld : IAction
{
	public void Invoke() => System.Console.WriteLine("hello world");
}

Instead of using a delegate you use a pass in a struct that implements an IAction<T...>/IFunc<T...> to a generic parameter. Again... kinda verbose/ugly, but it works.

But what about passing the delegate into a constructor? I have use cases where I would want that to be inlined too, but I don't know if that would be possible.

For example, I have code that looks something like this:

public class HeapArray<T, TCompare>
	where TCompare : struct, IFunc<T, T, CompareResult>
{
	internal readonly TCompare _compare;
	internal T[] _array;
	internal int _count;

	public HeapArray(TCompare compare = default)
	{
		_compare = compare;
		_array = new T[1];
		_count = 0;
	}

	// code that eventually calls: _compare.Invoke
}

and the TCompare represents the comparison delegate/function, which is faster than using IEqualityComparer<T> like Dictionary<K, T> because the TCompare can be inlined during the JIT reification of HeapArray<T, TCompare>. But if I used a delegate on the constructor instead of a generic type like this:

public class HeapArray<T>
{
	internal readonly Func<T, T, int> _compare;
	internal T[] _array;
	internal int _count;

	public HeapArray(Func<T, T, int> compare)
	{
		_compare = compare;
		_array = new T[1];
		_count = 0;
	}

	// code that eventually calls: _compare.Invoke
}

I don't know if it would be possible for _compare.Invoke to be inlined by the JIT. If that is not possible, then the optimization of this issue is still a good optimization, but it is limited on what all it can inline.

I created this issue/discussion on csharplang which is an explicit syntax for this topic: dotnet/csharplang#2904

Would it be possible to inline delegate invocations that are passed into constructors and stored as readonly fields?

Even if it was possible would that even be allowed? Because if it was inlined then doing stuff like using reflection to alter the readonly field would not effect code that was inlined.

We need a clean syntax for inlining delegates/functions on types too (as opposed to just methods as this issue demonstrates in its example).

@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Projects
None yet
Development

No branches or pull requests

7 participants