Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline optimizations #6237

Merged
merged 8 commits into from
Feb 3, 2022
Merged

Pipeline optimizations #6237

merged 8 commits into from
Feb 3, 2022

Conversation

danielmarbach
Copy link
Contributor

@danielmarbach danielmarbach commented Jan 27, 2022

TLDR:

  • Up to 5 times faster pipeline execution
  • All closure allocations during pipeline execution next calls are eliminated (= Zero Gen0 Garbage)
  • Up to 30% faster behavior during pipeline exceptions
  • All closure allocations during pipeline exceptions next calls are eliminated (= Zero Gen0 Garbage)
  • Switches over to the fast expression compiler source code package

Today, even a simple pipeline like

    var behavior1 = new MyBehavior();
    var behavior2 = new MyBehavior();
    var behavior3 = new MyBehavior();
         
    var context = new MyBehaviorContext();
         
    await behavior1.Invoke(context, ctx1 => behavior2.Invoke(ctx1, ctx2 => behavior3.Invoke(ctx2, ctx3 => Task.CompletedTask)));

creates the following code

    <>c__DisplayClass0_0 <>c__DisplayClass0_ = new <>c__DisplayClass0_0();
    MyBehavior myBehavior = new MyBehavior();
    <>c__DisplayClass0_.behavior2 = new MyBehavior();
    <>c__DisplayClass0_.behavior3 = new MyBehavior();
    MyBehaviorContext context = new MyBehaviorContext();
    awaiter = myBehavior.Invoke(context, new Func<IBehaviorContext, Task>(<>c__DisplayClass0_.<Invoke>b__0)).GetAwaiter();

    [CompilerGenerated]
    private sealed class <>c__DisplayClass0_0
    {
        public MyBehavior behavior2;

        public MyBehavior behavior3;

        public Func<IBehaviorContext, Task> <>9__1;

        internal Task <Invoke>b__0(IBehaviorContext ctx1)
        {
            return behavior2.Invoke(ctx1, <>9__1 ?? (<>9__1 = new Func<IBehaviorContext, Task>(<Invoke>b__1)));
        }

        internal Task <Invoke>b__1(IBehaviorContext ctx2)
        {
            return behavior3.Invoke(ctx2, <>c.<>9__0_2 ?? (<>c.<>9__0_2 = new Func<IBehaviorContext, Task>(<>c.<>9.<Invoke>b__0_2)));
        }
    }

notice the display class and func allocations throughout the pipeline which are going to happen for every pipeline invocation. This PR introduces an internal backing property on the context bag that contains all the behaviors for a given pipeline stage or the parent stage (in case the current stage has no pipeline) and then extracts the behaviors from the context. So in essence the PR would turn the above pseudocode from

await behavior1.Invoke(context, ctx1 => behavior2.Invoke(ctx1, ctx2 => behavior3.Invoke(ctx2, ctx3 => Task.CompletedTask)));

into

await behavior1.Invoke(context, ctx1 => ((MyBehavior)ctx1.Extensions.Behaviors[1]).Invoke(ctx1, ctx2 => ((MyBehavior)ctx2.Extensions.Behaviors[2]).Invoke(ctx2, ctx3 => Task.CompletedTask)));

which then leads to the following code being compiled

    MyBehavior myBehavior = new MyBehavior();
    MyBehavior myBehavior2 = new MyBehavior();
    MyBehavior myBehavior3 = new MyBehavior();
    MyBehaviorContext myBehaviorContext = new MyBehaviorContext();
    ContextBag extensions = myBehaviorContext.Extensions;
    IBehavior[] array = new IBehavior[3];
    array[0] = myBehavior;
    array[1] = myBehavior2;
    array[2] = myBehavior3;
    extensions.Behaviors = array;
    awaiter = myBehavior.Invoke(myBehaviorContext, <>c.<>9__0_0 ?? (<>c.<>9__0_0 = new Func<IBehaviorContext, Task>(<>c.<>9.<Invoke>b__0_0))).GetAwaiter();

    private sealed class <>c
    {
        public static readonly <>c <>9 = new <>c();

        public static Func<IBehaviorContext, Task> <>9__0_2;

        public static Func<IBehaviorContext, Task> <>9__0_1;

        public static Func<IBehaviorContext, Task> <>9__0_0;

        internal Task <Invoke>b__0_0(IBehaviorContext ctx1)
        {
            return ((MyBehavior)ctx1.Extensions.Behaviors[1]).Invoke(ctx1, <>9__0_1 ?? (<>9__0_1 = new Func<IBehaviorContext, Task>(<>9.<Invoke>b__0_1)));
        }

        internal Task <Invoke>b__0_1(IBehaviorContext ctx2)
        {
            return ((MyBehavior)ctx2.Extensions.Behaviors[2]).Invoke(ctx2, <>9__0_2 ?? (<>9__0_2 = new Func<IBehaviorContext, Task>(<>9.<Invoke>b__0_2)));
        }

        internal Task <Invoke>b__0_2(IBehaviorContext ctx3)
        {
            return Task.CompletedTask;
        }
    }

notice the display class and func allocations are gone. The changes introduced here are safe because the order of behaviors are calculate once and then stored in the right order into an array. The position of the behavior in the area as well as the type of the behavior is known at "baking" time of the expression tree and can just be expressed by a constant integer pointing to the location in the array.

For every pipeline execution, the behaviors need to be reassigned to the context instance to make that state available during the pipeline execution of the fresh context instances. Parent walking is necessary for pipeline stages that are part of a parent pipeline.

Alternatives to exposing the state

The extension bag cannot be used directly because the access to the extension dictionary and the associated allocations would undermine the performance improvements.

Further tweaks

Faster casting

It would be possible to save some on the casting by using Unsafe.As by for example replacing

((MyBehavior)ctx2.Extensions.Behaviors[2])

with

Unsafe.As<MyBehavior>(ctx2.Extensions.Behaviors[2])

image

which could lead to further speed improvements as shown below

image

yet as the highlighted changes show depending on the depth of the pipeline this optimization could be slower than directly accessing casting I figured I'll leave it out for now. For future reference, the expression tree code would look like

        static Delegate CreateBehaviorCallDelegate(MethodInfo methodInfo, ParameterExpression outerContextParam, Type behaviorType, Delegate previous, int i, List<Expression> expressions = null)
        {
            PropertyInfo extensionProperty = typeof(IExtendable).GetProperty("Extensions");
            Expression extensionPropertyExpression = Expression.Property(outerContextParam, extensionProperty);
            PropertyInfo behaviorsProperty = typeof(ContextBag).GetProperty("Behaviors", BindingFlags.Instance | BindingFlags.NonPublic);
            Expression behaviorsPropertyExpression = Expression.Property(extensionPropertyExpression, behaviorsProperty);
            Expression indexerPropertyExpression = Expression.ArrayIndex(behaviorsPropertyExpression, Expression.Constant(i));
            MethodInfo unsafeAsMethodInfo = typeof(Unsafe).GetMethod("As", new[] { typeof(object) }).MakeGenericMethod(behaviorType);
            Expression castToBehavior = Expression.Call(null, unsafeAsMethodInfo, indexerPropertyExpression);
            Expression body = Expression.Call(castToBehavior, methodInfo, outerContextParam, Expression.Constant(previous));
            var lambdaExpression = Expression.Lambda(body, outerContextParam);
            expressions?.Add(lambdaExpression);
            return lambdaExpression.CompileFast();
        }

Remove the bound checks with .NET 6

By talking to @Scooletz we figured it would be possible to remove the bound checks around the array access. This would look like the following if combined with Unsafe.As

Unsafe.As<MyBehavior>(Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(ctx2.Extensions.Behaviors), 2))

the expression tree code looks something like

            MethodInfo memoryMarshalArrayDataRefMethodInfo = typeof(MemoryMarshal).GetMethods(BindingFlags.Public | BindingFlags.Static).Single(x => x.Name == "GetArrayDataReference" && x.GetGenericArguments().Length == 1)
                .MakeGenericMethod(typeof(IBehavior));
            Expression marshalExpression =
                Expression.Call(null, memoryMarshalArrayDataRefMethodInfo, behaviorsPropertyExpression);
            MethodInfo unsafeAddMethodInfo = typeof(Unsafe).GetMethods(BindingFlags.Public | BindingFlags.Static)
                .Single(x => x.Name == "Add" && x.GetGenericArguments().Length == 1 && x.GetParameters().Length == 2 && x.GetParameters()[0].ParameterType.IsByRef && x.GetParameters()[1].ParameterType == typeof(int))
                .MakeGenericMethod(typeof(IBehavior));
            Expression unsafeAddExpression =
                Expression.Call(null, unsafeAddMethodInfo, marshalExpression, Expression.Constant(i));
            Expression castToBehavior = Expression.Call(null, unsafeAsMethodInfo, unsafeAddExpression);

or by using calling a helper

[MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static TBehavior GetBehavior<TBehavior>(IExtendable context, int index) where TBehavior : class, IBehavior =>
            Unsafe.As<TBehavior>(
                Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(context.Extensions.Behaviors), index));

This can lead to 3-5% additional gains but only works on NET6 which we don't target yet in core.

Benchmarks

image

Pipeline Execution

image

Pipeline Exception

image

Pipeline Warmup

image

@danielmarbach danielmarbach changed the title Pipeline runtime optimizations Pipeline optimizations Jan 28, 2022
@danielmarbach
Copy link
Contributor Author

This is good to go now

@danielmarbach danielmarbach merged commit c9301c0 into master Feb 3, 2022
@danielmarbach danielmarbach deleted the pipelineplayground branch February 3, 2022 10:58
@Scooletz
Copy link
Contributor

Scooletz commented Feb 3, 2022

I'm aware that this is post merge commit. I want to mention though that the array accessor for .NETCore could be ported from
https://github.com/CommunityToolkit/WindowsCommunityToolkit/blob/059cf83f1fb02a4fbb4ce24249ea6e38f504983b/Microsoft.Toolkit.HighPerformance/Extensions/ArrayExtensions.cs#L55-L106

so it could be included in .NETCore build (AFAIK, NSB targets 3.1). Not sure though that it would be reasonable for 3-5% gains as @danielmarbach described.

@danielmarbach
Copy link
Contributor Author

I have done another round on this code as promised. It is now possible because we dropped the .NET Core 3.1 target.

#6394

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants