Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async statemachine for methods with loop generate irreducible CFGs #39814

Open
benaadams opened this issue Nov 14, 2019 · 3 comments
Open

Async statemachine for methods with loop generate irreducible CFGs #39814

benaadams opened this issue Nov 14, 2019 · 3 comments
Assignees
Labels
Area-Compilers Code Gen Quality Room for improvement in the quality of the compiler's generated code
Milestone

Comments

@benaadams
Copy link
Member

benaadams commented Nov 14, 2019

An irreducible CFG isn't possible to generate in C# other than via the statemachine generator (or IL Emit) as while C# allows gotos; it won't let you jump into a loop from outside.

However, consider the method

public async Task CompletedTask()
{
    for (int i = 0; i < 100; i++)
        await Task.CompletedTask;
}

This will generate a loop in the MoveNext() method with 3 entry points and an irreducible control flow:

int num = <>1__state;
try
{
    TaskAwaiter awaiter;
    if (num == 0)
    {
        awaiter = <>u__1;
        <>u__1 = default(TaskAwaiter);
        num = (<>1__state = -1);
        goto IL_006c;>──┐
    }                   │
    <i>5__1 = 0;        │
    goto IL_0084;>──────┼─┐
IL_006c: <──────────────┘ │
  ^ awaiter.GetResult();  │
  │ <i>5__1++;            │
  │ goto IL_0084;>┐       │
IL│0084: <────────┘<──────┘
  │ if (<i>5__1 < 100)
  │ {
  │     awaiter = Task.CompletedTask.GetAwaiter();
  │     if (!awaiter.IsCompleted)
  │     {
  │         num = (<>1__state = 0);
  │         <>u__1 = awaiter;
  │         <CompletedTask>d__0 stateMachine = this;
  │         <>t__builder.AwaitUnsafeOnCompleted(ref awaiter, ref stateMachine);
  │         return;
  │     }
  └────< goto IL_006c;
    }
}

It would be easier for analysis and optimizations if it was a reducible control flow with only one loop entry point.

For example in the above this could be achieved if the IL_006c block (awaiter.GetResult(); <i>5__1++;) was repeated in the num/state == 0 and before the last goto removing that block entirely, so becoming (better approaches probably also available):

int num = <>1__state;
try
{
    TaskAwaiter awaiter;
    if (num == 0)
    {
        awaiter = <>u__1;
        <>u__1 = default(TaskAwaiter);
        num = (<>1__state = -1);
+       awaiter.GetResult();
+       <i>5__1++;   
        goto IL_0084;>───┐
    }                    │
    <i>5__1 = 0;         │
    goto IL_0084;>──┐    │
IL_0084: <──────────┘────┘
  ^ if (<i>5__1 < 100)
  │ {
  │     awaiter = Task.CompletedTask.GetAwaiter();
  │     if (!awaiter.IsCompleted)
  │     {
  │         num = (<>1__state = 0);
  │         <>u__1 = awaiter;
  │         <CompletedTask>d__0 stateMachine = this;
  │         <>t__builder.AwaitUnsafeOnCompleted(ref awaiter, ref stateMachine);
  │         return;
  │     }
+ │     awaiter.GetResult();
+ │     <i>5__1++;   
  └───< goto IL_0084;
    }
}

The irreducible flow control in IL (as its not exactly the same as the decompiled C# sharplab.io:

IL_0000: ldarg.0
IL_0001: ldfld int32 C/'<CompletedTask>d__0'::'<>1__state'
IL_0006: stloc.0
.try
{
        // sequence point: hidden
        IL_0007: ldloc.0
        IL_0008: brfalse.s IL_000c
        
        IL_000a: br.s IL_000e
        
    ┌─< IL_000c: br.s IL_0050
    │   
    │   IL_000e: nop
    │   IL_000f: ldarg.0
    │   IL_0010: ldc.i4.0
    │   IL_0011: stfld int32 C/'<CompletedTask>d__0'::'<i>5__1'
    │   // sequence point: hidden
  ┌─┼─< IL_0016: br.s IL_0084
  │ │   
┌─┼─┼─> IL_0018: call class [System.Private.CoreLib]System.Threading.Tasks.Task [System.Private.CoreLib]System.Threading.Tasks.Task::get_CompletedTask()
│ │ │   IL_001d: callvirt instance valuetype [System.Private.CoreLib]System.Runtime.CompilerServices.TaskAwaiter [System.Private.CoreLib]System.Threading.Tasks.Task::GetAwaiter()
│ │ │   IL_0022: stloc.1
│ │ J   // sequence point: hidden
│ │ U   IL_0023: ldloca.s 1
│ J M   IL_0025: call instance bool [System.Private.CoreLib]System.Runtime.CompilerServices.TaskAwaiter::get_IsCompleted()
│ U P   IL_002a: brtrue.s IL_006c
│ M │   
│ P I   IL_002c: ldarg.0
│ │ N   IL_002d: ldc.i4.0
│ I T   IL_002e: dup
│ N O   IL_002f: stloc.0
│ T │   IL_0030: stfld int32 C/'<CompletedTask>d__0'::'<>1__state'
│ O L   IL_0035: ldarg.0
│ │ O   IL_0036: ldloc.1
│ L O   IL_0037: stfld valuetype [System.Private.CoreLib]System.Runtime.CompilerServices.TaskAwaiter C/'<CompletedTask>d__0'::'<>u__1'
L O P   IL_003c: ldarg.0
O O │   IL_003d: stloc.2
O P │   IL_003e: ldarg.0
P │ │   IL_003f: ldflda valuetype [System.Private.CoreLib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder C/'<CompletedTask>d__0'::'<>t__builder'
│ │ │   IL_0044: ldloca.s 1
B │ │   IL_0046: ldloca.s 2
A │ │   IL_0048: call instance void [System.Private.CoreLib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder::AwaitUnsafeOnCompleted<valuetype [System.Private.CoreLib]System.Runtime.CompilerServices.TaskAwaiter, class C/'<CompletedTask>d__0'>(!!0&, !!1&)
C │ │   IL_004d: nop
K │ │   IL_004e: leave.s IL_00c4
│ │ │   
│ │ └─> IL_0050: ldarg.0
│ │     IL_0051: ldfld valuetype [System.Private.CoreLib]System.Runtime.CompilerServices.TaskAwaiter C/'<CompletedTask>d__0'::'<>u__1'
│ │     IL_0056: stloc.1
│ │     IL_0057: ldarg.0
│ │     IL_0058: ldflda valuetype [System.Private.CoreLib]System.Runtime.CompilerServices.TaskAwaiter C/'<CompletedTask>d__0'::'<>u__1'
│ │     IL_005d: initobj [System.Private.CoreLib]System.Runtime.CompilerServices.TaskAwaiter
│ │     IL_0063: ldarg.0
│ │     IL_0064: ldc.i4.m1
│ │     IL_0065: dup
│ │     IL_0066: stloc.0
│ │     IL_0067: stfld int32 C/'<CompletedTask>d__0'::'<>1__state'
│ │     
│ │     IL_006c: ldloca.s 1
│ │     IL_006e: call instance void [System.Private.CoreLib]System.Runtime.CompilerServices.TaskAwaiter::GetResult()
│ │     IL_0073: nop
│ │     IL_0074: ldarg.0
│ │     IL_0075: ldfld int32 C/'<CompletedTask>d__0'::'<i>5__1'
│ │     IL_007a: stloc.3
│ │     IL_007b: ldarg.0
│ │     IL_007c: ldloc.3
│ │     IL_007d: ldc.i4.1
│ │     IL_007e: add
│ │     IL_007f: stfld int32 C/'<CompletedTask>d__0'::'<i>5__1'
│ │     
│ └───> IL_0084: ldarg.0
│       IL_0085: ldfld int32 C/'<CompletedTask>d__0'::'<i>5__1'
│       IL_008a: ldc.i4.s 100
│       IL_008c: clt
│       IL_008e: stloc.s 4
│       // sequence point: hidden
│       IL_0090: ldloc.s 4
└─────< IL_0092: brtrue.s IL_0018

    IL_0094: leave.s IL_00b0
} // end .try
@CarolEidt
Copy link

@benaadams - thanks for creating this issue; although the JIT is not very aggressive with loop opts the register allocator doesn't always deal well with complex control flow, and changing this may also make it easier for a higher-tier or static compiler to optimize.

@jcouv
Copy link
Member

jcouv commented Apr 27, 2023

As part of discussion with @AndyAyersMS in issue #67774, we've proposed a possible Roslyn codegen change for async lowering.
We could use codegen for loop with awaits that follows the one we use for try blocks with awaits: one dispatch at the start of the method brings the execution to the entry of the try and another dispatch inside the try handles the next step.

This should not be too hard to prototype so that we can do some measurements. It's not yet clear how much gain we could expect from this and whether the trade-off with increased IL size is worthwhile.

@jcouv jcouv self-assigned this Apr 27, 2023
@jcouv jcouv added Code Gen Quality Room for improvement in the quality of the compiler's generated code and removed Bug labels Apr 27, 2023
@jcouv jcouv added this to the Compiler.Next milestone Aug 29, 2023
@jcouv
Copy link
Member

jcouv commented Aug 29, 2023

@AndyAyersMS Did you have any more thoughts on whether this is something worth pursuing (for .NET 9)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Code Gen Quality Room for improvement in the quality of the compiler's generated code
Projects
None yet
Development

No branches or pull requests

5 participants