Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Fix liveness for partial defs of parent locals #85654

Merged

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented May 2, 2023

Partial defs in liveness are modelled as full uses of all fields and then a full def of the entire local. The logic that handled fields directly got that right, but the logic that handled parent locals did not.

For example, for IR like

------------ BB18 [045..046), preds={BB16} succs={BB19}

***** BB18
STMT00096 ( INL10 @ 0x01F[E-] ... ??? ) <- INL04 @ ??? <- INLRT @ 0x045[E-]
N003 (  5,  4) [000403] -A------R--ASG       byref
N002 (  3,  2) [000402] D------N---                         ├──▌  LCL_VAR   byref  V73 tmp45
N001 (  1,  1) [000401] -----------                         └──▌  LCL_VAR   long   V43 tmp15

***** BB18
STMT00097 ( INL10 @ 0x026[E-] ... ??? ) <- INL04 @ ??? <- INLRT @ 0x045[E-]
N003 (  5,  4) [000407] -A------R--ASG       int
N002 (  3,  2) [000406] D------N---                         ├──▌  LCL_VAR   int    V74 tmp46
N001 (  1,  1) [000405] -----------                         └──▌  LCL_VAR   int    V42 tmp14

***** BB18
STMT00072 ( INL04 @ 0x073[--] ... ??? ) <- INLRT @ 0x045[E-]
N007 ( 14, 14) [000627] -A---------COMMA     void
N003 (  7,  7) [000623] -A------R--                         ├──▌  ASG       byref
N002 (  3,  4) [000621] U------N---                         │  ├──▌  LCL_FLD   byref (P) V12 loc3         [+16]
                                                            │  ├──▌    ref    field V12._managedArray (fldOffset=0x0) -> V57 tmp29
                                                            │  ├──▌    long   field V12._allocatedMemory (fldOffset=0x8) -> V58 tmp30
                                                            │  ├──▌    byref  field V12._reference (fldOffset=0x10) -> V59 tmp31
                                                            │  ├──▌    int    field V12._length (fldOffset=0x18) -> V60 tmp32
N001 (  3,  2) [000622] -----------                         │  └──▌  LCL_VAR   byref  V73 tmp45
N006 (  7,  7) [000626] -A------R--                         └──▌  ASG       int
N005 (  3,  4) [000624] U------N---                            ├──▌  LCL_FLD   int   (P) V12 loc3         [+24]
                                                               ├──▌    ref    field V12._managedArray (fldOffset=0x0) -> V57 tmp29
                                                               ├──▌    long   field V12._allocatedMemory (fldOffset=0x8) -> V58 tmp30
                                                               ├──▌    byref  field V12._reference (fldOffset=0x10) -> V59 tmp31
                                                               ├──▌    int    field V12._length (fldOffset=0x18) -> V60 tmp32
N004 (  3,  2) [000625] -----------                            └──▌  LCL_VAR   int    V74 tmp46

we would see

BB18 USE(6)={V58 V57 V59 V60 V42 V43        }
     DEF(2)={                        V73 V74}

which is obviously incorrect as V57-V60 are all defined under this model (and V59-V60 are defined under any reasonable model). This would lead to an assert in SSA since SSA did treat this as a def.

Fixes an issue I saw in #85569.

(Side note; the LCL_FLDs produced here by block morphing are obviously suboptimal, but that's a different issue.)

Partial defs in liveness are modelled as full uses of all fields and
then a full def of the entire local. The logic that handled fields
directly got that right, but the logic that handled parent locals did
not.

For example, for IR like

```
------------ BB18 [045..046), preds={BB16} succs={BB19}

***** BB18
STMT00096 ( INL10 @ 0x01F[E-] ... ??? ) <- INL04 @ ??? <- INLRT @ 0x045[E-]
N003 (  5,  4) [000403] -A------R--                         ▌  ASG       byref
N002 (  3,  2) [000402] D------N---                         ├──▌  LCL_VAR   byref  V73 tmp45
N001 (  1,  1) [000401] -----------                         └──▌  LCL_VAR   long   V43 tmp15

***** BB18
STMT00097 ( INL10 @ 0x026[E-] ... ??? ) <- INL04 @ ??? <- INLRT @ 0x045[E-]
N003 (  5,  4) [000407] -A------R--                         ▌  ASG       int
N002 (  3,  2) [000406] D------N---                         ├──▌  LCL_VAR   int    V74 tmp46
N001 (  1,  1) [000405] -----------                         └──▌  LCL_VAR   int    V42 tmp14

***** BB18
STMT00072 ( INL04 @ 0x073[--] ... ??? ) <- INLRT @ 0x045[E-]
N007 ( 14, 14) [000627] -A---------                         ▌  COMMA     void
N003 (  7,  7) [000623] -A------R--                         ├──▌  ASG       byref
N002 (  3,  4) [000621] U------N---                         │  ├──▌  LCL_FLD   byref (P) V12 loc3         [+16]
                                                            │  ├──▌    ref    field V12._managedArray (fldOffset=0x0) -> V57 tmp29
                                                            │  ├──▌    long   field V12._allocatedMemory (fldOffset=0x8) -> V58 tmp30
                                                            │  ├──▌    byref  field V12._reference (fldOffset=0x10) -> V59 tmp31
                                                            │  ├──▌    int    field V12._length (fldOffset=0x18) -> V60 tmp32
N001 (  3,  2) [000622] -----------                         │  └──▌  LCL_VAR   byref  V73 tmp45
N006 (  7,  7) [000626] -A------R--                         └──▌  ASG       int
N005 (  3,  4) [000624] U------N---                            ├──▌  LCL_FLD   int   (P) V12 loc3         [+24]
                                                               ├──▌    ref    field V12._managedArray (fldOffset=0x0) -> V57 tmp29
                                                               ├──▌    long   field V12._allocatedMemory (fldOffset=0x8) -> V58 tmp30
                                                               ├──▌    byref  field V12._reference (fldOffset=0x10) -> V59 tmp31
                                                               ├──▌    int    field V12._length (fldOffset=0x18) -> V60 tmp32
N004 (  3,  2) [000625] -----------                            └──▌  LCL_VAR   int    V74 tmp46
```

we would see

```
BB18 USE(6)={V58 V57 V59 V60 V42 V43        }
     DEF(2)={                        V73 V74}
```

which is obviously incorrect as V57-V60 are all defined under this
model. This would lead to an assert in SSA since SSA did treat this as a
def.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 2, 2023
@ghost ghost assigned jakobbotsch May 2, 2023
@ghost
Copy link

ghost commented May 2, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Partial defs in liveness are modelled as full uses of all fields and then a full def of the entire local. The logic that handled fields directly got that right, but the logic that handled parent locals did not.

For example, for IR like

------------ BB18 [045..046), preds={BB16} succs={BB19}

***** BB18
STMT00096 ( INL10 @ 0x01F[E-] ... ??? ) <- INL04 @ ??? <- INLRT @ 0x045[E-]
N003 (  5,  4) [000403] -A------R--ASG       byref
N002 (  3,  2) [000402] D------N---                         ├──▌  LCL_VAR   byref  V73 tmp45
N001 (  1,  1) [000401] -----------                         └──▌  LCL_VAR   long   V43 tmp15

***** BB18
STMT00097 ( INL10 @ 0x026[E-] ... ??? ) <- INL04 @ ??? <- INLRT @ 0x045[E-]
N003 (  5,  4) [000407] -A------R--ASG       int
N002 (  3,  2) [000406] D------N---                         ├──▌  LCL_VAR   int    V74 tmp46
N001 (  1,  1) [000405] -----------                         └──▌  LCL_VAR   int    V42 tmp14

***** BB18
STMT00072 ( INL04 @ 0x073[--] ... ??? ) <- INLRT @ 0x045[E-]
N007 ( 14, 14) [000627] -A---------COMMA     void
N003 (  7,  7) [000623] -A------R--                         ├──▌  ASG       byref
N002 (  3,  4) [000621] U------N---                         │  ├──▌  LCL_FLD   byref (P) V12 loc3         [+16]
                                                            │  ├──▌    ref    field V12._managedArray (fldOffset=0x0) -> V57 tmp29
                                                            │  ├──▌    long   field V12._allocatedMemory (fldOffset=0x8) -> V58 tmp30
                                                            │  ├──▌    byref  field V12._reference (fldOffset=0x10) -> V59 tmp31
                                                            │  ├──▌    int    field V12._length (fldOffset=0x18) -> V60 tmp32
N001 (  3,  2) [000622] -----------                         │  └──▌  LCL_VAR   byref  V73 tmp45
N006 (  7,  7) [000626] -A------R--                         └──▌  ASG       int
N005 (  3,  4) [000624] U------N---                            ├──▌  LCL_FLD   int   (P) V12 loc3         [+24]
                                                               ├──▌    ref    field V12._managedArray (fldOffset=0x0) -> V57 tmp29
                                                               ├──▌    long   field V12._allocatedMemory (fldOffset=0x8) -> V58 tmp30
                                                               ├──▌    byref  field V12._reference (fldOffset=0x10) -> V59 tmp31
                                                               ├──▌    int    field V12._length (fldOffset=0x18) -> V60 tmp32
N004 (  3,  2) [000625] -----------                            └──▌  LCL_VAR   int    V74 tmp46

we would see

BB18 USE(6)={V58 V57 V59 V60 V42 V43        }
     DEF(2)={                        V73 V74}

which is obviously incorrect as V57-V60 are all defined under this model. This would lead to an assert in SSA since SSA did treat this as a def.

Fixes and issue I saw in #85569.

(Side note; the LCL_FLDs produced here by block morphing are obviously suboptimal, but that's a different issue.)

Author: jakobbotsch
Assignees: jakobbotsch
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch
Copy link
Member Author

/azp run Fuzzlyn

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jakobbotsch
Copy link
Member Author

Opened #85672 for the Fuzzlyn failure. The runtime failure looks preexisting.

cc @dotnet/jit-contrib PTAL @AndyAyersMS @SingleAccretion

Copy link
Contributor

@SingleAccretion SingleAccretion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the nature of the diffs?

Side note: this could potentially be more precise like SSA/VN are (though I would not expect it to be worth it, the latter were written this way to not depend on liveness being conservative).

@@ -97,24 +97,21 @@ void Compiler::fgMarkUseDef(GenTreeLclVarCommon* tree)

for (unsigned i = varDsc->lvFieldLclStart; i < varDsc->lvFieldLclStart + varDsc->lvFieldCnt; ++i)
{
noway_assert(lvaTable[i].lvIsStructField);
if (lvaTable[i].lvTracked)
if (!lvaTable[i].lvTracked)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like bitMask above is now dead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@AndyAyersMS
Copy link
Member

TP impact looks interesting.

@jakobbotsch
Copy link
Member Author

jakobbotsch commented May 3, 2023

What is the nature of the diffs?

More DCE due to the more precise (/correct) liveness. For example:

------------ BB08 [022..031) (return), preds={BB04,BB07} succs={}

***** BB08
STMT00015 ( INL06 @ 0x00E[E-] ... ??? ) <- INLRT @ 0x022[E-]
               [000054] -A---------ASG       int   
               [000053] D------N---                         ├──▌  LCL_VAR   int    V06 tmp4         
               [000052] -----------                         └──▌  LCL_VAR   int    V05 tmp3         

***** BB08
STMT00007 ( 0x02F[E-] ... ??? )
               [000020] -----------RETURN    struct
               [000019] -----------                         └──▌  LCL_VAR   struct<System.Text.Json.JsonReaderOptions, 8>(P) V01 loc0         
                                                            └──▌    int    V01.<unknown class>:_maxDepth (offs=0x00) -> V06 tmp4         
                                                            └──▌    ubyte  V01.<unknown class>:_commentHandling (offs=0x04) -> V07 tmp5         
                                                            └──▌    bool   V01.<unknown class>:<AllowTrailingCommas>k__BackingField (offs=0x05) -> V08 tmp6         

Previously we would see that V06 was defined, but then get to the use of V01 and since V07/V08 weren't defined, we would mark V06 as used-before-def unnecessarily:

-BB08 USE(4)={V05 V06 V07 V08}
+BB08 USE(3)={V05     V07 V08}
      DEF(1)={    V06        }

Side note: this could potentially be more precise like SSA/VN are (though I would not expect it to be worth it, the latter were written this way to not depend on liveness being conservative).

Yeah, no additional opportunities on win-x64 from doing this. I assume we don't really come by the LCL_FLDs where it would matter. Maybe worth it to retry after #85569.

@jakobbotsch
Copy link
Member Author

jakobbotsch commented May 3, 2023

TP impact looks interesting.

In the benchmarks.run_pgo collection the more precise liveness (which now agrees with fgComputeLifeUntrackedLocal) leads to a bunch of cases where we now need one less iteration to reach the fixpoint. One case is
System.Reflection.CustomAttribute:AddCustomAttributes(byref,System.Reflection.RuntimeModule,int,System.RuntimeType,bool,System.RuntimeType+ListBuilder`1[System.Object])

which sees about 3.5% reduction in instructions executed. This context is duplicated 102 (!) times in the collection:
image

We should look into this, cc @EgorBo also if you have any ideas.

@jakobbotsch
Copy link
Member Author

This context is duplicated 102 (!) times in the collection:

Or perhaps it's expected and simply an artifact of how the benchmarks are run/merged?

@EgorBo
Copy link
Member

EgorBo commented May 3, 2023

This context is duplicated 102 (!) times in the collection:

Or perhaps it's expected and simply an artifact of how the benchmarks are run/merged?

Looks weird that we only have two contexts for "Instrumented tier" - I'm also unaware how they're merged (cc @AndyAyersMS) but it looks like a method for some reason was promoted several times (It might happen, but not 100 times)

@jakobbotsch
Copy link
Member Author

Failures are known. SPMI failures are due to JIT-EE GUID change.

@jakobbotsch jakobbotsch merged commit edd6d63 into dotnet:main May 3, 2023
@jakobbotsch jakobbotsch deleted the fix-liveness-partial-defs-promoted-uses branch May 3, 2023 13:06
@AndyAyersMS
Copy link
Member

This context is duplicated 102 (!) times in the collection:

Or perhaps it's expected and simply an artifact of how the benchmarks are run/merged?

Looks weird that we only have two contexts for "Instrumented tier" - I'm also unaware how they're merged (cc @AndyAyersMS) but it looks like a method for some reason was promoted several times (It might happen, but not 100 times)

The issue is likely how SPMI compares contexts that access PGO data --currently it is sensitive to small changes in counter values (see MethodContext::dumpMethodIdentityInfoToBuffer).

@BruceForstall
Copy link
Member

@AndyAyersMS Presumably SPMI should be "less sensitive" to counter data, and we should assume that PGO stress modes should cover dealing with different actual numbers?

@AndyAyersMS
Copy link
Member

@AndyAyersMS Presumably SPMI should be "less sensitive" to counter data, and we should assume that PGO stress modes should cover dealing with different actual numbers?

Perhaps, but I am not sure yet how to do the most appropriate approximate comparison. Also, the current comparison doesn't take class or method profiles into account. So it is arguably both too sensitive and not sensitive enough.

It somewhat hinges on what implications we think we should be able to draw from aggregate SPMI data.

Here's a half-baked idea: extend the surviving context to include a weight factor which is the number of merged contexts it represents, and then use the weight as a factor when aggregating, so say for diffs, if a given method context appeared 1000 times when collecting then its diffs would be multiplied by 1000. That way the overall "score" reflects the impact across the full underlying collection and not the strength or weakness of the merging algorithm.

@BruceForstall
Copy link
Member

MCH files currently don't have any notion of "importance" of any particular MC so when we see diffs, we treat them all the same. I like the idea of introducing some concept of importance. PGO data makes sense as one determiner. In "run" collections, everything collected was actually run at least once, so when we merge contexts, maybe we should bump up a count on the merged context. On the other hand, for PMI collections, we probably create too many generic instantiations; maybe the count for each should be artificially lowered.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants