-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Fix liveness for partial defs of parent locals #85654
JIT: Fix liveness for partial defs of parent locals #85654
Conversation
Partial defs in liveness are modelled as full uses of all fields and then a full def of the entire local. The logic that handled fields directly got that right, but the logic that handled parent locals did not. For example, for IR like ``` ------------ BB18 [045..046), preds={BB16} succs={BB19} ***** BB18 STMT00096 ( INL10 @ 0x01F[E-] ... ??? ) <- INL04 @ ??? <- INLRT @ 0x045[E-] N003 ( 5, 4) [000403] -A------R-- ▌ ASG byref N002 ( 3, 2) [000402] D------N--- ├──▌ LCL_VAR byref V73 tmp45 N001 ( 1, 1) [000401] ----------- └──▌ LCL_VAR long V43 tmp15 ***** BB18 STMT00097 ( INL10 @ 0x026[E-] ... ??? ) <- INL04 @ ??? <- INLRT @ 0x045[E-] N003 ( 5, 4) [000407] -A------R-- ▌ ASG int N002 ( 3, 2) [000406] D------N--- ├──▌ LCL_VAR int V74 tmp46 N001 ( 1, 1) [000405] ----------- └──▌ LCL_VAR int V42 tmp14 ***** BB18 STMT00072 ( INL04 @ 0x073[--] ... ??? ) <- INLRT @ 0x045[E-] N007 ( 14, 14) [000627] -A--------- ▌ COMMA void N003 ( 7, 7) [000623] -A------R-- ├──▌ ASG byref N002 ( 3, 4) [000621] U------N--- │ ├──▌ LCL_FLD byref (P) V12 loc3 [+16] │ ├──▌ ref field V12._managedArray (fldOffset=0x0) -> V57 tmp29 │ ├──▌ long field V12._allocatedMemory (fldOffset=0x8) -> V58 tmp30 │ ├──▌ byref field V12._reference (fldOffset=0x10) -> V59 tmp31 │ ├──▌ int field V12._length (fldOffset=0x18) -> V60 tmp32 N001 ( 3, 2) [000622] ----------- │ └──▌ LCL_VAR byref V73 tmp45 N006 ( 7, 7) [000626] -A------R-- └──▌ ASG int N005 ( 3, 4) [000624] U------N--- ├──▌ LCL_FLD int (P) V12 loc3 [+24] ├──▌ ref field V12._managedArray (fldOffset=0x0) -> V57 tmp29 ├──▌ long field V12._allocatedMemory (fldOffset=0x8) -> V58 tmp30 ├──▌ byref field V12._reference (fldOffset=0x10) -> V59 tmp31 ├──▌ int field V12._length (fldOffset=0x18) -> V60 tmp32 N004 ( 3, 2) [000625] ----------- └──▌ LCL_VAR int V74 tmp46 ``` we would see ``` BB18 USE(6)={V58 V57 V59 V60 V42 V43 } DEF(2)={ V73 V74} ``` which is obviously incorrect as V57-V60 are all defined under this model. This would lead to an assert in SSA since SSA did treat this as a def.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsPartial defs in liveness are modelled as full uses of all fields and then a full def of the entire local. The logic that handled fields directly got that right, but the logic that handled parent locals did not. For example, for IR like ------------ BB18 [045..046), preds={BB16} succs={BB19}
***** BB18
STMT00096 ( INL10 @ 0x01F[E-] ... ??? ) <- INL04 @ ??? <- INLRT @ 0x045[E-]
N003 ( 5, 4) [000403] -A------R-- ▌ ASG byref
N002 ( 3, 2) [000402] D------N--- ├──▌ LCL_VAR byref V73 tmp45
N001 ( 1, 1) [000401] ----------- └──▌ LCL_VAR long V43 tmp15
***** BB18
STMT00097 ( INL10 @ 0x026[E-] ... ??? ) <- INL04 @ ??? <- INLRT @ 0x045[E-]
N003 ( 5, 4) [000407] -A------R-- ▌ ASG int
N002 ( 3, 2) [000406] D------N--- ├──▌ LCL_VAR int V74 tmp46
N001 ( 1, 1) [000405] ----------- └──▌ LCL_VAR int V42 tmp14
***** BB18
STMT00072 ( INL04 @ 0x073[--] ... ??? ) <- INLRT @ 0x045[E-]
N007 ( 14, 14) [000627] -A--------- ▌ COMMA void
N003 ( 7, 7) [000623] -A------R-- ├──▌ ASG byref
N002 ( 3, 4) [000621] U------N--- │ ├──▌ LCL_FLD byref (P) V12 loc3 [+16]
│ ├──▌ ref field V12._managedArray (fldOffset=0x0) -> V57 tmp29
│ ├──▌ long field V12._allocatedMemory (fldOffset=0x8) -> V58 tmp30
│ ├──▌ byref field V12._reference (fldOffset=0x10) -> V59 tmp31
│ ├──▌ int field V12._length (fldOffset=0x18) -> V60 tmp32
N001 ( 3, 2) [000622] ----------- │ └──▌ LCL_VAR byref V73 tmp45
N006 ( 7, 7) [000626] -A------R-- └──▌ ASG int
N005 ( 3, 4) [000624] U------N--- ├──▌ LCL_FLD int (P) V12 loc3 [+24]
├──▌ ref field V12._managedArray (fldOffset=0x0) -> V57 tmp29
├──▌ long field V12._allocatedMemory (fldOffset=0x8) -> V58 tmp30
├──▌ byref field V12._reference (fldOffset=0x10) -> V59 tmp31
├──▌ int field V12._length (fldOffset=0x18) -> V60 tmp32
N004 ( 3, 2) [000625] ----------- └──▌ LCL_VAR int V74 tmp46 we would see BB18 USE(6)={V58 V57 V59 V60 V42 V43 }
DEF(2)={ V73 V74} which is obviously incorrect as V57-V60 are all defined under this model. This would lead to an assert in SSA since SSA did treat this as a def. Fixes and issue I saw in #85569. (Side note; the LCL_FLDs produced here by block morphing are obviously suboptimal, but that's a different issue.)
|
/azp run Fuzzlyn |
Azure Pipelines successfully started running 1 pipeline(s). |
Opened #85672 for the Fuzzlyn failure. The runtime failure looks preexisting. cc @dotnet/jit-contrib PTAL @AndyAyersMS @SingleAccretion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the nature of the diffs?
Side note: this could potentially be more precise like SSA/VN are (though I would not expect it to be worth it, the latter were written this way to not depend on liveness being conservative).
@@ -97,24 +97,21 @@ void Compiler::fgMarkUseDef(GenTreeLclVarCommon* tree) | |||
|
|||
for (unsigned i = varDsc->lvFieldLclStart; i < varDsc->lvFieldLclStart + varDsc->lvFieldCnt; ++i) | |||
{ | |||
noway_assert(lvaTable[i].lvIsStructField); | |||
if (lvaTable[i].lvTracked) | |||
if (!lvaTable[i].lvTracked) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like bitMask
above is now dead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
TP impact looks interesting. |
More DCE due to the more precise (/correct) liveness. For example: ------------ BB08 [022..031) (return), preds={BB04,BB07} succs={}
***** BB08
STMT00015 ( INL06 @ 0x00E[E-] ... ??? ) <- INLRT @ 0x022[E-]
[000054] -A--------- ▌ ASG int
[000053] D------N--- ├──▌ LCL_VAR int V06 tmp4
[000052] ----------- └──▌ LCL_VAR int V05 tmp3
***** BB08
STMT00007 ( 0x02F[E-] ... ??? )
[000020] ----------- ▌ RETURN struct
[000019] ----------- └──▌ LCL_VAR struct<System.Text.Json.JsonReaderOptions, 8>(P) V01 loc0
└──▌ int V01.<unknown class>:_maxDepth (offs=0x00) -> V06 tmp4
└──▌ ubyte V01.<unknown class>:_commentHandling (offs=0x04) -> V07 tmp5
└──▌ bool V01.<unknown class>:<AllowTrailingCommas>k__BackingField (offs=0x05) -> V08 tmp6 Previously we would see that -BB08 USE(4)={V05 V06 V07 V08}
+BB08 USE(3)={V05 V07 V08}
DEF(1)={ V06 }
Yeah, no additional opportunities on win-x64 from doing this. I assume we don't really come by the LCL_FLDs where it would matter. Maybe worth it to retry after #85569. |
In the benchmarks.run_pgo collection the more precise liveness (which now agrees with which sees about 3.5% reduction in instructions executed. This context is duplicated 102 (!) times in the collection: We should look into this, cc @EgorBo also if you have any ideas. |
Or perhaps it's expected and simply an artifact of how the benchmarks are run/merged? |
Looks weird that we only have two contexts for "Instrumented tier" - I'm also unaware how they're merged (cc @AndyAyersMS) but it looks like a method for some reason was promoted several times (It might happen, but not 100 times) |
Failures are known. SPMI failures are due to JIT-EE GUID change. |
The issue is likely how SPMI compares contexts that access PGO data --currently it is sensitive to small changes in counter values (see |
@AndyAyersMS Presumably SPMI should be "less sensitive" to counter data, and we should assume that PGO stress modes should cover dealing with different actual numbers? |
Perhaps, but I am not sure yet how to do the most appropriate approximate comparison. Also, the current comparison doesn't take class or method profiles into account. So it is arguably both too sensitive and not sensitive enough. It somewhat hinges on what implications we think we should be able to draw from aggregate SPMI data. Here's a half-baked idea: extend the surviving context to include a weight factor which is the number of merged contexts it represents, and then use the weight as a factor when aggregating, so say for diffs, if a given method context appeared 1000 times when collecting then its diffs would be multiplied by 1000. That way the overall "score" reflects the impact across the full underlying collection and not the strength or weakness of the merging algorithm. |
MCH files currently don't have any notion of "importance" of any particular MC so when we see diffs, we treat them all the same. I like the idea of introducing some concept of importance. PGO data makes sense as one determiner. In "run" collections, everything collected was actually run at least once, so when we merge contexts, maybe we should bump up a count on the merged context. On the other hand, for PMI collections, we probably create too many generic instantiations; maybe the count for each should be artificially lowered. |
Partial defs in liveness are modelled as full uses of all fields and then a full def of the entire local. The logic that handled fields directly got that right, but the logic that handled parent locals did not.
For example, for IR like
we would see
which is obviously incorrect as V57-V60 are all defined under this model (and
V59-V60
are defined under any reasonable model). This would lead to an assert in SSA since SSA did treat this as a def.Fixes an issue I saw in #85569.
(Side note; the LCL_FLDs produced here by block morphing are obviously suboptimal, but that's a different issue.)