-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Establish loop invariant base case based on IR #97182
Conversation
Avoid having a cross-phase dependency on loop inversion here. Instead, validate that the condition is an actual zero-trip test.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsAvoid having a cross-phase dependency on loop inversion here. Instead, validate that the condition is an actual zero-trip test. A few diffs expected due to the removal of the quirk in loop cloning; those are cases where we prove the loop invariant is trivially true in the base case.
|
Diff results for #97182Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 2,259,627 contexts (1,008,044 MinOpts, 1,251,583 FullOpts). MISSED contexts: base: 1 (0.00%), diff: 2 (0.00%) Overall (-2,772 bytes)
FullOpts (-2,772 bytes)
Assembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,249,836 contexts (981,298 MinOpts, 1,268,538 FullOpts). MISSED contexts: base: 0 (0.00%), diff: 1 (0.00%) Overall (-1,606 bytes)
FullOpts (-1,606 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 2,029,494 contexts (927,368 MinOpts, 1,102,126 FullOpts). MISSED contexts: base: 0 (0.00%), diff: 1 (0.00%) Overall (-2,852 bytes)
FullOpts (-2,852 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 2,070,984 contexts (937,853 MinOpts, 1,133,131 FullOpts). MISSED contexts: base: 1 (0.00%), diff: 5 (0.00%) Overall (-2,836 bytes)
FullOpts (-2,836 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,098,661 contexts (926,221 MinOpts, 1,172,440 FullOpts). MISSED contexts: base: 1 (0.00%), diff: 3 (0.00%) Overall (-1,662 bytes)
FullOpts (-1,662 bytes)
Details here Assembly diffs for linux/arm ran on linux/x86Diffs are based on 2,053,638 contexts (830,101 MinOpts, 1,223,537 FullOpts). MISSED contexts: base: 71,236 (3.35%), diff: 71,237 (3.35%) Overall (+226 bytes)
FullOpts (+226 bytes)
Assembly diffs for windows/x86 ran on linux/x86Diffs are based on 2,291,556 contexts (838,165 MinOpts, 1,453,391 FullOpts). MISSED contexts: base: 0 (0.00%), diff: 7 (0.00%) Overall (+280 bytes)
FullOpts (+280 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (-0.01% to +0.00%)
MinOpts (-0.00% to +0.01%)
FullOpts (-0.02% to +0.00%)
Throughput diffs for linux/x64 ran on windows/x64Overall (-0.01% to +0.00%)
FullOpts (-0.02% to +0.00%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (-0.01% to +0.00%)
FullOpts (-0.03% to +0.00%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (-0.01% to +0.00%)
MinOpts (-0.01% to +0.00%)
FullOpts (-0.02% to +0.00%)
Throughput diffs for windows/x64 ran on windows/x64Overall (-0.01% to +0.00%)
FullOpts (-0.02% to +0.00%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (-0.01% to +0.00%)
FullOpts (-0.02% to +0.00%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-0.01% to +0.00%)
FullOpts (-0.02% to +0.00%)
Details here |
cc @dotnet/jit-contrib PTAL @AndyAyersMS since Bruce is out. This passed CI but I misclicked the "Update branch" button instead of "Ready for review", so hopefully CI will pass again... Diffs, as mentioned above due to the removal of the quirk in loop cloning. We can prove some new IV structures legal with |
Diff results for #97182Assembly diffsAssembly diffs for linux/arm ran on windows/x86Diffs are based on 2,053,638 contexts (830,101 MinOpts, 1,223,537 FullOpts). MISSED contexts: base: 71,236 (3.35%), diff: 71,237 (3.35%) Overall (+226 bytes)
FullOpts (+226 bytes)
Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,291,556 contexts (838,165 MinOpts, 1,453,391 FullOpts). MISSED contexts: base: 0 (0.00%), diff: 7 (0.00%) Overall (+280 bytes)
FullOpts (+280 bytes)
Details here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, feel free to address comments subsequently.
return true; | ||
} | ||
|
||
for (FlowEdge* enterEdge : EntryEdges()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is handling a preheader, can you add a comment? Can do this in a follow-up.
Also we'd expect that entering
is BBJ_ALWAYS
The check in optExtractInitTestIncr
is still using bbFallsThrough
so I wonder if we're missing some cases from that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is handling a preheader, can you add a comment? Can do this in a follow-up.
Also we'd expect that
entering
isBBJ_ALWAYS
The check in
optExtractInitTestIncr
is still usingbbFallsThrough
so I wonder if we're missing some cases from that.
Currently FlowGraphNaturalLoop::AnalyzeIteration
is general enough to be used before preheaders have been created. We don't actually use it during that, but still I kept this similarly general (and matching optExtractInitTestIncr
).
We could probably clean up these two methods at the same time. I agree we should also generalize optExtractInitTestIncr
, in particular to handle loops we did not invert. We could also consider using dominators to try harder to prove the "loop invariant" to be true when initially entered, since I think we usually have them available in the places that call FlowGraphNaturalLoop::AnalyzeIteration
, so we could do something similar to RBO here.
I'll add a comment. I also just noticed that I forgot the equality check on the limits, so need to add that too.
src/coreclr/jit/flowgraph.cpp
Outdated
} | ||
|
||
//------------------------------------------------------------------------ | ||
// CondInitBlockEnterSide: Determine whether a BBJ_COND init block enters the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming nit -- from the name I thought initially this was checking for side entries into loops.
Maybe something like InitBlockEntersLoopOnTrue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed it.
Diff results for #97182Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 2,259,627 contexts (1,008,044 MinOpts, 1,251,583 FullOpts). MISSED contexts: base: 1 (0.00%), diff: 2 (0.00%) Overall (-2,772 bytes)
FullOpts (-2,772 bytes)
Assembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,249,836 contexts (981,298 MinOpts, 1,268,538 FullOpts). MISSED contexts: base: 0 (0.00%), diff: 1 (0.00%) Overall (-1,606 bytes)
FullOpts (-1,606 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 2,029,494 contexts (927,368 MinOpts, 1,102,126 FullOpts). MISSED contexts: base: 0 (0.00%), diff: 1 (0.00%) Overall (-2,852 bytes)
FullOpts (-2,852 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 2,070,984 contexts (937,853 MinOpts, 1,133,131 FullOpts). MISSED contexts: base: 1 (0.00%), diff: 5 (0.00%) Overall (-2,836 bytes)
FullOpts (-2,836 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,098,661 contexts (926,221 MinOpts, 1,172,440 FullOpts). MISSED contexts: base: 1 (0.00%), diff: 3 (0.00%) Overall (-1,662 bytes)
FullOpts (-1,662 bytes)
Details here Assembly diffs for linux/arm ran on linux/x86Diffs are based on 2,053,638 contexts (830,101 MinOpts, 1,223,537 FullOpts). MISSED contexts: base: 71,236 (3.35%), diff: 71,237 (3.35%) Overall (+226 bytes)
FullOpts (+226 bytes)
Assembly diffs for windows/x86 ran on linux/x86Diffs are based on 2,291,532 contexts (838,165 MinOpts, 1,453,367 FullOpts). MISSED contexts: base: 24 (0.00%), diff: 31 (0.00%) Overall (+280 bytes)
FullOpts (+280 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (-0.01% to +0.00%)
FullOpts (-0.02% to +0.00%)
Throughput diffs for linux/x64 ran on windows/x64Overall (-0.01% to +0.00%)
FullOpts (-0.02% to +0.00%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (-0.01% to +0.00%)
FullOpts (-0.03% to +0.00%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (-0.01% to +0.00%)
FullOpts (-0.02% to +0.00%)
Throughput diffs for windows/x64 ran on windows/x64Overall (-0.01% to +0.00%)
FullOpts (-0.02% to +0.00%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (-0.01% to +0.00%)
FullOpts (-0.02% to +0.00%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-0.01% to +0.00%)
FullOpts (-0.02% to +0.00%)
Details here |
Diff results for #97182Assembly diffsAssembly diffs for linux/arm ran on windows/x86Diffs are based on 2,053,638 contexts (830,101 MinOpts, 1,223,537 FullOpts). MISSED contexts: base: 71,236 (3.35%), diff: 71,237 (3.35%) Overall (+226 bytes)
FullOpts (+226 bytes)
Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,291,532 contexts (838,165 MinOpts, 1,453,367 FullOpts). MISSED contexts: base: 24 (0.00%), diff: 31 (0.00%) Overall (+280 bytes)
FullOpts (+280 bytes)
Details here |
Looks like I have some failures to investigate (probably some kind of conflict with #97488). |
Hmm, well the issue seems introduced by #97488. |
Diff results for #97182Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 2,259,462 contexts (1,008,044 MinOpts, 1,251,418 FullOpts). MISSED contexts: base: 159 (0.01%), diff: 160 (0.01%) Overall (+42,496 bytes)
FullOpts (+42,496 bytes)
Assembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,249,694 contexts (981,298 MinOpts, 1,268,396 FullOpts). MISSED contexts: base: 134 (0.01%), diff: 135 (0.01%) Overall (+36,024 bytes)
FullOpts (+36,024 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 2,029,378 contexts (927,368 MinOpts, 1,102,010 FullOpts). MISSED contexts: base: 109 (0.01%), diff: 110 (0.01%) Overall (+28,984 bytes)
FullOpts (+28,984 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 2,070,840 contexts (937,853 MinOpts, 1,132,987 FullOpts). MISSED contexts: base: 139 (0.01%), diff: 143 (0.01%) Overall (+36,936 bytes)
FullOpts (+36,936 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,098,518 contexts (926,221 MinOpts, 1,172,297 FullOpts). MISSED contexts: base: 138 (0.01%), diff: 140 (0.01%) Overall (+33,203 bytes)
FullOpts (+33,203 bytes)
Details here Assembly diffs for linux/arm ran on windows/x86Diffs are based on 2,053,494 contexts (830,101 MinOpts, 1,223,393 FullOpts). MISSED contexts: base: 71,368 (3.36%), diff: 71,369 (3.36%) Overall (+34,456 bytes)
FullOpts (+34,456 bytes)
Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,290,734 contexts (838,165 MinOpts, 1,452,569 FullOpts). MISSED contexts: base: 808 (0.04%), diff: 815 (0.04%) Overall (+28,904 bytes)
FullOpts (+28,904 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (+0.00% to +0.02%)
MinOpts (-0.00% to +0.01%)
FullOpts (+0.00% to +0.03%)
Throughput diffs for linux/x64 ran on windows/x64Overall (+0.00% to +0.03%)
FullOpts (+0.00% to +0.03%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (+0.00% to +0.02%)
FullOpts (+0.00% to +0.03%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (+0.00% to +0.03%)
MinOpts (-0.01% to +0.00%)
FullOpts (+0.00% to +0.04%)
Throughput diffs for windows/x64 ran on windows/x64Overall (-0.00% to +0.03%)
FullOpts (-0.00% to +0.04%)
Details here Throughput diffs for linux/arm ran on windows/x86Overall (+0.00% to +0.03%)
FullOpts (+0.00% to +0.04%)
Throughput diffs for windows/x86 ran on windows/x86Overall (+0.00% to +0.04%)
FullOpts (+0.00% to +0.05%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (+0.00% to +0.02%)
FullOpts (+0.00% to +0.03%)
Throughput diffs for linux/x64 ran on linux/x64Overall (+0.00% to +0.03%)
FullOpts (+0.00% to +0.03%)
Details here |
Diff results for #97182Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (-0.00% to +0.02%)
FullOpts (-0.01% to +0.03%)
Throughput diffs for linux/x64 ran on windows/x64Overall (-0.01% to +0.03%)
FullOpts (-0.01% to +0.04%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (-0.00% to +0.02%)
FullOpts (-0.01% to +0.03%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (-0.01% to +0.03%)
FullOpts (-0.01% to +0.04%)
Throughput diffs for windows/x64 ran on windows/x64Overall (-0.00% to +0.03%)
FullOpts (-0.00% to +0.04%)
Details here |
Diff results for #97182Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 2,507,309 contexts (1,007,092 MinOpts, 1,500,217 FullOpts). MISSED contexts: base: 8 (0.00%), diff: 1 (0.00%) Overall (+40,840 bytes)
FullOpts (+40,840 bytes)
Assembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,517,900 contexts (991,070 MinOpts, 1,526,830 FullOpts). MISSED contexts: base: 8 (0.00%), diff: 1 (0.00%) Overall (+36,557 bytes)
FullOpts (+36,557 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 2,270,860 contexts (932,669 MinOpts, 1,338,191 FullOpts). MISSED contexts: base: 9 (0.00%), diff: 2 (0.00%) Overall (+28,752 bytes)
FullOpts (+28,752 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 2,341,100 contexts (938,449 MinOpts, 1,402,651 FullOpts). MISSED contexts: base: 8 (0.00%), diff: 9 (0.00%) Overall (+36,536 bytes)
FullOpts (+36,536 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,512,201 contexts (997,391 MinOpts, 1,514,810 FullOpts). MISSED contexts: base: 8 (0.00%), diff: 3 (0.00%) Overall (+34,541 bytes)
FullOpts (+34,541 bytes)
Details here Assembly diffs for linux/arm ran on windows/x86Diffs are based on 2,239,390 contexts (829,328 MinOpts, 1,410,062 FullOpts). MISSED contexts: base: 71,273 (3.08%), diff: 71,274 (3.08%) Overall (+33,226 bytes)
FullOpts (+33,226 bytes)
Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,293,488 contexts (839,658 MinOpts, 1,453,830 FullOpts). MISSED contexts: base: 1 (0.00%), diff: 8 (0.00%) Overall (+29,276 bytes)
FullOpts (+29,276 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (-0.00% to +0.02%)
FullOpts (-0.01% to +0.03%)
Throughput diffs for linux/x64 ran on windows/x64Overall (-0.01% to +0.03%)
FullOpts (-0.01% to +0.04%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (-0.00% to +0.02%)
FullOpts (-0.01% to +0.03%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (-0.01% to +0.03%)
FullOpts (-0.01% to +0.04%)
Throughput diffs for windows/x64 ran on windows/x64Overall (-0.00% to +0.03%)
FullOpts (-0.00% to +0.04%)
Details here Throughput diffs for linux/arm ran on windows/x86Overall (+0.00% to +0.03%)
FullOpts (+0.00% to +0.04%)
Throughput diffs for windows/x86 ran on windows/x86Overall (+0.00% to +0.04%)
FullOpts (+0.00% to +0.05%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (-0.01% to +0.02%)
FullOpts (-0.01% to +0.02%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-0.01% to +0.02%)
FullOpts (-0.01% to +0.03%)
Details here |
We stop loop cloning in some more OSR cases with this change. For example, in Considering loop L02 to clone for optimizations.
Analyzing iteration for L02 with header BB04
Preheader = BB02
Checking exiting block BB08
Init = [000111], test = [000005], incr = [000047]
- Condition is established before entry at [000020]
- IterVar = V09
- Test is [000004] (invariant local limit )
-Checking loop L02 for optimization candidates (array bounds)
-Found ArrIndex at BB06 STMT00009 tree [000189] which is equivalent to: V01[V10], bounds check nodes: [000189]
-Induction V09 is not used as index on dim 0
-Found ArrIndex at BB06 STMT00010 tree [000202] which is equivalent to: V06[V10], bounds check nodes: [000202]
-V06 is not loop invariant
-Found ArrIndex at BB03 STMT00002 tree [000214] which is equivalent to: V02[V09], bounds check nodes: [000214]
-Loop L02 can be cloned for ArrIndex V02[V09] on dim 0
-Found ArrIndex at BB03 STMT00003 tree [000227] which is equivalent to: V03[V09], bounds check nodes: [000227]
-Loop L02 can be cloned for ArrIndex V03[V09] on dim 0
+ Init block BB14 enters the loop when condition [000020] evaluates to false
+ Relop does not involve iteration variable
+ Loop condition may not be true on the first iteration
+Loop cloning: rejecting loop L02. Could not analyze iteration. The test is ------------ BB08 [0010] [052..05E) -> BB03,BB09 (cond), preds={BB04,BB07} succs={BB09,BB03}
***** BB08 [0010]
STMT00012 ( 0x052[E-] ... 0x056 )
[000047] DA---+----- ▌ STORE_LCL_VAR int V09 loc4
[000046] -----+----- └──▌ ADD int
[000044] -----+----- ├──▌ LCL_VAR int V09 loc4
[000045] -----+----- └──▌ CNS_INT int 1
***** BB08 [0010]
STMT00001 ( 0x058[E-] ... 0x05C )
( 9, 7) [000005] ----------- ▌ JTRUE void
( 7, 5) [000004] J------N--- └──▌ LT int
( 3, 2) [000002] ----------- ├──▌ LCL_VAR int V09 loc4
( 3, 2) [000003] ----------- └──▌ LCL_VAR int V04 arg4 while ------------ BB14 [0005] [02D..039) -> BB10,BB02 (cond), preds={BB11,BB13} succs={BB02,BB10}
***** BB14 [0005]
STMT00029 ( 0x02D[E-] ... 0x031 )
[000111] DA---+----- ▌ STORE_LCL_VAR int V10 loc5
[000110] -----+----- └──▌ ADD int
[000108] -----+----- ├──▌ LCL_VAR int V10 loc5
[000109] -----+----- └──▌ CNS_INT int 1
***** BB14 [0005]
STMT00005 ( 0x033[E-] ... 0x037 )
( 7, 6) [000021] ----------- ▌ JTRUE void
( 5, 4) [000020] J------N--- └──▌ LT int
( 3, 2) [000018] ----------- ├──▌ LCL_VAR int V10 loc5
( 1, 1) [000019] ----------- └──▌ CNS_INT int 101 Clearly these are unrelated compares and In both base and diff we fail to clone the inner loop ***** BB06 [0008]
STMT00011 ( 0x046[E-] ... 0x04A )
[000043] DA---+----- ▌ STORE_LCL_VAR int V10 loc5
[000042] -----+----- └──▌ ADD int
[000040] -----+----- ├──▌ LCL_VAR int V10 loc5
[000041] -----+----- └──▌ CNS_INT int 1
------------ BB07 [0009] [04C..052) -> BB06,BB08 (cond), preds={BB06} succs={BB08,BB06}
***** BB07 [0009]
STMT00008 ( 0x04C[E-] ... 0x050 )
( 7, 6) [000029] ----------- ▌ JTRUE void
( 5, 4) [000028] J------N--- └──▌ LT int
( 3, 2) [000026] ----------- ├──▌ LCL_VAR int V10 loc5
( 1, 1) [000027] ----------- └──▌ CNS_INT int 101 I think we should be able to generalize |
A large part of the diff is a repeated one: [11:33:08] 112 ( 3.03% of base) : 565304.dasm - System.Text.StringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):System.Text.StringBuilder:this (Instrumented Tier1)
[11:33:08] 112 ( 2.97% of base) : 482762.dasm - System.Text.StringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):System.Text.StringBuilder:this (Instrumented Tier1)
[11:33:08] 112 ( 2.61% of base) : 541847.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08] 112 ( 2.61% of base) : 322125.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08] 112 ( 2.61% of base) : 560908.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08] 112 ( 2.61% of base) : 425423.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08] 112 ( 2.61% of base) : 541534.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08] 112 ( 2.61% of base) : 494106.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08] 112 ( 2.61% of base) : 439150.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08] 112 ( 2.61% of base) : 520847.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08] 112 ( 2.61% of base) : 541843.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08] 112 ( 2.61% of base) : 541839.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08] 110 ( 2.55% of base) : 306927.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08] 110 ( 2.55% of base) : 453727.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08] 110 ( 2.55% of base) : 311282.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08] 110 ( 2.55% of base) : 460376.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08] 110 ( 2.55% of base) : 571540.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08] 110 ( 2.55% of base) : 452934.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08] 110 ( 2.55% of base) : 495294.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08] 110 ( 2.55% of base) : 558635.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1) It looks like that's because of new cloning with this change. The flowgraph looks like this, and the diff with this change is: Considering loop L04 to clone for optimizations.
Analyzing iteration for L04 with header BB56
Preheader = BB54
Checking exiting block BB56
Could not extract an IV
Checking exiting block BB57
Could not extract an IV
Checking exiting block BB55
Init = [001280], test = [001306], incr = [001301]
- Loop condition is not always true
-Loop cloning: rejecting loop L04. Could not analyze iteration.
+ Init block BB53 enters the loop when condition [001284] evaluates to true
+ op1 is the iteration variable
+ Condition is established before entry at [001284]
+ IterVar = V05
+ Test is [001305] (array length limit )
+Checking loop L04 for optimization candidates (array bounds)
+Found ArrIndex at BB56 STMT00110 tree [002273] which is equivalent to: V02[V05], bounds check nodes: [002273]
+Loop L04 can be cloned for ArrIndex V02[V05] on dim 0 So it looks like we get a loop shape where we didn't do loop inversion, but we can now still prove that the loop invariant is true everywhere. The test is: ***** BB55 [0029]
STMT00305 ( INL55 @ 0x000[E-] ... ??? ) <- INLRT @ 0x177[E-]
[001301] DA--G+----- ▌ STORE_LCL_VAR int V05 loc1
[001300] ----G+----- └──▌ ADD int
[001298] -----+----- ├──▌ LCL_VAR int V05 loc1
[001299] -----+----- └──▌ CNS_INT int 1
***** BB55 [0029]
STMT00306 ( INL55 @ 0x006[E-] ... ??? ) <- INLRT @ 0x177[E-]
[001306] ---XG+----- ▌ JTRUE void
[001305] N--XG+-N-U- └──▌ GE int
[001303] -----+----- ├──▌ LCL_VAR int V05 loc1
[001304] ---X-+----- └──▌ ARR_LENGTH int
[000451] -----+----- └──▌ LCL_VAR ref V02 arg2 The dominating compare outside the loop is: ------------ BB53 [0028] [162..163) -> BB54,BB149 (cond), preds={BB52} succs={BB149,BB54}
***** BB53 [0028]
STMT00302 ( INL54 @ 0x000[E-] ... ??? ) <- INLRT @ 0x162[E-]
[001280] DA--G+----- ▌ STORE_LCL_VAR int V05 loc1
[001279] ----G+----- └──▌ ADD int
[001277] -----+----- ├──▌ LCL_VAR int V05 loc1
[001278] -----+----- └──▌ CNS_INT int 1
***** BB53 [0028]
STMT00303 ( INL54 @ 0x006[E-] ... ??? ) <- INLRT @ 0x162[E-]
[001285] ---XG+----- ▌ JTRUE void
[001284] N--XG+-N-U- └──▌ LT int
[001282] -----+----- ├──▌ LCL_VAR int V05 loc1
[001283] ---X-+----- └──▌ ARR_LENGTH int
[000419] -----+----- └──▌ LCL_VAR ref V02 arg2 If I had to guess, the loop we're cloning now is the following, after inlining runtime/src/libraries/System.Private.CoreLib/src/System/Text/ValueStringBuilder.AppendFormat.cs Lines 139 to 144 in 9429e43
|
Avoid having a cross-phase dependency on loop inversion here. Instead, validate that the condition is an actual zero-trip test.
A few diffs expected due to the removal of the quirk in loop cloning; those are cases where we prove the loop invariant is trivially true in the base case.