JIT: Move backward jumps to before their successors after RPO-based layout #102461

amanasifkhalid · 2024-05-20T15:49:03Z

Part of #93020. In #102343, we noticed the RPO-based layout sometimes makes suboptimal decisions in terms of placing a block's hottest predecessor before it -- in particular, this affects loops that aren't entered at the top (comment). To address this, after establishing a baseline RPO layout, fgMoveBackwardJumpsToSuccessors will try to move backward unconditional jumps to right behind their targets to create fallthrough, if the predecessor block is sufficiently hot.

Since the new layout isn't enabled in CI yet, here are the diffs with the baseline using the new RPO layout, on Windows x64: gist. This change is a net size regression, though the PerfScore diffs look promising, and I see many instances of the inverted loop shape fixed. TP impact is <=0.03% over doing just the RPO-based layout.

cc @dotnet/jit-contrib

dotnet-policy-service · 2024-05-20T15:49:39Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

amanasifkhalid · 2024-05-20T15:55:26Z

New codegen for the example linked in this comment:

; Assembly listing for method Program:Sum(int[]):int (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; fully interruptible
; No PGO data
; Final local variable assignments
;
;  V00 arg0         [V00,T02] (  4,  7   )     ref  ->  rcx         class-hnd single-def <int[]>
;* V01 loc0         [V01,T05] (  0,  0   )     int  ->  zero-ref
;  V02 loc1         [V02,T04] (  4,  6   )     int  ->  rax
;  V03 OutArgs      [V03    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;  V04 cse0         [V04,T01] (  3, 10   )     int  ->  r10         "CSE #04: aggressive"
;  V05 cse1         [V05,T03] (  2,  9   )     int  ->  rdx         hoist "CSE #01: aggressive"
;  V06 rat0         [V06,T00] (  5, 17   )    long  ->   r8         "Widened IV V01"
;
; Lcl frame size = 40

G_M53154_IG01:  ;; offset=0x0000
       sub      rsp, 40
                                                ;; size=4 bbWeight=1 PerfScore 0.25
G_M53154_IG02:  ;; offset=0x0004
       xor      eax, eax
       mov      edx, dword ptr [rcx+0x08]
       xor      r8d, r8d
       jmp      SHORT G_M53154_IG04
                                                ;; size=10 bbWeight=1 PerfScore 4.50
G_M53154_IG03:  ;; offset=0x000E
       add      eax, r10d
       inc      r8d
                                                ;; size=6 bbWeight=2 PerfScore 1.00
G_M53154_IG04:  ;; offset=0x0014
       cmp      edx, r8d
       jle      SHORT G_M53154_IG06
                                                ;; size=5 bbWeight=8 PerfScore 10.00
G_M53154_IG05:  ;; offset=0x0019
       mov      r10d, dword ptr [rcx+4*r8+0x10]
       test     r10d, r10d
       jne      SHORT G_M53154_IG03
                                                ;; size=10 bbWeight=4 PerfScore 13.00
G_M53154_IG06:  ;; offset=0x0023
       add      rsp, 40
       ret
                                                ;; size=5 bbWeight=1 PerfScore 1.25

; Total bytes of code 40, prolog size 4, PerfScore 30.00, instruction count 14, allocated bytes for code 40 (MethodHash=477a305d) for method Program:Sum(int[]):int (FullOpts)
; ============================================================

amanasifkhalid · 2024-05-20T16:33:26Z

I added a condition to fgMoveBlocksToHottestSuccessors to ensure loop heads are moved to the top of the loop, and updated the diffs link above. Adding this condition worsened size/PerfScore regressions, though I imagine we want to take this on principle? Here's the new layout for the System.Collections.Tests.Perf_BitArray.BitArrayNot(Size: 512) benchmark (notice BB11 now precedes BB12):

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight        IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1      897152 [000..039)-> BB23(0.001),BB08(0),BB07(0),BB06(0),BB05(0),BB04(0),BB03(0),BB02(0),BB09(0.999)[def] (switch)                     i IBC
BB09 [0009]  1       BB01                  1.00   896255 [071..0D4)-> BB13(0),BB10(1)         ( cond )                     i IBC nullcheck
BB10 [0033]  1       BB09                  1.00   896255 [0D6..???)-> BB12(1)                 (always)                     IBC internal
BB11 [0018]  1       BB12                 15.66 14051500 [0D6..0F7)-> BB12(1)                 (always)                     i IBC loophead bwd bwd-target
BB12 [0019]  2       BB10,BB11            16.65 14939683 [0F7..107)-> BB11(0.941),BB21(0.0595)  ( cond )                     i IBC bwd bwd-src
BB21 [0028]  4       BB12,BB13,BB16,BB20   1.00   896255 [15A..15E)-> BB20(0),BB23(1)         ( cond )                     i IBC bwd bwd-src
BB23 [0029]  3       BB01,BB08,BB21        1.00   897152 [15E..16E)                           (return)                     i IBC
BB13 [0021]  1       BB09                  0           0 [109..11A)-> BB21(0.48),BB16(0.52)   ( cond )                     i IBC rare
BB16 [0025]  2       BB13,BB15             0           0 [13D..14D)-> BB15(0.9),BB21(0.1)     ( cond )                     i IBC rare bwd bwd-src
BB15 [0024]  1       BB16                  0           0 [11C..13D)-> BB16(1)                 (always)                     i IBC rare loophead bwd bwd-target
BB20 [0027]  1       BB21                  0           0 [14F..15A)-> BB21(1)                 (always)                     i IBC rare loophead idxlen bwd bwd-target
BB02 [0002]  1       BB01                  0           0 [03B..042)-> BB03(1)                 (always)                     i IBC rare idxlen
BB03 [0003]  2       BB01,BB02             0           0 [042..049)-> BB04(1)                 (always)                     i IBC rare idxlen
BB04 [0004]  2       BB01,BB03             0           0 [049..050)-> BB05(1)                 (always)                     i IBC rare idxlen
BB05 [0005]  2       BB01,BB04             0           0 [050..057)-> BB06(1)                 (always)                     i IBC rare idxlen
BB06 [0006]  2       BB01,BB05             0           0 [057..05E)-> BB07(1)                 (always)                     i IBC rare idxlen
BB07 [0007]  2       BB01,BB06             0           0 [05E..065)-> BB08(1)                 (always)                     i IBC rare idxlen
BB08 [0008]  2       BB01,BB07             0           0 [065..071)-> BB23(1)                 (always)                     i IBC rare idxlen
BB24 [0038]  0                             0             [???..???)                           (throw )                     i rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Edit: Looking at the diffs again, I don't think this exception is a good idea. Reverted.

This reverts commit f9d81a9.

AndyAyersMS · 2024-05-20T17:34:51Z

It's probably worth considering only BBJ_ALWAYS and BBJ_COND here, which makes finding the preferred successor simpler.

In the phoenix version the similar bit of code keyed off of RPO back edges. If a back edge source was not a BBJ_COND, and the target of the back edge was not following its most likely pred, we'd move the back edge source block up. Another way of saying this is that we'd really like the last block in the loop body to be an exit block.

amanasifkhalid · 2024-05-20T17:57:30Z

In the phoenix version the similar bit of code keyed off of RPO back edges. If a back edge source was not a BBJ_COND, and the target of the back edge was not following its most likely pred, we'd move the back edge source block up. Another way of saying this is that we'd really like the last block in the loop body to be an exit block.

I see, I think I can simplify this PR quite a bit then. I'm guessing Phoenix wouldn't do this for BBJ_COND blocks with hot back-edges because those could be loop exits? It might be too narrow in scope, but maybe fgMoveBlocksToHottestSuccessors should only consider BBJ_ALWAYS back-edges to fix any mis-rotated loops?

(also, sorry to loop you in while you're OOF)

amanasifkhalid · 2024-05-20T21:55:00Z

I've simplified my approach to only consider backward, unconditional jumps, and this seems to solve all the mis-rotated loop examples we looked at in #102343 while keeping churn and TP cost pretty low. This implementation is pretty narrow in the flowgraph shapes it's trying to fix, but since this work was motivated by the misshapen loop regressions, I think we can keep this conservative until we find a need for other passes to tweak the RPO-based order.

@EgorBo do you have any time to look at this today? If not, no worries; I think @jakobbotsch will be online tomorrow. Thanks!

EgorBo · 2024-05-20T21:59:43Z

LGTM as far as I can tell, I presume the newer commits you've pushed should still be zero diffs since it's RPO is not enabled, right?

amanasifkhalid · 2024-05-20T22:02:42Z

I presume the newer commits you've pushed should still be zero diffs since it's RPO is not enabled, right?

Yes, that's right. I re-ran SPMI locally and updated the gist. TP impact is now <= 0.03%.

jakobbotsch

LGTM as well. I think a simple targeted fix like this is just fine, but of course it would be nice with something a little more general in the future.

I saw you commented on #9304, but I wasn't sure what the final result ended up being; does this PR help there as well?

amanasifkhalid · 2024-05-21T14:46:20Z

Thank you for the reviews! No diffs.

I saw you commented on #9304, but I wasn't sure what the final result ended up being; does this PR help there as well?

It does fix the loop rotation issue there as well, though the final layout isn't any better than what we started with. I'll elaborate more over there.

Follow-up to #102461, and part of #9304. Compacting blocks after establishing an RPO-based layout, but before moving backward jumps to fall into their successors, can enable more opportunities for branch removal; see #9304 for one such example.

Follow-up to dotnet#102461, and part of dotnet#9304. Compacting blocks after establishing an RPO-based layout, but before moving backward jumps to fall into their successors, can enable more opportunities for branch removal; see dotnet#9304 for one such example.

…ayout (dotnet#102461) Part of dotnet#93020. In dotnet#102343, we noticed the RPO-based layout sometimes makes suboptimal decisions in terms of placing a block's hottest predecessor before it -- in particular, this affects loops that aren't entered at the top. To address this, after establishing a baseline RPO layout, fgMoveBackwardJumpsToSuccessors will try to move backward unconditional jumps to right behind their targets to create fallthrough, if the predecessor block is sufficiently hot.

Follow-up to dotnet#102461, and part of dotnet#9304. Compacting blocks after establishing an RPO-based layout, but before moving backward jumps to fall into their successors, can enable more opportunities for branch removal; see dotnet#9304 for one such example.

amanasifkhalid added 3 commits May 20, 2024 10:25

wip

893c9a4

Add fgMoveBlocksToHottestSuccessors

3d31c8b

Add comment

3ae278f

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 20, 2024

dotnet-policy-service bot assigned amanasifkhalid May 20, 2024

amanasifkhalid mentioned this pull request May 20, 2024

JIT: Enable RPO-based block layout by default #102343

Merged

Add loop head exception

f9d81a9

JulieLeeMSFT requested review from EgorBo and jakobbotsch May 20, 2024 16:52

JulieLeeMSFT added the Priority:2 Work that is important, but not critical for the release label May 20, 2024

Revert "Add loop head exception"

a5b9c21

This reverts commit f9d81a9.

Only move unconditional backward jumps

608f1bd

amanasifkhalid changed the title ~~JIT: Move blocks up to their hottest successors after RPO-based layout~~ JIT: Move backward jumps to before their successors after RPO-based layout May 20, 2024

EgorBo approved these changes May 20, 2024

View reviewed changes

jakobbotsch approved these changes May 21, 2024

View reviewed changes

amanasifkhalid merged commit 68f3edc into dotnet:main May 21, 2024
104 of 107 checks passed

amanasifkhalid deleted the move-hot-blocks branch May 21, 2024 14:47

amanasifkhalid mentioned this pull request May 21, 2024

JIT: poor CQ for list enumeration constructs #9304

Open

amanasifkhalid mentioned this pull request May 21, 2024

JIT: Compact blocks in fgMoveBackwardJumpsToSuccessors #102512

Merged

amanasifkhalid mentioned this pull request May 28, 2024

Widespread perf regressions due to RPO layout #102763

Open

github-actions bot locked and limited conversation to collaborators Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Move backward jumps to before their successors after RPO-based layout #102461

JIT: Move backward jumps to before their successors after RPO-based layout #102461

amanasifkhalid commented May 20, 2024 •

edited

Loading

dotnet-policy-service bot commented May 20, 2024

amanasifkhalid commented May 20, 2024 •

edited

Loading

amanasifkhalid commented May 20, 2024 •

edited

Loading

AndyAyersMS commented May 20, 2024

amanasifkhalid commented May 20, 2024 •

edited

Loading

amanasifkhalid commented May 20, 2024

EgorBo commented May 20, 2024

amanasifkhalid commented May 20, 2024

jakobbotsch left a comment

amanasifkhalid commented May 21, 2024

JIT: Move backward jumps to before their successors after RPO-based layout #102461

JIT: Move backward jumps to before their successors after RPO-based layout #102461

Conversation

amanasifkhalid commented May 20, 2024 • edited Loading

dotnet-policy-service bot commented May 20, 2024

amanasifkhalid commented May 20, 2024 • edited Loading

amanasifkhalid commented May 20, 2024 • edited Loading

AndyAyersMS commented May 20, 2024

amanasifkhalid commented May 20, 2024 • edited Loading

amanasifkhalid commented May 20, 2024

EgorBo commented May 20, 2024

amanasifkhalid commented May 20, 2024

jakobbotsch left a comment

Choose a reason for hiding this comment

amanasifkhalid commented May 21, 2024

amanasifkhalid commented May 20, 2024 •

edited

Loading

amanasifkhalid commented May 20, 2024 •

edited

Loading

amanasifkhalid commented May 20, 2024 •

edited

Loading

amanasifkhalid commented May 20, 2024 •

edited

Loading