Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: checked/release asm diff in GitHub_17777 #64793

Closed
BruceForstall opened this issue Feb 4, 2022 · 11 comments
Closed

JIT: checked/release asm diff in GitHub_17777 #64793

BruceForstall opened this issue Feb 4, 2022 · 11 comments
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@BruceForstall
Copy link
Member

BruceForstall commented Feb 4, 2022

From #61335 (comment):

The pipeline caught two asm diffs - Link.

  • unix-arm64
    -- ISSUE: <ASM_DIFF> main method 246505 of size 2418 differs
  • win-arm64
    -- ISSUE: <ASM_DIFF> main method 249554 of size 2418 differs

It's the same test in each case:

Repro.Program:Test(int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int):int (MethodHash=8ad14d6d)

windows: ccb0c159-04b3-47f6-993e-79114c9cbef8.windows.arm64\coreclr_tests.pmi.windows.arm64.checked.mch
Linux: ccb0c159-04b3-47f6-993e-79114c9cbef8.Linux.arm64\coreclr_tests.pmi.Linux.arm64.checked.mch

Maybe related to #64162?

One difference is the Checked build saves and restores all the callee-saved registers x19-x28 (and allocates arguments/locals to them, used as w19-w28), whereas Release doesn't save/restore or use them.

Unfortunately, even though it's a small test, it generates almost 300,000 arm64 instructions.

@BruceForstall BruceForstall added arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Feb 4, 2022
@BruceForstall BruceForstall added this to the 7.0.0 milestone Feb 4, 2022
@BruceForstall BruceForstall self-assigned this Feb 4, 2022
@ghost
Copy link

ghost commented Feb 4, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

From #61335 (comment):

The pipeline caught two asm diffs - Link.

  • unix-arm64
    -- ISSUE: <ASM_DIFF> main method 246505 of size 2418 differs
  • win-arm64
    -- ISSUE: <ASM_DIFF> main method 249554 of size 2418 differs

It's the same test in each case:

Repro.Program:Test(int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int):int (MethodHash=8ad14d6d)

windows: ccb0c159-04b3-47f6-993e-79114c9cbef8.Linux.arm64\coreclr_tests.pmi.Linux.arm64.checked.mch
Linux: ccb0c159-04b3-47f6-993e-79114c9cbef8.Linux.arm64\coreclr_tests.pmi.Linux.arm64.checked.mch

Maybe related to #64162?

One difference is the Checked build saves and restores all the callee-saved registers x19-x28 (but never uses them), whereas Release doesn't save/restore or use them.

Unfortunately, even though it's a small test, it generates almost 300,000 arm64 instructions.

Author: BruceForstall
Assignees: BruceForstall
Labels:

arch-arm64, area-CodeGen-coreclr

Milestone: 7.0.0

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 4, 2022
@BruceForstall BruceForstall removed the untriaged New issue has not been triaged by the area owner label Feb 4, 2022
@BruceForstall
Copy link
Member Author

The Release build is hitting this NO_WAY assert and dropping back to MinOpts:

        /* Are we overflowing? */
        if (ig->igNext && (ig->igNum + 1 != ig->igNext->igNum))
        {
            NO_WAY("Too many instruction groups");
        }

with:

		ig->igNum	1	unsigned int
		ig->igNext->igNum	4237	unsigned int

the igNum is an unsigned int, so we're not overflowing that. So maybe there's some corruption somewhere?

@BruceForstall
Copy link
Member Author

Actually, this does look like #64162, where we're overflowing the single allowed prolog IG.

We just need to increase this:

#ifdef TARGET_ARMARCH
// The only place where this limited instruction group size is a problem is
// in the prolog, where we only support a single instruction group. We should really fix that.
// ARM32 and ARM64 both can require a bigger prolog instruction group. One scenario is where
// a function uses all the incoming integer and single-precision floating-point arguments,
// and must store them all to the frame on entry. If the frame is very large, we generate
// ugly code like "movw r10, 0x488; add r10, sp; vstr s0, [r10]" for each store, which
// eats up our insGroup buffer.
#define SC_IG_BUFFER_SIZE (100 * sizeof(emitter::instrDesc) + 14 * SMALL_IDSC_SIZE)
#else // !TARGET_ARMARCH
#define SC_IG_BUFFER_SIZE (50 * sizeof(emitter::instrDesc) + 14 * SMALL_IDSC_SIZE)
#endif // !TARGET_ARMARCH

Perhaps some of the additional instructions generated recently for zero initialization, stack probing, etc., are eating up the static size that was available.

This number doesn't ensure the same number of instructions fit it both DEBUG and non-DEBUG builds, which is why DEBUG might not be failing.

@AndyAyersMS
Copy link
Member

we only support a single instruction group. We should really fix that.

This also impacts OSR on this test ... a fix would be nice.

@BruceForstall
Copy link
Member Author

This also impacts OSR on this test ... a fix would be nice.

Do you have any guidance on how many additional instructions OSR might require in the prolog, in the worst case?

@BruceForstall
Copy link
Member Author

Current IG buffer stats for ARM64:

DEBUG build:

SC_IG_BUFFER_SIZE             = 2624
SMALL_IDSC_SIZE per IG buffer = 164
instrDesc per IG buffer       = 109
instrDescJmp per IG buffer    = 46
instrDescCns per IG buffer    = 82
instrDescDsp per IG buffer    = 82
instrDescCnsDsp per IG buffer = 65
instrDescCGCA per IG buffer   = 41

Release build:

SC_IG_BUFFER_SIZE             = 1712
SMALL_IDSC_SIZE per IG buffer = 214
instrDesc per IG buffer       = 107
instrDescJmp per IG buffer    = 35
instrDescCns per IG buffer    = 71
instrDescDsp per IG buffer    = 71
instrDescCnsDsp per IG buffer = 53
instrDescCGCA per IG buffer   = 30

@BruceForstall
Copy link
Member Author

I propose to basically double the arm32/arm64 insGroup buffer, to:

#define SC_IG_BUFFER_SIZE (200 * sizeof(emitter::instrDesc))

Which gives (for this test case, for the per-test stats):

DEBUG build:

SC_IG_BUFFER_SIZE             = 4800
SMALL_IDSC_SIZE per IG buffer = 300
instrDesc per IG buffer       = 200
instrDescJmp per IG buffer    = 85
instrDescCns per IG buffer    = 150
instrDescDsp per IG buffer    = 150
instrDescCnsDsp per IG buffer = 120
instrDescCGCA per IG buffer   = 75
...
Max prolog instrDesc count:       83
Max prolog insGroup size  :     2448

Release Build:

SC_IG_BUFFER_SIZE             = 3200
SMALL_IDSC_SIZE per IG buffer = 400
instrDesc per IG buffer       = 200
instrDescJmp per IG buffer    = 66
instrDescCns per IG buffer    = 133
instrDescDsp per IG buffer    = 133
instrDescCnsDsp per IG buffer = 100
instrDescCGCA per IG buffer   = 57
...
Max prolog instrDesc count:       83
Max prolog insGroup size  :     1784

There is nothing scientific about this number, other than it fixes this problem, and leaves significant additional headroom for OSR or other scenarios.

@AndyAyersMS
Copy link
Member

What's the downside of doing this? Is it that we need a new IG per "superblock" (join-free span of blocks) during regular emission, so we may have a lot of wasted IG space in very branchy methods?

@BruceForstall
Copy link
Member Author

The downside is the single, global IG buffer is larger. So in the example above, we allocate 3200 bytes instead of 1712. If we only ever use 2000 bytes, we "waste" 1200 bytes.

All individual IGs are precisely sized when they are "saved" (when we reach a label, typically).

On the plus side, we will have fewer "extension" / "overflow" groups, because fewer blocks will reach the global buffer size.

@AndyAyersMS
Copy link
Member

Doesn't sound too bad then.

BruceForstall added a commit to BruceForstall/runtime that referenced this issue Feb 10, 2022
We require that the maximum number of prolog instructions all fit in one
instruction group. Recent changes appear to have increased the number of
instructions we are generating the prolog, leading to NOWAY assert on
Release builds and test failure on linux-arm64.

Bump up the number to avoid this problem, and leave some headroom for
possible additional needs.

Fixes dotnet#64162, dotnet#64793.
BruceForstall added a commit that referenced this issue Feb 11, 2022
We require that the maximum number of prolog instructions all fit in one
instruction group. Recent changes appear to have increased the number of
instructions we are generating the prolog, leading to NOWAY assert on
Release builds and test failure on linux-arm64.

Bump up the number to avoid this problem, and leave some headroom for
possible additional needs.

Fixes #64162, #64793.
@BruceForstall
Copy link
Member Author

Fixed by #65153

@ghost ghost locked as resolved and limited conversation to collaborators Mar 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

2 participants