Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arm64 jitstress failing with encoding_found assert #61944

Closed
AndyAyersMS opened this issue Nov 22, 2021 · 13 comments · Fixed by #62106
Closed

arm64 jitstress failing with encoding_found assert #61944

AndyAyersMS opened this issue Nov 22, 2021 · 13 comments · Fixed by #62106
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@AndyAyersMS
Copy link
Member

https://dev.azure.com/dnceng/public/_build/results?buildId=1475284&view=ms.vss-test-web.build-test-results-tab&runId=42280206&resultId=106406&paneView=debug

We regressed somewhere in this range. Likely suspect in bold.

8275c37 Update DNNE version (#61738)
95e3144 Create runtime clone to manually kick off full test runs (#61641)
581d4d2 Hide 'align' instruction behind jmp (#60787)
7f874ee Remove race condition from DllImportGenerator build (#61695)
0d25969 [wasm][debugger] Tie sdb agent lifetime to the ExecutionContext and simplify the api (#61392)
66b31ca Upload dasm files as artifacts for "asmdiffs pipeline" (#61700)
30550d6 Remove DisableImplicitNamespaceImports_DotNet (#61656)
9962c10 src/tests tree test xunit-based source generated runner (#60846)
f9e3e28 Update docker image (#61217)
16300e0 Skip HardwareIntrinsics/x86base/Pause tests on Mono (#61743)
899bf97 Merge System.Security.Cryptography.Algorithms to System.Security.Cryptography
13024af HostFactoryResolver - Increase default timeout to thirty seconds (#61621)
8cf0b19 A few follow up changes to LookupTypeKey change (#61718)
25237fa Factor out and improve the vectorization of RegexInterpreter.FindFirstChar (#61490)
a508fb5 [DllImportGenerator] Enable on projects without System.Memory and System.Runtime.CompilerServices.Unsafe (#61704)
de883e3 Remove XUnit reference metadata from tests I switched over last week (#61691)
c03f4ec refactoring (#60074)
5331f21 Ignore missing data result from superpmi-replay run (#61699)
83661ff Remove some unnecessary slicing from generated Regex code (#61701)
44d28bf Extend RegexCharClass.Canonicalize range inversion optimization (#61562)
d1b3816 Minor File.ReadAllBytes* improvements (#61519)
10e107d [wasm][debugger] Fix evaluation of a static class attribute; using current namespace for evaluation (#61252)
0666ebc [mono][aot] Fix the inclusion of generic instances when doing profiled AOT. (#61627)
da0e0f7 [wasm] renames and cleanup before modularization (#61596)

Typical failure

Assert failure(PID 107 [0x0000006b], Thread: 107 [0x006b]): Assertion failed 'encoding_found' in 'Span.SpanBench:TestSpanFill(System.Byte[],int)' during 'Emit code' (IL size 37)

File: /__w/1/s/src/coreclr/jit/emitarm64.cpp Line: 2203

cc @dotnet/jit-contrib

@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Nov 22, 2021
@ghost
Copy link

ghost commented Nov 22, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

https://dev.azure.com/dnceng/public/_build/results?buildId=1475284&view=ms.vss-test-web.build-test-results-tab&runId=42280206&resultId=106406&paneView=debug

We regressed somewhere in this range. Likely suspect in bold.

8275c37 Update DNNE version (#61738)
95e3144 Create runtime clone to manually kick off full test runs (#61641)
581d4d2 Hide 'align' instruction behind jmp (#60787)
7f874ee Remove race condition from DllImportGenerator build (#61695)
0d25969 [wasm][debugger] Tie sdb agent lifetime to the ExecutionContext and simplify the api (#61392)
66b31ca Upload dasm files as artifacts for "asmdiffs pipeline" (#61700)
30550d6 Remove DisableImplicitNamespaceImports_DotNet (#61656)
9962c10 src/tests tree test xunit-based source generated runner (#60846)
f9e3e28 Update docker image (#61217)
16300e0 Skip HardwareIntrinsics/x86base/Pause tests on Mono (#61743)
899bf97 Merge System.Security.Cryptography.Algorithms to System.Security.Cryptography
13024af HostFactoryResolver - Increase default timeout to thirty seconds (#61621)
8cf0b19 A few follow up changes to LookupTypeKey change (#61718)
25237fa Factor out and improve the vectorization of RegexInterpreter.FindFirstChar (#61490)
a508fb5 [DllImportGenerator] Enable on projects without System.Memory and System.Runtime.CompilerServices.Unsafe (#61704)
de883e3 Remove XUnit reference metadata from tests I switched over last week (#61691)
c03f4ec refactoring (#60074)
5331f21 Ignore missing data result from superpmi-replay run (#61699)
83661ff Remove some unnecessary slicing from generated Regex code (#61701)
44d28bf Extend RegexCharClass.Canonicalize range inversion optimization (#61562)
d1b3816 Minor File.ReadAllBytes* improvements (#61519)
10e107d [wasm][debugger] Fix evaluation of a static class attribute; using current namespace for evaluation (#61252)
0666ebc [mono][aot] Fix the inclusion of generic instances when doing profiled AOT. (#61627)
da0e0f7 [wasm] renames and cleanup before modularization (#61596)

Typical failure

Assert failure(PID 107 [0x0000006b], Thread: 107 [0x006b]): Assertion failed 'encoding_found' in 'Span.SpanBench:TestSpanFill(System.Byte[],int)' during 'Emit code' (IL size 37)

File: /__w/1/s/src/coreclr/jit/emitarm64.cpp Line: 2203

cc @dotnet/jit-contrib

Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Nov 23, 2021
@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone Nov 23, 2021
@AndyAyersMS
Copy link
Member Author

Does indeed seem to be alignment related -- something about the interaction of alignment padding and STRESS_EMIT. Details soon.

@AndyAyersMS
Copy link
Member Author

When we switch to using breakpoints instead of nops under stress, we end up with the wrong form.

  • brk is IF_SI_0A
  • nop is IF_SN_0A

Think the fix may be simple...

* thread #1, name = 'corerun', stop reason = signal SIGTRAP
  * frame #0: 0x0000ffffef67c7bc libclrjit.so`DBG_DebugBreak at debugbreak.S:7
    frame #1: 0x0000ffffef61d194 libclrjit.so`::DebugBreak() at debug.cpp:406:9
    frame #2: 0x0000ffffef3af4d8 libclrjit.so`::assertAbort(why=0x0000ffffef70e1c9, file=0x0000ffffef70d32e, line=2203) at error.cpp:303:9
    frame #3: 0x0000ffffef5b003c libclrjit.so`emitter::emitInsCode(this=<unavailable>, ins=INS_brk, fmt=IF_SN_0A) at emitarm64.cpp:2203:5
    frame #4: 0x0000ffffef5cae78 libclrjit.so`emitter::emitOutputInstr(this=0x0000aaaaaac741f0, ig=0x0000aaaaaac90468, id=0x0000aaaaaac906f4, dp=0x0000ffffffffb398) at emitarm64.cpp:11478:24
    frame #5: 0x0000ffffef3a6280 libclrjit.so`emitter::emitIssue1Instr(this=0x0000aaaaaac741f0, ig=0x0000aaaaaac90468, id=0x0000aaaaaac906f4, dp=0x0000ffffffffb398) at emit.cpp:3835:10

@AndyAyersMS
Copy link
Member Author

Problem is only on non-windows, this is the likely culprit:

inline const instruction INS_BREAKPOINT_osHelper()
{
return TargetOS::IsUnix ? INS_brk : INS_bkpt;
}
#define INS_BREAKPOINT INS_BREAKPOINT_osHelper()

Fix is either to use INS_bkpt explicitly in this code

#ifdef DEBUG
// Under STRESS_EMITTER, if this is the 'align' before the 'jmp' instruction,
// then add "bkpt" instruction.
instrDescAlign* alignInstr = (instrDescAlign*)id;
if (emitComp->compStressCompile(Compiler::STRESS_EMITTER, 50) &&
(alignInstr->idaIG != alignInstr->idaLoopHeadPredIG) && !skipIns)
{
// There is no good way to squeeze in "bkpt" as well as display it
// in the disassembly because there is no corresponding instrDesc for
// it. As such, leave it as is, the "0xD43E0000" bytecode will be seen
// next to the nop instruction in disasm.
// e.g. D43E0000 align [4 bytes for IG07]
ins = INS_BREAKPOINT;
}
#endif

or give up on this bit of stress until we can find another way to go about it.

@AndyAyersMS
Copy link
Member Author

Fixing the instruction is not enough. We also seem to emit these in places where they can be executed. For example:

(lldb) clru 0000ffff7ea342e0
Normal JIT generated code
<PrivateImplementationDetails>.ComputeStringHash(System.String)
ilAddr is 0000FFFFF412F4C0 pImport is 000000003AF77060
Begin 0000FFFF7EA342E0, size 70
>>> 0000ffff7ea342e0 fd7bbea9             stp     x29, x30, [sp, #-0x20]!
0000ffff7ea342e4 fd030091             mov     x29, sp
0000ffff7ea342e8 a11f40b9             ldr     w1, [x29, #0x1c]
0000ffff7ea342ec 800200b4             cbz     x0, 0xffff7ea3433c
0000ffff7ea342f0 a1b89352             mov     w1, #0x9dc5
0000ffff7ea342f4 8123b072             movk    w1, #0x811c, lsl #16
0000ffff7ea342f8 e2031f2a             mov     w2, wzr
0000ffff7ea342fc 030840b9             ldr     w3, [x0, #0x8]
0000ffff7ea34300 7f000071             cmp     w3, #0x0
0000ffff7ea34304 cd010054             b.le    0xffff7ea3433c
0000ffff7ea34308 a11f00b9             str     w1, [x29, #0x1c]
0000ffff7ea3430c 00300091             add     x0, x0, #0xc
0000ffff7ea34310 64328052             mov     w4, #0x193
0000ffff7ea34314 0420a072             movk    w4, #0x100, lsl #16
0000ffff7ea34318 1f2003d5             nop     
0000ffff7ea3431c 00003ed4             brk     #0xf000                 // this gets hit at runtime
0000ffff7ea34320 05586278             ldrh    w5, [x0, w2, uxtw #1]
0000ffff7ea34324 a11f40b9             ldr     w1, [x29, #0x1c]
0000ffff7ea34328 2100054a             eor     w1, w1, w5
0000ffff7ea3432c 817c011b             mul     w1, w4, w1
0000ffff7ea34330 42040011             add     w2, w2, #0x1
0000ffff7ea34334 5f00036b             cmp     w2, w3
0000ffff7ea34338 8b000054             b.lt    0xffff7ea34348
0000ffff7ea3433c e003012a             mov     w0, w1
0000ffff7ea34340 fd7bc2a8             ldp     x29, x30, [sp], #0x20
0000ffff7ea34344 c0035fd6             ret     
0000ffff7ea34348 a11f00b9             str     w1, [x29, #0x1c]
0000ffff7ea3434c f5ffff17             b       0xffff7ea34320

So suggest for now we disable this bit of code.

@kunalspathak
Copy link
Member

I agree. Let's disable the emitting of bkpt/int3 on xarch/arm64 under stress mode. I will enable it once I get back.

@AndyAyersMS
Copy link
Member Author

I don't see this failing for xarch, so just disabled arm64 for now.

kunalspathak pushed a commit that referenced this issue Nov 23, 2021
… stress (#61951)

* Disable converting nop padding to breakpoints for arm64 under jit stress.

Workaround for issue #61944.

* disable differently

* fix
@BruceForstall
Copy link
Member

This is a dup of #61824

@AndyAyersMS
Copy link
Member Author

Hopefully worked around for now via #61951.

We possibly still may see failures from #61939.

@kunalspathak
Copy link
Member

From Armv8 manual:

D1.17.20 Software Breakpoint Instruction exception

These are the exception syndromes with the following EC values:
• 0x38, BKPT instruction executed in AArch32 state.
• 0x3C, BRK instruction executed in AArch64 state.

So it seems that we should really check for Arm vs. Arm64 to decide whether to use BKPT vs. BRK.

@TamarChristinaArm - can you please confirm?

@kunalspathak
Copy link
Member

We already set BKPT for Arm32, but for Arm64, we use BRK in Unix and BKPT otherwise. Not sure why it was written like that.

@TamarChristinaArm
Copy link
Contributor

TamarChristinaArm commented Nov 29, 2021

So it seems that we should really check for Arm vs. Arm64 to decide whether to use BKPT vs. BRK.

@TamarChristinaArm - can you please confirm?

Indeed, Arm64 should use BRK, the encoding for BKPT is undefined on Arm64.

We already set BKPT for Arm32, but for Arm64, we use BRK in Unix and BKPT otherwise. Not sure why it was written like that.

From what I remember, this was because BKPT was used on Arm64 but with the encoding of BRK. This was corrected in #892 (comment) But I guess this usage was missed. Perhaps it's easier to use INS_BREAKPOINT here? Which already seems to do platform abstractions.

Note that the immediate also matters, from what I remember, GDB and Window need different immediates. Windows wants 0xf000 and GDB will enter a signal loop for anything aside from 0x0.

@kunalspathak
Copy link
Member

Thanks @TamarChristinaArm .

I think below portion of your #892 (comment) was never done. I will do it in my upcoming PR.

I would suggest renaming bkpt in instrsarm64.h to something else to avoid this confusion.. My suggestion would be brk_windows and brk_linux.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 31, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants