New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Arm32] Invalid memory access (possibly JIT issue) #21117

Closed
aviviadi opened this Issue Nov 20, 2018 · 16 comments

Comments

Projects
None yet
4 participants
@aviviadi

aviviadi commented Nov 20, 2018

I have been chasing down an issue that crash our system on ARM 32 bits machine occasionally.
The error is a SIGSEGV or SIGABRT on memory that we are certain that we own.

We are able to reproduce this in a fairly consistent basis, but only by throwing a lot of work on the machine and I don't have a simple reproduction.

The error occur, at all times, on this line of code: Unsafe.CopyBlockUnaligned()

We have been able to capture this in lldb and have the following information:

(lldb) Process 18813 stopped
* thread #29: tid = 0x49a2, 0x7664d55e, name = 'Raven.Server', stop reason = signal SIGSEGV: address access protected (fault address: 0x520d5000)
    frame #0: 0x7664d55e
->  0x7664d55e: stmdavs r11, {r0, r1, r11, sp, lr}
    0x7664d562: stcllt p6, c15, [sp, #-772]!
    0x7664d566: .long  0xe92d0000                ; unknown opcode
    0x7664d56a: svcge  #0x34ff0

The fault address is: 0x520d5000

Looking at smaps, we can confirm that this is indeed an address that we shouldn't access:

520c5000-520d5000 rw-s 05b90000 08:01 1310760    /mnt/external/TmpDataDir/Databases/zz/Temp/scratch.0000000002.buffers
Size:                 64 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  64 kB
Pss:                  64 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:        64 kB
Private_Dirty:         0 kB
Referenced:           64 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
VmFlags: rd wr sh mr mw me ms
520d5000-520d6000 ---p 00000000 00:00 0
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB

However, note that we own the memory just before this bit.
We have added additional tracing to the code and we believe that the actual failure happened when we call:

Unsafe.CopyBlockUnaligned(0x520D4FFD,0x4DE6F1D4,2);

The 0x4DE6F1D4 source address is allocated on the stack and is used just before the failure with:

Unsafe.CopyBlockUnaligned(0x520D4FF1,0x4DE6F1D4,8);
Unsafe.CopyBlockUnaligned(0x520D4FF9,0x4DE6F1D4,4);

// dies here
Unsafe.CopyBlockUnaligned(0x520D4FFD,0x4DE6F1D4,2);

We have a stackalloc ulong[1] variable that is used as a buffer to copy to the destination.

We are writing toward the end of the page that we own, but that is expected and should be fine because we aren't going beyond the boundary of the page.

Here is the disassembly at the time of the crash

(lldb) d
->  0x7664d55e: stmdavs r11, {r0, r1, r11, sp, lr}
    0x7664d562: stcllt p6, c15, [sp, #-772]!
    0x7664d566: .long  0xe92d0000                ; unknown opcode
    0x7664d56a: svcge  #0x34ff0
    0x7664d56e: .long  0xf8c3b081                ; unknown opcode
    0x7664d572: bgt    0x7828157a
    0x7664d576: andeq  pc, r4, #-2147483648
    0x7664d57a: svceq  #0xe8b2

And here are the registers at the crash

(lldb) register read
General Purpose Registers:
        r0 = 0x520d4ffd
        r1 = 0x4de6f1d4
        r2 = 0x00000002
        r3 = 0x00000000
        r4 = 0x520d4ffd
        r5 = 0x4de6f1d4
        r6 = 0x00000002
        r7 = 0x00000000
        r8 = 0x5c48b99c
        r9 = 0x5c48b9ac
       r10 = 0x4de6fba4
       r11 = 0x4de6f1a0
       r12 = 0x7664d559
        sp = 0x4de6f170
        lr = 0x50ea66c5
        pc = 0x7664d55e
      cpsr = 0x20000030

I'm not an expert on ARM assembly, but it looks like the STM call is writing to the r11, but while r6 looks like it contains the size, I'm not seeing this actually being used here.

Here is the full disassembly from around the location of the crash:

(lldb) di -s 0x7664d500 -e 0x7664d600
    0x7664d500: .long  0xf04f462b                ; unknown opcode
    0x7664d504: .long  0x94000403                ; unknown opcode
    0x7664d508: blx    0x7560968a
    0x7664d50c: stmdals r10, {r3, r5, r8, r11, r12, sp, pc}
    0x7664d510: .long  0xe8bdb001                ; unknown opcode
    0x7664d514: .long  0xb0044ff0                ; unknown opcode
    0x7664d518: .long  0x46844770                ; unknown opcode
    0x7664d51c: .long  0xe8bdb001                ; unknown opcode
    0x7664d520: .long  0xbc0f4ff0                ; unknown opcode
    0x7664d524: push   {r5, r6, r8, r9, r10, lr}
    0x7664d528: stc    p15, c4, [sp, #-964]!
    0x7664d52c: strlt  r0, [r2], #-2824
    0x7664d530: stmdage r10, {r0, r7, r12, sp, pc}
    0x7664d534: mrc2   p7, #0x5, apsr_nzcv, c10, c11, #0x6
    0x7664d538: .long  0xbc02b001                ; unknown opcode
    0x7664d53c: bleq   0x76888838
    0x7664d540: svchi  #0xf1e8bd
    0x7664d544: strtvc sp, [r4], r0, asr #20
    0x7664d548: strtvc sp, [r4], r4, lsl #22
    0x7664d54c: svclt  #0x82a00
    0x7664d550: stmdavs r3, {r4, r5, r6, r8, r9, r10, lr}
    0x7664d554: blt    0x7758b060
    0x7664d558: svclt  #0x82a00
    0x7664d55c: stmdavs r3, {r4, r5, r6, r8, r9, r10, lr}
    0x7664d560: .long  0xf6c1680b                ; unknown opcode
    0x7664d564: .long  0x0000bd6d                ; unknown opcode
    0x7664d568: svcmi  #0xf0e92d
    0x7664d56c: addlt  r10, r1, r3, lsl #30
    0x7664d570: andle  pc, r0, r3, asr #17
    0x7664d574: .long  0xf102ca70                ; unknown opcode
    0x7664d578: .long  0xe8b20204                ; unknown opcode
    0x7664d57c: strmi  r0, [r8, r0, lsl #30]
    0x7664d580: .long  0xe8bdb001                ; unknown opcode
    0x7664d584: strlt  r8, [r0, #0xff0]
    0x7664d588: .long  0xf8c3466f                ; unknown opcode
    0x7664d58c: ldrmi  sp, [r0, r0]
    0x7664d590: andeq  r11, r0, r0, lsl #27
    0x7664d594: andle  r2, r6, r0, lsl #20
    0x7664d598: .long  0x466fb580                ; unknown opcode
    0x7664d59c: stmdavc r11, {r0, r1, r11, r12, sp, lr}
    0x7664d5a0: ldcl   p6, c15, [r0, #-772]
    0x7664d5a4: stmdami r3, {r7, r8, r10, r11, r12, sp, pc}
    0x7664d5a8: stmdahs r0, {r11, sp, lr}
    0x7664d5ac: .long  0xf7a4bf18                ; unknown opcode
    0x7664d5b0: .long  0x4770bebf                ; unknown opcode
    0x7664d5b4: strtvc sp, [r4], r0, asr #20
    0x7664d5b8: andeq  r0, r0, r0
    0x7664d5bc: andeq  r0, r0, r0
    0x7664d5c0: svclt  #0x4770
    0x7664d5c4: svclt  #0xbf00
    0x7664d5c8: svclt  #0xbf00
    0x7664d5cc: svclt  #0xbf00
    0x7664d5d0: svchi  #0x5ff3bf
    0x7664d5d4: .long  0xf2406001                ; unknown opcode
    0x7664d5d8: .long  0xf2c00301                ; unknown opcode
    0x7664d5dc: addsmi r0, r9, #0, #6
    0x7664d5e0: .long  0xf641d30a                ; unknown opcode
    0x7664d5e4: .long  0xf2c74324                ; unknown opcode
    0x7664d5e8: bl     0x7671a324
    0x7664d5ec: ldmdavc r8, {r4, r7, r8, r9, sp}
    0x7664d5f0: svclt  #0x1c28ff
    0x7664d5f4: .long  0x701820ff                ; unknown opcode
    0x7664d5f8: andeq  r4, r0, r0, ror r7
    0x7664d5fc: andeq  r0, r0, r0
@janvorli

This comment has been minimized.

Member

janvorli commented Nov 20, 2018

@aviviadi unfortunately, the disassembly is a garbage. Either it is in some random piece of memory or lldb thinks it is ARM code while it is in fact THUMB2. Or the processor errorneously jumped to an even address.
I actually wonder how you made lldb work on arm32 at all, since I've tried many versions in the past (on different Linux distros) and none of them worked. They either weren't able to start a process at all or they could start it, but they could not hit any breakpoints. What is the distro and lldb version that you are using? And what version of dotnet are you using?

You can try to disass from an address higher by one (ARM processors use the lowest address bit to distinguish between ARM and THUMB2 modes). That may make lldb to get the right disass.

Could you also try to get stack trace at the time of failure using "bt" command?
And finally, the LR register contains return address. Can you please try to disassemble the function at
that address? disass -a 0x50ea66c5 or, if the code is a managed code, you'll need to disass using range of addresses. I would try something like disass -s 0x50ea66a5 -e 0x50ea66d5.

@benaadams

This comment has been minimized.

Collaborator

benaadams commented Nov 21, 2018

We have a stackalloc ulong[1] variable that is used as a buffer to copy to the destination.

What's the C# for this line? (e.g. are you using array initalizer for values? Was fixed C# issue if so dotnet/roslyn#29092)

@ayende

This comment has been minimized.

Contributor

ayende commented Nov 21, 2018

@benaadams We initially had a ulong value and too the address of that to use in Unsafe.BlockCopyUnaligned.
We changed that to stackalloc ulong[1] with no init to see if it would help.

I'm fairly certain that the actual problem is with the code for Unsafe, not with the stackalloc

@aviviadi

This comment has been minimized.

aviviadi commented Nov 21, 2018

@janvorli - lldb-3.9, Raspbian Stretch (RPi3), runtime 2.1.6 (happens also on earlier runtime versions)
Using ssh debugging (Visual Studio), and using lldb (also with logging to memory entries and exits from function calls), we see the SIGSEGV while in Unsafe.CopyBlockUnaligned(), always with adress 0x*****FFD and for writing 2 bytes, when we allowed to write upto 3 bytes ahead.

As for lldb-3.9, I am attaching it to the running published -r linux-arm process.

I reproduced it again with the disass as you requested. And it is here below. I am going to leave this instance up, so if you wish to see/dissasm more, it will be done on this segv reproduction

(do you think gdb is better for this investigation ?)

Architecture set to: armv6-unknown-unknown.
(lldb) process handle -s false -n false -p false SIGTRAP SIGPIPE
NAME         PASS   STOP   NOTIFY
===========  =====  =====  ======
SIGTRAP      false  false  false
SIGPIPE      false  false  false
(lldb) continue
Process 22200 resuming
Process 22200 stopped
* thread #36: tid = 0x56f1, 0x7664155e, name = 'Raven.Server', stop reason = signal SIGSEGV: address access protected (fault address: 0x4bba3000)
    frame #0: 0x7664155e
->  0x7664155e: stmdavs r11, {r0, r1, r11, sp, lr}
    0x76641562: stcllt p6, c15, [sp, #-772]!
    0x76641566: .long  0xe92d0000                ; unknown opcode
    0x7664156a: svcge  #0x34ff0

(lldb) bt
* thread #36: tid = 0x56f1, 0x7664155e, name = 'Raven.Server', stop reason = signal SIGSEGV: address access protected (fault address: 0x4bba3000)
  * frame #0: 0x7664155e
    frame #1: 0x50eecacf
    frame #2: 0x4f53da7d
    frame #3: 0x4b90e047
    frame #4: 0x4b90bf89
    frame #5: 0x4b90b327
    frame #6: 0x4b90afbb
    frame #7: 0x4b90a5e9
    frame #8: 0x4b364a2b
    frame #9: 0x4b3647d7
    frame #10: 0x4faaef11
    frame #11: 0x5751d7f7
    frame #12: 0x59da8d2f
    frame #13: 0x59d953cf
(lldb) register read
General Purpose Registers:
        r0 = 0x4bba2ffe
        r1 = 0x4f6c31d4
        r2 = 0x00000002
        r3 = 0x00000000
        r4 = 0x4bba2ffe
        r5 = 0x4f6c31d4
        r6 = 0x00000002
        r7 = 0x5b781dfc
        r8 = 0x5b781e0c
        r9 = 0x5b780c70
       r10 = 0x4f6c3ba4
       r11 = 0x4f6c31a0
       r12 = 0x76641559
        sp = 0x4f6c3170
        lr = 0x50ebc14b
        pc = 0x7664155e
      cpsr = 0x20000030

(lldb) disass -a 0x50ebc14b
->  0x7664155e: stmdavs r11, {r0, r1, r11, sp, lr}
    0x76641562: stcllt p6, c15, [sp, #-772]!
    0x76641566: .long  0xe92d0000                ; unknown opcode
    0x7664156a: svcge  #0x34ff0
    0x7664156e: .long  0xf8c3b081                ; unknown opcode
    0x76641572: bgt    0x7827557a
    0x76641576: andeq  pc, r4, #-2147483648
    0x7664157a: svceq  #0xe8b2
(lldb) disass -s 0x50ebc11b -e 0x50ebc15b
    0x50ebc11b: .long  0x00e014f8                ; unknown opcode
    0x50ebc11f: ldrbteq r12, [r8], #3472
    0x50ebc123: .long  0x442000e0                ; unknown opcode
    0x50ebc127: strbgt r4, [lr, #-0x5f6]!
    0x50ebc12b: .long  0xf01ec5f6                ; unknown opcode
    0x50ebc12f: .long  0x61f64847                ; unknown opcode
    0x50ebc133: .long  0x13f2c543                ; unknown opcode
    0x50ebc137: subhs  r9, r7, r3, asr r8
    0x50ebc13b: sublo  r2, r6, #1146880
    0x50ebc13f: ldmibpl r2!, {r1, r2, r6, r8, lr} ^
    0x50ebc143: ldrbtvs r12, [r2], #1884
    0x50ebc147: strbmi lr, [r7, #-0x6c]
    0x50ebc14b: smmlsrgt r0, r2, r4, r2
    0x50ebc14f: mrcmi  p2, #0x2, r1, c0, c2, #0x7
    0x50ebc153: .long  0xc76391f6                ; unknown opcode
    0x50ebc157: stmdals r3!, {r1, r4, r5, r6, r7, r8, r10, r11, r12, lr} ^
(lldb) 

And the /proc/pid/smaps around the relevant address:

4bb93000-4bba3000 rw-s 01dc0000 08:01 1310749    /mnt/external/TmpDataDir/Databases/db/Temp/scratch.0000000000.buffers
Size:                 64 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  64 kB
Pss:                  64 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:        64 kB
Private_Dirty:         0 kB
Referenced:           64 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
VmFlags: rd wr sh mr mw me ms
4bba3000-4bba4000 ---p 00000000 00:00 0
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
VmFlags: mr mw me ac
@aviviadi

This comment has been minimized.

aviviadi commented Nov 21, 2018

P.S. using p/invoke memcpy instead of Unsafe.CopyBlockUnaligned "solves" the issue (and dramatically slower our app on the little RPi by 50-60% as we have a lot of blocks to mem copy)

@aviviadi

This comment has been minimized.

aviviadi commented Nov 21, 2018

Here is the disassembly with the raw bytes:
Note that 0x7664155e is the location that d points as the faulting instruction

(lldb) d -s 0x76641500 -e 0x76641600 -b
    0x76641500: 0xf04f462b   .long  0xf04f462b                ; unknown opcode
    0x76641504: 0x94000403   .long  0x94000403                ; unknown opcode
    0x76641508: 0xfbbef05e   blx    0x755fd68a
    0x7664150c: 0x980ab928   stmdals r10, {r3, r5, r8, r11, r12, sp, pc}
    0x76641510: 0xe8bdb001   .long  0xe8bdb001                ; unknown opcode
    0x76641514: 0xb0044ff0   .long  0xb0044ff0                ; unknown opcode
    0x76641518: 0x46844770   .long  0x46844770                ; unknown opcode
    0x7664151c: 0xe8bdb001   .long  0xe8bdb001                ; unknown opcode
    0x76641520: 0xbc0f4ff0   .long  0xbc0f4ff0                ; unknown opcode
    0x76641524: 0xe92d4760   push   {r5, r6, r8, r9, r10, lr}
    0x76641528: 0xed2d4ff1   stc    p15, c4, [sp, #-964]!
    0x7664152c: 0xb4020b08   strlt  r0, [r2], #-2824
    0x76641530: 0xa80ab081   stmdage r10, {r0, r7, r12, sp, pc}
    0x76641534: 0xfebaf7db   mrc2   p7, #0x5, apsr_nzcv, c10, c11, #0x6
    0x76641538: 0xbc02b001   .long  0xbc02b001                ; unknown opcode
    0x7664153c: 0x0b08ecbd   bleq   0x7687c838
    0x76641540: 0x8ff1e8bd   svchi  #0xf1e8bd
    0x76641544: 0x76a41a40   strtvc r1, [r4], r0, asr #20
    0x76641548: 0x76a41b04   strtvc r1, [r4], r4, lsl #22
    0x7664154c: 0xbf082a00   svclt  #0x82a00
    0x76641550: 0x68034770   stmdavs r3, {r4, r5, r6, r8, r9, r10, lr}
    0x76641554: 0xba3cf6c1   blt    0x7757f060
    0x76641558: 0xbf082a00   svclt  #0x82a00
    0x7664155c: 0x68034770   stmdavs r3, {r4, r5, r6, r8, r9, r10, lr}
    0x76641560: 0xf6c1680b   .long  0xf6c1680b                ; unknown opcode
    0x76641564: 0x0000bd6d   .long  0x0000bd6d                ; unknown opcode
    0x76641568: 0x4ff0e92d   svcmi  #0xf0e92d
    0x7664156c: 0xb081af03   addlt  r10, r1, r3, lsl #30
    0x76641570: 0xd000f8c3   andle  pc, r0, r3, asr #17
    0x76641574: 0xf102ca70   .long  0xf102ca70                ; unknown opcode
    0x76641578: 0xe8b20204   .long  0xe8b20204                ; unknown opcode
    0x7664157c: 0x47880f00   strmi  r0, [r8, r0, lsl #30]
    0x76641580: 0xe8bdb001   .long  0xe8bdb001                ; unknown opcode
    0x76641584: 0xb5808ff0   strlt  r8, [r0, #0xff0]
    0x76641588: 0xf8c3466f   .long  0xf8c3466f                ; unknown opcode
    0x7664158c: 0x4790d000   ldrmi  sp, [r0, r0]
    0x76641590: 0x0000bd80   andeq  r11, r0, r0, lsl #27
    0x76641594: 0xd0062a00   andle  r2, r6, r0, lsl #20
    0x76641598: 0x466fb580   .long  0x466fb580                ; unknown opcode
    0x7664159c: 0x780b7803   stmdavc r11, {r0, r1, r11, r12, sp, lr}
    0x766415a0: 0xed50f6c1   ldcl   p6, c15, [r0, #-772]
    0x766415a4: 0x4803bd80   stmdami r3, {r7, r8, r10, r11, r12, sp, pc}
    0x766415a8: 0x28006800   stmdahs r0, {r11, sp, lr}
    0x766415ac: 0xf7a4bf18   .long  0xf7a4bf18                ; unknown opcode
    0x766415b0: 0x4770bebf   .long  0x4770bebf                ; unknown opcode
    0x766415b4: 0x76a41a40   strtvc r1, [r4], r0, asr #20
    0x766415b8: 0x00000000   andeq  r0, r0, r0
    0x766415bc: 0x00000000   andeq  r0, r0, r0
    0x766415c0: 0xbf004770   svclt  #0x4770
    0x766415c4: 0xbf00bf00   svclt  #0xbf00
    0x766415c8: 0xbf00bf00   svclt  #0xbf00
    0x766415cc: 0xbf00bf00   svclt  #0xbf00
    0x766415d0: 0x8f5ff3bf   svchi  #0x5ff3bf
    0x766415d4: 0xf2406001   .long  0xf2406001                ; unknown opcode
    0x766415d8: 0xf2c00301   .long  0xf2c00301                ; unknown opcode
    0x766415dc: 0x42990300   addsmi r0, r9, #0, #6
    0x766415e0: 0xf641d30a   .long  0xf641d30a                ; unknown opcode
    0x766415e4: 0xf2c74324   .long  0xf2c74324                ; unknown opcode
    0x766415e8: 0xeb03334d   bl     0x7670e324
    0x766415ec: 0x78182390   ldmdavc r8, {r4, r7, r8, r9, sp}
    0x766415f0: 0xbf1c28ff   svclt  #0x1c28ff
    0x766415f4: 0x701820ff   .long  0x701820ff                ; unknown opcode
    0x766415f8: 0x00004770   andeq  r4, r0, r0, ror r7
    0x766415fc: 0x00000000   andeq  r0, r0, r0

Here is the results with raw bytes from lr

lldb) disass -s 0x50ebc11b -e 0x50ebc15b -b
    0x50ebc11b: 0x00e014f8   .long  0x00e014f8                ; unknown opcode
    0x50ebc11f: 0x04f8cd90   ldrbteq r12, [r8], #3472
    0x50ebc123: 0x442000e0   .long  0x442000e0                ; unknown opcode
    0x50ebc127: 0xc56e45f6   strbgt r4, [lr, #-0x5f6]!
    0x50ebc12b: 0xf01ec5f6   .long  0xf01ec5f6                ; unknown opcode
    0x50ebc12f: 0x61f64847   .long  0x61f64847                ; unknown opcode
    0x50ebc133: 0x13f2c543   .long  0x13f2c543                ; unknown opcode
    0x50ebc137: 0x20479853   subhs  r9, r7, r3, asr r8
    0x50ebc13b: 0x32462946   sublo  r2, r6, #1146880
    0x50ebc13f: 0x59f24146   ldmibpl r2!, {r1, r2, r6, r8, lr} ^
    0x50ebc143: 0x64f2c75c   ldrbtvs r12, [r2], #1884
    0x50ebc147: 0x4547e06c   strbmi lr, [r7, #-0x6c]
    0x50ebc14b: 0xc75024f2   smmlsrgt r0, r2, r4, r2
    0x50ebc14f: 0x4e5012f2   mrcmi  p2, #0x2, r1, c0, c2, #0x7
    0x50ebc153: 0xc76391f6   .long  0xc76391f6                ; unknown opcode
    0x50ebc157: 0x98635df2   stmdals r3!, {r1, r4, r5, r6, r7, r8, r10, r11, r12, lr} ^
@janvorli

This comment has been minimized.

Member

janvorli commented Nov 21, 2018

Hmm, the lldb's disassembling is really broken. Can you please get me

x/64bx 0x50ebc110

and

x/256bx 0x76641500

It would be easier to put the bytes printed into an online arm disassembler to see what they are.

But given the fact that lldb is broken like this, I would recommend reproducing the issue under gdb, which should work fine including the disassembly.

@aviviadi

This comment has been minimized.

aviviadi commented Nov 21, 2018

(lldb) x/64bx 0x50ebc110
0x50ebc110: 0x90 0x60 0xd3 0x60 0x02 0x9a 0x03 0x9b
0x50ebc118: 0x04 0x98 0xdd 0xf8 0x14 0xe0 0x00 0x90
0x50ebc120: 0xcd 0xf8 0x04 0xe0 0x00 0x20 0x44 0xf6
0x50ebc128: 0x45 0x6e 0xc5 0xf6 0xc5 0x1e 0xf0 0x47
0x50ebc130: 0x48 0xf6 0x61 0x43 0xc5 0xf2 0x13 0x53
0x50ebc138: 0x98 0x47 0x20 0x46 0x29 0x46 0x32 0x46
0x50ebc140: 0x41 0xf2 0x59 0x5c 0xc7 0xf2 0x64 0x6c
0x50ebc148: 0xe0 0x47 0x45 0xf2 0x24 0x50 0xc7 0xf2
(lldb) x/256bx 0x76641500
0x76641500: 0x2b 0x46 0x4f 0xf0 0x03 0x04 0x00 0x94
0x76641508: 0x5e 0xf0 0xbe 0xfb 0x28 0xb9 0x0a 0x98
0x76641510: 0x01 0xb0 0xbd 0xe8 0xf0 0x4f 0x04 0xb0
0x76641518: 0x70 0x47 0x84 0x46 0x01 0xb0 0xbd 0xe8
0x76641520: 0xf0 0x4f 0x0f 0xbc 0x60 0x47 0x2d 0xe9
0x76641528: 0xf1 0x4f 0x2d 0xed 0x08 0x0b 0x02 0xb4
0x76641530: 0x81 0xb0 0x0a 0xa8 0xdb 0xf7 0xba 0xfe
0x76641538: 0x01 0xb0 0x02 0xbc 0xbd 0xec 0x08 0x0b
0x76641540: 0xbd 0xe8 0xf1 0x8f 0x40 0x1a 0xa4 0x76
0x76641548: 0x04 0x1b 0xa4 0x76 0x00 0x2a 0x08 0xbf
0x76641550: 0x70 0x47 0x03 0x68 0xc1 0xf6 0x3c 0xba
0x76641558: 0x00 0x2a 0x08 0xbf 0x70 0x47 0x03 0x68
0x76641560: 0x0b 0x68 0xc1 0xf6 0x6d 0xbd 0x00 0x00
0x76641568: 0x2d 0xe9 0xf0 0x4f 0x03 0xaf 0x81 0xb0
0x76641570: 0xc3 0xf8 0x00 0xd0 0x70 0xca 0x02 0xf1
0x76641578: 0x04 0x02 0xb2 0xe8 0x00 0x0f 0x88 0x47
0x76641580: 0x01 0xb0 0xbd 0xe8 0xf0 0x8f 0x80 0xb5
0x76641588: 0x6f 0x46 0xc3 0xf8 0x00 0xd0 0x90 0x47
0x76641590: 0x80 0xbd 0x00 0x00 0x00 0x2a 0x06 0xd0
0x76641598: 0x80 0xb5 0x6f 0x46 0x03 0x78 0x0b 0x78
0x766415a0: 0xc1 0xf6 0x50 0xed 0x80 0xbd 0x03 0x48
0x766415a8: 0x00 0x68 0x00 0x28 0x18 0xbf 0xa4 0xf7
0x766415b0: 0xbf 0xbe 0x70 0x47 0x40 0x1a 0xa4 0x76
0x766415b8: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x766415c0: 0x70 0x47 0x00 0xbf 0x00 0xbf 0x00 0xbf
0x766415c8: 0x00 0xbf 0x00 0xbf 0x00 0xbf 0x00 0xbf
0x766415d0: 0xbf 0xf3 0x5f 0x8f 0x01 0x60 0x40 0xf2
0x766415d8: 0x01 0x03 0xc0 0xf2 0x00 0x03 0x99 0x42
0x766415e0: 0x0a 0xd3 0x41 0xf6 0x24 0x43 0xc7 0xf2
0x766415e8: 0x4d 0x33 0x03 0xeb 0x90 0x23 0x18 0x78
0x766415f0: 0xff 0x28 0x1c 0xbf 0xff 0x20 0x18 0x70
0x766415f8: 0x70 0x47 0x00 0x00 0x00 0x00 0x00 0x00

I will repo this on another machine with gdb right away

@janvorli

This comment has been minimized.

Member

janvorli commented Nov 21, 2018

The disass even in thumb2 mode doesn't make sense at the point of failure. Maybe it is just another lldb issue. Let's see what we'll get in gdb.

@aviviadi

This comment has been minimized.

aviviadi commented Nov 21, 2018

with gdb, reproduced :

Thread 30 "Raven.Server" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x4c1c4450 (LWP 1096)]
0x7668555e in JIT_MemCpy () from /mnt/ext-lab/RavenDB.regular/libcoreclr.so
(gdb) bt
#0  0x7668555e in JIT_MemCpy () from /mnt/ext-lab/RavenDB.regular/libcoreclr.so
#1  0x50a4bb54 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 
@aviviadi

This comment has been minimized.

aviviadi commented Nov 21, 2018

(gdb) info registers
r0 0x523b3ffd 1379614717
r1 0x4c1c31d8 1276916184
r2 0x2 2
r3 0x0 0
r4 0x6362dcbc 1667423420
r5 0x523b3ffd 1379614717
r6 0x7 7
r7 0x1c5e1ea5 475930277
r8 0x0 0
r9 0x6448bbf0 1682488304
r10 0x4c1c3ba4 1276918692
r11 0x4c1c31f0 1276916208
r12 0x76685559 1986549081
sp 0x4c1c31b0 0x4c1c31b0
lr 0x50a4bb55 1352973141
pc 0x7668555e 0x7668555e <JIT_MemCpy+6>
cpsr 0x20000030 536870960
(gdb) disassemble 0x50a4bb55
No function contains specified address.
(gdb) disassemble 0x7668555e
Dump of assembler code for function JIT_MemCpy:
0x76685558 <+0>: cmp r2, #0
0x7668555a <+2>: it eq
0x7668555c <+4>: bxeq lr
=> 0x7668555e <+6>: ldr r3, [r0, #0]
0x76685560 <+8>: ldr r3, [r1, #0]
0x76685562 <+10>: b.w 0x76547040 memcpy@plt
End of assembler dump.
(gdb)

@aviviadi

This comment has been minimized.

aviviadi commented Nov 21, 2018

So..

(gdb) p/x $r0
$3 = 0x523b3ffd
(gdb) p $_siginfo._sifields._sigfault.si_addr
$1 = (void *) 0x523b4000

And SEGV is on:

=> 0x7668555e <+6>: ldr r3, [r0, #0]

Could it be reading from $r0 4 bytes (reg size) to [ 0x523b3ffd + 4 bytes ] (which ends after the mapped page) causing seg fault although we wanted to Unsafe.CopyBlock only 2 bytes.. ?

@janvorli

This comment has been minimized.

Member

janvorli commented Nov 21, 2018

Yes, based on the register values, it is what it was doing. And I can see it is a bug in the asm JIT_MemCpy helper. It uses the read to check if the address is valid before it jumps to memcpy. However, reading 4 bytes is obviously wrong. It should use just a byte read instead. Based on the comment in the function code, it seems that there used to be a requirement that this function is called only for a 4 byte aligned addresses, but looking at the Windows version of this helper, the code doesn't require it.
https://github.com/dotnet/coreclr/blob/master/src/vm/arm/crthelpers.S#L44-L58
I'll create a PR with a fix.

@aviviadi

This comment has been minimized.

aviviadi commented Nov 21, 2018

Great! Thanks.

@aviviadi

This comment has been minimized.

aviviadi commented Nov 21, 2018

Just to make sure, is this fixes both Unsafe.CopyBlock and Unsafe.CopyBlockUnaligned?

@janvorli

This comment has been minimized.

Member

janvorli commented Nov 22, 2018

Looking at JIT source, there is only a single place that invokes JIT_MemCpy. And the cpblk IL instruction is compiled at that place. Both Unsafe.CopyBlock and Unsafe.CopyBlockUnaligned use cpblk, as you can see here:
https://github.com/dotnet/corefx/blob/64c6d9fe5409be14bdc3609d73ffb3fea1f35797/src/System.Runtime.CompilerServices.Unsafe/src/System.Runtime.CompilerServices.Unsafe.il#L162-L206

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment