ARM: `drutil_insert_get_mem_addr` no longer works correctly on AArch64 #5498

petrochenkov · 2022-05-24T14:59:00Z

#5283 introduced a change to decoding of load instructions, as a result of that change load index registers can be decoded as w registers (as opposed to x registers as previously).

Not all places in DR are prepared to encounter a w register in a load index context.
For example, this logic in drutil_insert_get_mem_addr_arm

        reg_id_t index = opnd_get_index(memref);

        // ...

        reg_id_t stolen = dr_get_stolen_reg();

        // ...

        } else if (index == stolen) {
            index = replace_stolen_reg(drcontext, bb, where, memref, dst, scratch,
                                       scratch_used);
        }

now doesn't recognize w index registers as stolen, because dr_get_stolen_reg() typically returns an x register.

This results in instrumentation being silently incorrect and drutil_insert_get_mem_addr_arm producing incorrect addresses.

Some other places besides drutil_insert_get_mem_addr_arm may also be affected by this change.

The text was updated successfully, but these errors were encountered:

derekbruening · 2022-05-24T19:08:23Z

This may have resulted in incorrect addresses in AArch64 traces gathered since Jan 18.
Is one way to check whether any instruction matches ldr.*w28 or str.*w28?

derekbruening · 2022-05-24T19:11:13Z

This is rather disturbing. What can we do to avoid such regressions in the future?

Add more comprehensive regression tests for drutil_insert_get_mem_addr covering stolen register corner cases
Add checks for unmapped addresses to the invariant_checker trace analyzer: but this requires having metadata on the valid mapped regions, which are not gathered by the Github drmemtrace code today. This is no guarantee as the computed incorrect address could happen to land in a mapped region.

derekbruening · 2022-05-25T04:50:55Z

Is one way to check whether any instruction matches ldr.*w28 or str.*w28?

The w28 must be an index, so it would be ldr.*, w28 or str.*, w28 for arm-style disassembly -- right? Planning to search through objdump output as an initial check.

AssadHashmi · 2022-05-25T16:01:47Z

This is rather disturbing. What can we do to avoid such regressions in the future?

It is! Adding more comprehensive regression tests for drutil_insert_get_mem_addr() covering stolen register corner cases is definitely something we should do.

In general, are there areas of DR core and API code which rely heavily on the correctness of generated instructions which should also have more comprehensive test coverage? Can we identify a set of INSTR_CREATE_ macros and DR functions which need better test coverage.

Can we identify these code paths and improve error trapping with e.g. ASSERTs, to fail loudly rather than silently go undetected?

Does #2443 cover enough of this?

derekbruening · 2022-05-25T16:24:43Z

So the #5283 fix itself was addressing incorrect addresses which I don't remember realizing when that fix went in: so older traces pre-Jan 18 also had a source of inaccuracies.

derekbruening · 2022-05-25T22:43:52Z

So the #5283 fix itself was addressing incorrect addresses which I don't remember realizing when that fix went in: so older traces pre-Jan 18 also had a source of inaccuracies.

Could this be quantified? If we want to search for load/store instructions whose addresses were computed incorrectly prior to the #5283 fix what would we search for? Is it that any scaled index register was treated as not scaled at all?

joshua-warburton · 2022-05-26T10:43:58Z

Yes, it's correlated with the extend- if the extend is UXTW or SXTW then it should be a W register , if it is SXTX or LSL/UXTX, it should be an X register

derekbruening · 2022-05-26T13:50:48Z

Yes, it's correlated with the extend- if the extend is UXTW or SXTW then it should be a W register , if it is SXTX or LSL/UXTX, it should be an X register

But the extend scale is also incorrect, right? In the diff for PR #5283 I see things like this:
At dis-a64.txt:13145 it was:

str    %x1 -> (%x2,%x3,uxtx #3)[8byte]

And now it's:

str    %x1 -> (%x2,%x3,lsl #3)[8byte]

So a trace gathered prior to that PR that had a store with a scale would have incorrectly ignored the scale and computed the wrong address? (I think uxtx #3 ignores the #3, i.e., it only extends and doesn't scale/shift?)

joshua-warburton · 2022-05-26T15:08:37Z

Indeed, it is incorrect, but it as at least consistently incorrect. UXTX only appears where it should be LSL, and the correct registers can be derived from that if we want to find where the decode errors occurred pre #5283.

I'm not sure on the difference between uxtx #3 and lsl #3. I'll have to defer to @AssadHashmi on that

AssadHashmi · 2022-05-26T17:11:40Z

I'm not sure on the difference between uxtx #3 and lsl #3. I'll have to defer to @AssadHashmi on that

My understanding is that UXTX shouldn't appear in X reg offsets because zero extending a 64 bit index value to 64 bits is meaningless, a no-op extension. decode_opnd_memreg_size() in #5283 just converted UXTX to the preferred alias LSL and opnd_base_disp_scale_disassemble() fixed it in disassembler.

So a trace gathered prior to that PR that had a store with a scale would have incorrectly ignored the scale and computed the wrong address? (I think uxtx #3 ignores the #3, i.e., it only extends and doesn't scale/shift?)

I don't think that's the case. The values used to create memref with opnd_create_base_disp_aarch64() are the same including scaled and extend_type but after #5283, extend_type is disassembled as LSL rather than UXTX.

derekbruening · 2022-05-26T19:54:04Z

So a trace gathered prior to that PR that had a store with a scale would have incorrectly ignored the scale and computed the wrong address? (I think uxtx #3 ignores the #3, i.e., it only extends and doesn't scale/shift?)

I don't think that's the case. The values used to create memref with opnd_create_base_disp_aarch64() are the same including scaled and extend_type but after #5283, extend_type is disassembled as LSL rather than UXTX.

OK, so a trace prior to PR #5283 would only have an incorrect address related to that PR's changes if it had an X register as an index that really should have been a W and it had a non-zero value in the upper bits? The scaling and extension changes from that PR are purely cosmetic.

The PR description of The "extension" LSL #0 was being decoded as uxtx #0, combined with the diff showing changes that weren't with just #0, made it seem like things were being decoded incorrectly, not just being disassembled incorrectly.

I did a sanity test:

After PR #5283:

derek@dynamorio:~/dr/build$ rm -rf drmemtrace.*.dir; setarch `uname -m` -R bin64/drrun -stderr_mask 0 -t drcachesim -offline -- ~/dr/test/pr5283 && bin64/drrun -stderr_mask 0 -t drcachesim -indir drmemtrace.*.dir -simulator_type view
All done
...
       11: T1738749   0x0000000000400154  910003e2   add    %sp $0x0000 lsl $0x00 -> %x2
       12: T1738749   0x0000000000400158  d2800523   movz   $0x0029 lsl $0x00 -> %x3
       13: T1738749   0x000000000040015c  f8237841   str    %x1 -> (%x2,%x3,lsl #3)[8byte]
       14: T1738749     write 8 byte(s) @ 0xfffffffff188

Going back before the PR:

derek@dynamorio:~/dr/build$ rm -rf drmemtrace.*.dir; setarch `uname -m` -R bin64/drrun -stderr_mask 0 -t drcachesim -offline -- ~/dr/test/pr5283 && bin64/drrun -stderr_mask 0 -t drcachesim -indir drmemtrace.*.dir -simulator_type view
All done
...
       11: T1731912   0x0000000000400154  910003e2   add    %sp $0x0000 lsl $0x00 -> %x2
       12: T1731912   0x0000000000400158  d2800523   movz   $0x0029 lsl $0x00 -> %x3
       13: T1731912   0x000000000040015c  f8237841   str    %x1 -> (%x2,%x3,uxtx #3)[8byte]
       14: T1731912     write 8 byte(s) @ 0xfffffffff188

Assuming my ASLR-disabling test will get the same stack address (sure seems like it; too lazy to create a different constant address instead of using SP in a tiny asm test; maybe should have used a .text address and a load) it does seem that the lsl #3 vs uxtx #3 disassembly difference made no impact on the traced address.

- Add client test case using W28 index register in memory instr. - Add FIXME comment and an 'if (index == DR_REG_W28)' check in drutil_insert_get_mem_addr_arm(). This avoids silent errors. There will be a following PR for the actual fix. In the meantime, the test has been added to the known failures list in runsuite_wrapper.pl's %ignore_failures_64. Issue: #5498

- Handle stolen register in drutil_insert_get_mem_addr_arm() if it's used as a 32 bit index register. - Add client test case using W28 as index register in memory instructions. Issue: #5498

derekbruening · 2022-09-09T21:05:02Z

Through unrelated validation of virtual-to-physical address gathering, we discovered a source of incorrect addresses in our traces: instructions with w28 as the index register. It seems that the fix in #5511 did not actually result in correct addresses. (Incidentally, was that PR meant to close this issue? Was there other work that left it open?)

Creating a test case we can see that the addresses are wrong:

        mov      x28, sp
        mov      x27, sp
        add      x5, x27, w28, sxtw #3
        ldr      x4, [x27]
        ldr      x4, [x27, w28, sxtw #3]

I ran it without ASLR which just happens to not crash on that memref:

derek@dynamorio:~/dr/build$ pushd ~/dr/test; as -mcpu=cortex-a55 -o allasm_aarch64.o allasm_aarch64.s && gcc -static -o allasm_aarch64 allasm_aarch64.o -nostartfiles; popd
derek@dynamorio:~/dr/build$ rm -rf drmemtrace.*.dir; setarch `uname -m` -R bin64/drrun -t drcachesim -offline -- ~/dr/test/allasm_aarch64
derek@dynamorio:~/dr/build$ bin64/drrun -t drcachesim -indir drmemtrace.*.dir -simulator_type view 2>&1 | head -40

       14: T820058 ifetch       4 byte(s) @ 0x0000000000400158 910003fc   add    %sp $0x0000 lsl $0x00 -> %x28
       15: T820058 ifetch       4 byte(s) @ 0x000000000040015c 910003fb   add    %sp $0x0000 lsl $0x00 -> %x27
       16: T820058 ifetch       4 byte(s) @ 0x0000000000400160 8b3ccf65   add    %x27 %w28 sxtw $0x03 -> %x5
       17: T820058 ifetch       4 byte(s) @ 0x0000000000400164 f9400364   ldr    (%x27)[8byte] -> %x4
       18: T820058 read         8 byte(s) @ 0x0000fffffffff040 by PC 0x0000000000400164
       19: T820058 ifetch       4 byte(s) @ 0x0000000000400168 f87cdb64   ldr    (%x27,%w28,sxtw #3)[8byte] -> %x4
       20: T820058 read         8 byte(s) @ 0x0000ffffc0187040 by PC 0x0000000000400168

That recorded address 0x0000ffffc0187040 sure looks wrong. It should be:

(gdb) p/x 0x0000fffffffff040 + (((int) 0x0000fffffffff040)<<3)
$9 = 0xffffffff7240

Instru:

after instrumentation:
TAG  0x0000000000400158
 +0    m4 @0x0000fffdf804abb0  f940d784   ldr    +0x01a8(%x28)[8byte] -> %x4
 +4    m4 @0x0000fffdf804aa68  d2802b05   movz   $0x0158 lsl $0x00 -> %x5
 +8    m4 @0x0000fffdf804a9e8  f2e40185   movk   %x5 $0x200c lsl $0x30 -> %x5
 +12   m4 @0x0000fffdf804a968  f9000085   str    %x5 -> (%x4)[8byte]
 +16   m4 @0x0000fffdf804ab30  f9000085   <label note=0x0000000000000000>
 +16   m4 @0x0000fffdf804a8e8  91002084   add    %x4 $0x0008 lsl $0x0000000000000000 -> %x4
 +20   m4 @0x0000fffdf804a868  f900d784   str    %x4 -> +0x01a8(%x28)[8byte]
 +24   m4 @0x0000fffdf8049948  f900d784   <label note=0x0000000000000000>
 +24   L3 @0x0000fffdf8049160  910003fc   add    %sp $0x0000 lsl $0x00 -> %x28
 +28   L3 @0x0000fffdf804ac30  910003fb   add    %sp $0x0000 lsl $0x00 -> %x27
 +32   L3 @0x0000fffdf8049690  8b3ccf65   add    %x27 %w28 sxtw $0x03 -> %x5
 +36   m4 @0x0000fffdf804a720  f940d784   ldr    +0x01a8(%x28)[8byte] -> %x4
 +40   m4 @0x0000fffdf804a5d8  f900009b   str    %x27 -> (%x4)[8byte]
 +44   m4 @0x0000fffdf804a6a0  f900009b   <label note=0x0000000000000000>
 +44   m4 @0x0000fffdf804a510  91002084   add    %x4 $0x0008 lsl $0x0000000000000000 -> %x4
 +48   m4 @0x0000fffdf804a490  f900d784   str    %x4 -> +0x01a8(%x28)[8byte]
 +52   m4 @0x0000fffdf804a7a0  f900d784   <label note=0x0000000000000000>
 +52   L3 @0x0000fffdf80497e0  f9400364   ldr    (%x27)[8byte] -> %x4
 +56   m4 @0x0000fffdf804a300  f940d784   ldr    +0x01a8(%x28)[8byte] -> %x4
 +60   m4 @0x0000fffdf804a200  f900ab80   str    %x0 -> +0x0150(%x28)[8byte]
 +64   m4 @0x0000fffdf804a138  f9401b80   ldr    +0x30(%x28)[8byte] -> %x0
 +68   m4 @0x0000fffdf804a070  8b3ccf60   add    %x27 %w28 sxtw $0x0000000000000003 -> %x0
 +72   m4 @0x0000fffdf8049ff0  f9000080   str    %x0 -> (%x4)[8byte]
 +76   m4 @0x0000fffdf804a280  f9000080   <label note=0x0000000000000000>
 +76   m4 @0x0000fffdf8049f28  91002084   add    %x4 $0x0008 lsl $0x0000000000000000 -> %x4
 +80   m4 @0x0000fffdf8049ea8  f900d784   str    %x4 -> +0x01a8(%x28)[8byte]
 +84   m4 @0x0000fffdf804a3c8  f900d784   <label note=0x0000000000000000>
 +84   L3 @0x0000fffdf8049778  f87cdb64   ldr    (%x27,%w28,sxtw #3)[8byte] -> %x4

It's not restoring the app's x28 value, so the w28 it uses is DR's.

So all aarch64 traces have had incorrect addresses for w28 index registers, all this time.

The fix in PR #5511 for drutil_insert_get_mem_addr() on a w28 index register failed to use the application value in a temp register and resulted in an incorrect recorded address. The test added in that PR failed to actually check the value and so failed to catch the error. Here we augment the test to check the value by passing the recorded address and the operand to a clean call. This new test fails for the w28 case. With the included fix, it now passes. Fixes #5498

petrochenkov mentioned this issue May 24, 2022

i#2626: AArch64 v8 decode: Fix incorrect decode of ldrb #5283

Merged

AssadHashmi self-assigned this May 24, 2022

bete0 added the OpSys-AArch64 label May 24, 2022

derekbruening added Priority-High Google-Affecting labels May 24, 2022

AssadHashmi mentioned this issue May 31, 2022

i#5498 Fix and test for indexed memory instrs using W28 #5511

Merged

derekbruening mentioned this issue Sep 13, 2022

i#5498: Fix drutil_insert_get_mem_addr for w28 index #5649

Merged

derekbruening closed this as completed in 6ea013e Sep 13, 2022

derekbruening closed this as completed in #5649 Sep 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARM: `drutil_insert_get_mem_addr` no longer works correctly on AArch64 #5498

ARM: `drutil_insert_get_mem_addr` no longer works correctly on AArch64 #5498

petrochenkov commented May 24, 2022

derekbruening commented May 24, 2022

derekbruening commented May 24, 2022

derekbruening commented May 25, 2022

AssadHashmi commented May 25, 2022

derekbruening commented May 25, 2022

derekbruening commented May 25, 2022

joshua-warburton commented May 26, 2022

derekbruening commented May 26, 2022

joshua-warburton commented May 26, 2022

AssadHashmi commented May 26, 2022

derekbruening commented May 26, 2022

derekbruening commented Sep 9, 2022

ARM: drutil_insert_get_mem_addr no longer works correctly on AArch64 #5498

ARM: drutil_insert_get_mem_addr no longer works correctly on AArch64 #5498

Comments

petrochenkov commented May 24, 2022

derekbruening commented May 24, 2022

derekbruening commented May 24, 2022

derekbruening commented May 25, 2022

AssadHashmi commented May 25, 2022

derekbruening commented May 25, 2022

derekbruening commented May 25, 2022

joshua-warburton commented May 26, 2022

derekbruening commented May 26, 2022

joshua-warburton commented May 26, 2022

AssadHashmi commented May 26, 2022

derekbruening commented May 26, 2022

derekbruening commented Sep 9, 2022

ARM: `drutil_insert_get_mem_addr` no longer works correctly on AArch64 #5498

ARM: `drutil_insert_get_mem_addr` no longer works correctly on AArch64 #5498