Support ARM/Thumb interworking #35

MrAnno · 2022-01-02T02:29:40Z

My understanding is that we have to support all possible scenarios, where both the original main() and new_main() can either be Thumb or ARM.

Note: I could not find a way to make the Thumb code generation optional, but it seems Thumb is a pretty old thing.

TODO:

Retest on real armv6
Retest on real armv8
"Provide your own main" tests: arm-calls-arm, thumb-calls-thumb, arm-calls-thumb, thumb-calls-arm

This fixes the BXF_ARCH_ARM and BXF_ARCH_ARM64 macros as well.

MrAnno · 2022-01-02T02:56:26Z

Hm, I'm not sure what happened with the timeout.c sample, it crashes somewhere in libc, before calling our patched main.

Snaipe · 2022-01-02T13:05:51Z

I'm not even seeing the failure logs on Cirrus, maybe the ARM runner is broken. In any case, if I'd had to guess, I'd assume it crashes trying to call main if the trampoline is not compatible?

MrAnno · 2022-01-02T18:50:10Z

Oh I think I have identified the exact problem.

ldr     r3, addr_data_alt
bx      r3

addr_data_alt:
.fill 4, 1, 0

The assembler generates the following from the above:

(gdb) x/3i opcodes
   0xfffef608:  ldr     r3, [pc, #0]    ; (0xfffef60c)
   0xfffef60a:  bx      r3
   0xfffef60c:  address of new_main

pc is 2 instructions ahead, which is 4 bytes in Thumb mode.

But when PC is used as a base register for addressing operations, it uses a word-aligned value, which means the offset is variable, so this can't be used reliably for patching. timeout.c is failing because its main function's address is not word-aligned (it's halfword-aligned), so the trampoline jumps to the wrong location:

(gdb) x/3i main
   0x4012e6:    ldr     r3, [pc, #0]    ; (0x4012e8)
   0x4012e8:    bx      r3
   0x4012ea:    address of new_main

MrAnno · 2022-01-02T19:59:45Z

I'm testing an idea that could make this truly position independent.

BX exchanges the instruction set if necessary.

MrAnno · 2022-01-03T01:39:33Z

Well, I'm not proud of the end result, but it finally works.

src/asm/trampoline-arm.S

Snaipe · 2022-01-03T13:10:34Z

src/exe-elf.c

@@ -179,7 +187,8 @@ int bxfi_exe_patch_main(bxfi_exe_fn *new_main)
    size_t len = align2_up(offset + sizeof (opcodes), PAGE_SIZE);

    mprotect(base, len, PROT_READ | PROT_WRITE | PROT_EXEC);
-    memcpy(nonstd (void *) addr, opcodes, sizeof (opcodes));
+    size_t prelude_len = bxfi_exe_inject_prelude(addr);


I think this can be simplified; I think the non-alignment is only an issue for the thumb trampoline, right? i.e. everything other than arm thumb is already aligned, and it's just the thumb trampoline that's misaligned.

If so, I think we should just preface the thumb trampoline with a 2-byte nop, and then align2_up(opcodes, 4) so that the nop gets ignored during the copy.

I tried implementing this, but I haven't been able to succeed.

My experience was that I can't really insert nops into the original trampoline, because it interferes with how pseudo-instructions are generated by the assembler. For example, adding a nop just before the ldr instruction would generate a different pc-relative jump, which then I would have to patch somehow in the C code when injecting it to a word-aligned Thumb address.

Just for the record, when I tried the above, timeout.c began to work (the halfword-aligned one), but all word-aligned tests failed:

if (bxfi_exe_is_arm_func_word_aligned(func_to_patch)) *trampoline = (void*) ((uintptr_t) &bxfi_trampoline_thumb + 2);

1/4 nested.c FAIL 0.01s killed by signal 6 SIGABRT >>> MALLOC_PERTURB_=72 /tmp/cirrus-ci-build/build/sample/nested.c.bin 2/4 callback.c FAIL 0.01s killed by signal 6 SIGABRT >>> MALLOC_PERTURB_=179 /tmp/cirrus-ci-build/build/sample/callback.c.bin 3/4 context.c FAIL 0.01s killed by signal 6 SIGABRT >>> MALLOC_PERTURB_=195 /tmp/cirrus-ci-build/build/sample/context.c.bin 4/4 timeout.c OK 2.00s

I see. It's a bit disappointing that we have to maintain an extra pair of symbols just to delimit a nop, but I can live with it.

Alternatively, I wonder if we could hardcode the nop opcode as a constant for ARM and pre-fill the opcodes buffer with a nop slide, and then copying the actual opcodes at an aligned index. I don't prefer one over the other, so your call.

We could inject the actual opcode from C, but I thought that would be even more terrible.

I'm not sure which way to go, my problem with the hard-coded opcode was portability.
ARM docs state that NOP may be a pseudo-instruction, and the assembler might replace it with different instructions for different processors.

For example, MOV r8, r8 or MOV r0, r0.

Yeah, my idea was essentially to do this:

char opcodes[4096]; bxfi_nop_fill(opcodes, sizeof(opcodes)); uintptr_t offset = (uintptr_t) base - (uintptr_t) align2((uintptr_t) base, arch_word_size); memcpy(opcodes + offset, trampoline, sizeof (opcodes) - offset);

and have bxfi_nop_fill fill the buffer with nops depending on the actual architecture, so that we ensure that regardless of the architecture, the trampoline start is word aligned. That said it assumes that the base can't be misaligned by an offset smaller than the size of the smallest nop, but I think the current PR makes the same assumption anyway (with the thumb nop being something like mov r8, r8 and being 2-bytes-long).

This sounds good to me.
I'm a bit afraid of the implementation of bxfi_nop_fill, so I'd leave this PR as is for now, if you don't mind. :)
When adding the WIndows/iOS support, I'm going to try implementing this as you suggested.

Honestly I'm fine with the current code. I'm always happier the better the code is, so if it looks like my suggestion could end up being overall better, of course I'd prefer it, but this is perfectly fine as-is too. And better yet, it works.

Thumb is an ARM instruction set encoding, it uses 16-bit instructions instead of 32. When patching such a function, we should use a Thumb-encoded trampoline.

bxfi_exe_trampoline_fixup() can be used to make small modifications on the trampoline code or on the function being patched.

https://developer.arm.com/documentation/dui0471/m/interworking-arm-and-thumb/pointers-to-functions-in-thumb-state Pointers to Thumb functions have their least significant bit set to ensure that interworking works correctly. That bit is just an indicator, we need to get rid of it before patching the function with our trampoline.

Thumb is a halfword-aligned instruction set encoding, but our trampoline code uses a PC-relative jump, which needs to be positioned on a word aligned address.

src/exe-elf-arm-fixup.h

Snaipe

LGTM on my end; is this OK to merge?

MrAnno · 2022-01-03T17:05:54Z

Thanks for the thorough review.

I think it's ready now.

Snaipe · 2022-01-03T17:10:32Z

Great, thanks for the awesome work!

Snaipe · 2022-01-03T17:11:14Z

Had to rebase on top of master, so the commit hash will need an update in the Criterion PR

MrAnno added 2 commits January 1, 2022 15:55

ci: add native arm builds

92bc14f

build: use uppercase BXF_ARCH_* macros

d446389

This fixes the BXF_ARCH_ARM and BXF_ARCH_ARM64 macros as well.

MrAnno mentioned this pull request Jan 2, 2022

Fix UB, support ARM/Thumb interworking, and run ARM tests in CI Snaipe/Criterion#404

Merged

MrAnno force-pushed the fix-arm32v7-trampoline branch 6 times, most recently from 85c6adc to 05216c9 Compare January 3, 2022 00:43

trampoline-arm: support jumping to Thumb function from ARM mode

3e2e549

BX exchanges the instruction set if necessary.

MrAnno force-pushed the fix-arm32v7-trampoline branch 2 times, most recently from 61fe8af to b688d20 Compare January 3, 2022 01:37

Snaipe reviewed Jan 3, 2022

View reviewed changes

MrAnno added 6 commits January 3, 2022 15:56

trampoline-arm: add trampoline for patching a Thumb function

687af0d

Thumb is an ARM instruction set encoding, it uses 16-bit instructions instead of 32. When patching such a function, we should use a Thumb-encoded trampoline.

exe-elf: add possibility to fixup trampoline code

a0ef4ad

bxfi_exe_trampoline_fixup() can be used to make small modifications on the trampoline code or on the function being patched.

exe-elf: add possibility to inject a prelude code into main()

79912ca

exe-elf: detect ARM Thumb-mode main() function, patch accordingly

2301e95

exe-elf: word-align trampoline code

f35ffa7

Thumb is a halfword-aligned instruction set encoding, but our trampoline code uses a PC-relative jump, which needs to be positioned on a word aligned address.

MrAnno force-pushed the fix-arm32v7-trampoline branch from b688d20 to f35ffa7 Compare January 3, 2022 14:58

Snaipe reviewed Jan 3, 2022

View reviewed changes

src/exe-elf-arm-fixup.h Outdated Show resolved Hide resolved

exe: move trampoline fixup to a separate unit

f4f83a7

MrAnno force-pushed the fix-arm32v7-trampoline branch from c1cf9cc to f4f83a7 Compare January 3, 2022 15:43

Snaipe approved these changes Jan 3, 2022

View reviewed changes

Snaipe merged commit 46cfaeb into Snaipe:master Jan 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support ARM/Thumb interworking #35

Support ARM/Thumb interworking #35

MrAnno commented Jan 2, 2022 •

edited

MrAnno commented Jan 2, 2022

Snaipe commented Jan 2, 2022

MrAnno commented Jan 2, 2022 •

edited

MrAnno commented Jan 2, 2022

MrAnno commented Jan 3, 2022

Snaipe Jan 3, 2022

MrAnno Jan 3, 2022 •

edited

Snaipe Jan 3, 2022

Snaipe Jan 3, 2022

MrAnno Jan 3, 2022

MrAnno Jan 3, 2022 •

edited

Snaipe Jan 3, 2022

MrAnno Jan 3, 2022

Snaipe Jan 3, 2022

Snaipe left a comment

MrAnno commented Jan 3, 2022

Snaipe commented Jan 3, 2022

Snaipe commented Jan 3, 2022

Support ARM/Thumb interworking #35

Support ARM/Thumb interworking #35

Conversation

MrAnno commented Jan 2, 2022 • edited

MrAnno commented Jan 2, 2022

Snaipe commented Jan 2, 2022

MrAnno commented Jan 2, 2022 • edited

MrAnno commented Jan 2, 2022

MrAnno commented Jan 3, 2022

Snaipe Jan 3, 2022

Choose a reason for hiding this comment

MrAnno Jan 3, 2022 • edited

Choose a reason for hiding this comment

Snaipe Jan 3, 2022

Choose a reason for hiding this comment

Snaipe Jan 3, 2022

Choose a reason for hiding this comment

MrAnno Jan 3, 2022

Choose a reason for hiding this comment

MrAnno Jan 3, 2022 • edited

Choose a reason for hiding this comment

Snaipe Jan 3, 2022

Choose a reason for hiding this comment

MrAnno Jan 3, 2022

Choose a reason for hiding this comment

Snaipe Jan 3, 2022

Choose a reason for hiding this comment

Snaipe left a comment

Choose a reason for hiding this comment

MrAnno commented Jan 3, 2022

Snaipe commented Jan 3, 2022

Snaipe commented Jan 3, 2022

MrAnno commented Jan 2, 2022 •

edited

MrAnno commented Jan 2, 2022 •

edited

MrAnno Jan 3, 2022 •

edited

MrAnno Jan 3, 2022 •

edited