Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add cross-arch support to drdecodelib #1684

Open
derekbruening opened this issue Apr 14, 2015 · 26 comments
Open

add cross-arch support to drdecodelib #1684

derekbruening opened this issue Apr 14, 2015 · 26 comments

Comments

@derekbruening
Copy link
Contributor

Today we have ifdefs selecting whether the decoder and encoder support
either ARM or x86. For a nice standalone decoder we should have support
for both in the same build.

@derekbruening
Copy link
Contributor Author

Alternatively, supporting building one or the other on either host architecture would also go a long way and be a nice mostly-there solution.

@derekbruening
Copy link
Contributor Author

For #4318 we'd want cross-arch support in full DR and not just drdecodelib. I think it's not much extra work to do so as the bulk of the work would have to be done for drdecodelib anyway.

@derekbruening
Copy link
Contributor Author

The goal is to compile the decodelib sources for all architectures into
every DR library build: so e.g. the x86 build has code to generate, decode,
encode, and disassemble ARM and AArch64 instructions.

The main challenge is duplicate symbols:

  • Functions: various top-level routines like decode() and various per-arch
    routines like instr_is_call_arch().
    We can change these w/o any interface changes by having a new top-level
    decode() check the ISA mode. Perhaps we can short-circuit that extra
    overhead for bb building decode+gen+encode.

  • Opcodes and INSTR_CREATE_ and OPND_ macros.
    Changing these to have arch prefixes or sthg is going to be a major
    compatibility break and we'd probably have to go to 9.0?

    So we'd have sthg like:
    OP_add => OP_X86_add, OP_ARM_add, OP_AARCH64_add?
    INSTR_CREATE_add => INSTR_CREATE_X86_add, INSTR_CREATE_ARM_add, INSTR_CREATE_AARCH64_add?

    We'd add the "_X86" to every single one and not just the ones whose
    names clash I assume.

    Would we add a DR_ prefix while at it for better namespace separation?

    The symbol lengths start to get long is one downside. Having some
    assembly-feel layer on top to generate code w/o the clunky and long macro
    and operand creation names (xref: err can't find it...thought we had it
    filed somewhere) would help.

    To shrink the long INSTR_CREATE_AARCH64_add and add DR_:
    DR_CREATE_X86_add, DR_CREATE_A32_add, DR_CREATE_A64_add?
    And XINST_CREATE_add becomes DR_XCREATE_add??

@derekbruening
Copy link
Contributor Author

@johnfxgalea , @AssadHashmi : looking for some feedback on the proposal in the prior comment re: interface changes and names.

@johnfxgalea
Copy link
Contributor

Pretty nice idea. It will also help tools like drdisas to disassemble cross-arch instructions.

I just had a quick look at your comment, and best I sleep on it before giving further feedback. However, my initial concern with the change is that it is going to break quite a lot of code including Dr Memory. I think it is best we break it down into smaller steps and do the change incrementally. Maybe start off with one arch first, particularly with regards to the opcode name changes?

Also:

Alternatively, supporting building one or the other on either host architecture would also go a long way and be a nice mostly-there solution.

I take it that the above is no longer under consideration? That will avoid such a major change but I understand it would be better to have support for both archs in the same build.

@johnfxgalea
Copy link
Contributor

  • We'd add the "_X86" to every single one and not just the ones whose
    names clash I assume.

Yes, I think that is best for consistency.

@derekbruening
Copy link
Contributor Author

Pretty nice idea. It will also help tools like drdisas to disassemble cross-arch instructions.

I just had a quick look at your comment, and best I sleep on it before giving further feedback. However, my initial concern with the change is that it is going to break quite a lot of code including Dr Memory. I think it is best we break it down into smaller steps and do the change incrementally. Maybe start off with one arch first, particularly with regards to the opcode name changes?

What if there were a compatibility layer in place which kept all the old names for the primary architecture as aliases?

Alternatively, supporting building one or the other on either host architecture would also go a long way and be a nice mostly-there solution.

I take it that the above is no longer under consideration? That will avoid such a major change but I understand it would be better to have support for both archs in the same build.

This might work for our use case. We're driven by #4318 and could conceivably swap to a different binary, though it complicates particular deployment patterns. Maybe we should look further into that to start.

@johnfxgalea
Copy link
Contributor

though it complicates particular deployment patterns

Hmm, what do you mean here exactly?

@derekbruening
Copy link
Contributor Author

though it complicates particular deployment patterns

Hmm, what do you mean here exactly?

For the way we run our automated jobs it is easier to have a single binary with everything statically linked together. A job is typically one single binary. Now we'll need two separate deployed binaries plus some kind of selector to figure out which one to launch which may end up requiring a separate job stage in the automation pipeline. Still easier than adding all arches in the same DR build though, even if not as appealing.

@derekbruening
Copy link
Contributor Author

It seems like the first step for either the cross-target or multi-arch is to isolate the decoding files from the rest of the core, so we can set a different target arch for them without messing up header includes, inline asm, and other execution-oriented code and build attributes where we want the host arch. This would be a continuation of the cleanup of building drdecode separately, but I would want to eventually get the core to use the drdecode lib itself, instead of building those same files separately.

Because of the include file issue, I think we have to move the files to physically separate the headers so our include dirs using the host arch can be separate from those for the target arch. This will be a version history pain point but long-term should be cleanest in other respects as well.

I'm proposing to make a new directory core/ir and move the core/arch/ hierarchy over to core/ir/ for the drdecode sources.

I.e., we'd have core/ir/aarch64/decode.c and core/arch/aarch64/mangle.c.

Unless there are competing ideas either for the approach or the directory name and location for ir?

@johnfxgalea
Copy link
Contributor

johnfxgalea commented Jun 11, 2020

so we can set a different target arch for them without messing up header includes, inline asm,

Okay, but moving source out of core/arch is going to "mess" some (okay, a few) header includes mainly in the Perl script.

grep -rn . -e "core/arch"
./CMakeLists.txt:71:  "${PROJECT_SOURCE_DIR}/core/arch/asm_defines.asm"
./lib/genapi.pl:183:     "$core/arch/instrlist.h",
./lib/genapi.pl:187:     "$core/arch/arch_exports.h", # encode routines
./lib/genapi.pl:188:     "$core/arch/proc.h",
./lib/genapi.pl:192:     "$core/arch/x86/opcode.h",
./lib/genapi.pl:193:     "$core/arch/arm/opcode.h",
./lib/genapi.pl:194:     "$core/arch/opnd.h",
./lib/genapi.pl:195:     "$core/arch/instr.h",
./lib/genapi.pl:196:     "$core/arch/instr_inline.h",
./lib/genapi.pl:197:     "$core/arch/instr_create_shared.h",
./lib/genapi.pl:198:     "$core/arch/x86/instr_create.h",
./lib/genapi.pl:199:     "$core/arch/aarch64/instr_create.h",
./lib/genapi.pl:200:     "$core/arch/arm/instr_create.h",
./lib/genapi.pl:201:     "$core/arch/decode.h",       # OPSZ_ consts, decode routines <----
./lib/genapi.pl:202:     "$core/arch/decode_fast.h",  # decode routines <----
./lib/genapi.pl:203:     "$core/arch/disassemble.h",  # disassemble routines

I think the separation is good. Not sure if "decode" is a better name but "ir" also sounds good to me.

@derekbruening
Copy link
Contributor Author

Right, all references, including genapi.pl, would be updated.

I picked "ir" over "decode" b/c it's instruction generation, instruction list manipulation, encoding, etc., not just decoding, and "decode" makes it sound too limited IMHO.

derekbruening added a commit that referenced this issue Jun 12, 2020
Moves all of the IR-related files (instruction generation, encoding,
decoding, disassembly, instructions, operands, instruction lists) from
core/arch to core/ir, mirroring the arch-specific subdirectories under
core/ir.  This is a code cleanup step toward properly isolating the
drdecode library, as well as moving us toward the ability to build for
a separate target architecture from the host architecture and
eventually perhaps building in multiple target architectures in the
same binary for decoding and IR manipulation.

Also renames mangle_utils.c to ir_utils.c to better fit its purpose
and location.

Issue: #1684
derekbruening added a commit that referenced this issue Jun 12, 2020
Moves all of the IR-related files (instruction generation, encoding,
decoding, disassembly, instructions, operands, instruction lists) from
core/arch to core/ir, mirroring the arch-specific subdirectories under
core/ir.  This is a code cleanup step toward properly isolating the
drdecode library, as well as moving us toward the ability to build for
a separate target architecture from the host architecture and
eventually perhaps building in multiple target architectures in the
same binary for decoding and IR manipulation.

Also renames mangle_utils.c to ir_utils.c to better fit its purpose
and location.

Issue: #1684
derekbruening added a commit that referenced this issue Jun 12, 2020
Moves the handful of functions from mangle.c used by drdecode (things
that are difficult to untangle from the IR functions) into ir_utils.c
Renames the top-level ir_utils.c to ir_utils_shared.c.  Adds
ir_utils.h.

Issue: #1684
derekbruening added a commit that referenced this issue Jun 12, 2020
Moves the handful of functions from mangle.c used by drdecode (things
that are difficult to untangle from the IR functions) into ir_utils.c
Renames the top-level ir_utils.c to ir_utils_shared.c.  Adds
ir_utils.h.

Issue: #1684
derekbruening added a commit that referenced this issue Jun 12, 2020
Adds drlibc_unix.h to better separate drlibc and prevent it including
os_private.h and thus pulling in tls.h and other headers related to DR
execution, which incorrectly mixes target and host architectures for
drdecode and drdisas.

Issue: #1684
derekbruening added a commit that referenced this issue Jun 12, 2020
Adds drlibc_unix.h to better separate drlibc and prevent it including
os_private.h and thus pulling in tls.h and other headers related to DR
execution, which incorrectly mixes target and host architectures for
drdecode and drdisas.

Issue: #1684
@derekbruening
Copy link
Contributor Author

So I've been struggling with the right approach here to split the host from the target for the multi-binary solution. I have two different branches and I keep waffling between them, thinking the other must be better.

Approach A: Add an IR_ARCH define for the target, change all the ifdefs (and IF_XX_ELSE, etc.) in core/ir/ to use the new IR_X86 instead of X86, and then go fix up enough IR-using ifdefs in the rest of the code to get what we want to build. This is nice because it's a stepping-stone to multi-arch-in-one-build: we then replace all the IR_ARCH defines with runtime conditionals and leave all the other ifdefs alone as host ifdefs. But there are many cases outside of core/ir/ that require changing and some that are complex and mix host and target-IR, like TLS using SEG_FS: so if host is x86 and target is aarch64, it fails b/c SEG_FS is x86-only.

Approach B: Add a HOST_ARCH define for the host and change only the asm code to use it, leaving everything else as the target. This produces a crippled core that is never going to work for anything but standalone mode: but A essentially does that as well, it's just not quite as crippled.

Both require a lot of work going through and trying to separate host from system in places like TLS handling where they are intertwined. Both would benefit from avoiding the core altogether and only supporting building drdecode, drdisas, rawtrace, and trace analyzers: the things we want to run vs other arches. However, there are several problems there:

  • It is not easy to downgrade things like raw2trace from DR standalone mode to drdecode+drlibc: they have many components each using things like dr_map_executable_file or drcontainers. There would be a lot of work and complexity trying to get file i/o into drlibc; trying to have the core use the same drlibc and drdecode in a static link as another use of drdecode (a current use case of ours; maybe we should disallow it) where weak syms can't be used, making it hard to have differing behavior in these libs for core vs noncore. Long-term separating these things would be a good thing IMHO but it is not simple.
  • We'd have some ugliness in the build system going and disabling half the targets b/c they don't build.

Approach B is less work, it turns out. I pushed it through w/ a relatively small diff (much smaller than in-progress A), and now I have a fully-building x86-host aarch64-target where I can run drdisas, rawtrace, and opcode_mix on an x86_64 machine targeting aarch64 traces and ISA. But I feel bad b/c it does not leverage the refactoring into core/ir. Still, that refactoring is a good cleanup and a step toward future more-modular code, even if we don't push on it now.

@AssadHashmi
Copy link
Contributor

@johnfxgalea , @AssadHashmi : looking for some feedback on the proposal in the prior comment re: interface changes and names.

The interface and name changes are quite intrusive and the later suggestion is desirable:

What if there were a compatibility layer in place which kept all the old names for the primary architecture as aliases?

Regarding the different approaches:

But I feel bad b/c it does not leverage the refactoring into core/ir. Still, that refactoring is a good cleanup and a step toward future more-modular code, even if we don't push on it now.

Agreed. Based on my limited knowledge, B seems like a safer option ATM and prepares the ground for future work.

Just to be clear, is this work primarily driven by #4318 ?

@derekbruening
Copy link
Contributor Author

Just to be clear, is this work primarily driven by #4318 ?

Yes, that is our immediate goal, postprocessing and analyzing traces gathered on another platform, such as aarch64, on an x86 machine. Ideally we wouldn't need a different build to do it but we'll leave that for future work and go with the separate build solution.

The separate build has a libdynamorio that works in standalone mode, but fails of course if used to run anything. I have most of the tests disabled; trying to figure out the least disruptive way to do that though w/o a big if around everything. Ideally there I could get everything to build like on Mac and use a -L tag, but the building is the hard part with all the asm code and low-level things in the tests. Worst-case I'll do the big if().

@derekbruening
Copy link
Contributor Author

Building the aarch64 code on Windows with VS2017 is resulting in a bunch of warnings. While we won't easily get aarch64 drcachesim post-processing on Windows x86 due to module mapping complexity, we can at least get drdecode and drdisas. Here is a sampling of some of the warnings. I'm wondering if I should split some of these out instead of putting into the same PR #4325 -- although maybe it won't be all that many LOC to fix these.

D:\derek\dr\git\src\core\arch\arch.h(412): error C2220: warning treated as error - no 'object' file generated
D:\derek\dr\git\src\core\arch\arch.h(412): warning C4200: nonstandard extension used: zero-sized array in struct/union
D:\derek\dr\git\src\core\arch\arch.h(413): error C2229: struct '_clean_call_info_t' has an illegal zero-sized array
D:\derek\dr\git\src\core\arch\arch.h(1460): warning C4200: nonstandard extension used: zero-sized array in struct/union
D:\derek\dr\git\src\core\arch\arch.h(1461): error C2229: struct '_callee_info_t' has an illegal zero-sized array
D:\derek\dr\git\src\core\ir\aarch64\codec.c(174): warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)
D:\derek\dr\git\src\core\ir\aarch64\codec.c(176): warning C4244: '=': conversion from 'ptr_int_t' to 'uint', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(194): warning C4244: '=': conversion from 'ptr_uint_t' to 'uint', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(216): warning C4244: 'function': conversion from 'reg_t' to 'reg_id_t', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(251): warning C4244: 'return': conversion from 'uint' to 'reg_id_t', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(288): warning C4244: '=': conversion from 'uint' to 'reg_id_t', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(289): warning C4244: '=': conversion from 'uint' to 'reg_id_t', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(290): warning C4244: '=': conversion from 'uint' to 'reg_id_t', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(291): warning C4244: '=': conversion from 'uint' to 'reg_id_t', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(292): warning C4244: '=': conversion from 'uint' to 'reg_id_t', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(336): warning C4244: 'function': conversion from 'ptr_uint_t' to 'uint', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(360): warning C4244: 'return': conversion from 'ptr_uint_t' to 'int', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(394): warning C4244: 'initializing': conversion from 'ptr_uint_t' to 'int', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(397): warning C4244: 'initializing': conversion from 'ptr_uint_t' to 'int', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(509): warning C4244: '=': conversion from 'ptr_int_t' to 'uint', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(587): warning C4244: 'function': conversion from 'ptr_uint_t' to 'int', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(612): warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)
D:\derek\dr\git\src\core\ir\aarch64\codec.c(612): warning C4244: 'function': conversion from 'ptr_int_t' to 'int', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(632): warning C4018: '>=': signed/unsigned mismatch
D:\derek\dr\git\src\core\ir\aarch64\codec.c(643): warning C4244: 'function': conversion from 'ptr_int_t' to 'int', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(721): warning C4244: 'function': conversion from 'ptr_uint_t' to 'uint', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(741): warning C4244: 'function': conversion from 'ptr_uint_t' to 'uint', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(780): warning C4244: '=': conversion from 'reg_t' to 'uint', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(854): warning C4244: '=': conversion from 'uint' to 'reg_id_t', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(856): warning C4244: '=': conversion from 'uint' to 'reg_id_t', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(857): warning C4244: '=': conversion from 'uint' to 'reg_id_t', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(866): warning C4244: 'function': conversion from 'ptr_uint_t' to 'uint', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(1162): warning C4244: 'function': conversion from 'ptr_uint_t' to 'reg_id_t', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(1331): warning C4244: 'function': conversion from 'ptr_uint_t' to 'reg_id_t', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(1409): warning C4244: 'function': conversion from 'ptr_uint_t' to 'reg_id_t', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(1550): warning C4244: 'function': conversion from 'ptr_uint_t' to 'uint', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(1709): warning C4244: 'function': conversion from 'ptr_uint_t' to 'reg_id_t', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(1861): warning C4244: 'function': conversion from 'int' to 'reg_id_t', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(1859): warning C4244: 'initializing': conversion from 'ptr_uint_t' to 'int', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(1895): warning C4244: 'function': conversion from 'int' to 'reg_id_t', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(1893): warning C4244: 'initializing': conversion from 'ptr_uint_t' to 'int', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(2088): warning C4244: '=': conversion from 'ptr_int_t' to 'uint', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(2110): warning C4244: '=': conversion from 'ptr_int_t' to 'uint', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(2132): warning C4244: '=': conversion from 'ptr_int_t' to 'uint', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(2152): warning C4244: '=': conversion from 'ptr_int_t' to 'uint', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(2174): warning C4244: '=': conversion from 'ptr_int_t' to 'uint', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(2266): warning C4244: 'function': conversion from 'ptr_uint_t' to 'int', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(2272): warning C4244: 'function': conversion from 'ptr_uint_t' to 'int', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(2282): warning C4244: 'function': conversion from 'int' to 'reg_id_t', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(2280): warning C4244: 'initializing': conversion from 'ptr_uint_t' to 'int', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(2524): warning C4244: 'function': conversion from 'ptr_uint_t' to 'reg_id_t', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(2629): warning C4244: 'function': conversion from 'ptr_uint_t' to 'int', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(2635): warning C4244: 'function': conversion from 'ptr_uint_t' to 'int', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(2687): warning C4244: 'function': conversion from 'ptr_uint_t' to 'int', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(2693): warning C4244: 'function': conversion from 'ptr_uint_t' to 'int', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(2860): warning C4244: 'function': conversion from 'ptr_uint_t' to 'uint', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(2939): warning C4244: 'function': conversion from 'ptr_uint_t' to 'uint', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(2967): warning C4389: '==': signed/unsigned mismatch
D:\derek\dr\git\src\core\ir\aarch64\codec.c(2976): warning C4244: 'function': conversion from 'ptr_uint_t' to 'uint', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(2978): warning C4244: 'function': conversion from 'ptr_uint_t' to 'uint', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(2965): warning C4244: 'initializing': conversion from 'ptr_uint_t' to 'uint', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(3012): warning C4244: 'function': conversion from 'ptr_int_t' to 'uint', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(3031): warning C4244: 'function': conversion from 'ptr_uint_t' to 'uint', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(3041): warning C4244: 'function': conversion from 'ptr_uint_t' to 'uint', possible loss of data
D:\derek\dr\git\src\core\ir\aarch64\codec.c(3071): warning C4244: 'function': conversion from 'ptr_uint_t' to 'uint', possible loss of data

derekbruening added a commit that referenced this issue Jun 26, 2020
Adds support for building with different host and target
architectures.  The goal is to support running drdisas, drdecode,
drraw2trace, and drcachesim trace analyzers on one host machine while
targeting recorded bytes or traces from a different type of machine:
e.g., processing aarch64 traces on an x86_64 machine.  The goal is
*not* to turn DR into a cross-ISA binary translator: we only support
standalone mode here, not full DR managed mode.  Long-term it would be
nicer to split out the DR standalone mode interfaces into separate
libraries but that is beyond the scope of the current work.

Adds a top-level CMake option variable "TARGET_ARCH" which defaults to
CMAKE_SYSTEM_PROCESSOR but can be set to a different value.  The
regular arch variables/defines X86, AARCH64, and ARM are set from
TARGET_ARCH, while new values DR_HOST_{X86,AARCH64,ARM} are set for
the host.

Uses a target-centric approach, where the bulk of the code is built
targeting the TARGET_ARCH, with only assembly code and other limited code
using the DR_HOST_* arch.  This limits the scope of the code changes as
compared to the opposite host-centric approach.  A new define
DR_HOST_NOT_TARGET is used to simply disable cases where the host and
target are intertwined and difficult to easily separate, such as TLS
handling and assembly code used for application execution (such as
icache_op_isb_asm).

The key code changed to use DR_HOST_* includes:
+ *.asm files
+ arch_exports.h inline asm for atomics and other key utilities
+ system call defines (we don't want to have our raw syscall wrappers running
  incorrect syscall numbers: we'll end up with fork bombs or other madness)
+ injection code
+ data section setup
+ drsyms libraries

Code using built-in __*__ arch defines is changed to use our defines:
+ sigcontext.h, which is also augmented for some missing AArch64 types
+ x86_code_test.c

Code reading ELF headers is relaxed to allow *any* arch of the same
bitwdith (i#1345 prevents different bitwidth), since we need the host for
standalone init code but the target for raw2trace module mapping.

Nearly all tests are disabled not just from running but also from
building for simplicity, to reduce code changes here.  Some api.* decoder
tests, drdisas, and the altbin test for drcachesim post-processing and
analysis are enbabled in host!=target builds.
Tests are disabled by using a macro to set up the tests and shifting all
the setup code prior to the tests themselves, allowing a return() to skip
over the bulk of the tests to avoid a giant if().

Only Linux host-x86_64-target-aarch64 is explicitly tested and enabled
here.  Adding the inverse, and adding host-i386-target-arm32, should not be
much work, mostly with compiler flags, but will be done separately.  Adding
Windows support is also separated out as it requires a number of changes
for compiler problems with aarch64 code.

Adds a new host-x86_64-target-aarch64 build to runsuite.cmake, allocated to
a new Travis job to avoid slowing down the current jobs.

Issue: #1684, #4318, #1345
derekbruening added a commit that referenced this issue Jul 6, 2020
Fixes an undefined symbol in drwrap when host!=target.
Only some compilers complain about it.

Issue: #1684
derekbruening added a commit that referenced this issue Jul 6, 2020
Fixes an undefined symbol in drwrap when host!=target.
Only some compilers complain about it.

Issue: #1684
derekbruening added a commit that referenced this issue Dec 8, 2020
Adds X86 ifdefs around the AVX-512 opmask fields in the clean call
optimization structures and the code that references them, to avoid
zero-size-array compiler errors with gcc10 (and with Visual Studio).

Issue: #1684, #4594
Fixes #4594
derekbruening added a commit that referenced this issue Dec 9, 2020
Adds X86 ifdefs around the AVX-512 opmask fields in the clean call
optimization structures and the code that references them, to avoid
zero-size-array compiler errors with gcc10 (and with Visual Studio).

Issue: #1684, #4594
Fixes #4594
derekbruening added a commit that referenced this issue Jan 4, 2021
Adds a long-missing feature: following into a Windows child process of
a different bitwidth.

Switches injection from DR and from drinjectlib (including drrun and
drinject) to use -early_inject_map.  This was most easily done by
turning on -early_inject by default as well.  However, the
-early_inject_location default is INJECT_LOCATION_ImageEntry, which is
the same late takeover point as with thread injection.  Switching all
injection over to map-from-the-parent simplifies cross-arch following,
as well as making it easier to shift the takeover point to an earlier
spot in the future.  This is a step toward #607 by switching
drinjectlib to use map injection; the takeover point, as mentioned, is
still the image entry.

Adds an -inject_x64 option to inject a 64-bit DR lib into a 32-bit
child from a 64-bit parent, but this option is only sketched out and
is not fully supported yet: #49 covers adding tests and official
support.

Adds library swapping code to find the other-bitwidth library, which
assumes a parallel directory structure.  Add a new fatal error if the
library for a child is not found.

To support generating code for all 3 child-parent cases (same-same,
32-64, and 64-32), and in particular for 32-64, switches the small
gencode sequence for -early_inject_map from using IR to using raw
bytes.  A multi-arch encoder (#1684) would help but we would need
cross-bitwidth support there, which is not on the horizon.  Fixes what
look like bugs in the original gencode generation along the way
(s/pc/cur_local_pos/ and s/local_code_buf/remote_code_buf/): it's not
clear how it worked before.

Adds support for several system calls from a 32-bit parent to a 64-bit
child where the desired NtWow64* system call does not exist.  We use
switch_modes_and_call() for NtProtectVirtualMemory and
NtQueryVirtualMemory.

Changes all types in the injection code to handle 64-bit addresses in
32-bit code.  Adds UNICODE_STRING_32 and
RTL_USER_PROCESS_PARAMETERS_32 for handling 32-bit structures from
64-bit parents.  Similarly, adds RTL_USER_PROCESS_PARAMETERS_64 and
PROCESS_BASIC_INFORMATION64.

Adds get_process_imgname_cmdline() capability for 64-bit remote from 32-bit.

Adds get_remote_proc_address() and uses it to look up
dynamorio_earliest_init_takeover() in a child DR.

Finds the remote ntdll base via a remote query memory walk plus remote
image header parsing.  This requires adding a switch_modes_and_call()
version of NtQueryVirtualMemory (also mentioned above), which needs
64-bit args: so we refactor switch_modes_and_call() to take in a
struct of all 64-bit fields for the args.

Fixes a few bugs in other routines to properly get the image name and
image entry for 32-bit children of 64-bit parents.

Updates environment variable propagation code to handle a 32-bit
parent and a 64-bit child.  Updates a 64-bit parent and 32-bit child
to insert the variables into the 32-bit PEB (64-bit does no good),
which requires finding the 32-bit PEB.  This is done via the 32-bit
TEB, using a hack due to what seems like a kernel bug where it has the
TebBaseAddress 0x2000 too low.

Makes environment variable propagation failures fatal and visible,
unlike previously where errors would just result in silently letting
the child run natively.  Turns some other prior soft errors into fatal
errors on child takeover.

Moves environment variable propagation to post-CreateUserProcess
instead of waiting for ResumeThread, which avoids having to get the
thread context (for which we have no other-bitwidth support) to figure
out whether it's the first thread in the process or not.  We bail on
propagation for pre-Vista where we'd have to wait for ResumeThred.

Generalizes the other-bitwidth Visual Studio toolchain environment
variable setting for use in a new build-and-test other-bitwidth test
which builds dynamorio and the large_options client (to ensure options
are propagated to children; and it has convenient init and exit time
prints) for the other bitwidth, arranges parallel lib dirs, and runs
the other client

Issue: #803, #147, #607, #49
Fixes #803
derekbruening added a commit that referenced this issue Jan 5, 2021
Adds a long-missing feature: following into a Windows child process of
a different bitwidth.

Switches injection from DR and from drinjectlib (including drrun and
drinject) to use -early_inject_map.  This was most easily done by turning
on -early_inject by default as well.  However, the -early_inject_location
default is INJECT_LOCATION_ThreadStart, a new "early" injection location
which is the same late takeover point as with thread injection (we could
also use _ImageEntry, which is only very slightly later, but that fails for
.NET and other applications).  Switching all injection over to
map-from-the-parent simplifies cross-arch following, as well as making it
easier to shift the takeover point to an earlier spot in the future.  This
is a step toward #607 by switching drinjectlib to use map injection; the
takeover point, as mentioned, is still the thread start.

Placing a hook at the thread start causes some stability issues, so instead
of the usual hook for -early_inject_map, for INJECT_LOCATION_ThreadStart we
set the thread context, like thread injection does.  The gencode still
restores the hook as a nop, for simplicity.  For parent64 child32, we can't
easily locate the thread start, so we assume it's
ntdll32!RtlUserThreadStart (which is also a fallback if anything fails in
other cases; the final fallback is a hook at the image entry, which works
nearly everywhere but not for .NET where the image entry is not reached).

Adds an -inject_x64 option to inject a 64-bit DR lib into a 32-bit
child from a 64-bit parent, but this option is only sketched out and
is not fully supported yet: #49 covers adding tests and official
support.

Adds library swapping code to find the other-bitwidth library, which
assumes a parallel directory structure.  Add a new fatal error if the
library for a child is not found.

To support generating code for all 3 child-parent cases (same-same,
32-64, and 64-32), and in particular for 32-64, switches the small
gencode sequence for -early_inject_map from using IR to using raw
bytes.  A multi-arch encoder (#1684) would help but we would need
cross-bitwidth support there, which is not on the horizon.  Fixes what
look like bugs in the original gencode generation along the way
(s/pc/cur_local_pos/ and s/local_code_buf/remote_code_buf/): it's not
clear how it worked before.

Adds support for several system calls from a 32-bit parent to a 64-bit
child where the desired NtWow64* system call does not exist.  We use
switch_modes_and_call() for NtProtectVirtualMemory and
NtQueryVirtualMemory.

Changes all types in the injection code to handle 64-bit addresses in
32-bit code.  Adds UNICODE_STRING_32 and
RTL_USER_PROCESS_PARAMETERS_32 for handling 32-bit structures from
64-bit parents.  Similarly, adds RTL_USER_PROCESS_PARAMETERS_64 and
PROCESS_BASIC_INFORMATION64.

Adds get_process_imgname_cmdline() capability for 64-bit remote from 32-bit.

Adds get_remote_proc_address() and uses it to look up
dynamorio_earliest_init_takeover() in a child DR.

Finds the remote ntdll base via a remote query memory walk plus remote
image header parsing.  This requires adding a switch_modes_and_call()
version of NtQueryVirtualMemory (also mentioned above), which needs
64-bit args: so we refactor switch_modes_and_call() to take in a
struct of all 64-bit fields for the args.

Fixes a few bugs in other routines to properly get the image name and
image entry for 32-bit children of 64-bit parents.

Updates environment variable propagation code to handle a 32-bit
parent and a 64-bit child.  Updates a 64-bit parent and 32-bit child
to insert the variables into the 32-bit PEB (64-bit does no good),
which requires finding the 32-bit PEB.  This is done via the 32-bit
TEB, using a hack due to what seems like a kernel bug where it has the
TebBaseAddress 0x2000 too low.

Makes environment variable propagation failures fatal and visible,
unlike previously where errors would just result in silently letting
the child run natively.  Turns some other prior soft errors into fatal
errors on child takeover.

Moves environment variable propagation to post-CreateUserProcess
instead of waiting for ResumeThread, which avoids having to get the
thread context (for which we have no other-bitwidth support) to figure
out whether it's the first thread in the process or not.  We bail on
propagation for pre-Vista where we'd have to wait for ResumeThred.

Generalizes the other-bitwidth Visual Studio toolchain environment
variable setting for use in a new build-and-test other-bitwidth test
which builds dynamorio and the large_options client (to ensure options
are propagated to children; and it has convenient init and exit time
prints) for the other bitwidth, arranges parallel lib dirs, and runs
the other client.

Issue: #803, #147, #607, #49
Fixes #803
@derekbruening
Copy link
Contributor Author

Pasting my notes for ARM on i386:

 $ rm -rf *; CFLAGS=-m32 CXXFLAGS=-m32 cmake -GNinja -DTARGET_ARCH=arm ../src && ninja
 cc: error: unrecognized command line option ‘-mthumb’

And x86-64 on aarch64:

 derek@tx1:~/dr/build_a64_tgt_x64$ rm -rf *; cmake -DTARGET_ARCH=x86_64 -DDEBUG=ON -DBUILD_TESTS=ON ../src && make -j7
 g++: error: unrecognized command line option ‘-m64’

It's not clear whether there's a small or large amount of work needed beyond these initial config issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants