Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename all global symbols to avoid conflicts when used as a static library #3348

Open
derekbruening opened this issue Jan 17, 2019 · 7 comments

Comments

@derekbruening
Copy link
Contributor

Xref #975. We're using DR as a static library more and more these days, and the current solution for hiding its global symbols, running objcopy --localize-hidden, is a fragile post-link step that conflicts with other build steps such as lld's ICF and which can break when used in complex build configurations.

This issue is a proposal to rename all of DR's global symbols (i.e., everything not file-static) to dramatically reduce the risk of symbol conflicts when statically linked into large applications.

We would leave alone all of the symbols exported as part of the API. Unfortunately these have a variety of prefixes but at least they have prefixes:

$ nm lib64/release/libdynamorio_static.a | grep ' T ' | awk '{print $3}' | perl -p -e 's/^([a-zA-Z0-9]+)_.*/\1/' | sort | uniq -c
     10 decode
      5 disassemble
    311 dr
      3 dynamorio
      3 get
    184 instr
     35 instrlist
     93 opnd
     21 proc
     26 reg
      1 set
      1 __wrap_calloc
      1 __wrap_free
      1 __wrap_malloc
      1 __wrap_realloc
$ nm lib64/release/libdynamorio_static.a | grep ' T ' | awk '{print $3}' | grep '^[gs]et'
get_application_short_name
get_register_name
get_x86_mode
set_x86_mode

Moving on to the non-API routines, there are thousands of them, including functions and variables.
Some thoughts on renaming rules:

"dr_": a simple prefix, except today it means "exported for API" and we have
several instances of "dr_foo" exported and "foo" internally: e.g., internal "safe_read" and interface "dr_safe_read". So how do we rename the internal one?

"dri_": where "i" means internal.

Sthg specific to the module? "drinterp_xxx", "drvm_xxx", "drxl8_xxx", etc.

"dr__" or "_dr": these violate the C/C++ standards which say no double underscores anywhere, no leading single underscore plus capital letter anywhere, and no leading underscore in global symbols.

Could do a single underscore suffix: arch_init_(), clean_call_opt_exit_(),
add_dynamo_vm_area_(), executable_areas_. More of a minimal change.
It would mean "global but not in API". Could combine it with a "dr_" prefix.

@derekbruening
Copy link
Contributor Author

We're de-prioritizing any large step of renaming all symbols, in favor of incremental tackling of likely-to-conflict symbols: mem{set,cpy,move} (xref #3315), perhaps a few others. For mem*, since we can't easily rename the implicit ones, it may be safe to have a statically-linked DR use compiler-provided versions. These should be re-entrant, and shared DR mostly just wants to avoid imports, rather than having a worry about transparency from non-re-entrant functions.

@derekbruening
Copy link
Contributor Author

As mentioned for #3315 which removed memcpy and memset from static DR: there is a downside to having static DR use libc functions. They can be intercepted, such as by sanitizer tools. There are other interoperability issues with such tools, though, so we live with the potential problem of
the app overriding DR's memory function as the lesser of two evils, assuming symbol conflicts are more prevalent and serious. We can also try to rename DR's internal uses of functions like these to avoid interception.

@derekbruening
Copy link
Contributor Author

If we remove the objcopy --localize-hidden step, we hit duplicate symbol errors:

[3/20] Linking CXX executable clients/bin64/opcode_mix_launcher
FAILED: clients/bin64/opcode_mix_launcher 
: && /usr/bin/c++   -Wl,--hash-style=both  -rdynamic -Wl,--as-needed  clients/drcachesim/CMakeFiles/opcode_mix_launcher.dir/tools/opcode_mix_launcher.cpp.o  -o clients/bin64/opcode_mix_launcher  clients/lib64/debug/libdrmemtrace_analyzer.a clients/lib64/debug/libdrmemtrace_opcode_mix.a clients/lib64/debug/libdrmemtrace_raw2trace.a ext/lib64/debug/libdrcovlib_static.a lib64/libdrfrontendlib.a lib64/debug/libdynamorio_static.a -lz clients/lib64/debug/libdirectory_iterator.a lib64/libdrfrontendlib.a lib64/libdrmemfuncs.a -lpthread ext/lib64/debug/libdrutil_static.a ext/lib64/debug/libdrx_static.a ext/lib64/debug/libdrreg_static.a ext/lib64/debug/libdrcontainers.a ext/lib64/debug/libdrmgr_static.a lib64/debug/libdynamorio.so lib64/libdrhelper.a -ldl && /usr/bin/objcopy --only-keep-debug clients/bin64/opcode_mix_launcher clients/bin64/opcode_mix_launcher.debug && /usr/bin/objcopy --add-gnu-debuglink=clients/bin64/opcode_mix_launcher.debug clients/bin64/opcode_mix_launcher && /usr/bin/strip -g -x clients/bin64/opcode_mix_launcher && :
lib64/debug/libdynamorio_static.a(libdynamorio_static.o): In function `open_syscall':
/home/bruening/dr/git/src/core/unix/os.c:4129: multiple definition of `open_syscall'
lib64/libdrfrontendlib.a(os.c.o):/home/bruening/dr/git/src/core/unix/os.c:4129: first defined here
lib64/debug/libdynamorio_static.a(libdynamorio_static.o): In function `os_open':
/home/bruening/dr/git/src/core/unix/os.c:4188: multiple definition of `os_open'
lib64/libdrfrontendlib.a(os.c.o):/home/bruening/dr/git/src/core/unix/os.c:4188: first defined here

So it looks like we may need to solve #1409 and separate out a library with module_get_platform() and os_read().

@derekbruening
Copy link
Contributor Author

For renames: the plan is to rename only single-word or likely-to-conflict double-word symbols (where underscores divide words), and as mentioned above to not attempt to rename the world.

"d_r_" is another candidate prefix for renames.

@derekbruening
Copy link
Contributor Author

The one-word global symbol names in the core library (the ones most likely to conflict) after removing memcpy and memset (this is with a local change to create a _nohide version for testing purposes):

$ nm lib64/debug/libdynamorio_static_nohide.a | egrep ' [A-TV-Z] [^_\.]+$'
000000000011f88d T crc32
000000000000fce0 D debugRegister
00000000001f3dff T decode
0000000000202099 T disassemble
0000000000082ba4 T dispatch
0000000000227852 T emulate
000000000000b9a0 D extensions
0000000000000008 C initstack
00000000001e1860 T loginst
00000000001e1a0a T logopnd
00000000001e1bbd T logtrace
000000000022ba6d T mangle
000000000011fb17 T MD5Final
000000000011f8f0 T MD5Init
000000000011f932 T MD5Update
00000000001b7f67 T memcmp
00000000001b7dab T memmove
000000000009f6d7 T notify
0000000000068ef8 R regparms
0000000000293136 T stackdump
0000000000001830 B stats
00000000001b8077 T strcasecmp
00000000001b7c26 T strchr
00000000001b7e2d T strcmp
00000000001b7bbf T strlen
00000000001b7d1f T strncat
00000000001b7ec1 T strncmp
00000000001b7ca4 T strncpy
00000000001b7c61 T strrchr
00000000001b7ff5 T strstr
00000000001b8115 T strtoul
00000000001b8057 T tolower
00000000001b7bf1 T wcslen

For the string routines, we have two choices: one is to mirror memcpy and make a "strfuncs" library that others can omit. The other is to mirror snprintf which is renamed to our_snprintf. The latter seems the better choice. It may be better long-term to explicitly call the our_str* (or _d_r_str*, however named) or we can minimize code churn by renamed via macro as is done for snprintf (partly b/c of Windows names complicating things).

As part of that we'll have to remove <string.h> from all core files, to avoid macros and intrinsics from bypassing our calls.

@derekbruening
Copy link
Contributor Author

Another complication: what about __strncpy_chk and the other _chk versions,
which gcc emits calls to? Presumably if DR never calls the plain strncpy
gcc will never emit those _chk calls.

@derekbruening
Copy link
Contributor Author

I'm going with d_r_ as the prefix for "global internal". See other ideas above, summarized here on initstack:

d_r_initstack
dri_initstack
dy_initstack
d_r_initstack_
__dr__initstack (technically reserved b/c contains double underscore)
_dr_initstack (technically reserved as global b/c starts w/ underscore)
dr__initstack (technically reserved b/c contains double underscore)
initstack_dr_

derekbruening added a commit that referenced this issue Mar 1, 2019
Rename our_*snprintf* to d_r_*snprintf* per the new global variable naming policy.
Rename our_*sscanf to d_r_*sscanf.

Rename DR's internal string routines to use a d_r_ prefix to reduce
name conflicts with static linking.  Because on Windows we still want
to use the standard names (because we import from ntdll), we resort to
macro renames within core/ just like we're doing with snprintf.

Remove includes of string.h in core/ to avoid macros and inlined
functions pulling in libc functions, now that our versions use
different names.

Issue: #3348
derekbruening added a commit that referenced this issue Mar 1, 2019
Rename our_*snprintf* to d_r_*snprintf* per the new global variable naming policy.
Rename our_*sscanf to d_r_*sscanf.

Rename DR's internal string routines to use a d_r_ prefix to reduce
name conflicts with static linking.  Because on Windows we still want
to use the standard names (because we import from ntdll), we resort to
macro renames within core/ just like we're doing with snprintf.

Remove includes of string.h in core/ to avoid macros and inlined
functions pulling in libc functions, now that our versions use
different names.

Links string.c and io.c into drdecode.

Issue: #3348
derekbruening added a commit that referenced this issue Mar 1, 2019
Adds a new dynamorio_static_nohide library which is only built when
tests are built.  It is used for sanity checks on global symbol names
for better interoperability with toolchains where objcopy
--localize-hidden is a pain point.

Renames core/CMake_globalize_pic_thunks.cmake to
core/CMake_finalize_static_lib.cmake to better reflect the multiple
steps it is taking beyond just the thunks.

Adds a new test api.static_symbols of linking dynamorio_static_nohide.

Adds a new post-build step that runs a new script
CMake_symbol_check.cmake which looks for likely-to-conflict symbols in
dynamorio_static_nohide.

Issue: #3348
derekbruening added a commit that referenced this issue Mar 2, 2019
)

Adds a new dynamorio_static_nohide library which is only built when
tests are built.  It is used for sanity checks on global symbol names
for better interoperability with toolchains where objcopy
--localize-hidden is a pain point.

Renames core/CMake_globalize_pic_thunks.cmake to
core/CMake_finalize_static_lib.cmake to better reflect the multiple
steps it is taking beyond just the thunks.

Adds a new test api.static_symbols of linking dynamorio_static_nohide.

Adds a new post-build step that runs a new script
CMake_symbol_check.cmake which looks for likely-to-conflict symbols in
dynamorio_static_nohide.

Issue: #3348
derekbruening added a commit that referenced this issue Mar 2, 2019
Adds a weak d_r_strncmp in drlibc to fix a Mac link error in non-core
uses of drlibc.

Issue: #3348
derekbruening added a commit that referenced this issue Mar 2, 2019
Adds a weak d_r_strncmp in drlibc to fix a Mac link error in non-core
uses of drlibc.

Issue: #3348
derekbruening added a commit that referenced this issue Mar 2, 2019
Renames several single-word global symbols to help reduce the chance
of name conflicts:

+ s/debugRegister/d_r_debug_register/
+ s/crc32/d_r_crc32/
+ s/dispatch/d_r_dispatch/
+ s/emulate/d_r_emulate/
+ s/extensions/base_extensions/

Issue: #3348
derekbruening added a commit that referenced this issue Mar 2, 2019
Renames several single-word global symbols to help reduce the chance
of name conflicts:

+ s/debugRegister/d_r_debug_register/
+ s/crc32/d_r_crc32/
+ s/dispatch/d_r_dispatch/
+ s/emulate/d_r_emulate/
+ s/extensions/base_extensions/

Issue: #3348
derekbruening added a commit that referenced this issue Mar 2, 2019
Renames more single-word global symbols to help reduce the chance
of name conflicts:

+ s/initstack/d_r_initstack/
+ s/loginst/d_r_loginst/
+ s/logopnd/d_r_logopnd/
+ s/logtrace/d_r_logtrace/
+ s/mangle/d_r_mangle/
+ s/mangle/d_r_mangle/
+ s/MD5Init/d_r_md5_init/
+ s/MD5Final/d_r_md5_final/
+ s/MD5Update/d_r_md5_update/
+ s/notify/d_r_notify/
+ s/stackdump/d_r_stackdump/
+ s/stats/d_r_stats/

Issue: #3348
derekbruening added a commit that referenced this issue Mar 3, 2019
Renames more single-word global symbols to help reduce the chance
of name conflicts:

+ s/initstack/d_r_initstack/
+ s/loginst/d_r_loginst/
+ s/logopnd/d_r_logopnd/
+ s/logtrace/d_r_logtrace/
+ s/mangle/d_r_mangle/
+ s/mangle/d_r_mangle/
+ s/MD5Init/d_r_md5_init/
+ s/MD5Final/d_r_md5_final/
+ s/MD5Update/d_r_md5_update/
+ s/notify/d_r_notify/
+ s/stackdump/d_r_stackdump/
+ s/stats/d_r_stats/

Issue: #3348
derekbruening added a commit that referenced this issue Mar 8, 2019
Renames multi-word global symbols with popular names in other software
to help reduce the chance of name conflicts:

s/config_{in,ex}it/d_r_config_{in,ex}it/
s/mutex_lock/d_r_mutex_lock/
s/mutex_unlock/d_r_mutex_unlock/
s/mutex_trylock/d_r_mutex_trylock/
s/mutex_delete/d_r_mutex_delete/
s/mutex_lock_app/d_r_mutex_lock_app/
s/mutex_mark_as_app/d_r_mutex_mark_as_app/
s/mutex_fork_reset/d_r_mutex_fork_reset/
s/read_lock/d_r_read_lock/
s/write_lock/d_r_write_lock/
s/write_unlock/d_r_write_unlock/
s/read_unlock/d_r_read_unlock/
s/write_trylock/d_r_write_trylock/
s/parse_word/d_r_parse_word/
s/parse_int/d_r_parse_int/
s/print_timestamp/d_r_print_timestamp/
s/get_thread_id/d_r_get_thread_id/
s/internal_error/d_r_internal_error/
s/safe_read/d_r_safe_read/

Issue: #3348
derekbruening added a commit that referenced this issue Mar 8, 2019
Renames multi-word global symbols with popular names in other software
to help reduce the chance of name conflicts:

s/config_{in,ex}it/d_r_config_{in,ex}it/
s/mutex_lock/d_r_mutex_lock/
s/mutex_unlock/d_r_mutex_unlock/
s/mutex_trylock/d_r_mutex_trylock/
s/mutex_delete/d_r_mutex_delete/
s/mutex_lock_app/d_r_mutex_lock_app/
s/mutex_mark_as_app/d_r_mutex_mark_as_app/
s/mutex_fork_reset/d_r_mutex_fork_reset/
s/read_lock/d_r_read_lock/
s/write_lock/d_r_write_lock/
s/write_unlock/d_r_write_unlock/
s/read_unlock/d_r_read_unlock/
s/write_trylock/d_r_write_trylock/
s/parse_word/d_r_parse_word/
s/parse_int/d_r_parse_int/
s/print_timestamp/d_r_print_timestamp/
s/get_thread_id/d_r_get_thread_id/
s/internal_error/d_r_internal_error/
s/safe_read/d_r_safe_read/

Issue: #3348
derekbruening added a commit that referenced this issue Mar 11, 2019
Renames more global variables whose names are popular in other
third-party C libraries:

+ s/get_parameter/d_r_get_parameter/
+ s/get_proc_address/d_r_get_proc_address/
+ s/print_log/d_r_print_log/
+ s/map_file/d_r_map_file/
+ s/unmap_file/d_r_unmap_file/
+ s/get_tls/d_r_get_tls/
+ s/set_tls/d_r_set_tls/
+ s/get_num_threads/d_r_get_num_threads/
+ s/get_random_seed/d_r_get_random_seed/
+ s/set_random_seed/d_r_set_random_seed/
+ s/heap_init/d_r_heap_init/
+ s/heap_exit/d_r_heap_exit/
+ s/decode_init/d_r_decode_init/
+ s/monitor_init/d_r_monitor_init/
+ s/monitor_exit/d_r_monitor_exit/
+ s/os_init/d_r_os_init/
+ s/os_exit/d_r_os_exit/
+ s/signal_init/d_r_signal_init/
+ s/signal_exit/d_r_signal_exit/
+ s/link_init/d_r_link_init/
+ s/link_exit/d_r_link_exit/
+ s/arch_init/d_r_arch_init/
+ s/arch_exit/d_r_arch_exit/

Issue: #3348
derekbruening added a commit that referenced this issue Mar 11, 2019
Renames more global variables whose names are popular in other
third-party C libraries:

+ s/get_parameter/d_r_get_parameter/
+ s/get_proc_address/d_r_get_proc_address/
+ s/print_log/d_r_print_log/
+ s/map_file/d_r_map_file/
+ s/unmap_file/d_r_unmap_file/
+ s/get_tls/d_r_get_tls/
+ s/set_tls/d_r_set_tls/
+ s/get_num_threads/d_r_get_num_threads/
+ s/get_random_seed/d_r_get_random_seed/
+ s/set_random_seed/d_r_set_random_seed/
+ s/heap_init/d_r_heap_init/
+ s/heap_exit/d_r_heap_exit/
+ s/decode_init/d_r_decode_init/
+ s/monitor_init/d_r_monitor_init/
+ s/monitor_exit/d_r_monitor_exit/
+ s/os_init/d_r_os_init/
+ s/os_exit/d_r_os_exit/
+ s/signal_init/d_r_signal_init/
+ s/signal_exit/d_r_signal_exit/
+ s/link_init/d_r_link_init/
+ s/link_exit/d_r_link_exit/
+ s/arch_init/d_r_arch_init/
+ s/arch_exit/d_r_arch_exit/

Issue: #3348
derekbruening added a commit that referenced this issue Jun 2, 2020
Provides isolation of mem{cpy,set,move} in the DR static library on
AArchXX.  Moves the assembly mem{cpy,set} into a separate memfuncs.asm
for arm and aarch64, and links in memmove.c.  Marks the
__mem{cpy,set}_chk aliases as weak.  This brings AArchXX into parity
with x86 on library symbol isolation.

Removes the exceptions for AArchXX in the CMake_symbol_check test.
Confirms the build-time test fails without the code changes here:

  CMake Error at xxx/core/CMake_symbol_check.cmake:98 (message):
    *** Error:
    xxx/build_a64_dbg_tests/lib64/debug/libdynamorio_static_nohide.a
    contains a likely-to-conflict symbol: 4279: 00000000002cea3c 0 FUNC GLOBAL
    HIDDEN 248 memcpy

Issue: #3315, #3348
derekbruening added a commit that referenced this issue Jun 3, 2020
Provides isolation of mem{cpy,set,move} in the DR static library on
AArchXX.  Moves the assembly mem{cpy,set} into a separate memfuncs.asm
for arm and aarch64, and links in memmove.c.  Marks the
__mem{cpy,set}_chk aliases as weak.  This brings AArchXX into parity
with x86 on library symbol isolation.

Removes the exceptions for AArchXX in the CMake_symbol_check test.
Confirms the build-time test fails without the code changes here:

  CMake Error at xxx/core/CMake_symbol_check.cmake:98 (message):
    *** Error:
    xxx/build_a64_dbg_tests/lib64/debug/libdynamorio_static_nohide.a
    contains a likely-to-conflict symbol: 4279: 00000000002cea3c 0 FUNC GLOBAL
    HIDDEN 248 memcpy

Issue: #3315, #3348
derekbruening added a commit that referenced this issue Jan 11, 2022
Fixes a symbol conflict with a python package PyTables that is hit
when statically linking with certain toolchains where non-exported
symbol hiding is not available.  The fix is to rename set_cache_size()
to proc_set_cache_size().  This also matches the proc_ prefix for
related functions, so we do not resort to the d_r_ prefix.

Issue: #3348
derekbruening added a commit that referenced this issue Jan 11, 2022
…5276)

Fixes a symbol conflict with a python package PyTables that is hit
when statically linking with certain toolchains where non-exported
symbol hiding is not available.  The fix is to rename set_cache_size()
to proc_set_cache_size().  This also matches the proc_ prefix for
related functions, so we do not resort to the d_r_ prefix.

Issue: #3348
abhinav92003 added a commit that referenced this issue Jul 28, 2022
Renames dynamorio_static_nohide to dynamorio_static and deprecates
the version where we hide DR symbols using hide_symbols.

Removes tobuild_static_nohide_api, since now all static libs use
the nohide variant.

Documents hide_symbols as unsafe because it may lead to
confusing linking behavior.

Issue: #5570, #3348
abhinav92003 added a commit that referenced this issue Jul 29, 2022
Renames dynamorio_static_nohide to dynamorio_static and deprecates
the version where we hide DR symbols using hide_symbols.

Removes tobuild_static_nohide_api, since now all static libs use
the nohide variant.

Documents hide_symbols as unsafe because it may lead to
confusing linking behavior.

Issue: #5570, #3348
dolanzhao pushed a commit that referenced this issue Aug 1, 2022
Renames dynamorio_static_nohide to dynamorio_static and deprecates
the version where we hide DR symbols using hide_symbols.

Removes tobuild_static_nohide_api, since now all static libs use
the nohide variant.

Documents hide_symbols as unsafe because it may lead to
confusing linking behavior.

Issue: #5570, #3348
derekbruening added a commit that referenced this issue Nov 14, 2023
To avoid symbol conflicts with IN, OUT, and INOUT, which are defined
in our exported headers, we add DR_PARAM_ to the front.

Issue: #3348
derekbruening added a commit that referenced this issue Nov 14, 2023
To avoid symbol conflicts with IN, OUT, and INOUT, which are defined in
our exported headers, we add DR_PARAM_ to the front.

The aarch64 codec.c was left as the shorter names since the
aarch64_check_codec_order.py assumes single-line signatures.

Issue: #3348
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant