Skip to content

ByteHook v1.0.8

Compare
Choose a tag to compare
@caikelun caikelun released this 16 May 06:18
· 12 commits to main since this release
v1.0.8
4d07fb8

Incompatible changes

1. Operation record is no longer enabled by default.

Added API and initialization parameters to enable and disable operation record.

Writing operation records every time hook and unhook will have some impact on performance. It is recommended to sample or enable operation records according to actual needs.

Bugs fixed

1. Fix a crash caused by concurrency.

PR: #70. Thanks: @cmzy

2. Fixed an ANR problem caused by concurrency.

The source of the problem is that the dlopen and dlclose proxy functions inside ByteHook need to know whether they are currently inside the linker's global mutex lock in order to properly flush the ELF cache. This mechanism was previously implemented by adding a counter in the dlopen and dlclose proxy functions. However, dlopen and dlclose need to be hooked separately, they are not atomic, which may cause the initial value of the counter to be incorrect.

Now we use the bionic pthread_mutex_internal_t internal data structure to determine whether it is currently in the global mutex lock of the linker. We have found this method to be safe and reliable after extensive testing and verification of the production environment.

3. Avoid the ANR problem caused by mmap being inlinehooked.

Replace the mmap function call used in the internal logic of ByteHook with a system call, which avoids the interaction with the external mmap inlinehook proxy function, which may cause ANR problems.

Improve

1. No more attempt to get timezone via localtime_r() in operation record.

localtime_r() will call getenv() to access the global environ, if there are concurrent setenv() calls at this time, a crash may occur, because in bionic, the access to environ is not protected by a lock.

What we can do currently is try to avoid calling getenv() and setenv().

2. In manual mode, an abort() will be triggered if a proxy function written for automatic mode is used.

Doing so allows for clearer and earlier detection of such do-not-use issues.

3. TLS key usage reduced from 3 to 1.

In most system versions and situations, ByteHook only use 1 TLS key now.

The available TLS keys per process are limited.

New features

1. Add API to get ByteHook version number.

At the same time, in the case of only ELF files, the version number of ByteHook can also be determined in the following ways:

llvm-strings libbytehook.so | grep "bytehook version"

不兼容的变更

1. 默认不再开启操作记录。

增加了 API 和初始化参数用于开启和关闭操作记录。

每次 hook 和 unhook 时都写操作记录对性能会有一些影响,建议根据实际需要采样的或有针对性的开启操作记录。

Bugs 修复

1. 修复了一个并发导致的崩溃。

PR: #70. Thanks: @cmzy

2. 修复了一个并发导致的 ANR 问题。

问题的根源是 ByteHook 内部的 dlopendlclose 代理函数需要知道当前是否在 linker 的全局 mutex 锁内,以便正确刷新 ELF 缓存。这种机制之前是通过在 dlopendlclose 代理函数中增加一个计数器完成的。但是对 dlopendlclose 需要分别 hook,它们不是原子的,这导致了计数器初始值可能不正确。

现在我们改为通过 bionic pthread_mutex_internal_t 内部数据结构来判断当前是否在 linker 的全局 mutex 锁内。我们经过大量的测试和生成环境的验证,发现这种方法是安全和可靠的。

3. 避免 mmap 被 inlinehook 导致的 ANR 问题。

把 ByteHook 内部逻辑中使用的 mmap 函数调用替换成了系统调用,这样做避免了和外部 mmap inlinehook 代理函数之间的相互作用,进而可能导致 ANR 问题。

改进

1. 操作记录中不再尝试通过 localtime_r() 获取时区。

localtime_r() 会调用 getenv() 来访问全局的 environ,如果此时存在并发的 setenv() 调用,则可能会发生崩溃,因为对 environ 的访问在 bionic 中没有锁保护。

我们目前能做的是尽量避免调用 getenv()setenv()

2. 在 manual 模式中,如果使用为 automatic 模式编写的 proxy 函数,将触发主动 abort()

这样做可以更明确和更早的发现这类勿用问题。

3. TLS key 使用从 3 个减少到 1 个。

在绝大多数系统版本和情况下,现在 ByteHook 只会使用一个 TLS key。

每个进程的可用 TLS key 是有限的。

新特性

1. 增加 API 用于获取 ByteHook 的版本号。

同时,在仅有 ELF 文件的情况下,也可以通过以下方式确定 ByteHook 的版本号:

llvm-strings libbytehook.so | grep "bytehook version"