You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recurring JVM crash on `macos-14/Spark 4.1, JDK 17, Scala 2.13 [parquet]` (and occasionally other macOS PR-build jobs) after the one `ParquetReadFromFakeHadoopFsSuite` test completes. Reproduced on at least PRs #4197 and earlier runs.
Same failure shape as closed #2354 (`hdfsThreadDestructor` on linux amd64), but here on macOS aarch64 the offending frame is anonymous.
`hs_err` summary
```
SIGBUS (0xa) at pc=0x000000012e828e00
siginfo: si_signo: 10 (SIGBUS), si_code: 1 (BUS_ADRALN), si_addr: 0x000000012e828e00
Current thread is native thread
Native frames:
C 0x000000012e828e00 ← unmapped/stripped
C [libsystem_pthread.dylib+0x4818] _pthread_tsd_cleanup+0x1e8
C [libsystem_pthread.dylib+0x762c] _pthread_exit+0x54
C [libsystem_pthread.dylib+0x6f48] _pthread_start+0x94
Registers (selected):
pc=0x000000012e828e00 x8=0x000000012e828e00 ← callee == pc
```
Root cause (suspected)
Classic `pthread_key_create` TSD destructor called on dlclose'd code pattern:
libcomet (or a library it pulls in — `hdfs-opendal` / libhdfs) calls `pthread_key_create(&key, destructor_fn)` for cleanup on thread exit.
The one test finishes (`931 ms` in the latest run); hdfs background threads finish their work and call `_pthread_exit`.
`_pthread_tsd_cleanup` walks the TSD key table and jumps to `destructor_fn`.
By this point the page holding `destructor_fn` has been unmapped / the lib has been unloaded, so the fetch at `pc` raises `BUS_ADRALN`.
The stack `_pthread_start → _pthread_exit → _pthread_tsd_cleanup → ` plus `pc == x8` (the TSD cleanup loop stores the destructor in `x8` before `blr x8` on arm64) is the tell.
Where the stale destructor comes from
The suite depends on the `hdfs-opendal` feature (`assume(isFeatureEnabled("hdfs-opendal"))`). On macOS aarch64 CI that feature is enabled, so every run exercises the JNI bridge to Hadoop native libs. Those libs are the most likely registrars of the TSD key (cf. the original #2354 crash that pointed at `hdfsThreadDestructor+0x61`).
Mitigations to consider
Skip `ParquetReadFromFakeHadoopFsSuite` on macOS aarch64 until the root cause is fixed.
Unregister TSD keys at library-unload time, or avoid dlclose-like paths when TSD destructors are registered.
Description
Recurring JVM crash on `macos-14/Spark 4.1, JDK 17, Scala 2.13 [parquet]` (and occasionally other macOS PR-build jobs) after the one `ParquetReadFromFakeHadoopFsSuite` test completes. Reproduced on at least PRs #4197 and earlier runs.
Same failure shape as closed #2354 (`hdfsThreadDestructor` on linux amd64), but here on macOS aarch64 the offending frame is anonymous.
`hs_err` summary
```
SIGBUS (0xa) at pc=0x000000012e828e00
siginfo: si_signo: 10 (SIGBUS), si_code: 1 (BUS_ADRALN), si_addr: 0x000000012e828e00
Current thread is native thread
Native frames:
C 0x000000012e828e00 ← unmapped/stripped
C [libsystem_pthread.dylib+0x4818] _pthread_tsd_cleanup+0x1e8
C [libsystem_pthread.dylib+0x762c] _pthread_exit+0x54
C [libsystem_pthread.dylib+0x6f48] _pthread_start+0x94
Registers (selected):
pc=0x000000012e828e00 x8=0x000000012e828e00 ← callee == pc
```
Root cause (suspected)
Classic `pthread_key_create` TSD destructor called on dlclose'd code pattern:
The stack `_pthread_start → _pthread_exit → _pthread_tsd_cleanup → ` plus `pc == x8` (the TSD cleanup loop stores the destructor in `x8` before `blr x8` on arm64) is the tell.
Where the stale destructor comes from
The suite depends on the `hdfs-opendal` feature (`assume(isFeatureEnabled("hdfs-opendal"))`). On macOS aarch64 CI that feature is enabled, so every run exercises the JNI bridge to Hadoop native libs. Those libs are the most likely registrars of the TSD key (cf. the original #2354 crash that pointed at `hdfsThreadDestructor+0x61`).
Mitigations to consider
hdfsThreadDestructor#2354's hdfsThreadDestructor).Linking PR #4197 where this most recently surfaced.