Describe the bug
binding_java / ubuntu-latest / hf / hf_bucket is currently crashing the forked JVM with SIGSEGV instead of returning a normal test failure:
This is not caused by calling hf-xet's blocking API directly.
The Java binding builds a blocking operator by wrapping OpenDAL's async operator, and the crash happens when that blocking wrapper executes an HF/XET async write future on Linux x86_64.
What we know so far
The relevant execution model is:
- Java behavior tests create a blocking operator via
AsyncOperator.blocking().
- The Java blocking binding then calls Rust
blocking::Operator.
- OpenDAL's blocking operator drives the underlying async write via
Handle::block_on(...).
- HF write goes through the XET async upload path.
- That async upload path internally spawns Tokio async tasks and blocking tasks.
So the failing shape is:
Java sync API -> Rust blocking::Operator::block_on(async HF/XET write future) -> HF/XET async upload graph
Evidence
The following repro results were observed with real HF bucket credentials:
- macOS + Java HF blocking write: passes
- Ubuntu 24.04 x86_64 container + Java HF async write: passes
- Ubuntu 24.04 x86_64 container +
AsyncOperator.blocking() construction only: passes
- Ubuntu 24.04 x86_64 container + first blocking write: JVM crashes with
SIGSEGV
- Running the minimal repro directly via
java -cp ... also crashes, so this is not caused by Surefire/JUnit
This rules out several earlier hypotheses:
- not HF credentials
- not Java test code itself
- not
HfCore session initialization timing
- not using XET's own blocking API
- not Maven Surefire
A temporary experiment also replaced the Java blocking runtime with a plain Tokio runtime without the JNI thread hooks, and Linux still crashed. So the primary issue does not appear to be the Java executor's attach/detach hooks either.
Likely root cause
The most likely root cause is an incompatibility between:
- Java's native blocking binding path using Rust
Handle::block_on(...), and
- the HF/XET async upload implementation that internally fans out into spawned async/blocking Tokio work.
In other words, the failure is at the boundary between the Java blocking wrapper and the HF/XET async execution graph, not in ordinary HF business logic.
Minimal repro direction
A reduced repro can be built around:
- Construct
AsyncOperator.of("hf", config)
- Convert it with
.blocking()
- Perform a single
write() against an HF bucket on Linux x86_64
That is sufficient to trigger the crash in the affected environment.
Expected behavior
HF behavior tests should either pass or fail with a normal OpenDAL/Java exception. They must not terminate the JVM with SIGSEGV.
Temporary mitigation
Until the Java blocking path is redesigned or HF/XET becomes compatible with this execution model, the practical mitigation is to disable the Java HF behavior case in CI.
Describe the bug
binding_java / ubuntu-latest / hf / hf_bucketis currently crashing the forked JVM withSIGSEGVinstead of returning a normal test failure:Process Exit Code: 139org.apache.opendal.test.behavior.BlockingWriteTestThis is not caused by calling
hf-xet's blocking API directly.The Java binding builds a blocking operator by wrapping OpenDAL's async operator, and the crash happens when that blocking wrapper executes an HF/XET async write future on Linux x86_64.
What we know so far
The relevant execution model is:
AsyncOperator.blocking().blocking::Operator.Handle::block_on(...).So the failing shape is:
Java sync API -> Rust blocking::Operator::block_on(async HF/XET write future) -> HF/XET async upload graphEvidence
The following repro results were observed with real HF bucket credentials:
AsyncOperator.blocking()construction only: passesSIGSEGVjava -cp ...also crashes, so this is not caused by Surefire/JUnitThis rules out several earlier hypotheses:
HfCoresession initialization timingA temporary experiment also replaced the Java blocking runtime with a plain Tokio runtime without the JNI thread hooks, and Linux still crashed. So the primary issue does not appear to be the Java executor's attach/detach hooks either.
Likely root cause
The most likely root cause is an incompatibility between:
Handle::block_on(...), andIn other words, the failure is at the boundary between the Java blocking wrapper and the HF/XET async execution graph, not in ordinary HF business logic.
Minimal repro direction
A reduced repro can be built around:
AsyncOperator.of("hf", config).blocking()write()against an HF bucket on Linux x86_64That is sufficient to trigger the crash in the affected environment.
Expected behavior
HF behavior tests should either pass or fail with a normal OpenDAL/Java exception. They must not terminate the JVM with
SIGSEGV.Temporary mitigation
Until the Java blocking path is redesigned or HF/XET becomes compatible with this execution model, the practical mitigation is to disable the Java HF behavior case in CI.