Summary
When experimental_semantic_search=true, the OpenCode plugin can spawn the AFT binary before ONNX runtime initialization finishes. In that window, the child process starts without the resolved ONNX dylib directory, semantic index initialization tries to load libonnxruntime.dylib, and the Rust binary aborts with SIGABRT.
This was reproducible on macOS with the OpenCode plugin and explains repeated restart loops with:
Failed to load ONNX Runtime dylib
dlopen failed
Process exited: code=null, signal=SIGABRT
Max restarts (3) reached
Environment
- Platform: macOS darwin/arm64
- AFT binary:
v0.11.1
- OpenCode:
1.4.3
- Config:
experimental_search_index=true, experimental_semantic_search=true
Root Cause
In packages/opencode-plugin/src/index.ts, ensureOnnxRuntime() previously ran asynchronously without being awaited. BridgePool could be created first, and the AFT child process could be spawned before _ort_dylib_dir was populated.
That means the child process environment did not reliably include the ONNX dylib path before semantic index initialization started.
Because the Rust release profile uses panic = \"abort\", ONNX loading failure terminates the process instead of surfacing as a recoverable error.
Reproduction
- Enable
experimental_semantic_search=true
- Start OpenCode with the AFT plugin
- Trigger an AFT-backed session quickly on a project that causes semantic indexing to initialize early
- Observe repeated crashes in
aft-plugin.log
Observed log sequence:
ONNX Runtime found at ...
Spawning binary: .../v0.11.1/aft
Failed to load ONNX Runtime dylib: ... dlopen failed
Process exited: code=null, signal=SIGABRT
Auto-restart #1/#2/#3
Max restarts (3) reached
Evidence
I verified this locally from logs and runtime behavior.
Before the fix, hwmon was a reliable repro case and repeatedly crashed during semantic search startup.
After changing plugin startup order so ONNX initialization is awaited before bridge creation, the same setup successfully reached:
started
pre-warmed symbol cache
built semantic index
semantic index persisted
Example successful post-fix log sequence:
Spawning binary: /Users/.../.cache/aft/bin/v0.11.1/aft (cwd: /Users/.../hwmon)
Binary version: 0.11.1
pre-warmed symbol cache: 63 files
built semantic index: 76 files, 799 entries
semantic index persisted: 799 entries, 1707.0 KB
Proposed Fix
In the OpenCode plugin startup path:
- await
ensureOnnxRuntime(storageDir) before constructing BridgePool
- set
configOverrides._ort_dylib_dir before any child process spawn
- keep the fast path unchanged when
experimental_semantic_search=false
Regression Coverage
I added two focused tests locally:
ensureOnnxRuntime must resolve before BridgePool construction
- when semantic search is disabled,
ensureOnnxRuntime must not be called
Why This Matters
This is not just a corrupted local ONNX cache case. The race can happen even when the dylib exists and is loadable on disk, because the child process may start before the plugin has injected the runtime directory into process configuration.
Summary
When
experimental_semantic_search=true, the OpenCode plugin can spawn the AFT binary before ONNX runtime initialization finishes. In that window, the child process starts without the resolved ONNX dylib directory, semantic index initialization tries to loadlibonnxruntime.dylib, and the Rust binary aborts withSIGABRT.This was reproducible on macOS with the OpenCode plugin and explains repeated restart loops with:
Failed to load ONNX Runtime dylibdlopen failedProcess exited: code=null, signal=SIGABRTMax restarts (3) reachedEnvironment
v0.11.11.4.3experimental_search_index=true,experimental_semantic_search=trueRoot Cause
In
packages/opencode-plugin/src/index.ts,ensureOnnxRuntime()previously ran asynchronously without being awaited.BridgePoolcould be created first, and the AFT child process could be spawned before_ort_dylib_dirwas populated.That means the child process environment did not reliably include the ONNX dylib path before semantic index initialization started.
Because the Rust release profile uses
panic = \"abort\", ONNX loading failure terminates the process instead of surfacing as a recoverable error.Reproduction
experimental_semantic_search=trueaft-plugin.logObserved log sequence:
ONNX Runtime found at ...Spawning binary: .../v0.11.1/aftFailed to load ONNX Runtime dylib: ... dlopen failedProcess exited: code=null, signal=SIGABRTAuto-restart #1/#2/#3Max restarts (3) reachedEvidence
I verified this locally from logs and runtime behavior.
Before the fix,
hwmonwas a reliable repro case and repeatedly crashed during semantic search startup.After changing plugin startup order so ONNX initialization is awaited before bridge creation, the same setup successfully reached:
startedpre-warmed symbol cachebuilt semantic indexsemantic index persistedExample successful post-fix log sequence:
Spawning binary: /Users/.../.cache/aft/bin/v0.11.1/aft (cwd: /Users/.../hwmon)Binary version: 0.11.1pre-warmed symbol cache: 63 filesbuilt semantic index: 76 files, 799 entriessemantic index persisted: 799 entries, 1707.0 KBProposed Fix
In the OpenCode plugin startup path:
ensureOnnxRuntime(storageDir)before constructingBridgePoolconfigOverrides._ort_dylib_dirbefore any child process spawnexperimental_semantic_search=falseRegression Coverage
I added two focused tests locally:
ensureOnnxRuntimemust resolve beforeBridgePoolconstructionensureOnnxRuntimemust not be calledWhy This Matters
This is not just a corrupted local ONNX cache case. The race can happen even when the dylib exists and is loadable on disk, because the child process may start before the plugin has injected the runtime directory into process configuration.