Skip to content

fix: prevent abort in local inference #7633

Merged
jh-block merged 1 commit intoblock:mainfrom
KubeCat:fix/local-inference-cpp-exception
Mar 4, 2026
Merged

fix: prevent abort in local inference #7633
jh-block merged 1 commit intoblock:mainfrom
KubeCat:fix/local-inference-cpp-exception

Conversation

@KubeCat
Copy link
Contributor

@KubeCat KubeCat commented Mar 4, 2026

Fix a bug "Rust cannot catch foreign exceptions" abort that occurs when using Local Inference with models that have long Jinja2 chat templates (e.g. Qwen3 GGUF models).

The problem

  1. apply_chat_template() calls llama_chat_apply_template() which internally does a C++ map .at() lookup on the template string.
  2. When the string is a full Jinja2 template (not a short name like "chatml"), this throws std::out_of_range.
  3. The exception is normally caught in C++, but when goose's other native dependencies are linked into the binary, the C++ exception-handling ABI breaks and the exception propagates across the Rust FFI boundary, causing a process abort.

The solution

Switch from apply_chat_template() to apply_chat_template_with_tools_oaicompat() in inference_emulated_tools.rs

Why?

This function has its own C++ try-catch wrapper. When called with tools=None, it produces the same prompt. This aligns the emulated tools path with the native tools path, which already uses the oaicompat variant.

Testing

A few integration tests were improved, the integration tests for the local inference had been abandoned and marked with #[ignore], this is problematic because I could not understand very clearly how to actually test my changes. The changes in testing are as follows:

  1. Fix broken model ID in local_inference_integration.rs, add TEST_MODEL env var override, removed perf assertion since it would never pass because the model was never truly in a cold start state.
  2. Separate cold vs warm perf benchmark with TEST_MODEL env var support in local_inference_perf.rs

Type of Change

  • Feature
  • Bug fix
  • Refactor / Code quality
  • Performance improvement
  • Documentation
  • Tests
  • Security fix
  • Build / Release
  • Other (specify below)

AI Assistance

  • This PR was created or reviewed with AI assistance
  • 56 unit tests pass (parsing, engine, hf_models)
  • Integration tests pass with Llama 1B and Qwen3 32B, on both CPU and CUDA
  • Qwen3 32B was the crashing model now works

Testing

  • Tested end-to-end with bartowski/Qwen_Qwen3-32B-GGUF:Q4_K_M via test_provider_configuration
    previously aborted, now passes.
  • Verified the crash is specific to the goose binary (standalone llama-cpp-2 tests pass because the
    C++ ABI isn't disturbed by additional native deps).
  • Verified that short template names ("chatml") never triggered the crash — only full Jinja2
    template strings.
  • Existing unit tests for emulated tools parsing, tool parsing, and inference engine all continue to
    pass (they test downstream of the template step).

Related Issues

  • Related to bug(windows): MSVC link failure in goosed when v8 and llama-cpp are linked together (LNK2038/LNK2005/LNK1169) #7410 same root cause (C++ exception-handling ABI clash between llama-cpp and other native deps). That issue manifests as a Windows MSVC link failure; this one manifests as a Linux runtime abort.
  • Follow-up to Fix Windows MSVC linking issues #7511 which fixed the Windows MSVC link-time side of the v8/llama-cpp C++ ABI conflict (/FORCE:MULTIPLE for duplicate std::exception_ptr symbols). This PR addresses the Linux runtime side where the exception links but doesn't propagate correctly across the Rust FFI boundary.
  • Discovered an issue where CUDA will crash when context_size is null in registry which falls through to n_ctx_train, despite the fact that estimate_max_context_for_memory should cap it.

Screenshots/Demos (for UX changes)

image

After: Model loads and generates normally.
image

With CUDA:
Screenshot_20260304_023801

@KubeCat KubeCat force-pushed the fix/local-inference-cpp-exception branch from ca43bd8 to d198761 Compare March 4, 2026 01:18
@KubeCat KubeCat force-pushed the fix/local-inference-cpp-exception branch from d198761 to d39c9a2 Compare March 4, 2026 01:19
@KubeCat KubeCat marked this pull request as ready for review March 4, 2026 01:45
Copy link
Collaborator

@jh-block jh-block left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this!

@jh-block jh-block added this pull request to the merge queue Mar 4, 2026
Merged via the queue into block:main with commit dafc4db Mar 4, 2026
20 checks passed
lifeizhou-ap added a commit that referenced this pull request Mar 4, 2026
* main:
  docs: add GOOSE_INPUT_LIMIT environment variable documentation (#7299)
  Merge platform/builtin extensions (#7630)
  Clean up stale references to removed components (#7644)
  fix: scope empty session reuse to current window to prevent session mixing (#7602)
  fix: prevent abort in local inference  (#7633)
  Revert git patch for llama-cpp-2 (#7642)
  docs: update recipe usage step to reflect auto-submit behavior (#7639)
  docs: add guide for customizing the sidebar (#7638)
  docs: update Claude Code approve behavior and model list in cli-providers guide (#7448)
  fix: restore provider and extensions for LRU-evicted sessions (#7616)
  Restore goosed logging (#7622)
craigwalkeruk pushed a commit to craigwalkeruk/custom-goose that referenced this pull request Mar 5, 2026
Signed-off-by: Kube Cat <cat@kubecat.io>
Co-authored-by: Kube Cat <cat@kubecat.io>
tlongwell-block added a commit that referenced this pull request Mar 5, 2026
* origin/main: (107 commits)
  Merge platform/builtin extensions (#7630)
  Clean up stale references to removed components (#7644)
  fix: scope empty session reuse to current window to prevent session mixing (#7602)
  fix: prevent abort in local inference  (#7633)
  Revert git patch for llama-cpp-2 (#7642)
  docs: update recipe usage step to reflect auto-submit behavior (#7639)
  docs: add guide for customizing the sidebar (#7638)
  docs: update Claude Code approve behavior and model list in cli-providers guide (#7448)
  fix: restore provider and extensions for LRU-evicted sessions (#7616)
  Restore goosed logging (#7622)
  feat: return structured {stdout, stderr} from shell tool with output schema (#7604)
  Improve custom provider creation experience (#7541)
  fix(scheduler): schedules added via CLI showing up in UI (#7594)
  chore: openai reasoning model cleanup (#7529)
  chore(deps): bump hono from 4.12.1 to 4.12.3 in /evals/open-model-gym/mcp-harness (#7585)
  chore(deps): bump minimatch from 10.1.1 to 10.2.3 in /evals/open-model-gym/suite (#7498)
  chore(deps): bump swiper from 11.2.10 to 12.1.2 in /documentation (#7368)
  Better network failure error & antrhopic retry (#7595)
  feat: make the text bar persistent and add a queue for messages (#7560)
  fix: outdated clippy command in goosehints (#7590)
  ...

# Conflicts:
#	Cargo.lock
#	Cargo.toml
#	crates/goose-server/src/commands/agent.rs
#	crates/goose-server/src/main.rs
#	crates/goose-server/src/routes/reply.rs
Abhijay007 pushed a commit to Abhijay007/goose that referenced this pull request Mar 6, 2026
Signed-off-by: Kube Cat <cat@kubecat.io>
Co-authored-by: Kube Cat <cat@kubecat.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants