fix: prevent abort in local inference by KubeCat · Pull Request #7633 · block/goose

KubeCat · 2026-03-04T01:10:11Z

Fix a bug "Rust cannot catch foreign exceptions" abort that occurs when using Local Inference with models that have long Jinja2 chat templates (e.g. Qwen3 GGUF models).

The problem

apply_chat_template() calls llama_chat_apply_template() which internally does a C++ map .at() lookup on the template string.
When the string is a full Jinja2 template (not a short name like "chatml"), this throws std::out_of_range.
The exception is normally caught in C++, but when goose's other native dependencies are linked into the binary, the C++ exception-handling ABI breaks and the exception propagates across the Rust FFI boundary, causing a process abort.

The solution

Switch from apply_chat_template() to apply_chat_template_with_tools_oaicompat() in inference_emulated_tools.rs

Why?

This function has its own C++ try-catch wrapper. When called with tools=None, it produces the same prompt. This aligns the emulated tools path with the native tools path, which already uses the oaicompat variant.

Testing

A few integration tests were improved, the integration tests for the local inference had been abandoned and marked with #[ignore], this is problematic because I could not understand very clearly how to actually test my changes. The changes in testing are as follows:

Fix broken model ID in local_inference_integration.rs, add TEST_MODEL env var override, removed perf assertion since it would never pass because the model was never truly in a cold start state.
Separate cold vs warm perf benchmark with TEST_MODEL env var support in local_inference_perf.rs

Type of Change

AI Assistance

This PR was created or reviewed with AI assistance
56 unit tests pass (parsing, engine, hf_models)
Integration tests pass with Llama 1B and Qwen3 32B, on both CPU and CUDA
Qwen3 32B was the crashing model now works

Testing

Tested end-to-end with bartowski/Qwen_Qwen3-32B-GGUF:Q4_K_M via test_provider_configuration —
previously aborted, now passes.
Verified the crash is specific to the goose binary (standalone llama-cpp-2 tests pass because the
C++ ABI isn't disturbed by additional native deps).
Verified that short template names ("chatml") never triggered the crash — only full Jinja2
template strings.
Existing unit tests for emulated tools parsing, tool parsing, and inference engine all continue to
pass (they test downstream of the template step).

Related Issues

Related to bug(windows): MSVC link failure in goosed when v8 and llama-cpp are linked together (LNK2038/LNK2005/LNK1169) #7410 same root cause (C++ exception-handling ABI clash between llama-cpp and other native deps). That issue manifests as a Windows MSVC link failure; this one manifests as a Linux runtime abort.
Follow-up to Fix Windows MSVC linking issues #7511 which fixed the Windows MSVC link-time side of the v8/llama-cpp C++ ABI conflict (/FORCE:MULTIPLE for duplicate std::exception_ptr symbols). This PR addresses the Linux runtime side where the exception links but doesn't propagate correctly across the Rust FFI boundary.
Discovered an issue where CUDA will crash when context_size is null in registry which falls through to n_ctx_train, despite the fact that estimate_max_context_for_memory should cap it.

Screenshots/Demos (for UX changes)

After: Model loads and generates normally.

With CUDA:

Signed-off-by: Kube Cat <cat@kubecat.io>

jh-block

Thank you for this!

* main: docs: add GOOSE_INPUT_LIMIT environment variable documentation (#7299) Merge platform/builtin extensions (#7630) Clean up stale references to removed components (#7644) fix: scope empty session reuse to current window to prevent session mixing (#7602) fix: prevent abort in local inference (#7633) Revert git patch for llama-cpp-2 (#7642) docs: update recipe usage step to reflect auto-submit behavior (#7639) docs: add guide for customizing the sidebar (#7638) docs: update Claude Code approve behavior and model list in cli-providers guide (#7448) fix: restore provider and extensions for LRU-evicted sessions (#7616) Restore goosed logging (#7622)

Signed-off-by: Kube Cat <cat@kubecat.io> Co-authored-by: Kube Cat <cat@kubecat.io>

* origin/main: (107 commits) Merge platform/builtin extensions (#7630) Clean up stale references to removed components (#7644) fix: scope empty session reuse to current window to prevent session mixing (#7602) fix: prevent abort in local inference (#7633) Revert git patch for llama-cpp-2 (#7642) docs: update recipe usage step to reflect auto-submit behavior (#7639) docs: add guide for customizing the sidebar (#7638) docs: update Claude Code approve behavior and model list in cli-providers guide (#7448) fix: restore provider and extensions for LRU-evicted sessions (#7616) Restore goosed logging (#7622) feat: return structured {stdout, stderr} from shell tool with output schema (#7604) Improve custom provider creation experience (#7541) fix(scheduler): schedules added via CLI showing up in UI (#7594) chore: openai reasoning model cleanup (#7529) chore(deps): bump hono from 4.12.1 to 4.12.3 in /evals/open-model-gym/mcp-harness (#7585) chore(deps): bump minimatch from 10.1.1 to 10.2.3 in /evals/open-model-gym/suite (#7498) chore(deps): bump swiper from 11.2.10 to 12.1.2 in /documentation (#7368) Better network failure error & antrhopic retry (#7595) feat: make the text bar persistent and add a queue for messages (#7560) fix: outdated clippy command in goosehints (#7590) ... # Conflicts: # Cargo.lock # Cargo.toml # crates/goose-server/src/commands/agent.rs # crates/goose-server/src/main.rs # crates/goose-server/src/routes/reply.rs

Signed-off-by: Kube Cat <cat@kubecat.io> Co-authored-by: Kube Cat <cat@kubecat.io>

KubeCat force-pushed the fix/local-inference-cpp-exception branch from ca43bd8 to d198761 Compare March 4, 2026 01:18

fix: prevent C++ exception abort in local inference emulated tools

d39c9a2

Signed-off-by: Kube Cat <cat@kubecat.io>

KubeCat force-pushed the fix/local-inference-cpp-exception branch from d198761 to d39c9a2 Compare March 4, 2026 01:19

KubeCat marked this pull request as ready for review March 4, 2026 01:45

jh-block approved these changes Mar 4, 2026

View reviewed changes

jh-block added this pull request to the merge queue Mar 4, 2026

Merged via the queue into block:main with commit dafc4db Mar 4, 2026
20 checks passed

craigwalkeruk pushed a commit to craigwalkeruk/custom-goose that referenced this pull request Mar 5, 2026

fix: prevent abort in local inference (block#7633)

e16c46d

Signed-off-by: Kube Cat <cat@kubecat.io> Co-authored-by: Kube Cat <cat@kubecat.io>

Abhijay007 pushed a commit to Abhijay007/goose that referenced this pull request Mar 6, 2026

fix: prevent abort in local inference (block#7633)

f22ba8a

Signed-off-by: Kube Cat <cat@kubecat.io> Co-authored-by: Kube Cat <cat@kubecat.io>

github-actions bot mentioned this pull request Mar 10, 2026

chore(release): release version 1.28.0 (minor) #7780

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent abort in local inference #7633

fix: prevent abort in local inference #7633
jh-block merged 1 commit intoblock:mainfrom
KubeCat:fix/local-inference-cpp-exception

KubeCat commented Mar 4, 2026 •

edited

Loading

Uh oh!

jh-block left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KubeCat commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The problem

The solution

Why?

Testing

Type of Change

AI Assistance

Testing

Related Issues

Screenshots/Demos (for UX changes)

Uh oh!

jh-block left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KubeCat commented Mar 4, 2026 •

edited

Loading