Skip to content

fix(tui): wrap startup embed calls with tokio::time::timeout (#2879)#2880

Merged
bug-ops merged 1 commit intomainfrom
2879-tui-ollama-embed-timeout
Apr 11, 2026
Merged

fix(tui): wrap startup embed calls with tokio::time::timeout (#2879)#2880
bug-ops merged 1 commit intomainfrom
2879-tui-ollama-embed-timeout

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Apr 11, 2026

Summary

  • Wraps 5 embed call sites on the agent startup path with tokio::time::timeout(config.timeouts.embedding_seconds) to prevent TUI from hanging indefinitely when an Ollama embedding provider is slow
  • On timeout: warn! logged, LlmError::Timeout returned — each caller degrades gracefully, startup continues
  • Adds two unit tests for resolve_rl_embed_dim (timeout-fallback path and fast-provider path)
  • Extends MockProvider with with_embedding() and with_embed_delay() builders

Sites fixed

File Function
src/runner.rs resolve_rl_embed_dim (new embedding_timeout_secs param)
src/daemon.rs call site updated for new param
crates/zeph-core/src/bootstrap/mcp.rs create_mcp_registry embed_fn
crates/zeph-core/src/bootstrap/skills.rs create_skill_matcher embed_fn
crates/zeph-core/src/agent/mcp.rs sync_mcp_registry + rebuild_semantic_index
crates/zeph-core/src/agent/mod.rs rebuild_skill_matcher embed_fn

Reference pattern: SkillMatcher::new in crates/zeph-skills/src/matcher.rs already used tokio::time::timeout correctly; this PR applies the same pattern to all remaining startup sites.

Test plan

  • cargo +nightly fmt --check — clean
  • cargo clippy --workspace --features full -- -D warnings — clean
  • cargo nextest run --workspace --features full --lib --bins — 8149 passed, 25 skipped
  • resolve_rl_embed_dim_timeout_uses_fallback — mock delay 1100ms, timeout 1s → returns FALLBACK 1536
  • resolve_rl_embed_dim_fast_provider_returns_dim — fast mock returns vec[768] → returns 768

Closes #2879

@github-actions github-actions Bot added llm zeph-llm crate (Ollama, Claude) rust Rust code changes core zeph-core crate bug Something isn't working size/M Medium PR (51-200 lines) labels Apr 11, 2026
Five embed call sites on the agent startup path had no timeout, causing
the TUI to hang indefinitely when an Ollama embedding provider was slow
to respond (e.g. model still loading or queued requests).

Sites fixed:
- src/runner.rs: resolve_rl_embed_dim (new embedding_timeout_secs param)
- src/daemon.rs: resolve_rl_embed_dim call site updated
- crates/zeph-core/src/bootstrap/mcp.rs: create_mcp_registry embed_fn
- crates/zeph-core/src/bootstrap/skills.rs: create_skill_matcher embed_fn
- crates/zeph-core/src/agent/mcp.rs: sync_mcp_registry + rebuild_semantic_index
- crates/zeph-core/src/agent/mod.rs: rebuild_skill_matcher embed_fn

All sites now use config.timeouts.embedding_seconds (default 30 s).
On timeout a warn! is logged and LlmError::Timeout returned; each caller
already handles this gracefully so startup continues without abort.

Adds two unit tests for resolve_rl_embed_dim covering the timeout-fallback
and fast-provider paths. Extends MockProvider in zeph-llm with
with_embedding() and with_embed_delay() builders to support these tests.

Closes #2879
@bug-ops bug-ops force-pushed the 2879-tui-ollama-embed-timeout branch from a48ebbd to 3462841 Compare April 11, 2026 11:43
@bug-ops bug-ops enabled auto-merge (squash) April 11, 2026 11:43
@bug-ops bug-ops merged commit 034d45b into main Apr 11, 2026
30 checks passed
@bug-ops bug-ops deleted the 2879-tui-ollama-embed-timeout branch April 11, 2026 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working core zeph-core crate llm zeph-llm crate (Ollama, Claude) rust Rust code changes size/M Medium PR (51-200 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tui: startup hangs when ollama embed provider is slow (missing timeouts on embed calls)

1 participant