Skip to content

feat(inference): allow setting custom inference timeout#672

Open
pentschev wants to merge 3 commits intoNVIDIA:mainfrom
pentschev:inference-timeout
Open

feat(inference): allow setting custom inference timeout#672
pentschev wants to merge 3 commits intoNVIDIA:mainfrom
pentschev:inference-timeout

Conversation

@pentschev
Copy link
Copy Markdown

Summary

Makes the inference routing timeout configurable via openshell inference set --timeout <secs> and openshell inference update --timeout <secs>, replacing the hardcoded 60-second default. Timeout changes propagate dynamically to running sandboxes within the route refresh interval (~5 seconds) without requiring sandbox recreation.

The timeout was observed running OpenCode for a complex build task on a DGX Spark running nemotron-3-super:120b via Ollama, this feature allows longer running tasks to succeed.

Related Issue

Closes #641

Changes

  • Add timeout_secs field to ClusterInferenceConfig, SetClusterInferenceRequest, SetClusterInferenceResponse, GetClusterInferenceResponse, and ResolvedRoute proto messages
  • Add timeout field (Duration) to the router's ResolvedRoute struct with a DEFAULT_ROUTE_TIMEOUT of 60 seconds
  • Remove the global reqwest::Client timeout; apply per-request .timeout(route.timeout) in backend.rs
  • Thread timeout_secs through server persistence (upsert_cluster_inference_route, build_cluster_inference_config, bundle resolution)
  • Map proto timeout_secs to router ResolvedRoute.timeout in the sandbox's bundle_to_resolved_routes()
  • Include timeout_secs in the bundle revision hash so timeout changes trigger route cache refreshes in running sandboxes
  • Add --timeout CLI flag to inference set (default 0 = 60s) and inference update (optional)
  • Update docs/inference/configure.md with timeout usage and hot-reload behavior
  • Update architecture/inference-routing.md with per-request timeout semantics, proto field additions, and CLI surface

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

@pentschev pentschev requested a review from a team as a code owner March 30, 2026 07:13
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 30, 2026

All contributors have signed the DCO ✍️ ✅
Posted by the DCO Assistant Lite bot.

@pentschev
Copy link
Copy Markdown
Author

I have read the DCO document and I hereby sign the DCO.

@pentschev
Copy link
Copy Markdown
Author

recheck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(proxy): 60s reqwest total timeout kills streaming inference responses mid-generation

1 participant