fix(core): add context-prep timeout and NoProviders backoff to agent loop#3373
Merged
fix(core): add context-prep timeout and NoProviders backoff to agent loop#3373
Conversation
4433a25 to
6ec2aff
Compare
…loop When all LLM providers fail, the agent was stalling inside advance_context_lifecycle for 14+ seconds (1006 embed calls against rate-limited backends) and then immediately retrying the same expensive path on every subsequent turn. - Wrap advance_context_lifecycle with tokio::time::timeout via the new advance_context_lifecycle_guarded helper; configurable via [timeouts] context_prep_timeout_secs (default 30 s) - After a NoProviders error, record the failure timestamp and sleep no_providers_backoff_secs (default 2 s); skip context prep on the next turn while still within the backoff window - Add AgentError::is_no_providers() predicate used by the backoff guard - Remove TaskSupervisor BlockingSpawner from CodeIndexer in agent_setup to prevent 971 concurrent chunk tasks from flooding the async worker pool during active agent turns Closes #3357
6ec2aff to
fa4f301
Compare
This was referenced Apr 24, 2026
fix(config): context_prep_timeout_secs and no_providers_backoff_secs missing from default.toml
#3377
Closed
bug-ops
added a commit
that referenced
this pull request
Apr 24, 2026
Add llm_request_timeout_secs (600 s), context_prep_timeout_secs (30 s), and no_providers_backoff_secs (2 s) to the [timeouts] section with descriptive comments. These fields were added in #3373 but omitted from the reference config, making them invisible to migrate-config --diff and to users reading the config file. Closes #3377
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
advance_context_lifecycle_guarded: wraps context preparation withtokio::time::timeout(default 30 s, configurable via[timeouts] context_prep_timeout_secs) to prevent 14+ second stalls when embed backends are rate-limited or unavailableNoProviderserror, record the failure timestamp, sleepno_providers_backoff_secs(default 2 s), and skip context prep on the next turn while still within the backoff windowAgentError::is_no_providers()predicate used by the backoff guardTaskSupervisorBlockingSpawnerattachment fromCodeIndexerinagent_setup.rsto prevent flooding the async worker pool with 971+ concurrent chunk tasks during active agent turnsRoot Cause (from investigation)
The "150+/s agent.turn" and "128-call prepare_context burst" reported in #3357 were async tracing artifacts (
#[tracing::instrument]emits B/E events at every tokio poll boundary). The real issue was a single turn stalling for 14 seconds inadvance_context_lifecycledue to 1006 embed calls against rate-limited/unavailable providers, compounded by 971 concurrent background indexer tasks saturating the tokio worker pool.Test plan
cargo nextest run --config-file .github/nextest.toml --workspace --lib --bins)TimeoutConfigdefaults/deserialization,LifecycleStatebackoff gate logic,is_no_providers()predicatecargo +nightly fmt --checkcleancargo clippy --workspace -- -D warningscleanCloses #3357