feat: enforce lifecycle.startup_timeout for MCP/LSP toolset startup#3373
Merged
Conversation
docker-agent
reviewed
Jul 1, 2026
docker-agent
left a comment
Contributor
There was a problem hiding this comment.
Assessment: 🟡 NEEDS ATTENTION
This PR correctly implements the startup-timeout enforcement mechanism for MCP/LSP toolset startup with a well-thought-out single-inflight-connect guarantee. Two findings worth the author's attention are filed as inline comments.
Race Connect against a timer so a server that hangs mid-handshake is bounded by startup_timeout instead of blocking indefinitely. The strict profile defaults to 30s; other profiles default to 0 (no timeout). Assisted-By: docker-agent
…imeout Keep at most one connector.Connect in flight via inflightConnect/pendingConnect. A timed-out Connect is adopted by the next Start or reaped by Stop, preventing a late session from clobbering/closing a newer shared session. Assisted-By: Claude <noreply@anthropic.com>
Addresses PR review feedback: - Launch the timed-out Connect goroutine with context.WithoutCancel so a later adopting Start does not receive a stale context.Canceled when the first caller's context is cancelled (matters for ctx-respecting connectors such as LSP). - Defensively close the adopted session if the connector returns both a session and an error, since Start discards the session on error.
b07e435 to
328b837
Compare
Member
Author
|
Addressed both review findings, rebased on
|
Sayt-0
approved these changes
Jul 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Enforces
lifecycle.startup_timeoutfor MCP/LSP toolset startup. Until now this config field (and itsstrict-profile 30s default) was parsed and validated but never acted on —EffectiveStartupTimeout()had no callers, so a server that hung mid-initializewould block startup indefinitely.Why
The MCP
clientConnector.Connectdeliberately detaches its context withcontext.WithoutCancel(the session must outlive the request that triggered it). A consequence is that the initialize handshake has no deadline of its own: a server that accepts the connection but never completesinitializewedges the startup attempt forever, andstartup_timeoutcould not rescue it because nothing consumed the value.How
StartupTimeouttolifecycle.Policyand wireEffectiveStartupTimeout()intoPolicyFromConfig.Supervisor.connect()by racingConnectagainst a timer rather than actxdeadline (a deadline would be stripped byWithoutCancel). On expiry,StartreturnsErrInitTimeoutand the toolset staysStoppedso the runtime retries on the next turn.Connectis ever in flight: a timed-outConnectgoroutine is left running and recorded ininflightConnect; the nextStartadopts it andStopreaps it. This is required because the MCP transport shares one underlying client acrossConnectcalls — two overlapping connects would race on the shared session (a late one could clobber/close a newer one). Anadoptedflag under the mutex ensures the resulting session is adopted-or-closed exactly once.Scope
startup_timeoutbounds only the initialStart, not background watcher reconnects (which already have their own restart/backoff budget).strict→ 30s default; other profiles → 0 (no timeout); explicit value always wins.Tests
New supervisor tests (all pass under
-race): timeout returnsErrInitTimeoutwith a singleConnect; the nextStartadopts a late-completing connect;Stopreaps and closes a late connect. NewPolicyFromConfigtest covers the nil/resilient/strict/explicit cases.