Skip to content

fix(lsp): Add crash detection and exponential backoff retry#9142

Open
sauerdaniel wants to merge 3 commits intoanomalyco:devfrom
sauerdaniel:pr/lsp-resilience
Open

fix(lsp): Add crash detection and exponential backoff retry#9142
sauerdaniel wants to merge 3 commits intoanomalyco:devfrom
sauerdaniel:pr/lsp-resilience

Conversation

@sauerdaniel
Copy link
Copy Markdown
Contributor

@sauerdaniel sauerdaniel commented Jan 17, 2026

Summary

Add exponential backoff retry logic for LSP servers that crash or fail to spawn, preventing permanent disabling of LSP functionality.

Fixes #13785

Problem

When an LSP server fails to spawn or crashes, it gets added to a broken set and is never retried during the session. This means if an LSP server temporarily fails (e.g., due to resource constraints), the user loses LSP features for the entire session.

Solution

Replace the simple broken: Set<string> with a Map that tracks:

  • failTime: When the failure occurred
  • attemptCount: Number of consecutive failures

Use exponential backoff (2s, 4s, 8s, ... up to 30s max) to retry failed servers instead of permanently disabling them.

Changes

  • packages/opencode/src/lsp/index.ts:
    • Change broken from Set<string> to Map<string, {failTime, attemptCount}>
    • Add isBrokenWithBackoff() function with exponential backoff calculation
    • Update failure tracking to increment attempt count
    • Reset attempt count on successful spawn

Testing

  • TypeScript compilation passes (bun turbo typecheck)
  • Unit tests pass (725 tests, 0 failures)
  • LSP client tests pass (3 tests)

Note: Manual LSP crash testing (killing LSP server and verifying retry) was not performed.

@github-actions
Copy link
Copy Markdown
Contributor

The following comment was made by an LLM, it may be inaccurate:

No duplicate PRs found

@sauerdaniel
Copy link
Copy Markdown
Contributor Author

Status update: implementation is complete and linked issue coverage remains intact (Fixes #13785).

Current blocker is CI only:

  • e2e (windows) fails with repeated NotFoundError
  • test (linux) then fails on the upstream gate because e2e is red

No new code changes pending on this PR right now.

When an LSP process crashes or exits unexpectedly, the client
remains in s.clients array forever, causing memory leaks and
preventing reconnection attempts.

Changes:
- Add exit event listener on LSP process handle
- Remove dead client from s.clients array on exit
- Mark server as broken for retry mechanism
- Log crash with exit code and signal
- Publish update event for state sync
Instead of permanently marking LSP servers as broken, use exponential
backoff to retry them. This allows recovery from transient failures
like resource exhaustion or temporary network issues.

Changes:
- Convert broken from Set<string> to Map with failTime and attemptCount
- Add isBrokenWithBackoff() to calculate retry delay using SessionRetry.delay()
- Clear broken state on successful connection
- Track attempt count across retries for proper backoff
Inline the backoff constants and calculation directly in the LSP module
to avoid a circular dependency chain:
lsp/index.ts -> session/retry.ts -> session/message-v2.ts -> lsp/index.ts

This was causing LSP.Range to be undefined when message-v2.ts tried to
reference it, breaking all tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LSP server crashes permanently disable language features for the session

1 participant