generated from MetaMask/metamask-module-template
-
Notifications
You must be signed in to change notification settings - Fork 6
feat(ocap-kernel): add permanent failure detection for reconnection #789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
sirtimid
wants to merge
11
commits into
main
Choose a base branch
from
sirtimid/permanent-failure-detection
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add error pattern tracking to ReconnectionManager to detect when a peer is permanently unreachable. When the same error code (ECONNREFUSED, EHOSTUNREACH, ENOTFOUND, or ENETUNREACH) occurs consecutively N times (default 5), the peer is marked as permanently failed and reconnection attempts stop. Changes: - Add error history tracking per peer in ReconnectionManager - Add isPermanentlyFailed() and clearPermanentFailure() methods - Add getNetworkErrorCode() helper to extract error codes - Integrate error recording into reconnection lifecycle - Check permanent failure status before attempting reconnection Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add comprehensive unit tests for: - getNetworkErrorCode helper function - Error tracking in ReconnectionManager (recordError, getErrorHistory) - Permanent failure detection (isPermanentlyFailed) - Clearing permanent failure state (clearPermanentFailure) - Custom consecutive error threshold - Integration with startReconnection, clearPeer, and clear Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add integration tests for permanent failure detection in the reconnection lifecycle: - Gives up when peer is permanently failed at start of loop - Records errors after failed dial attempts - Gives up when error triggers permanent failure - Continues retrying when error does not trigger failure - handleConnectionLoss skips reconnection for permanently failed peers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update existing tests to work with permanent failure detection changes: - Add getNetworkErrorCode export to index test - Add getNetworkErrorCode mock to transport tests - Update startReconnection mocks to return true - Add isPermanentlyFailed and recordError mocks to ReconnectionManager Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Contributor
Coverage Report
File Coverage
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Fix issues identified in code review: - Add bounds validation for consecutiveErrorThreshold (must be >= 1) - Cap error history to prevent unbounded memory growth The error history is now limited to the threshold size since we only need the last N errors for pattern detection. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When a user explicitly calls reconnectPeer, clear the permanent failure status so the reconnection can proceed. Previously, permanently failed peers could not be manually reconnected because startReconnection would return false without attempting any connection. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Export both getNetworkErrorCode and isResourceLimitError from kernel-errors - Handle rate limit and connection limit errors before permanent failure check - Add both mocks to transport tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…rogress Move isReconnecting() check before clearPermanentFailure() in reconnectPeer to prevent resetting error history during active reconnection attempts. Previously, calling reconnectPeer while reconnection was in progress would clear the accumulated error history, defeating the purpose of permanent failure detection by resetting progress toward the failure threshold. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Include ENOTFOUND (DNS lookup failed) in isRetryableNetworkError to enable permanent failure detection for this error code. Previously, ENOTFOUND was in PERMANENT_FAILURE_ERROR_CODES but not retryable, causing immediate give-up after the first error without allowing errors to accumulate toward the permanent failure threshold. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
resetBackoff() and resetAllBackoffs() now clear error history in addition to resetting attempt counts. This prevents stale errors from accumulating and triggering false permanent failure detection. Previously, if a peer had 4 ECONNREFUSED errors, then successfully connected, then had 1 more ECONNREFUSED error, it would be marked as permanently failed (4+1=5). Now the error history is cleared on successful communication, so only consecutive errors without intervening successes trigger permanent failure. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #688
Summary
ReconnectionManagerto detect permanently unreachable peersChanges
kernel-errors:
getNetworkErrorCode()helper to extract error codes from errorsocap-kernel:
ReconnectionStatewitherrorHistoryandpermanentlyFailedfieldsrecordError(),isPermanentlyFailed(),clearPermanentFailure()methods toReconnectionManagerstartReconnection()to returnfalsefor permanently failed peers and reset error historyTest plan
getNetworkErrorCodehelperReconnectionManager🤖 Generated with Claude Code
Note
Medium Risk
Changes core reconnection control flow by adding stateful error-pattern tracking and an early-exit path that can stop retries; misclassification or integration bugs could cause peers to be marked failed and never reconnect until manually cleared.
Overview
Adds permanent-failure detection to remote reconnection:
ReconnectionManagernow tracks per-peererrorHistory(capped) and marks peerspermanentlyFailedafter N consecutive identical errors from a configured set, preventing further automatic reconnection.Integrates this into
reconnection-lifecycleby extracting an error code via newkernel-errorshelpergetNetworkErrorCode, recording it on failures, and giving up immediately when a peer becomes permanently failed (including whenstartReconnectionnow returnsfalse). ManualreconnectPeernow clears permanent-failure state before retrying, andisRetryableNetworkErroraddsENOTFOUNDto the retryable set; tests are expanded/updated acrosskernel-errors, lifecycle, transport, and reconnection manager.Written by Cursor Bugbot for commit 7c82f74. This will update automatically on new commits. Configure here.