fix: peer disconnection#296
Merged
Merged
Conversation
Improve resilience across IPC workers, libp2p streams, and P2P broadcasts: - RPCSubWorkerManager: Retry sending to a healthy worker, handle IPC send errors (ERR_IPC_CHANNEL_CLOSED/ERR_IPC_DISCONNECTED), log when all workers are unavailable, and respawn workers on exit via replaceWorker. - RPCSubWorker: Exit on parent IPC disconnect to avoid zombie children, and guard/catch process.send failures to drop responses when the channel is closed. - P2PManager: Add a per-peer mempool broadcast timeout (8s) and wrap broadcasts in sendToPeerWithTimeout to avoid slow/blocked peers stalling broadcasts and to log failures/timeouts. - ReusableStream: Add a 5s drain timeout (DRAIN_TIMEOUT_MS) and race onDrain() against the timeout; close the stream and surface an error on timeout to avoid permanently stuck writers. - ReusableStreamManager: Add a 5s dial timeout when opening outbound streams using AbortSignal.timeout. These changes prevent hanging operations, resource leaks, and improve logging and automatic recovery in face of slow peers or broken IPC channels.
Add fatal error logging and shutdown handlers: introduce writeFatal to serialize error stacks to 'uncaught-exception.log' and write to stderr, and wire it into process.on handlers for 'uncaughtException' and 'unhandledRejection'. Replace the previous console.log usage with writeFatal and call process.exit(1) to ensure the process exits after recording the failure for post-mortem debugging.
Refactor peer fan-out and witness validation; add process fatal handlers. - PoC: Rework onBlockProcessed to update consensus height synchronously and chain witness fan-out behind a blockProcessedLock to preserve ordering while isolating failures. - P2PManager: Introduce runPeerOp to wrap per-peer operations with a hard timeout, unified error logging and isolation; replace direct peer calls with runPeerOp for mempool broadcast, witness requests and witness broadcasts. - BlockWitnessManager: Make block header signature validation async, add an RPC timeout when requesting block data, and yield between witness verifications to avoid blocking the event loop. - Globals: Install guarded process-wide uncaughtException/unhandledRejection handlers that append stack traces to a log file and write to stderr before exiting. - RustContract: Remove duplicated fatal handler/fs import now provided by Globals. These changes reduce the risk of slow or misbehaving peers blocking batch operations, improve responsiveness during witness validation, and ensure fatal errors are captured for post-mortem debugging.
Remove unnecessary await when delegating to a synchronous onBlockProcessed handler in PoC.ts and delete related explanatory comments about background chaining. Also cast StreamMessageEvent.data to Uint8Array in ReusableStream.ts to satisfy the handler's expected type. These are primarily cleanup/typing fixes with no behavioral change.
Add interface-datastore@^10.0.1 to package.json and adjust P2PManager.ts imports to bring Datastore in as a runtime import instead of a type-only import. This ensures the Datastore symbol is available where needed at runtime and avoids type-only import issues.
Make onBlockProcessed async and use blockProcessedLock to serialize height broadcasts and surface errors. Replaces the previous IIFE chaining with an explicit promise sequence: send WITNESS_HEIGHT_UPDATE via sendMessageToAllThreads (assigned to blockProcessedLock) and await it with error logging, then asynchronously dispatch WITNESS_BLOCK_PROCESSED to the witness thread with its own error handling. Also move p2p.updateConsensusHeight after the broadcast steps and simplify control flow.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Type of Change
Checklist
Build & Tests
npm installcompletes without errorsnpm run buildcompletes without errorsnpm testpasses all testsCode Quality
Documentation
Security
OP_NET Node Specific
Testing
Consensus Impact
Related Issues
By submitting this PR, I confirm that my contribution is made under the terms of the project's license.