Skip to content

fix: peer disconnection#296

Merged
BlobMaster41 merged 9 commits into
mainfrom
fix/p2p-crash
May 12, 2026
Merged

fix: peer disconnection#296
BlobMaster41 merged 9 commits into
mainfrom
fix/p2p-crash

Conversation

@BlobMaster41
Copy link
Copy Markdown
Contributor

@BlobMaster41 BlobMaster41 commented May 10, 2026

Description

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Performance improvement
  • Consensus change (changes that affect state calculation or validation)
  • Refactoring (no functional changes)
  • Documentation update
  • CI/CD changes
  • Dependencies update

Checklist

Build & Tests

  • npm install completes without errors
  • npm run build completes without errors
  • npm test passes all tests

Code Quality

  • Code follows the project's coding standards
  • No new compiler warnings introduced
  • Error handling is appropriate
  • Logging is appropriate for debugging and monitoring

Documentation

  • Code comments added for complex logic
  • Public APIs are documented
  • README updated (if applicable)

Security

  • No sensitive data (keys, credentials) committed
  • No new security vulnerabilities introduced
  • RPC endpoints properly authenticated
  • Input validation in place for external data

OP_NET Node Specific

  • Changes are compatible with existing network state
  • Consensus logic changes are documented and tested
  • State transitions are deterministic
  • WASM VM execution is reproducible across nodes
  • P2P protocol changes are backward-compatible (or migration planned)
  • Database schema changes include migration path
  • Epoch finality and PoC/PoW logic unchanged (or documented if changed)

Testing

Consensus Impact

Related Issues


By submitting this PR, I confirm that my contribution is made under the terms of the project's license.

Improve resilience across IPC workers, libp2p streams, and P2P broadcasts:

- RPCSubWorkerManager: Retry sending to a healthy worker, handle IPC send errors (ERR_IPC_CHANNEL_CLOSED/ERR_IPC_DISCONNECTED), log when all workers are unavailable, and respawn workers on exit via replaceWorker.
- RPCSubWorker: Exit on parent IPC disconnect to avoid zombie children, and guard/catch process.send failures to drop responses when the channel is closed.
- P2PManager: Add a per-peer mempool broadcast timeout (8s) and wrap broadcasts in sendToPeerWithTimeout to avoid slow/blocked peers stalling broadcasts and to log failures/timeouts.
- ReusableStream: Add a 5s drain timeout (DRAIN_TIMEOUT_MS) and race onDrain() against the timeout; close the stream and surface an error on timeout to avoid permanently stuck writers.
- ReusableStreamManager: Add a 5s dial timeout when opening outbound streams using AbortSignal.timeout.

These changes prevent hanging operations, resource leaks, and improve logging and automatic recovery in face of slow peers or broken IPC channels.
Add fatal error logging and shutdown handlers: introduce writeFatal to serialize error stacks to 'uncaught-exception.log' and write to stderr, and wire it into process.on handlers for 'uncaughtException' and 'unhandledRejection'. Replace the previous console.log usage with writeFatal and call process.exit(1) to ensure the process exits after recording the failure for post-mortem debugging.
@BlobMaster41 BlobMaster41 added the bug Something isn't working label May 10, 2026
Refactor peer fan-out and witness validation; add process fatal handlers.

- PoC: Rework onBlockProcessed to update consensus height synchronously and chain witness fan-out behind a blockProcessedLock to preserve ordering while isolating failures.
- P2PManager: Introduce runPeerOp to wrap per-peer operations with a hard timeout, unified error logging and isolation; replace direct peer calls with runPeerOp for mempool broadcast, witness requests and witness broadcasts.
- BlockWitnessManager: Make block header signature validation async, add an RPC timeout when requesting block data, and yield between witness verifications to avoid blocking the event loop.
- Globals: Install guarded process-wide uncaughtException/unhandledRejection handlers that append stack traces to a log file and write to stderr before exiting.
- RustContract: Remove duplicated fatal handler/fs import now provided by Globals.

These changes reduce the risk of slow or misbehaving peers blocking batch operations, improve responsiveness during witness validation, and ensure fatal errors are captured for post-mortem debugging.
Remove unnecessary await when delegating to a synchronous onBlockProcessed handler in PoC.ts and delete related explanatory comments about background chaining. Also cast StreamMessageEvent.data to Uint8Array in ReusableStream.ts to satisfy the handler's expected type. These are primarily cleanup/typing fixes with no behavioral change.
Add interface-datastore@^10.0.1 to package.json and adjust P2PManager.ts imports to bring Datastore in as a runtime import instead of a type-only import. This ensures the Datastore symbol is available where needed at runtime and avoids type-only import issues.
Make onBlockProcessed async and use blockProcessedLock to serialize height broadcasts and surface errors. Replaces the previous IIFE chaining with an explicit promise sequence: send WITNESS_HEIGHT_UPDATE via sendMessageToAllThreads (assigned to blockProcessedLock) and await it with error logging, then asynchronously dispatch WITNESS_BLOCK_PROCESSED to the witness thread with its own error handling. Also move p2p.updateConsensusHeight after the broadcast steps and simplify control flow.
@BlobMaster41 BlobMaster41 merged commit b5d5a69 into main May 12, 2026
6 checks passed
@BlobMaster41 BlobMaster41 deleted the fix/p2p-crash branch May 12, 2026 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant