refactor: async Handshake network support #434

agaffney · 2025-11-21T23:23:22Z

Summary by cubic

Refactors the peer protocol to use an async receive loop with per-message channels, enabling concurrent requests with clear timeouts and better error propagation. Improves connection lifecycle safety and prevents peer reuse after disconnect.

Refactors
- Introduced recvLoop and per-message channels (handshake, headers, block, addr, proof); replaced blocking receiveMessage with a dispatcher.
- Added setupConnection, mutex, and hasConnected guard; disallow reuse after disconnect.
- Exposed ErrorChan for async errors; added timeouts (1s for handshake, 5s for requests); synchronized sendMessage.
- Close now clears the connection; the error channel remains open.

^{Written for commit b77012e. Summary will update automatically on new commits.}

Summary by CodeRabbit

Bug Fixes
- Improved connection state handling, error propagation, timeouts, and cleanup for more reliable peer interactions.
Refactor
- Switched peer communication to an asynchronous, event-driven model with per-message routing and synchronized connection lifecycle for better responsiveness and stability.
New Features
- Added a surfaced error channel and explicit connection setup/teardown for safer reuse and clearer failure signals.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-21T23:23:32Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Replaces the synchronous receive flow in internal/handshake/protocol/peer.go with an asynchronous, channel-driven model. Adds setupConnection that initializes per-message channels (handshakeCh, headersCh, blockCh, addrCh, proofCh), errorCh, doneCh, and starts recvLoop. Introduces recvLoop and handleMessage to decode and route incoming messages to the appropriate channels or errorCh. Adds mutexes (mu, sendMu), hasConnected flag, and a serialized sendMessage path. Reworks handshake, GetPeers, GetHeaders, GetProof, and GetBlock to wait on channels with timeouts and propagate errors via errorCh. Adds ErrorChan() accessor and a close path that cleans up channels/state.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–60 minutes

Verify recvLoop decoding and handleMessage routing: message-type matching, non-blocking sends to per-type channels, and proper handling when channels are closed.
Audit synchronization: mu, sendMu, hasConnected usage across setupConnection, sendMessage, Close, and potential double-close or reuse races.
Check timeouts and error propagation: handshake and Get* methods must correctly select on their channels, errorCh, and doneCh to avoid hangs or lost errors.
Ensure channels and goroutines are cleaned up exactly once and no goroutine leaks occur on error paths.

Possibly related PRs

feat: support for fetching Handshake peers #423: Modifies MsgGetAddr/MsgAddr and GetPeers handling in the same peer message flow; strongly related to the new per-message channel routing.
feat: initial handshake network peer support #422: Prior refactor of Peer handshake/message handling in internal/handshake/protocol/peer.go that this change builds upon.

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the primary change: refactoring the peer protocol from synchronous to asynchronous message handling with improved connection lifecycle management.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch refactor/handshake-async

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (4)

internal/handshake/protocol/peer.go (4)
54-54: Remove redundant assignment.

Line 54 assigns p.conn = conn, but conn is already assigned to p.conn during struct initialization on line 49.

Apply this diff:
-	p.conn = conn
 	p.address = conn.RemoteAddr().String()
148-189: Consider adding graceful shutdown mechanism for recvLoop.

recvLoop runs indefinitely until an error occurs, with no way to gracefully stop the goroutine without closing the underlying connection. Consider adding a context or done channel to allow controlled shutdown, which would prevent goroutine leaks and enable cleaner testing.

Example approach using a done channel:
 type Peer struct {
 	address      string
 	conn         net.Conn
 	networkMagic uint32
 	mu           sync.Mutex
 	hasConnected bool
+	doneCh       chan struct{}
 	errorCh      chan error
 func (p *Peer) recvLoop() {
 	err := func() error {
 		for {
+			select {
+			case <-p.doneCh:
+				return nil
+			default:
+			}
 			headerBuf := make([]byte, messageHeaderLength)
 func (p *Peer) Close() error {
 	p.mu.Lock()
 	defer p.mu.Unlock()
 	if p.conn == nil {
 		return errors.New("connection is not established")
 	}
+	close(p.doneCh)
 	if err := p.conn.Close(); err != nil {
 		return err
 	}
51-51: Document rationale for channel buffer sizes.

The buffer sizes for errorCh (5) and message channels (10) appear arbitrary. Consider documenting the rationale for these sizes or making them configurable if they need to be tuned for different workloads.

Also applies to: 114-118

148-189: Close message channels when recvLoop exits.

The message channels (handshakeCh, headersCh, blockCh, addrCh, proofCh) created in setupConnection are never closed. While they'll be garbage collected, it's better practice to close them when recvLoop exits to signal consumers that no more messages will arrive.

Apply this diff:
 func (p *Peer) recvLoop() {
+	defer func() {
+		close(p.handshakeCh)
+		close(p.headersCh)
+		close(p.blockCh)
+		close(p.addrCh)
+		close(p.proofCh)
+	}()
 	err := func() error {
Note: Consumers would need to be updated to handle closed channels gracefully.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 764f347 and 69d895f.

📒 Files selected for processing (1)

internal/handshake/protocol/peer.go (10 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

internal/handshake/protocol/peer.go (3)

internal/handshake/protocol/messages.go (10)

Message (51-54)

UnsupportedMessageTypeError (43-45)

MsgVersion (183-192)

MsgVerack (291-291)

MsgVerack (293-296)

MsgVerack (298-301)

MsgAddr (319-321)

MsgHeaders (422-424)

MsgBlock (460-462)

MsgProof (499-503)

internal/handshake/spv.go (1)

Proof (63-74)

internal/handshake/block.go (1)

Block (22-25)

🔇 Additional comments (1)

internal/handshake/protocol/peer.go (1)

238-263: Good error handling in handshake.

The refactored handshake properly checks errorCh in both select statements (lines 243, 254), enabling fast failure when recvLoop encounters errors. This is the correct pattern that should be applied to the request methods as well.

internal/handshake/protocol/peer.go

agaffney · 2025-11-21T23:42:45Z

@coderabbitai please review this

coderabbitai · 2025-11-21T23:43:02Z

@agaffney I'll review the changes in this PR. This looks like a significant refactoring to introduce asynchronous message handling with proper concurrency control and timeout management for the Handshake network protocol.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

agaffney · 2025-11-21T23:44:30Z

@cubic-dev-ai please review this

cubic-dev-ai · 2025-11-21T23:44:48Z

@cubic-dev-ai please review this

@agaffney I've started the AI code review. It'll take a few minutes to complete.

cubic-dev-ai

2 issues found across 1 file

Prompt for AI agents (all 2 issues)


Understand the root cause of the following 2 issues and fix them.


<file name="internal/handshake/protocol/peer.go">

<violation number="1" location="internal/handshake/protocol/peer.go:102">
Close now nils `p.conn` while the recvLoop goroutine still dereferences it, so closing a peer mid-stream can panic when the loop tries to read from a nil connection.</violation>

<violation number="2" location="internal/handshake/protocol/peer.go:207">
Unhandled but valid protocol messages (e.g., MsgPing) now trigger `handleMessage`’s default error, causing recvLoop to close the connection whenever a peer sends a keepalive or other unlisted message.</violation>
</file>

_{Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR}

internal/handshake/protocol/peer.go

cubic-dev-ai · 2025-11-21T23:57:42Z

internal/handshake/protocol/peer.go

 	if err := p.conn.Close(); err != nil {
 		return err
 	}
+	p.conn = nil


Close now nils p.conn while the recvLoop goroutine still dereferences it, so closing a peer mid-stream can panic when the loop tries to read from a nil connection.

Prompt for AI agents

Address the following comment on internal/handshake/protocol/peer.go at line 102: <comment>Close now nils `p.conn` while the recvLoop goroutine still dereferences it, so closing a peer mid-stream can panic when the loop tries to read from a nil connection.</comment> <file context> @@ -51,34 +67,69 @@ func NewPeer(conn net.Conn, networkMagic uint32) (*Peer, error) { if err := p.conn.Close(); err != nil { return err } + p.conn = nil + close(p.doneCh) + return nil </file context>

Signed-off-by: Aurora Gaffney <aurora@blinklabs.io>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

internal/handshake/protocol/peer.go (2)
196-212: Major: Unhandled protocol messages cause disconnect.

The default case on line 209 returns an error for any message type not explicitly handled. Since recvLoop closes the connection on any handleMessage error (line 192), valid protocol messages like MsgPing (keepalive) will disconnect the peer.

Apply this diff to ignore unhandled messages instead of disconnecting:
 func (p *Peer) handleMessage(msg Message) error {
 	switch msg.(type) {
 	case *MsgVersion, *MsgVerack:
 		p.handshakeCh <- msg
 	case *MsgAddr:
 		p.addrCh <- msg
 	case *MsgHeaders:
 		p.headersCh <- msg
 	case *MsgBlock:
 		p.blockCh <- msg
 	case *MsgProof:
 		p.proofCh <- msg
 	default:
-		return fmt.Errorf("unknown message type: %T", msg)
+		// Ignore unhandled message types (e.g., MsgPing keepalives)
+		return nil
 	}
 	return nil
 }
Based on learnings.

151-156: Critical: Data race on p.conn access.

Line 155 reads p.conn without holding p.mu, but Close() writes p.conn = nil on line 102 while holding the mutex. This is a data race that the Go race detector will flag.

The local variable assignment prevents a nil pointer dereference but does not prevent the race itself.

Consider one of these solutions:

Solution 1: Read connection under mutex protection
 func (p *Peer) recvLoop() {
 	err := func() error {
 		// Assign to local var to avoid nil deref panic on shutdown
-		conn := p.conn
+		p.mu.Lock()
+		conn := p.conn
+		p.mu.Unlock()
+		if conn == nil {
+			return errors.New("connection is nil")
+		}
 		for {
Solution 2: Use atomic.Value for p.conn
Store p.conn in an atomic.Value and use atomic loads/stores throughout.

Solution 3: Rely on doneCh for clean shutdown
 func (p *Peer) recvLoop() {
 	err := func() error {
-		// Assign to local var to avoid nil deref panic on shutdown
-		conn := p.conn
 		for {
+			select {
+			case <-p.doneCh:
+				return nil
+			default:
+			}
+			p.mu.Lock()
+			conn := p.conn
+			p.mu.Unlock()
+			if conn == nil {
+				return errors.New("connection closed")
+			}
 			headerBuf := make([]byte, messageHeaderLength)
(Though this adds overhead to every iteration.)

🧹 Nitpick comments (2)

internal/handshake/protocol/peer.go (2)
241-250: Recommend checking doneCh for consistency.

The first select statement waits for Verack but doesn't check doneCh, while the second select (lines 252-263) does. If the connection closes during the first wait, the handshake won't detect it until timeout (1 second).

Apply this diff for consistent shutdown detection:
 	// Wait for Verack response
 	select {
 	case msg := <-p.handshakeCh:
 		if _, ok := msg.(*MsgVerack); !ok {
 			return fmt.Errorf("unexpected message: %T", msg)
 		}
 	case err := <-p.errorCh:
 		return fmt.Errorf("handshake failed: %w", err)
+	case <-p.doneCh:
+		return errors.New("connection has shut down")
 	case <-time.After(1 * time.Second):
 		return errors.New("handshake timed out")
 	}
277-288: Recommend checking errorCh for more informative error messages.

When recvLoop encounters an error, it sends to errorCh then calls Close() (which closes doneCh). This method checks doneCh and returns "connection has shut down", but doesn't check errorCh, so the caller doesn't get the underlying error (e.g., decode failure, network error) without separately monitoring ErrorChan().

Apply this diff to provide more informative errors:
 	// Wait for Addr response
 	select {
 	case msg := <-p.addrCh:
 		msgAddr, ok := msg.(*MsgAddr)
 		if !ok {
 			return nil, fmt.Errorf("unexpected message: %T", msg)
 		}
 		return msgAddr.Peers, nil
+	case err := <-p.errorCh:
+		return nil, fmt.Errorf("receive error: %w", err)
 	case <-p.doneCh:
 		return nil, errors.New("connection has shut down")
 	case <-time.After(5 * time.Second):
 		return nil, errors.New("timed out")
 	}
Apply similar changes to GetHeaders (lines 301-312), GetProof (lines 326-337), and GetBlock (lines 354-365).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 69d895f and b77012e.

📒 Files selected for processing (1)

internal/handshake/protocol/peer.go (10 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

internal/handshake/protocol/peer.go (1)

internal/handshake/protocol/messages.go (10)

Message (51-54)

UnsupportedMessageTypeError (43-45)

MsgVersion (183-192)

MsgVerack (291-291)

MsgVerack (293-296)

MsgVerack (298-301)

MsgAddr (319-321)

MsgHeaders (422-424)

MsgBlock (460-462)

MsgProof (499-503)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Analyze (go)

🔇 Additional comments (5)

internal/handshake/protocol/peer.go (5)

35-44: LGTM: Well-designed concurrency primitives.

The separation of mu (state) and sendMu (send serialization), lifecycle channels (doneCh, errorCh), and per-message-type channels provides a solid foundation for asynchronous message handling.

53-54: LGTM: Appropriate channel initialization.

Unbuffered doneCh for synchronization and buffered errorCh (size 5) for async error reporting are correctly sized for this use case.

112-127: LGTM: Correct initialization order.

Channels are created before recvLoop starts, preventing nil channel panics. The handshake correctly waits on channels populated by the receive loop.

93-105: LGTM: Proper close semantics with shutdown signaling.

The close logic correctly:

Guards against double-close via the p.conn == nil check

Signals shutdown via doneCh to unblock waiting operations

Keeps errorCh open for the lifetime of the peer (per maintainer design)

Note: The assignment p.conn = nil on line 102 creates a race with recvLoop reading p.conn on line 155, which is flagged separately.

130-149: LGTM: Serialized send path.

Using a dedicated sendMu to serialize concurrent sends is a good practice that avoids holding the main state mutex during I/O operations.

agaffney · 2025-11-22T20:30:50Z

@cubic-dev-ai please review this

cubic-dev-ai · 2025-11-22T20:31:06Z

@cubic-dev-ai please review this

@agaffney I've started the AI code review. It'll take a few minutes to complete.

cubic-dev-ai

1 issue found across 1 file

Prompt for AI agents (all 1 issues)


Understand the root cause of the following 1 issues and fix them.


<file name="internal/handshake/protocol/peer.go">

<violation number="1" location="internal/handshake/protocol/peer.go:209">
Receiving valid request messages (e.g., getaddr/getheaders/getdata/getproof) now triggers handleMessage’s default error, and recvLoop closes the connection as soon as it sees those messages. This causes unnecessary disconnects from compliant peers instead of ignoring or responding to the requests.</violation>
</file>

_{Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR}

cubic-dev-ai · 2025-11-22T20:45:47Z

internal/handshake/protocol/peer.go

+	case *MsgProof:
+		p.proofCh <- msg
+	default:
+		return fmt.Errorf("unknown message type: %T", msg)


Receiving valid request messages (e.g., getaddr/getheaders/getdata/getproof) now triggers handleMessage’s default error, and recvLoop closes the connection as soon as it sees those messages. This causes unnecessary disconnects from compliant peers instead of ignoring or responding to the requests.

Prompt for AI agents

Address the following comment on internal/handshake/protocol/peer.go at line 209: <comment>Receiving valid request messages (e.g., getaddr/getheaders/getdata/getproof) now triggers handleMessage’s default error, and recvLoop closes the connection as soon as it sees those messages. This causes unnecessary disconnects from compliant peers instead of ignoring or responding to the requests.</comment> <file context> @@ -97,37 +148,67 @@ func (p *Peer) sendMessage(msgType uint8, msgPayload Message) error { + case *MsgProof: + p.proofCh <- msg + default: + return fmt.Errorf("unknown message type: %T", msg) + } + return nil </file context>

agaffney requested a review from a team as a code owner November 21, 2025 23:23

agaffney force-pushed the refactor/handshake-async branch from d2c4316 to 69d895f Compare November 21, 2025 23:25

coderabbitai bot reviewed Nov 21, 2025

View reviewed changes

agaffney force-pushed the refactor/handshake-async branch 3 times, most recently from ddcf0e7 to e1bc253 Compare November 21, 2025 23:39

cubic-dev-ai bot reviewed Nov 21, 2025

View reviewed changes

refactor: async Handshake network support

b77012e

Signed-off-by: Aurora Gaffney <aurora@blinklabs.io>

agaffney force-pushed the refactor/handshake-async branch from e1bc253 to b77012e Compare November 22, 2025 20:18

coderabbitai bot reviewed Nov 22, 2025

View reviewed changes

cubic-dev-ai bot reviewed Nov 22, 2025

View reviewed changes

wolf31o2 approved these changes Nov 23, 2025

View reviewed changes

agaffney merged commit ed6a7ec into main Nov 23, 2025
12 checks passed

agaffney deleted the refactor/handshake-async branch November 23, 2025 02:57

agaffney linked an issue Nov 23, 2025 that may be closed by this pull request

Implement block synchronization of Handshake data #260

Closed

coderabbitai bot mentioned this pull request Nov 26, 2025

feat: initial Handshake indexer support #447

Merged

refactor: async Handshake network support #434

refactor: async Handshake network support #434

Uh oh!

Conversation

agaffney commented Nov 21, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Estimated code review effort

Possibly related PRs

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

agaffney commented Nov 21, 2025

Uh oh!

coderabbitai bot commented Nov 21, 2025

Uh oh!

agaffney commented Nov 21, 2025

Uh oh!

cubic-dev-ai bot commented Nov 21, 2025

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai bot Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

agaffney commented Nov 22, 2025

Uh oh!

cubic-dev-ai bot commented Nov 22, 2025

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

agaffney commented Nov 21, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 21, 2025 •

edited

Loading

cubic-dev-ai bot Nov 21, 2025 •

edited

Loading

cubic-dev-ai bot Nov 22, 2025 •

edited

Loading