Skip to content

Conversation

@wolf31o2
Copy link
Member

@wolf31o2 wolf31o2 commented Nov 13, 2025

Summary by CodeRabbit

  • New Features

    • Added explicit lifecycle control methods to multiple protocol clients and servers, enabling graceful shutdown and cleanup.
    • Improved concurrent access handling with enhanced state synchronization across protocol implementations.
  • Bug Fixes

    • Enhanced robustness of protocol shutdown sequences to prevent race conditions and resource leaks.
    • Resolved issues with concurrent start/stop operations that could cause deadlocks.
  • Tests

    • Added comprehensive shutdown behavior tests across protocol implementations.
    • Introduced concurrency tests to verify safe handling of simultaneous operations.

@wolf31o2 wolf31o2 requested a review from a team as a code owner November 13, 2025 17:31
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 13, 2025

📝 Walkthrough

Walkthrough

This pull request introduces comprehensive lifecycle management and graceful shutdown synchronization across multiple protocol implementations in the Ouroboros network library. The changes add explicit Stop() methods, lifecycle state tracking (started/stopped flags), and shutdown-aware channel operations to client and server implementations for blockfetch, chainsync, leiosnotify, localtxmonitor, localtxsubmission, txsubmission, keepalive, leiosfetch, peersharing, and localstatequery protocols. The core protocol.go Stop() signature is updated to return an error, and all result-sending pathways are retrofitted with non-blocking selects that respect protocol shutdown signals. Supporting test files add shutdown and concurrency test cases to verify lifecycle correctness.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

  • Synchronization logic: New mutex and atomic boolean usage across multiple files; verify no race conditions in started/stopped state transitions and channel closure timing
  • Shutdown coordination: Deferred channel closures and shutdown-aware sends in handleXxx methods; ensure DoneChan signals are properly checked before all channel operations
  • Error propagation in Stop(): Protocol.Stop() now returns error; review all call sites to ensure errors are appropriately handled or logged
  • Lifecycle state machines: Changes to handleDone in server implementations (blockfetch, chainsync, leiosfetch, leiosnotify, peersharing) that coordinate protocol restart; verify TOCTOU (time-of-check-time-of-use) safety
  • Test coverage: New concurrency tests (concurrent start/stop, stop-before-start scenarios); verify they exercise edge cases and don't introduce new goroutine leaks
  • TxSubmission server changes: Significant restructuring with restart flow, atomic ackCount, and guarded result channels; requires careful inspection of the RequestTxIds/RequestTxs paths and done signaling

Possibly related PRs

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The PR title 'perf(protocols): improve performance across all protocols' is vague and generic, failing to convey meaningful information about the actual changes. The title uses overly broad language ('all protocols') without describing the specific improvement. Review the changes and provide a more specific title that clarifies what lifecycle management or synchronization improvements were made.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch perf/protocols-improvements

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
protocol/txsubmission/server.go (1)

252-257: Add synchronization to prevent race condition when restarting protocol.

The concern is valid. RequestTxIds() (line 143) and RequestTxs() (line 179) both read from channels without any state checks or synchronization. When handleDone() calls Stop() at line 252, it closes these channels at lines 93-94. Between closing and recreation at lines 254-255, any call to these methods would read from a closed channel.

A mutex protecting the Stop/reinit/Start sequence and guarding the RequestTxIds/RequestTxs channel reads is necessary. Alternatively, ensure the protocol state machine prevents these methods from being called during the transition window.

🧹 Nitpick comments (1)
protocol/localtxmonitor/client_test.go (1)

300-352: LGTM! Shutdown test validates explicit cleanup.

The test properly verifies the client lifecycle with goroutine leak detection and timeout handling. It's consistent with shutdown tests added to other protocol implementations in this PR.

Minor observation: This test duplicates some setup logic from the existing runTest helper (lines 50-106). However, maintaining consistency with other shutdown tests in this PR is more valuable than eliminating this duplication.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b93f5f3 and bb59f79.

📒 Files selected for processing (12)
  • protocol/blockfetch/blockfetch.go (1 hunks)
  • protocol/blockfetch/client.go (1 hunks)
  • protocol/blockfetch/client_test.go (1 hunks)
  • protocol/chainsync/chainsync.go (1 hunks)
  • protocol/chainsync/client.go (1 hunks)
  • protocol/chainsync/client_test.go (2 hunks)
  • protocol/leiosnotify/client.go (1 hunks)
  • protocol/localtxmonitor/client.go (1 hunks)
  • protocol/localtxmonitor/client_test.go (1 hunks)
  • protocol/localtxsubmission/client.go (1 hunks)
  • protocol/localtxsubmission/client_test.go (1 hunks)
  • protocol/txsubmission/server.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
protocol/localtxsubmission/client_test.go (7)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-264)
protocol/chainsync/client_test.go (1)
  • TestClientShutdown (283-336)
protocol/localtxmonitor/client_test.go (1)
  • TestClientShutdown (300-352)
connection.go (3)
  • NewConnection (107-130)
  • Connection (59-103)
  • New (133-135)
connection_options.go (2)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
protocol/localtxsubmission/localtxsubmission.go (1)
  • LocalTxSubmission (67-70)
protocol/localtxsubmission/client.go (1)
  • Client (26-34)
protocol/blockfetch/client_test.go (7)
protocol/chainsync/client_test.go (1)
  • TestClientShutdown (283-336)
protocol/localtxmonitor/client_test.go (1)
  • TestClientShutdown (300-352)
protocol/localtxsubmission/client_test.go (1)
  • TestClientShutdown (167-219)
connection.go (3)
  • NewConnection (107-130)
  • Connection (59-103)
  • New (133-135)
protocol/blockfetch/blockfetch.go (2)
  • New (156-162)
  • BlockFetch (102-105)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/blockfetch/client.go (1)
  • Client (29-39)
protocol/localtxmonitor/client_test.go (6)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-264)
protocol/chainsync/client_test.go (1)
  • TestClientShutdown (283-336)
protocol/localtxsubmission/client_test.go (1)
  • TestClientShutdown (167-219)
connection.go (3)
  • NewConnection (107-130)
  • Connection (59-103)
  • New (133-135)
connection_options.go (2)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
protocol/localtxmonitor/client.go (1)
  • Client (25-38)
protocol/txsubmission/server.go (3)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/txsubmission/txsubmission.go (1)
  • ProtocolName (27-27)
connection/id.go (1)
  • ConnectionId (22-25)
protocol/chainsync/client_test.go (4)
protocol/chainsync/chainsync.go (2)
  • ChainSync (204-207)
  • New (259-267)
protocol/chainsync/client.go (1)
  • Client (29-45)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-264)
connection.go (3)
  • NewConnection (107-130)
  • Connection (59-103)
  • New (133-135)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (go)
🔇 Additional comments (12)
protocol/blockfetch/blockfetch.go (1)

122-122: LGTM: Default queue size increase improves buffering capacity.

The increase from 256 to 384 aligns with the PR objective of improving performance through better buffering, while remaining well within the maximum limit of 512.

protocol/localtxsubmission/client_test.go (1)

167-219: LGTM: Shutdown test follows established patterns.

The test properly verifies client lifecycle management and shutdown behavior, consistent with similar tests in other protocol clients.

protocol/chainsync/chainsync.go (1)

226-227: LGTM: Default increases improve pipeline capacity and buffering.

Increasing both DefaultPipelineLimit and DefaultRecvQueueSize from 50 to 75 enhances throughput while staying within protocol limits.

protocol/chainsync/client.go (1)

146-146: LGTM: Explicit channel cleanup in Stop() simplifies lifecycle.

Moving channel closure from an implicit shutdown goroutine to the explicit Stop() method makes resource cleanup deterministic and reduces goroutine overhead.

protocol/leiosnotify/client.go (1)

93-93: LGTM: Explicit cleanup consistent with lifecycle pattern.

Closing requestNextChan in Stop() ensures blocked RequestNext() calls terminate gracefully with ErrProtocolShuttingDown.

protocol/blockfetch/client.go (1)

112-113: LGTM: Dual channel cleanup ensures complete shutdown.

Explicitly closing both blockChan and startBatchResultChan in Stop() ensures any blocked GetBlock() or GetBlockRange() calls terminate properly.

protocol/chainsync/client_test.go (2)

83-86: LGTM: Cleanup prevents goroutine leaks in tests.

Adding explicit client Stop() in runTest ensures proper cleanup after each test, preventing potential goroutine leaks.


283-336: LGTM: Comprehensive shutdown test validates lifecycle.

The test properly verifies the client can be started and stopped without errors or goroutine leaks, consistent with similar tests across other protocols.

protocol/localtxsubmission/client.go (1)

102-102: LGTM: Explicit cleanup completes the lifecycle pattern.

Closing submitResultChan in Stop() ensures blocked SubmitTx() calls terminate gracefully, completing the consistent cleanup pattern across all protocol clients.

protocol/localtxmonitor/client.go (1)

113-116: LGTM! Explicit channel cleanup in Stop().

The explicit channel closures in Stop() are well-synchronized via busyMutex, which also protects all channel readers throughout the file. This eliminates the need for a background goroutine and provides more predictable cleanup behavior.

protocol/blockfetch/client_test.go (1)

211-264: LGTM! Comprehensive shutdown test.

The test properly validates the client lifecycle with goroutine leak detection, timeout handling, and error propagation. The structure is consistent with shutdown tests in other protocol implementations.

protocol/txsubmission/server.go (1)

85-96: LGTM! Explicit shutdown for server protocol.

The Stop() method follows the same explicit cleanup pattern adopted across other protocol implementations in this PR, closing result channels before stopping the underlying protocol.

@wolf31o2 wolf31o2 force-pushed the perf/protocols-improvements branch from bb59f79 to beec783 Compare November 13, 2025 18:23
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
protocol/txsubmission/server.go (2)

18-35: Guard Stop() against double-close on done

handleDone() now calls s.Stop(), and the connection teardown path is likely to do the same. As written, the second caller panics on close(s.done). We need an idempotent guard (e.g., sync.Once) and to reset it when we create a fresh channel during restart.
Apply this diff:

@@
-import (
-	"errors"
-	"fmt"
+import (
+	"errors"
+	"fmt"
+	"sync"
@@
 type Server struct {
 	*protocol.Protocol
@@
-	done                   chan struct{}
+	done                   chan struct{}
+	stopOnce               sync.Once
@@
 func (s *Server) Stop() {
 	s.Protocol.Logger().
 		Debug("stopping server protocol",
 			"component", "network",
 			"protocol", ProtocolName,
 			"connection_id", s.callbackContext.ConnectionId.String(),
 		)
-	close(s.done)
+	s.stopOnce.Do(func() {
+		close(s.done)
+	})
 	s.Protocol.Stop()
 }

Also applies to: 88-96


247-263: Avoid deadlocking on requestTxIdsResultChan when handling Done

When the peer sends Done while no RequestTxIds call is outstanding, the current send blocks forever on the unbuffered channel, preventing restart. Switching to a non-blocking send (and resetting the new stopOnce) keeps shutdown smooth.
Apply this diff:

@@
-	s.requestTxIdsResultChan <- requestTxIdsResult{
-		err: ErrStopServerProcess,
-	}
+	select {
+	case s.requestTxIdsResultChan <- requestTxIdsResult{err: ErrStopServerProcess}:
+	default:
+	}
@@
-	s.requestTxIdsResultChan = make(chan requestTxIdsResult)
-	s.requestTxsResultChan = make(chan []TxBody)
-	s.done = make(chan struct{})
+	s.requestTxIdsResultChan = make(chan requestTxIdsResult)
+	s.requestTxsResultChan = make(chan []TxBody)
+	s.done = make(chan struct{})
+	s.stopOnce = sync.Once{}
 	s.ackCount = 0
 	s.Start()
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bb59f79 and beec783.

📒 Files selected for processing (12)
  • protocol/blockfetch/blockfetch.go (1 hunks)
  • protocol/blockfetch/client.go (1 hunks)
  • protocol/blockfetch/client_test.go (1 hunks)
  • protocol/chainsync/chainsync.go (1 hunks)
  • protocol/chainsync/client.go (1 hunks)
  • protocol/chainsync/client_test.go (2 hunks)
  • protocol/leiosnotify/client.go (1 hunks)
  • protocol/localtxmonitor/client.go (1 hunks)
  • protocol/localtxmonitor/client_test.go (1 hunks)
  • protocol/localtxsubmission/client.go (1 hunks)
  • protocol/localtxsubmission/client_test.go (1 hunks)
  • protocol/txsubmission/server.go (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • protocol/chainsync/chainsync.go
  • protocol/leiosnotify/client.go
  • protocol/chainsync/client_test.go
  • protocol/localtxsubmission/client_test.go
🧰 Additional context used
🧬 Code graph analysis (3)
protocol/localtxmonitor/client_test.go (7)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-264)
protocol/chainsync/client_test.go (1)
  • TestClientShutdown (283-336)
protocol/localtxsubmission/client_test.go (1)
  • TestClientShutdown (167-219)
connection.go (3)
  • NewConnection (107-130)
  • Connection (59-103)
  • New (133-135)
connection_options.go (2)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
protocol/localtxmonitor/localtxmonitor.go (1)
  • LocalTxMonitor (112-115)
protocol/localtxmonitor/client.go (1)
  • Client (25-38)
protocol/blockfetch/client_test.go (6)
protocol/chainsync/client_test.go (1)
  • TestClientShutdown (283-336)
connection.go (3)
  • NewConnection (107-130)
  • Connection (59-103)
  • New (133-135)
protocol/protocol.go (2)
  • ProtocolRoleClient (94-94)
  • New (122-133)
protocol/blockfetch/blockfetch.go (2)
  • New (156-162)
  • BlockFetch (102-105)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/blockfetch/client.go (1)
  • Client (29-39)
protocol/txsubmission/server.go (2)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (go)
🔇 Additional comments (5)
protocol/blockfetch/blockfetch.go (1)

122-122: Recv queue bump looks safe

The new default of 384 still respects MaxRecvQueueSize (512) and helps accommodate the larger buffers targeted by the PR. No issues here.

protocol/localtxmonitor/client.go (1)

113-116: Orderly shutdown looks good

Holding busyMutex before closing the result channels ensures no in-flight request is still writing, so the deterministic shutdown here reads well.

protocol/localtxmonitor/client_test.go (1)

300-352: Shutdown test covers the new Stop path

Thanks for adding parity shutdown coverage—the mock handshake plus async error watcher matches the other protocol suites and should catch future regressions.

protocol/localtxsubmission/client.go (1)

102-102: Channel closure under mutex looks correct

Locking busyMutex before closing submitResultChan prevents the handler goroutines from racing the close; the shutdown flow remains safe.

protocol/blockfetch/client_test.go (1)

211-264: Shutdown regression test appreciated

This mirrors the other protocol shutdown suites and will flag future lifecycle regressions quickly.

Copy link
Contributor

@agaffney agaffney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bot seems to think that it's a bad idea to move the channel close into the Stop() function on some of these protocols, and I'd tend to agree. The way it is now allows for cleaner async shutdown.

@wolf31o2 wolf31o2 force-pushed the perf/protocols-improvements branch from beec783 to fe904df Compare November 14, 2025 19:04
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
protocol/txsubmission/server.go (1)

250-273: Fix the silent error loss and thread-safety issues in the restart sequence.

The non-blocking send to the unbuffered requestTxIdsResultChan (lines 251-257) silently discards the error if no goroutine is actively blocked in the select statement at line 149. Any subsequent RequestTxIds call after restart will receive from the newly created channel (line 267) and will not be aware that a Done message was processed. This violates the protocol's contract of notifying callers about the done state.

Additionally, the restart sequence (lines 265-272) creates a thread-safety issue: reinitializing onceStop = sync.Once{} (line 270) allows Stop() to execute again after being called at line 265, potentially triggering duplicate stop logic or partial restarts if Stop() is called concurrently during the window between lines 270-272. The channel reassignments (lines 267-269) are also not synchronized, risking data races if goroutines are blocked on sends/receives during restart.

Recommended fixes:

  • Use a buffered channel for requestTxIdsResultChan or implement a callback-based notification mechanism to ensure no error loss
  • Synchronize the restart sequence with a lock to prevent concurrent Stop() calls and unsafe channel reassignments
  • Add test coverage for restart scenarios, particularly with concurrent RequestTxIds calls and Stop() invocations
♻️ Duplicate comments (1)
protocol/chainsync/client.go (1)

146-150: Acknowledge existing race condition flagged in previous review.

The critical race condition between message handlers writing to readyForNextBlockChan and Stop() closing it has already been thoroughly documented in the previous review comment. This remains unresolved in the current changes.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between beec783 and fe904df.

📒 Files selected for processing (12)
  • protocol/blockfetch/blockfetch.go (1 hunks)
  • protocol/blockfetch/client.go (1 hunks)
  • protocol/blockfetch/client_test.go (1 hunks)
  • protocol/chainsync/chainsync.go (1 hunks)
  • protocol/chainsync/client.go (1 hunks)
  • protocol/chainsync/client_test.go (2 hunks)
  • protocol/leiosnotify/client.go (1 hunks)
  • protocol/localtxmonitor/client.go (1 hunks)
  • protocol/localtxmonitor/client_test.go (1 hunks)
  • protocol/localtxsubmission/client.go (1 hunks)
  • protocol/localtxsubmission/client_test.go (1 hunks)
  • protocol/txsubmission/server.go (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • protocol/localtxsubmission/client_test.go
  • protocol/blockfetch/blockfetch.go
  • protocol/blockfetch/client_test.go
  • protocol/chainsync/client_test.go
  • protocol/blockfetch/client.go
  • protocol/localtxmonitor/client_test.go
🧰 Additional context used
🧬 Code graph analysis (1)
protocol/txsubmission/server.go (3)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/error.go (1)
  • ErrStopServerProcess (21-21)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (go)
🔇 Additional comments (3)
protocol/chainsync/chainsync.go (1)

226-227: LGTM: Reasonable increase in default buffer sizes for better throughput.

Increasing DefaultPipelineLimit and DefaultRecvQueueSize from 50 to 75 aligns with the PR objectives to improve buffering and performance. The new values remain comfortably within protocol limits (max=100).

protocol/txsubmission/server.go (2)

89-101: LGTM: Proper shutdown implementation with idempotent Stop().

The Stop() method correctly uses onceStop to ensure idempotent shutdown, closes the done channel to signal waiting operations, and stops the underlying protocol. The logging is appropriate for debugging.


148-161: LGTM: Proper shutdown handling in RequestTxIds.

The updated RequestTxIds correctly uses a select statement to wait on both the result channel and the done channel, ensuring the function returns promptly on shutdown. The validation of result.err and the update of ackCount are correct.

@wolf31o2 wolf31o2 force-pushed the perf/protocols-improvements branch from fe904df to bc29c22 Compare November 14, 2025 19:36
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
protocol/localtxmonitor/client.go (1)

113-120: Critical race condition: message handlers can write to closed channels.

The cleanup goroutine closes result channels after DoneChan() signals, but message handlers (handleAcquired at line 266, handleReplyHasTx at line 279, handleReplyNextTx at line 292, handleReplyGetSizes at line 305) write to these channels without checking DoneChan(). Between when MsgDone is sent (line 109) and when the protocol fully shuts down, handlers can still execute. Once the cleanup goroutine closes the channels, any in-flight handler writes will panic.

Apply this fix to add DoneChan() checks in handlers:

 func (c *Client) handleAcquired(msg protocol.Message) error {
 	// ... logging ...
 	msgAcquired := msg.(*MsgAcquired)
 	c.acquired = true
 	c.acquiredSlot = msgAcquired.SlotNo
+	select {
+	case <-c.DoneChan():
+		return protocol.ErrProtocolShuttingDown
+	case c.acquireResultChan <- true:
+	}
-	c.acquireResultChan <- true
 	return nil
 }

 func (c *Client) handleReplyHasTx(msg protocol.Message) error {
 	// ... logging ...
 	msgReplyHasTx := msg.(*MsgReplyHasTx)
+	select {
+	case <-c.DoneChan():
+		return protocol.ErrProtocolShuttingDown
+	case c.hasTxResultChan <- msgReplyHasTx.Result:
+	}
-	c.hasTxResultChan <- msgReplyHasTx.Result
 	return nil
 }

 func (c *Client) handleReplyNextTx(msg protocol.Message) error {
 	// ... logging ...
 	msgReplyNextTx := msg.(*MsgReplyNextTx)
+	select {
+	case <-c.DoneChan():
+		return protocol.ErrProtocolShuttingDown
+	case c.nextTxResultChan <- msgReplyNextTx.Transaction.Tx:
+	}
-	c.nextTxResultChan <- msgReplyNextTx.Transaction.Tx
 	return nil
 }

 func (c *Client) handleReplyGetSizes(msg protocol.Message) error {
 	// ... logging ...
 	msgReplyGetSizes := msg.(*MsgReplyGetSizes)
+	select {
+	case <-c.DoneChan():
+		return protocol.ErrProtocolShuttingDown
+	case c.getSizesResultChan <- msgReplyGetSizes.Result:
+	}
-	c.getSizesResultChan <- msgReplyGetSizes.Result
 	return nil
 }
protocol/chainsync/client.go (1)

146-150: Critical: Handlers write to channel without synchronization.

Message handlers handleRollForward (lines 721, 728) and handleRollBackward (lines 752, 760) write to readyForNextBlockChan without checking DoneChan(). Between when Stop() sends MsgDone (line 142) and when the protocol shuts down, handlers can still execute. Once the cleanup goroutine closes the channel, handler writes will panic.

Apply this fix to add DoneChan() checks before channel writes:

 func (c *Client) handleRollForward(msgGeneric protocol.Message) error {
 	// ... existing logic ...
 	if callbackErr != nil {
 		if errors.Is(callbackErr, ErrStopSyncProcess) {
-			c.readyForNextBlockChan <- false
+			select {
+			case <-c.DoneChan():
+			case c.readyForNextBlockChan <- false:
+			}
 			return nil
 		} else {
 			return callbackErr
 		}
 	}
-	c.readyForNextBlockChan <- true
+	select {
+	case <-c.DoneChan():
+		return protocol.ErrProtocolShuttingDown
+	case c.readyForNextBlockChan <- true:
+	}
 	return nil
 }

 func (c *Client) handleRollBackward(msgGeneric protocol.Message) error {
 	// ... existing logic ...
 	if callbackErr := c.config.RollBackwardFunc(c.callbackContext, msgRollBackward.Point, msgRollBackward.Tip); callbackErr != nil {
 		if errors.Is(callbackErr, ErrStopSyncProcess) {
-			c.readyForNextBlockChan <- false
+			select {
+			case <-c.DoneChan():
+			case c.readyForNextBlockChan <- false:
+			}
 			return nil
 		} else {
 			return callbackErr
 		}
 	}
-	c.readyForNextBlockChan <- true
+	select {
+	case <-c.DoneChan():
+		return protocol.ErrProtocolShuttingDown
+	case c.readyForNextBlockChan <- true:
+	}
 	return nil
 }
protocol/leiosnotify/client.go (1)

93-97: Critical race condition: message handlers can write to closed channel.

The cleanup goroutine closes requestNextChan after DoneChan(), but message handlers (handleBlockAnnouncement at line 137, handleBlockOffer at line 142, handleBlockTxsOffer at line 147, handleVotesOffer at line 152) write to this channel without checking for shutdown. Between when MsgDone is sent (line 91) and when the protocol fully shuts down, handlers can still execute and write to the channel. Once the channel is closed, any handler write will panic.

Apply this fix to add DoneChan() checks in all handlers:

 func (c *Client) handleBlockAnnouncement(msg protocol.Message) error {
+	select {
+	case <-c.DoneChan():
+		return protocol.ErrProtocolShuttingDown
+	case c.requestNextChan <- msg:
+	}
-	c.requestNextChan <- msg
 	return nil
 }

 func (c *Client) handleBlockOffer(msg protocol.Message) error {
+	select {
+	case <-c.DoneChan():
+		return protocol.ErrProtocolShuttingDown
+	case c.requestNextChan <- msg:
+	}
-	c.requestNextChan <- msg
 	return nil
 }

 func (c *Client) handleBlockTxsOffer(msg protocol.Message) error {
+	select {
+	case <-c.DoneChan():
+		return protocol.ErrProtocolShuttingDown
+	case c.requestNextChan <- msg:
+	}
-	c.requestNextChan <- msg
 	return nil
 }

 func (c *Client) handleVotesOffer(msg protocol.Message) error {
+	select {
+	case <-c.DoneChan():
+		return protocol.ErrProtocolShuttingDown
+	case c.requestNextChan <- msg:
+	}
-	c.requestNextChan <- msg
 	return nil
 }
🧹 Nitpick comments (4)
protocol/localtxsubmission/client_test.go (1)

167-219: Good baseline shutdown test.

The test validates basic Start/Stop lifecycle and goroutine cleanup. Consider adding a follow-up test that verifies shutdown behavior during active operations (e.g., pending SubmitTx call) to ensure graceful handling.

protocol/chainsync/client_test.go (1)

283-336: Baseline shutdown test is sound.

The test validates Start/Stop lifecycle and goroutine cleanup. Consider adding follow-up tests that verify shutdown behavior during active sync operations to ensure graceful handling of in-flight requests.

protocol/localtxmonitor/client_test.go (1)

300-352: Shutdown test follows established pattern.

The test validates basic lifecycle and leak detection. Consider adding tests that verify shutdown during active operations (e.g., pending HasTx, NextTx, or GetSizes calls) to ensure graceful handling.

protocol/txsubmission/server.go (1)

149-162: Consider select case ordering during shutdown.

The select statement doesn't guarantee priority between cases. If both requestTxIdsResultChan and done are ready, Go randomly chooses one. This means a valid result might be discarded in favor of returning ErrProtocolShuttingDown.

This behavior is likely acceptable—once shutdown is initiated, returning an error is reasonable. However, if you want to prioritize draining pending results before acknowledging shutdown, you could check the result channel first with a non-blocking receive, then fall back to the blocking select.

Current behavior is reasonable for most use cases, so this is optional.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe904df and bc29c22.

📒 Files selected for processing (12)
  • protocol/blockfetch/blockfetch.go (1 hunks)
  • protocol/blockfetch/client.go (1 hunks)
  • protocol/blockfetch/client_test.go (1 hunks)
  • protocol/chainsync/chainsync.go (1 hunks)
  • protocol/chainsync/client.go (1 hunks)
  • protocol/chainsync/client_test.go (2 hunks)
  • protocol/leiosnotify/client.go (1 hunks)
  • protocol/localtxmonitor/client.go (1 hunks)
  • protocol/localtxmonitor/client_test.go (1 hunks)
  • protocol/localtxsubmission/client.go (1 hunks)
  • protocol/localtxsubmission/client_test.go (1 hunks)
  • protocol/txsubmission/server.go (8 hunks)
✅ Files skipped from review due to trivial changes (1)
  • protocol/blockfetch/blockfetch.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • protocol/localtxsubmission/client.go
  • protocol/blockfetch/client_test.go
🧰 Additional context used
🧬 Code graph analysis (4)
protocol/chainsync/client_test.go (4)
protocol/chainsync/chainsync.go (3)
  • ChainSync (204-207)
  • New (259-267)
  • NewConfig (273-295)
protocol/chainsync/client.go (1)
  • Client (29-45)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-264)
connection.go (3)
  • NewConnection (107-130)
  • Connection (59-103)
  • New (133-135)
protocol/txsubmission/server.go (3)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/error.go (1)
  • ErrStopServerProcess (21-21)
protocol/localtxmonitor/client_test.go (6)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-264)
protocol/chainsync/client_test.go (1)
  • TestClientShutdown (283-336)
protocol/localtxsubmission/client_test.go (1)
  • TestClientShutdown (167-219)
connection.go (3)
  • NewConnection (107-130)
  • Connection (59-103)
  • New (133-135)
connection_options.go (2)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
protocol/localtxmonitor/client.go (1)
  • Client (25-38)
protocol/localtxsubmission/client_test.go (3)
connection.go (3)
  • NewConnection (107-130)
  • Connection (59-103)
  • New (133-135)
connection_options.go (2)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
protocol/localtxsubmission/client.go (1)
  • Client (26-34)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (go)
🔇 Additional comments (9)
protocol/blockfetch/client.go (1)

112-117: Shutdown pattern correctly implemented.

The deferred channel cleanup is safe because all message handlers (handleStartBatch at lines 226-230, handleNoBlocks at lines 244-249, and handleBlock at lines 281-286) check DoneChan() before writing to channels, preventing writes to closed channels.

protocol/chainsync/client_test.go (1)

83-86: Good cleanup addition.

Adding client.Stop() in the test teardown ensures proper cleanup of goroutines and prevents resource leaks in tests.

protocol/chainsync/chainsync.go (1)

226-227: Performance tuning looks reasonable.

Increasing default queue sizes from 50 to 75 (still within protocol max of 100) aligns with the PR's performance improvement objectives. The values remain compliant with protocol limits and validation logic.

protocol/txsubmission/server.go (6)

35-37: LGTM! Appropriate synchronization primitives for shutdown and restart.

The new fields provide the necessary synchronization for graceful shutdown and restart:

  • done signals shutdown to waiting operations
  • onceStop ensures idempotent Stop() behavior
  • restartMutex protects the restart sequence

51-53: Good performance improvement with buffered channels.

Buffering the result channels with capacity 1 decouples producers and consumers, allowing message handlers to send results without blocking. This aligns with the PR's performance optimization objectives.


80-88: LGTM! Cleanup goroutine elimination aligns with PR objectives.

Moving cleanup from Start() to Stop() simplifies the code and eliminates an unnecessary goroutine, as noted in the PR description.


187-194: LGTM! Proper shutdown handling for RequestTxs.

The shutdown handling correctly checks channel closure and returns ErrProtocolShuttingDown appropriately.


254-261: Good defensive pattern with non-blocking send.

The non-blocking send to requestTxIdsResultChan properly prevents deadlock during restart. If RequestTxIds is waiting, it receives the stop signal; if not, the restart proceeds without blocking.


281-285: LGTM! Complete state reinitialization for restart.

The restart sequence properly reinitializes all stateful fields:

  • Result channels recreated with appropriate buffer sizes
  • Fresh done channel for the new protocol instance
  • onceStop reset allows the restarted server to be stopped
  • ackCount reset is correct for a new session

@wolf31o2 wolf31o2 force-pushed the perf/protocols-improvements branch from bc29c22 to 41df945 Compare November 15, 2025 00:44
@wolf31o2
Copy link
Member Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 15, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
protocol/txsubmission/server.go (1)

90-102: Critical race condition: concurrent close of done channel (duplicate).

Both Stop() (line 99) and handleDone() (line 278) attempt to close s.done without coordination:

  • Stop() uses onceStop but doesn't acquire restartMutex
  • handleDone() acquires restartMutex but closes done unconditionally
  • If called concurrently, both may attempt to close s.done, causing a panic

The suggested fix from the previous review is still applicable:

  1. In Stop(): Acquire restartMutex before closing done:
 func (s *Server) Stop() {
 	s.onceStop.Do(func() {
+		s.restartMutex.Lock()
+		defer s.restartMutex.Unlock()
 		s.Protocol.Logger().
 			Debug("stopping server protocol",
 				"component", "network",
 				"protocol", ProtocolName,
 				"connection_id", s.callbackContext.ConnectionId.String(),
 			)
 		close(s.done)
 		s.Protocol.Stop()
 	})
 }
  1. In handleDone() (line 278): Guard the close:
-	close(s.done)
+	select {
+	case <-s.done:
+		// Already closed by Stop()
+	default:
+		close(s.done)
+	}
🧹 Nitpick comments (4)
protocol/peersharing/client_test.go (2)

72-76: Consider validating the value received from ErrorChan().

The test discards the value received from ErrorChan() without checking if an error occurred during shutdown. Consider capturing and validating the received value to ensure clean shutdown.

Apply this diff to validate the shutdown:

 	// Wait for connection shutdown
 	select {
-	case <-oConn.ErrorChan():
+	case err := <-oConn.ErrorChan():
+		if err != nil {
+			t.Errorf("unexpected error during shutdown: %s", err)
+		}
 	case <-time.After(10 * time.Second):
 		t.Errorf("did not shutdown within timeout")
 	}

56-66: Consider testing shutdown after protocol operations.

While testing basic shutdown is valuable, performing actual PeerSharing protocol operations before shutdown would better validate the cleanup improvements mentioned in the PR (deferred cleanup, eliminated goroutines). This would ensure that active protocol state is properly cleaned up.

protocol/txsubmission/server_test.go (1)

28-30: Track the mock server limitation as a follow-up issue.

The test is skipped due to mock server issues with the NtN protocol, which means the server shutdown behavior isn't currently being validated. Consider creating a GitHub issue to track fixing the mock server so this test can be enabled.

Would you like me to help draft an issue description for tracking the mock server limitation?

protocol/localstatequery/client_test.go (1)

357-409: Consider adding Done message to mock conversation for completeness.

The test verifies that Stop() can be called without error, but the mock conversation (lines 361-364) doesn't include an expected Done message from the client. Since Stop() sends MsgDone (as implemented in client.go line 127), consider adding this to the mock conversation:

mockConn := ouroboros_mock.NewConnection(
    ouroboros_mock.ProtocolRoleClient,
    []ouroboros_mock.ConversationEntry{
        ouroboros_mock.ConversationEntryHandshakeRequestGeneric,
        ouroboros_mock.ConversationEntryHandshakeNtCResponse,
        ouroboros_mock.ConversationEntryInput{
            ProtocolId:  localstatequery.ProtocolId,
            MessageType: localstatequery.MessageTypeDone,
        },
    },
)

This would make the test more explicit about what's expected during shutdown and catch any regressions where the Done message isn't sent.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc29c22 and 41df945.

📒 Files selected for processing (19)
  • protocol/blockfetch/blockfetch.go (1 hunks)
  • protocol/blockfetch/client.go (4 hunks)
  • protocol/blockfetch/client_test.go (1 hunks)
  • protocol/chainsync/chainsync.go (1 hunks)
  • protocol/chainsync/client.go (3 hunks)
  • protocol/chainsync/client_test.go (2 hunks)
  • protocol/keepalive/client_test.go (1 hunks)
  • protocol/leiosnotify/client.go (2 hunks)
  • protocol/leiosnotify/client_test.go (1 hunks)
  • protocol/localstatequery/client.go (5 hunks)
  • protocol/localstatequery/client_test.go (1 hunks)
  • protocol/localtxmonitor/client.go (5 hunks)
  • protocol/localtxmonitor/client_test.go (1 hunks)
  • protocol/localtxsubmission/client.go (3 hunks)
  • protocol/localtxsubmission/client_test.go (1 hunks)
  • protocol/peersharing/client_test.go (1 hunks)
  • protocol/peersharing/server_test.go (1 hunks)
  • protocol/txsubmission/server.go (8 hunks)
  • protocol/txsubmission/server_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • protocol/localtxmonitor/client_test.go
  • protocol/chainsync/client.go
  • protocol/chainsync/client_test.go
  • protocol/blockfetch/blockfetch.go
  • protocol/blockfetch/client_test.go
  • protocol/localtxsubmission/client_test.go
🔇 Additional comments (25)
protocol/peersharing/client_test.go (1)

28-29: Good use of goleak for leak detection.

The defer goleak.VerifyNone(t) ensures no goroutine leaks occur during the test, which aligns well with the PR's objective of eliminating unnecessary goroutines and improving cleanup.

protocol/txsubmission/server_test.go (1)

31-80: Well-structured shutdown test.

The test correctly validates the shutdown sequence with proper error handling, timeouts, and cleanup. The pattern of monitoring async errors, starting/stopping the server, and verifying orderly shutdown is sound.

protocol/keepalive/client_test.go (1)

241-292: LGTM! Client shutdown test is well-structured.

The test follows the same robust pattern as the server tests, with proper async error monitoring, timeout handling, and cleanup. Unlike the server tests, this one isn't skipped, so it will actively validate the client shutdown behavior.

protocol/chainsync/chainsync.go (1)

226-227: Appropriate default increases for better buffering.

The increase from 50 to 75 for both DefaultPipelineLimit and DefaultRecvQueueSize aligns with the PR's performance objectives. The new defaults at 75% of the maximum limits (100) provide better buffering while maintaining a reasonable margin.

protocol/leiosnotify/client.go (2)

93-97: Proper deferred channel cleanup.

The change to defer closing requestNextChan until after DoneChan() fires correctly addresses the race condition mentioned in past reviews. This ensures handlers complete before the channel is closed.


137-169: Shutdown-aware message handling is correctly implemented.

All four message handlers (handleBlockAnnouncement, handleBlockOffer, handleBlockTxsOffer, handleVotesOffer) consistently use the select pattern to check DoneChan() before writing to requestNextChan, preventing panics from writes to closed channels during shutdown.

protocol/localtxmonitor/client.go (2)

113-120: Correct deferred cleanup for all result channels.

The goroutine properly waits for DoneChan() before closing all four result channels (acquireResultChan, hasTxResultChan, nextTxResultChan, getSizesResultChan), ensuring handlers complete before cleanup.


266-321: All handlers correctly implement shutdown guards.

All four message handlers consistently use the select pattern with DoneChan() to prevent writes to result channels during shutdown, addressing the race condition flagged in previous reviews.

protocol/blockfetch/client.go (3)

112-117: Deferred channel cleanup correctly implemented.

The change to defer closing blockChan and startBatchResultChan until after DoneChan() fires addresses the race condition from the previous review, preventing panics when responses arrive after Stop() is called.


225-230: Shutdown-aware batch start handling.

The select statement properly guards the send to startBatchResultChan, returning ErrProtocolShuttingDown if the protocol is shutting down.


242-301: Proper shutdown handling in block and error paths.

Both handleNoBlocks and handleBlock correctly implement shutdown checks:

  • handleNoBlocks creates the error before the select, then guards the send
  • handleBlock checks for shutdown before callback processing (lines 278-282) and uses select for channel sends in the non-callback path (lines 297-301)
protocol/localtxsubmission/client.go (2)

102-106: Deferred channel cleanup prevents race condition.

The goroutine correctly waits for DoneChan() before closing submitResultChan, ensuring that handlers complete before the channel is closed.


158-187: Handlers correctly use select to eliminate TOCTOU race.

Both handleAcceptTx and handleRejectTx use the select pattern with DoneChan(), which atomically checks for shutdown while attempting to send to submitResultChan. This eliminates the time-of-check-time-of-use race condition mentioned in previous reviews.

protocol/peersharing/server_test.go (2)

28-30: Track mock server limitation alongside TxSubmission test.

This test is skipped for the same reason as protocol/txsubmission/server_test.go. These should be tracked together in a single issue for fixing the mock server's NtN protocol support.


31-78: Consistent and well-structured server shutdown test.

The test follows the same robust pattern as other shutdown tests in this PR, with proper async error monitoring, timeout handling, and cleanup. The consistent pattern across the codebase makes the tests easier to understand and maintain.

protocol/localstatequery/client.go (5)

99-115: LGTM! Deferred cleanup improves shutdown reliability.

The goroutine that closes result channels on protocol shutdown ensures that callers waiting on these channels will unblock when the protocol stops. This prevents resource leaks and aligns with the PR's goal of improving shutdown handling.


117-131: LGTM! Idempotent Stop() with proper error handling.

The Stop() method correctly uses onceStop to ensure it only executes once, even if called multiple times. Returning the error from SendMessage allows callers to handle cases where the message cannot be sent (e.g., if the protocol is already shutting down).


887-903: LGTM! Shutdown-aware sending prevents blocking.

Setting acquired = true before the select statement ensures state is updated consistently. The select statement properly checks for shutdown before sending to acquireResultChan, returning protocol.ErrProtocolShuttingDown if the protocol is shutting down.


905-931: LGTM! Consistent shutdown handling in failure cases.

Both failure cases now use shutdown-aware sending via select statements, preventing blocked sends during protocol shutdown. This ensures that failure handling respects the protocol lifecycle.


933-948: LGTM! Optimized result extraction with shutdown awareness.

Extracting the MsgResult before the select statement is a minor optimization that improves readability. The shutdown-aware send pattern is consistent with the other handlers.

protocol/txsubmission/server.go (5)

27-38: LGTM! New fields support shutdown and restart coordination.

The added fields (done, onceStop, restartMutex) provide the necessary infrastructure for graceful shutdown and protocol restart. This follows established patterns for lifecycle management.


46-61: LGTM! Buffered result channels improve shutdown handling.

The buffered result channels (capacity 1) allow handlers to send results without blocking, which is particularly useful during shutdown. The unbuffered done channel is appropriate for signaling.


104-163: LGTM! Shutdown-aware request handling with proper state updates.

The select statement properly handles both results and shutdown signals. The ackCount update at line 158 correctly occurs after successfully receiving results, ensuring consistency.


165-195: LGTM! Performance optimization and shutdown handling.

The pre-allocation of txString (lines 167-172) is a nice performance optimization for logging. The shutdown-aware select statement properly handles both the done signal and channel closure.


246-288: LGTM! Comprehensive restart logic with proper state reinitialization.

The non-blocking send (lines 254-261) ensures RequestTxIds is signaled without blocking. The restart sequence properly:

  • Acquires restartMutex for coordination
  • Stops the current protocol
  • Recreates channels with appropriate buffering
  • Resets onceStop to allow future Stop() calls
  • Reinitializes all state

Note: The race condition with close(s.done) at line 278 was flagged in the Stop() method review.

@wolf31o2 wolf31o2 force-pushed the perf/protocols-improvements branch from 41df945 to af11acf Compare November 15, 2025 15:05
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
protocol/localstatequery/client.go (1)

887-903: Potential data race on c.acquired between handler and callers

c.acquired is written here in handleAcquired() (called from the protocol’s message-handling goroutine) and read in acquire()/runQuery() without any shared synchronization. That gives you a likely data race under the Go race detector, and in the worst case could cause a caller to take the wrong Acquire vs ReAcquire path under contention.

Since busyMutex is held by the caller while acquire() waits on c.acquireResultChan, you can’t safely reuse busyMutex here without risking deadlock. Consider making acquired an atomic value or guarding it with a small dedicated mutex (read+write on both the call path and in handleAcquired()/release()).

🧹 Nitpick comments (2)
protocol/localstatequery/client.go (1)

43-44: Stop() lifecycle guard and deferred cleanup look solid

The new onceStop guard plus Stop() that sends MsgDone and then closes queryResultChan/acquireResultChan after <-c.DoneChan() matches the shutdown pattern used in other protocol clients and avoids the earlier eager cleanup in Start(). This should reduce goroutine churn and races around channel closing.

One small nit: if DoneChan() never closes (e.g., protocol was never started or gets wedged), the goroutine spawned in Stop() will park forever. Probably acceptable in practice, but if you want to be extra defensive you could gate the goroutine on onceStart having run or add a timeout around the wait.

Also applies to: 99-131

protocol/txsubmission/server.go (1)

256-263: handleDone signaling and restart flow are well‑structured

The non‑blocking send of ErrStopServerProcess into requestTxIdsResultChan plus invoking DoneFunc and then restarting under restartMutex (including guarded close of done, Protocol.Stop(), initProtocol(), re‑creating channels, resetting done and onceStop, zeroing ackCount, and Start()) gives a coherent restart story and avoids the previous concurrent done close issue.

One minor consideration: s.Start() currently runs while restartMutex is held. If you ever want Stop() calls not to block behind a potentially slow restart, you could move the Start() call outside the locked section, once all fields are safely reinitialized. Not critical, but might improve responsiveness under heavy load.

Also applies to: 271-292

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41df945 and af11acf.

📒 Files selected for processing (20)
  • protocol/blockfetch/blockfetch.go (1 hunks)
  • protocol/blockfetch/client.go (4 hunks)
  • protocol/blockfetch/client_test.go (1 hunks)
  • protocol/chainsync/chainsync.go (1 hunks)
  • protocol/chainsync/client.go (3 hunks)
  • protocol/chainsync/client_test.go (2 hunks)
  • protocol/keepalive/client_test.go (1 hunks)
  • protocol/leiosnotify/client.go (2 hunks)
  • protocol/leiosnotify/client_test.go (1 hunks)
  • protocol/localstatequery/client.go (5 hunks)
  • protocol/localstatequery/client_test.go (1 hunks)
  • protocol/localtxmonitor/client.go (5 hunks)
  • protocol/localtxmonitor/client_test.go (1 hunks)
  • protocol/localtxsubmission/client.go (3 hunks)
  • protocol/localtxsubmission/client_test.go (1 hunks)
  • protocol/peersharing/client.go (4 hunks)
  • protocol/peersharing/client_test.go (1 hunks)
  • protocol/peersharing/server_test.go (1 hunks)
  • protocol/txsubmission/server.go (8 hunks)
  • protocol/txsubmission/server_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
  • protocol/chainsync/chainsync.go
  • protocol/localtxsubmission/client.go
  • protocol/peersharing/server_test.go
  • protocol/leiosnotify/client.go
  • protocol/peersharing/client_test.go
  • protocol/chainsync/client_test.go
  • protocol/chainsync/client.go
  • protocol/blockfetch/client_test.go
  • protocol/txsubmission/server_test.go
🧰 Additional context used
🧬 Code graph analysis (10)
protocol/localtxmonitor/client_test.go (4)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-264)
connection.go (3)
  • NewConnection (107-130)
  • Connection (59-103)
  • New (133-135)
connection_options.go (2)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
protocol/localtxmonitor/client.go (1)
  • Client (25-38)
protocol/keepalive/client_test.go (5)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-264)
protocol/peersharing/client_test.go (1)
  • TestClientShutdown (28-81)
connection.go (3)
  • NewConnection (107-130)
  • Connection (59-103)
  • New (133-135)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/keepalive/keepalive.go (1)
  • KeepAlive (85-88)
protocol/localstatequery/client.go (5)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localstatequery/localstatequery.go (1)
  • ProtocolName (28-28)
protocol/localstatequery/messages.go (3)
  • NewMsgDone (245-252)
  • AcquireFailurePointNotOnChain (44-44)
  • MsgResult (172-175)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localstatequery/error.go (2)
  • ErrAcquireFailurePointTooOld (20-20)
  • ErrAcquireFailurePointNotOnChain (23-25)
protocol/localtxsubmission/client_test.go (4)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-264)
connection.go (3)
  • NewConnection (107-130)
  • Connection (59-103)
  • New (133-135)
connection_options.go (2)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
protocol/localtxsubmission/client.go (1)
  • Client (26-34)
protocol/blockfetch/client.go (4)
protocol/blockfetch/blockfetch.go (1)
  • New (156-162)
protocol/protocol.go (1)
  • New (122-133)
muxer/muxer.go (1)
  • New (90-117)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/server.go (3)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/error.go (1)
  • ErrStopServerProcess (21-21)
protocol/peersharing/client.go (3)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/peersharing/peersharing.go (1)
  • ProtocolName (27-27)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localtxmonitor/client.go (1)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/leiosnotify/client_test.go (4)
protocol/versiondata.go (6)
  • VersionData (40-46)
  • VersionDataNtN13andUp (143-145)
  • VersionDataNtN11to12 (116-122)
  • DiffusionModeInitiatorOnly (21-21)
  • PeerSharingModeNoPeerSharing (27-27)
  • QueryModeDisabled (36-36)
protocol/handshake/messages.go (1)
  • NewMsgAcceptVersion (88-102)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/leiosnotify/client.go (1)
  • Client (24-31)
protocol/localstatequery/client_test.go (4)
protocol/localtxmonitor/client_test.go (1)
  • TestClientShutdown (300-352)
connection.go (3)
  • NewConnection (107-130)
  • Connection (59-103)
  • New (133-135)
connection_options.go (2)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
protocol/localstatequery/client.go (1)
  • Client (30-44)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (go)
🔇 Additional comments (14)
protocol/peersharing/client.go (3)

72-83: LGTM: Clean idempotent start implementation.

The Start() method correctly uses sync.Once to ensure single initialization, includes appropriate logging for observability, and delegates to the underlying Protocol.Start(). The implementation follows the lifecycle pattern described in the PR objectives.


147-151: LGTM: Shutdown-aware channel send prevents blocking.

The select statement correctly checks DoneChan before sending to sharePeersChan, returning protocol.ErrProtocolShuttingDown during shutdown instead of blocking. This integrates well with the deferred channel closure in Stop() and aligns with the PR's goal of consistent shutdown handling across protocols.


85-102: Code pattern is sound; error handling concern was based on misreading Protocol.Stop() signature.

The shutdown flow is properly implemented:

  1. Protocol.Stop() returns void (not error), so the concern about error propagation is incorrect.
  2. The goroutine at lines 95-98 will reliably close when Protocol.Stop() → Muxer.UnregisterProtocol() closes the receive channel, causing the Protocol's loops to exit and the sentinel goroutine (Protocol.Start lines 161-165) to close DoneChan.
  3. No indefinite goroutine leak: the cleanup chain closes recvChan → protocol loops exit → sentinel closes doneChan → Client goroutine unblocks and closes sharePeersChan.

The async shutdown design is acceptable; Client.Stop() correctly returns nil since the actual cleanup happens asynchronously.

protocol/localtxmonitor/client_test.go (1)

300-352: LGTM!

The shutdown test follows the established pattern and properly validates error handling, timeout behavior, and goroutine cleanup.

protocol/localtxmonitor/client.go (1)

113-120: Deferred cleanup correctly addresses race condition.

The goroutine now waits for DoneChan() before closing result channels, ensuring no handlers attempt to send after closure. Combined with the shutdown-aware selects in handlers (lines 266-270, 283-287, 300-304, 317-321), this eliminates the panic risk flagged in past reviews.

protocol/leiosnotify/client_test.go (1)

56-106: Good fix: test is no longer skipped.

The protocol initialization issues mentioned in past reviews have been resolved. The test now runs and properly validates LeiosNotify client shutdown behavior with appropriate handshake configuration (NtN version 15).

protocol/blockfetch/blockfetch.go (1)

122-122: LGTM!

Increasing DefaultRecvQueueSize from 256 to 384 aligns with the PR's performance objectives by providing better buffering capacity while remaining well within the MaxRecvQueueSize limit of 512.

protocol/localtxsubmission/client_test.go (1)

167-219: LGTM!

The shutdown test properly validates error handling, mock connection teardown, and timeout behavior consistent with the established testing pattern across other protocols.

protocol/localstatequery/client_test.go (1)

357-409: LGTM!

The shutdown test follows the established pattern with proper error handling and timeout-based validation of orderly client shutdown.

protocol/blockfetch/client.go (1)

112-117: Deferred cleanup correctly prevents panics.

The goroutine now waits for DoneChan() before closing channels, ensuring message handlers (lines 225-229, 242-247, 297-301) cannot send to closed channels. This addresses the race condition flagged in past reviews.

protocol/localstatequery/client.go (2)

905-931: Shutdown‑aware failure handling looks correct

Wrapping the sends to c.acquireResultChan in a select on c.DoneChan() means the handler now cleanly returns protocol.ErrProtocolShuttingDown instead of risking a send on a channel that will soon be closed. This aligns with the rest of the PR’s shutdown handling and looks good.


933-947: Result handling now safely coordinates with shutdown

Decoding MsgResult up front and then using a select on c.DoneChan() vs c.queryResultChan <- msgResult.Result ensures callers either see the result or a consistent protocol.ErrProtocolShuttingDown when the protocol is stopping, avoiding sends into a channel closed by Stop(). This change is straightforward and sound.

protocol/txsubmission/server.go (2)

20-21: Server lifecycle fields and Stop() coordination look good

Introducing done, onceStop, and restartMutex on Server, initializing them in NewServer, and having Stop() acquire restartMutex before closing done and calling Protocol.Stop() gives a clear, idempotent shutdown path and avoids the earlier double‑close race on done. This setup aligns well with the restart logic in handleDone() and with the rest of the PR’s shutdown model.

Also applies to: 26-38, 45-54, 80-104


188-196: RequestTxs: shutdown‑aware select is correct

The updated select on s.done vs s.requestTxsResultChan ensures callers either see protocol.ErrProtocolShuttingDown or the Tx bodies, and gracefully handles a closed result channel. This matches the shutdown strategy used elsewhere and looks correct.

@wolf31o2 wolf31o2 force-pushed the perf/protocols-improvements branch from af11acf to ddbf172 Compare November 15, 2025 16:29
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
protocol/keepalive/client_test.go (1)

241-294: LGTM! Shutdown test properly validates client lifecycle.

The test correctly:

  • Creates a mock connection and wires async error monitoring
  • Starts and stops the KeepAlive client
  • Checks the Stop() error (addressing the past review comment)
  • Verifies shutdown within timeouts
  • Uses goleak.VerifyNone() to detect goroutine leaks
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af11acf and ddbf172.

📒 Files selected for processing (23)
  • protocol/blockfetch/blockfetch.go (1 hunks)
  • protocol/blockfetch/client.go (6 hunks)
  • protocol/blockfetch/client_test.go (1 hunks)
  • protocol/chainsync/chainsync.go (1 hunks)
  • protocol/chainsync/client.go (5 hunks)
  • protocol/chainsync/client_test.go (2 hunks)
  • protocol/keepalive/client.go (2 hunks)
  • protocol/keepalive/client_test.go (1 hunks)
  • protocol/leiosfetch/client.go (2 hunks)
  • protocol/leiosnotify/client.go (4 hunks)
  • protocol/leiosnotify/client_test.go (1 hunks)
  • protocol/localstatequery/client.go (6 hunks)
  • protocol/localstatequery/client_test.go (1 hunks)
  • protocol/localtxmonitor/client.go (7 hunks)
  • protocol/localtxmonitor/client_test.go (1 hunks)
  • protocol/localtxsubmission/client.go (5 hunks)
  • protocol/localtxsubmission/client_test.go (1 hunks)
  • protocol/peersharing/client.go (4 hunks)
  • protocol/peersharing/client_test.go (1 hunks)
  • protocol/peersharing/server.go (1 hunks)
  • protocol/peersharing/server_test.go (1 hunks)
  • protocol/txsubmission/server.go (8 hunks)
  • protocol/txsubmission/server_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • protocol/chainsync/chainsync.go
  • protocol/peersharing/server_test.go
  • protocol/localstatequery/client_test.go
  • protocol/txsubmission/server_test.go
  • protocol/chainsync/client_test.go
  • protocol/blockfetch/blockfetch.go
  • protocol/leiosnotify/client_test.go
🧰 Additional context used
🧬 Code graph analysis (15)
protocol/peersharing/server.go (1)
protocol/message.go (1)
  • Message (18-22)
protocol/localtxsubmission/client.go (1)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localtxsubmission/client_test.go (8)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-264)
protocol/chainsync/client_test.go (1)
  • TestClientShutdown (283-336)
protocol/localstatequery/client_test.go (1)
  • TestClientShutdown (357-409)
protocol/localtxmonitor/client_test.go (1)
  • TestClientShutdown (300-352)
connection.go (3)
  • NewConnection (107-130)
  • Connection (59-103)
  • New (133-135)
connection_options.go (2)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
protocol/localtxsubmission/localtxsubmission.go (1)
  • LocalTxSubmission (67-70)
protocol/localtxsubmission/client.go (1)
  • Client (26-35)
protocol/localtxmonitor/client.go (2)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
ledger/tx.go (1)
  • Transaction (26-26)
protocol/blockfetch/client_test.go (5)
protocol/keepalive/client_test.go (1)
  • TestClientShutdown (241-294)
connection.go (3)
  • NewConnection (107-130)
  • Connection (59-103)
  • New (133-135)
protocol/blockfetch/blockfetch.go (2)
  • New (156-162)
  • BlockFetch (102-105)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/blockfetch/client.go (1)
  • Client (29-40)
protocol/leiosnotify/client.go (2)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/blockfetch/client.go (1)
  • Client (29-40)
protocol/keepalive/client.go (4)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/keepalive/keepalive.go (1)
  • ProtocolName (27-27)
connection/id.go (1)
  • ConnectionId (22-25)
protocol/keepalive/messages.go (1)
  • NewMsgDone (94-101)
protocol/chainsync/client.go (1)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/blockfetch/client.go (3)
protocol/blockfetch/blockfetch.go (1)
  • New (156-162)
protocol/protocol.go (1)
  • New (122-133)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/peersharing/client_test.go (4)
connection.go (2)
  • NewConnection (107-130)
  • Connection (59-103)
protocol/protocol.go (1)
  • ProtocolRoleClient (94-94)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/peersharing/client.go (1)
  • Client (25-33)
protocol/localstatequery/client.go (5)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localtxmonitor/client.go (1)
  • Client (25-39)
protocol/localstatequery/messages.go (4)
  • NewMsgDone (245-252)
  • AcquireFailurePointNotOnChain (44-44)
  • MsgResult (172-175)
  • NewMsgQuery (160-170)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localstatequery/error.go (2)
  • ErrAcquireFailurePointTooOld (20-20)
  • ErrAcquireFailurePointNotOnChain (23-25)
protocol/localtxmonitor/client_test.go (6)
protocol/localstatequery/client_test.go (1)
  • TestClientShutdown (357-409)
connection.go (3)
  • NewConnection (107-130)
  • Connection (59-103)
  • New (133-135)
protocol/protocol.go (2)
  • ProtocolRoleClient (94-94)
  • New (122-133)
connection_options.go (2)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
protocol/localtxmonitor/localtxmonitor.go (1)
  • LocalTxMonitor (112-115)
protocol/localtxmonitor/client.go (1)
  • Client (25-39)
protocol/peersharing/client.go (4)
protocol/blockfetch/client.go (1)
  • Client (29-40)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/peersharing/peersharing.go (1)
  • ProtocolName (27-27)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/keepalive/client_test.go (5)
connection.go (3)
  • NewConnection (107-130)
  • Connection (59-103)
  • New (133-135)
protocol/protocol.go (2)
  • ProtocolRoleClient (94-94)
  • New (122-133)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/keepalive/keepalive.go (1)
  • KeepAlive (85-88)
protocol/keepalive/client.go (1)
  • Client (26-34)
protocol/txsubmission/server.go (5)
protocol/txsubmission/messages.go (2)
  • TxBody (197-201)
  • NewMsgRequestTxIds (68-82)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/txsubmission/txsubmission.go (2)
  • ProtocolName (27-27)
  • MaxAckCount (143-143)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/error.go (1)
  • ErrStopServerProcess (21-21)
🔇 Additional comments (26)
protocol/leiosfetch/client.go (1)

101-107: LGTM! Proper deferred cleanup on shutdown.

The goroutine correctly waits for protocol shutdown via DoneChan() before closing result channels, preventing writes to closed channels during the shutdown window.

protocol/keepalive/client.go (1)

97-111: LGTM! Clean Stop() implementation.

The new Stop() method properly uses onceStop to ensure idempotency, logs the shutdown, and sends the MsgDone message. The timer cleanup is correctly handled by the goroutine in Start() (lines 84-92), which waits for DoneChan() before stopping the timer.

protocol/localtxmonitor/client_test.go (1)

300-352: LGTM! Well-structured shutdown test.

The test follows the established pattern across protocol clients, properly validating:

  • Client availability check
  • Start/Stop lifecycle with error handling
  • Async error monitoring with timeouts
  • Goroutine leak detection via goleak
protocol/leiosnotify/client.go (2)

79-79: LGTM! Proper lifecycle management with started flag.

The started flag is correctly used to determine channel closure timing in Stop():

  • If started, defers requestNextChan closure until protocol shutdown (lines 96-100)
  • If never started, closes immediately (lines 102-103)

This prevents closing channels that may still receive writes during shutdown, addressing the race condition noted in past reviews.

Also applies to: 95-104


143-177: LGTM! Shutdown-aware channel sends prevent panics.

All message handlers now use select with DoneChan() to avoid writing to requestNextChan after it's closed. This eliminates the TOCTOU race condition where handlers could check shutdown state and then write to a closed channel.

protocol/localtxsubmission/client.go (2)

83-83: LGTM! Consistent lifecycle management pattern.

The started flag and conditional channel closure in Stop() follow the same pattern as other protocol clients in this PR, ensuring submitResultChan is closed at the appropriate time based on whether the protocol was started.

Also applies to: 104-113


165-169: LGTM! Handlers properly guard against shutdown races.

Both handleAcceptTx and handleRejectTx now use select to race the channel send against DoneChan(), returning ErrProtocolShuttingDown if shutdown occurs first. This addresses the TOCTOU race noted in past reviews.

Also applies to: 190-194

protocol/localtxsubmission/client_test.go (1)

167-219: LGTM! Comprehensive shutdown test.

The test properly validates the LocalTxSubmission client lifecycle:

  • Checks client availability
  • Starts and stops with error handling
  • Uses async error monitoring with timeouts
  • Verifies clean shutdown and goroutine leak detection
protocol/blockfetch/client.go (2)

97-97: LGTM! Proper deferred channel closure prevents panics.

The started flag and conditional closure logic ensure that blockChan and startBatchResultChan are closed only after the protocol fully shuts down (via DoneChan()), preventing panics from in-flight responses attempting to write to closed channels. This addresses the critical issue noted in past reviews.

Also applies to: 114-125


233-238: LGTM! All handlers are shutdown-aware.

The message handlers now properly use select to check DoneChan() before sending to result channels, ensuring no writes occur after shutdown begins. This eliminates race conditions between shutdown and message handling.

Also applies to: 250-256, 305-309

protocol/blockfetch/client_test.go (1)

211-264: LGTM! Thorough shutdown lifecycle test.

The test properly validates the new Start/Stop lifecycle by:

  • Verifying the client can be started and stopped cleanly
  • Using goroutine leak detection to ensure no resources are leaked
  • Handling async errors from the mock connection appropriately
  • Enforcing reasonable timeouts for shutdown sequences

This follows the established testing pattern used across other protocol clients in this PR.

protocol/chainsync/client.go (3)

37-37: Good addition of lifecycle tracking.

The started flag enables conditional channel cleanup in Stop(), ensuring channels are only closed after the protocol fully shuts down when the client was actually started.


148-157: Proper shutdown sequencing prevents resource leaks.

The conditional deferred channel closure ensures:

  • If started, channels close only after the protocol signals shutdown via DoneChan()
  • If never started, channels close immediately to prevent leaks

This addresses the race condition flagged in previous reviews.


728-733: Shutdown-aware channel writes prevent panics.

The select statements properly handle shutdown by:

  • Checking DoneChan() before writing to readyForNextBlockChan
  • Returning protocol.ErrProtocolShuttingDown if the protocol is shutting down
  • Preventing writes to potentially closed channels

This resolves the race condition between message handlers and Stop() noted in past reviews.

Also applies to: 739-743

protocol/localtxmonitor/client.go (2)

115-130: Excellent shutdown coordination for multiple result channels.

The conditional deferred closure handles all four result channels (acquireResultChan, hasTxResultChan, nextTxResultChan, getSizesResultChan) appropriately:

  • Waits for protocol shutdown if started
  • Closes immediately if never started

This prevents resource leaks and addresses the race condition noted in previous reviews.


276-281: Consistent shutdown-aware pattern across all handlers.

All message handlers now use the same safe pattern:

  • Select on DoneChan() to detect shutdown
  • Return protocol.ErrProtocolShuttingDown if shutting down
  • Otherwise, write to the result channel

This eliminates the race condition where handlers could write to channels closed by Stop().

Also applies to: 293-297, 310-314, 327-331

protocol/localstatequery/client.go (3)

904-911: Good synchronization for acquired state.

Using acquiredMutex to protect the acquired flag before signaling prevents race conditions with concurrent acquire/release operations.


927-937: Shutdown-aware error and result handling.

The handlers properly check for shutdown before writing to channels, returning protocol.ErrProtocolShuttingDown when appropriate. This prevents writes to closed channels.

Also applies to: 952-957


114-140: Resource leak on Stop() error path.

If SendMessage(msg) fails at line 125, the function returns the error at line 139 without:

  1. Closing the channels (queryResultChan, acquireResultChan)
  2. Calling Protocol.Stop()

This causes resource leaks when the Done message fails to send.

Apply this fix to ensure cleanup even on error:

 func (c *Client) Stop() error {
 	var err error
 	c.onceStop.Do(func() {
 		c.Protocol.Logger().
 			Debug("stopping client protocol",
 				"component", "network",
 				"protocol", ProtocolName,
 				"connection_id", c.callbackContext.ConnectionId.String(),
 			)
 		msg := NewMsgDone()
-		err = c.SendMessage(msg)
+		if sendErr := c.SendMessage(msg); sendErr != nil {
+			// Log the error but continue with cleanup
+			c.Protocol.Logger().
+				Error("failed to send Done message during stop",
+					"error", sendErr,
+					"component", "network",
+					"protocol", ProtocolName,
+					"connection_id", c.callbackContext.ConnectionId.String(),
+				)
+		}
 		// Defer closing channels until protocol fully shuts down (only if started)
 		if c.started {
 			go func() {
 				<-c.DoneChan()
 				close(c.queryResultChan)
 				close(c.acquireResultChan)
 			}()
 		} else {
 			// If protocol was never started, close channels immediately
 			close(c.queryResultChan)
 			close(c.acquireResultChan)
 		}
+		c.Protocol.Stop()
 	})
-	return err
+	return nil
 }

Alternatively, if you want to preserve the error, call Protocol.Stop() unconditionally before returning the error.

Likely an incorrect or invalid review comment.

protocol/txsubmission/server.go (3)

91-105: Proper shutdown synchronization with restart mechanism.

The Stop() method correctly:

  • Uses onceStop for idempotent shutdown
  • Holds restartMutex to coordinate with handleDone()'s restart logic
  • Closes done channel to signal shutdown
  • Calls Protocol.Stop() to clean up the underlying protocol

This addresses the race condition flagged in previous reviews.


131-163: Atomic operations fix ackCount data race.

The conversion to int32 with atomic operations (atomic.LoadInt32 and atomic.StoreInt32) properly synchronizes ackCount access between:

  • RequestTxIds() reading and writing ackCount
  • handleDone() resetting ackCount during restart

This resolves the data race noted in previous reviews.


282-287: Safe channel close coordination.

The select statement prevents closing an already-closed done channel by checking if Stop() already closed it. Combined with restartMutex held by both Stop() and handleDone(), this eliminates the race condition.

protocol/peersharing/server.go (1)

109-109: LGTM! Cosmetic improvement to eliminate unused parameter.

Renaming the unused msg parameter to _ properly indicates it's intentionally unused while maintaining the function signature for the message handler interface.

protocol/peersharing/client_test.go (1)

28-81: LGTM! Comprehensive shutdown test for PeerSharing client.

The test validates:

  • Client can be started and stopped cleanly
  • No goroutine leaks occur (via goleak.VerifyNone)
  • Proper error handling from mock connection
  • Shutdown completes within reasonable timeouts

Consistent with shutdown tests for other protocols in this PR.

protocol/peersharing/client.go (2)

73-109: Excellent lifecycle management implementation.

The Start/Stop methods provide proper lifecycle control:

  • Start() is idempotent via onceStart and sets the started flag
  • Stop() is idempotent via onceStop and conditionally defers channel closure
  • If started, channel closes only after protocol shutdown via DoneChan()
  • If never started, channel closes immediately to prevent leaks

This mirrors the pattern used consistently across other protocol clients in this PR.


154-158: Shutdown-aware channel write prevents panics.

The select statement properly handles shutdown by checking DoneChan() before writing to sharePeersChan, returning protocol.ErrProtocolShuttingDown if the protocol is shutting down. This prevents writes to a closed channel.

@wolf31o2 wolf31o2 force-pushed the perf/protocols-improvements branch from ddbf172 to edd9441 Compare November 15, 2025 16:41
@wolf31o2
Copy link
Member Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 15, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@wolf31o2
Copy link
Member Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 15, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
protocol/leiosfetch/client.go (1)

230-253: Critical race condition: message handlers can write to closed channels.

Unlike the other protocol clients in this PR (leiosnotify, localtxsubmission, blockfetch), the message handlers here perform unconditional sends to result channels (lines 231, 236, 241, 246, 251) without checking DoneChan(). When Stop() spawns the cleanup goroutine (lines 116-122), there's a window where handlers can write to channels that are being closed, causing a panic.

Apply shutdown-aware sends to all handlers:

 func (c *Client) handleBlock(msg protocol.Message) error {
-	c.blockResultChan <- msg
+	select {
+	case <-c.DoneChan():
+		return protocol.ErrProtocolShuttingDown
+	case c.blockResultChan <- msg:
+	}
 	return nil
 }
 
 func (c *Client) handleBlockTxs(msg protocol.Message) error {
-	c.blockTxsResultChan <- msg
+	select {
+	case <-c.DoneChan():
+		return protocol.ErrProtocolShuttingDown
+	case c.blockTxsResultChan <- msg:
+	}
 	return nil
 }
 
 func (c *Client) handleVotes(msg protocol.Message) error {
-	c.votesResultChan <- msg
+	select {
+	case <-c.DoneChan():
+		return protocol.ErrProtocolShuttingDown
+	case c.votesResultChan <- msg:
+	}
 	return nil
 }
 
 func (c *Client) handleNextBlockAndTxsInRange(msg protocol.Message) error {
-	c.blockRangeResultChan <- msg
+	select {
+	case <-c.DoneChan():
+		return protocol.ErrProtocolShuttingDown
+	case c.blockRangeResultChan <- msg:
+	}
 	return nil
 }
 
 func (c *Client) handleLastBlockAndTxsInRange(msg protocol.Message) error {
-	c.blockRangeResultChan <- msg
+	select {
+	case <-c.DoneChan():
+		return protocol.ErrProtocolShuttingDown
+	case c.blockRangeResultChan <- msg:
+	}
 	return nil
 }
protocol/peersharing/client.go (1)

24-33: Data race on started between Start and Stop

started is written in Start() and read in Stop() but is not protected by any shared synchronization primitive. Because onceStart and onceStop are independent sync.Once instances, calling Start() and Stop() from different goroutines can trigger a race on started, and the Stop path (immediate vs deferred channel close) becomes non-deterministic under the race detector.

A minimal way to make this race-free is to guard started (and the Stop cleanup decision) with a mutex:

 type Client struct {
 	*protocol.Protocol
 	config          *Config
 	callbackContext CallbackContext
 	sharePeersChan  chan []PeerAddress
 	onceStart       sync.Once
 	onceStop        sync.Once
+	stateMutex      sync.Mutex
 	started         bool
 }

 // Start starts the PeerSharing client protocol
 func (c *Client) Start() {
 	c.onceStart.Do(func() {
-		c.Protocol.Logger().
+		c.stateMutex.Lock()
+		defer c.stateMutex.Unlock()
+
+		c.Protocol.Logger().
 			Debug("starting client protocol",
 				"component", "network",
 				"protocol", ProtocolName,
 				"connection_id", c.callbackContext.ConnectionId.String(),
 			)
 		c.started = true
 		c.Protocol.Start()
 	})
 }

 // Stop stops the PeerSharing client protocol
 func (c *Client) Stop() error {
 	c.onceStop.Do(func() {
-		c.Protocol.Logger().
+		c.stateMutex.Lock()
+		defer c.stateMutex.Unlock()
+
+		c.Protocol.Logger().
 			Debug("stopping client protocol",
 				"component", "network",
 				"protocol", ProtocolName,
 				"connection_id", c.callbackContext.ConnectionId.String(),
 			)
 		// Defer closing channel until protocol fully shuts down (only if started)
 		if c.started {
 			go func() {
 				<-c.DoneChan()
 				close(c.sharePeersChan)
 			}()
 		} else {
 			// If protocol was never started, close channel immediately
 			close(c.sharePeersChan)
 		}
 		c.Protocol.Stop()
 	})
 	return nil
 }

Alternatively, an atomic.Bool would also work if you’re comfortable taking a dependency on sync/atomic for this.

Also applies to: 73-109

protocol/localtxmonitor/client.go (1)

24-39: Unsynchronized started flag between Start and Stop

As in the PeerSharing client, started is written in Start() and read in Stop() without shared synchronization. Because onceStart and onceStop are distinct sync.Once instances, concurrent calls to Start() and Stop() can race on started, which in turn controls whether channels are closed immediately or only after DoneChan() completes. This is a classic data race that will be flagged by -race.

You can make this race-free by guarding started and the Stop cleanup decision with a mutex dedicated to lifecycle state:

 type Client struct {
 	*protocol.Protocol
 	config             *Config
 	callbackContext    CallbackContext
 	busyMutex          sync.Mutex
 	acquired           bool
 	acquiredSlot       uint64
 	acquireResultChan  chan bool
 	hasTxResultChan    chan bool
 	nextTxResultChan   chan []byte
 	getSizesResultChan chan MsgReplyGetSizesResult
 	onceStart          sync.Once
 	onceStop           sync.Once
+	stateMutex         sync.Mutex
 	started            bool
 }

 func (c *Client) Start() {
 	c.onceStart.Do(func() {
-		c.Protocol.Logger().
+		c.stateMutex.Lock()
+		defer c.stateMutex.Unlock()
+
+		c.Protocol.Logger().
 			Debug("starting client protocol",
 				"component", "network",
 				"protocol", ProtocolName,
 				"connection_id", c.callbackContext.ConnectionId.String(),
 			)
 		c.started = true
 		c.Protocol.Start()
 	})
 }

 // Stop transitions the protocol to the Done state. No more operations will be possible
 func (c *Client) Stop() error {
 	var err error
 	c.onceStop.Do(func() {
-		c.Protocol.Logger().
+		c.stateMutex.Lock()
+		defer c.stateMutex.Unlock()
+
+		c.Protocol.Logger().
 			Debug("stopping client protocol",
 				"component", "network",
 				"protocol", ProtocolName,
 				"connection_id", c.callbackContext.ConnectionId.String(),
 			)
 		c.busyMutex.Lock()
 		defer c.busyMutex.Unlock()
 		msg := NewMsgDone()
 		if err = c.SendMessage(msg); err != nil {
 			return
 		}
 		// Defer closing channels until protocol fully shuts down (only if started)
 		if c.started {
 			go func() {
 				<-c.DoneChan()
 				close(c.acquireResultChan)
 				close(c.hasTxResultChan)
 				close(c.nextTxResultChan)
 				close(c.getSizesResultChan)
 			}()
 		} else {
 			// If protocol was never started, close channels immediately
 			close(c.acquireResultChan)
 			close(c.hasTxResultChan)
 			close(c.nextTxResultChan)
 			close(c.getSizesResultChan)
 		}
 	})
 	return err
 }

An atomic.Bool would also work if you prefer that pattern over an extra mutex.

Also applies to: 86-97, 99-133

protocol/chainsync/client.go (1)

341-346: Check channel closure in GetAvailableBlockRange.

The receive from readyForNextBlockChan at line 341 doesn't check the ok return value. If the channel is closed during GetAvailableBlockRange (e.g., concurrent Stop()), the receive will succeed repeatedly with a zero value, causing the loop to spin and send spurious RequestNext messages until DoneChan() is checked on the next iteration.

Apply this diff to handle channel closure:

-		case <-c.readyForNextBlockChan:
+		case ready, ok := <-c.readyForNextBlockChan:
+			if !ok {
+				// Channel closed, protocol shutting down
+				return start, end, protocol.ErrProtocolShuttingDown
+			}
+			// Only proceed if ready is true
+			if !ready {
+				return start, end, errors.New("sync cancelled")
+			}
 			// Request the next block
 			msg := NewMsgRequestNext()
 			if err := c.SendMessage(msg); err != nil {
🧹 Nitpick comments (1)
protocol/keepalive/client.go (1)

33-34: Consider preserving the first Stop() error across subsequent calls

Right now Stop() returns the SendMessage error only on the first invocation; later calls always return nil because onceStop.Do short-circuits and err stays zero-valued. If callers may legitimately call Stop() more than once, consider storing the first error in the Client and always returning that, e.g. a stopErr field populated inside the onceStop closure.

Also applies to: 97-111

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ddbf172 and edd9441.

📒 Files selected for processing (23)
  • protocol/blockfetch/blockfetch.go (1 hunks)
  • protocol/blockfetch/client.go (6 hunks)
  • protocol/blockfetch/client_test.go (1 hunks)
  • protocol/chainsync/chainsync.go (1 hunks)
  • protocol/chainsync/client.go (5 hunks)
  • protocol/chainsync/client_test.go (2 hunks)
  • protocol/keepalive/client.go (2 hunks)
  • protocol/keepalive/client_test.go (1 hunks)
  • protocol/leiosfetch/client.go (3 hunks)
  • protocol/leiosnotify/client.go (4 hunks)
  • protocol/leiosnotify/client_test.go (1 hunks)
  • protocol/localstatequery/client.go (6 hunks)
  • protocol/localstatequery/client_test.go (1 hunks)
  • protocol/localtxmonitor/client.go (7 hunks)
  • protocol/localtxmonitor/client_test.go (1 hunks)
  • protocol/localtxsubmission/client.go (5 hunks)
  • protocol/localtxsubmission/client_test.go (1 hunks)
  • protocol/peersharing/client.go (4 hunks)
  • protocol/peersharing/client_test.go (1 hunks)
  • protocol/peersharing/server.go (1 hunks)
  • protocol/peersharing/server_test.go (1 hunks)
  • protocol/txsubmission/server.go (8 hunks)
  • protocol/txsubmission/server_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • protocol/blockfetch/blockfetch.go
  • protocol/leiosnotify/client_test.go
  • protocol/blockfetch/client_test.go
  • protocol/peersharing/server_test.go
  • protocol/localtxsubmission/client_test.go
  • protocol/chainsync/chainsync.go
  • protocol/peersharing/server.go
🔇 Additional comments (38)
protocol/leiosnotify/client.go (2)

31-31: LGTM! Proper lifecycle tracking and conditional channel closure.

The started flag correctly tracks whether the protocol has been started, and Stop() uses this to defer channel cleanup until the protocol fully shuts down (via DoneChan()). This prevents races where channels are closed prematurely.

Also applies to: 79-79, 95-104


143-176: LGTM! Shutdown-aware sends prevent write-to-closed-channel panics.

All message handlers properly use select to check DoneChan() before sending to requestNextChan, ensuring no sends occur after shutdown begins.

protocol/localtxsubmission/client.go (2)

34-34: LGTM! Consistent lifecycle management.

The started flag and conditional channel closure in Stop() follow the same correct pattern as other protocol clients in this PR.

Also applies to: 83-83, 104-113


157-171: LGTM! Handlers properly check for shutdown.

Both handleAcceptTx and handleRejectTx use select to race the send against DoneChan(), eliminating the TOCTOU race mentioned in past review comments.

Also applies to: 173-196

protocol/blockfetch/client.go (2)

39-39: LGTM! Proper deferred channel cleanup.

The started flag ensures channels are only closed after the protocol fully shuts down (via DoneChan()), preventing panics from in-flight responses.

Also applies to: 97-97, 114-125


224-239: LGTM! All message handlers are shutdown-aware.

All three handlers (handleStartBatch, handleNoBlocks, handleBlock) properly use select with DoneChan() before sending to result channels, addressing the critical race condition noted in past reviews.

Also applies to: 241-257, 259-312

protocol/chainsync/client_test.go (2)

83-86: LGTM! Proper cleanup prevents goroutine leaks.

Adding explicit client shutdown after each test ensures proper resource cleanup, which is validated by the goleak.VerifyNone(t) check.


283-336: LGTM! Standard shutdown lifecycle test.

The test properly validates the Start/Stop lifecycle with timeout checks and error handling, consistent with shutdown tests in other protocol clients.

protocol/txsubmission/server_test.go (1)

28-80: LGTM! Test structure is sound, skip is documented.

The test follows the same shutdown validation pattern as other protocol tests. The skip reason clearly documents a known mock infrastructure limitation. Once mock server issues are resolved, this test will provide coverage for the server shutdown path.

protocol/keepalive/client_test.go (1)

241-294: LGTM! Consistent shutdown test with proper error handling.

The test properly validates the Stop() error return (lines 272-274), addressing the inconsistency noted in past reviews. The structure matches other protocol shutdown tests.

protocol/localtxmonitor/client_test.go (1)

300-352: LGTM! Consistent shutdown lifecycle validation.

The test follows the established pattern for shutdown tests across protocol clients, with proper error handling and timeout validation.

protocol/peersharing/client_test.go (1)

28-80: Shutdown test pattern looks solid

The test exercises start/stop, validates both mock and Ouroboros shutdown, and leverages goleak to catch leaks. This nicely mirrors the other protocol shutdown tests.

protocol/localstatequery/client_test.go (1)

357-409: Consistent shutdown coverage for LocalStateQuery

This mirrors the other protocol shutdown tests: verifies client Start/Stop, checks mock and Ouroboros shutdown, and uses goleak for leak detection. Good addition to guard the new lifecycle behavior.

protocol/peersharing/client.go (1)

145-159: Shutdown-aware handleSharePeers looks correct

Using a select on DoneChan() before sending to sharePeersChan aligns with the deferred-close logic in Stop() and avoids writes to a closed channel while giving callers a clear ErrProtocolShuttingDown signal via GetPeers.

protocol/localtxmonitor/client.go (1)

265-333: Handler select on DoneChan cleanly avoids send-on-closed-channel panics

The updated handlers (handleAcquired, handleReplyHasTx, handleReplyNextTx, handleReplyGetSizes) now select on c.DoneChan() before sending into their result channels and propagate ErrProtocolShuttingDown when shutting down. Combined with deferring channel closure until after DoneChan in Stop(), this resolves the prior race between handler writes and channel closes.

protocol/chainsync/client.go (5)

37-37: LGTM: started field tracks lifecycle state.

The started boolean tracks whether Start() has been invoked, enabling conditional cleanup in Stop(). Access is safe because it's written once under onceStart and read once under onceStop.


119-130: LGTM: Start() sets lifecycle flag.

Setting started = true enables the conditional cleanup logic in Stop(). Protected by onceStart.


148-157: LGTM: Shutdown sequencing addresses the race condition.

The goroutine waits for protocol shutdown (DoneChan()) before closing readyForNextBlockChan, ensuring message handlers complete before the channel closes. This addresses the race condition flagged in the previous review.


728-743: LGTM: Shutdown-aware channel sends prevent write-after-close.

Wrapping the readyForNextBlockChan sends in select statements with DoneChan() checks prevents writes to a closed channel during shutdown. This coordinates correctly with Stop(), which waits for protocol shutdown before closing the channel.


767-783: LGTM: Consistent shutdown-aware channel sends.

The handleRollBackward method applies the same shutdown-aware pattern as handleRollForward, preventing write-after-close panics.

protocol/localstatequery/client.go (10)

38-38: LGTM: Mutex added to protect acquired state.

The acquiredMutex synchronizes access to the acquired boolean, preventing data races between concurrent operations like acquire(), release(), runQuery(), and message handlers.


44-45: LGTM: Lifecycle management fields added.

The onceStop and started fields enable idempotent shutdown and conditional cleanup, consistent with other protocols in this PR.


101-112: LGTM: Start() sets lifecycle flag and removes immediate cleanup goroutine.

Setting started = true and removing the cleanup goroutine aligns with the PR's approach of deferring channel cleanup until Stop() ensures protocol shutdown.


114-140: LGTM: Stop() defers channel cleanup until protocol shutdown.

The Stop() method sends MsgDone and waits for full protocol shutdown before closing queryResultChan and acquireResultChan, preventing write-after-close panics in message handlers. The conditional logic handles both started and non-started cases correctly.


904-911: LGTM: handleAcquired() protects state and uses shutdown-aware send.

The acquiredMutex protects the acquired boolean, and the select statement prevents writing to a closed channel during shutdown.


927-937: LGTM: handleFailure() uses shutdown-aware sends.

Both failure cases wrap acquireResultChan sends in select statements with DoneChan() checks, preventing writes to closed channels during shutdown.


952-957: LGTM: handleResult() uses shutdown-aware send.

Extracting msgResult before the select and checking DoneChan() prevents writes to a closed channel during shutdown.


961-997: LGTM: acquire() reads acquired state under mutex.

The acquiredMutex protects the read of the acquired boolean (lines 962-964), ensuring thread-safe access when determining whether to send Acquire or ReAcquire messages.


1004-1006: LGTM: release() protects state update.

The acquiredMutex protects the write to acquired, maintaining consistent synchronization with other accesses.


1011-1020: LGTM: runQuery() reads acquired state under mutex.

The acquiredMutex protects the read of acquired (lines 1013-1015), ensuring thread-safe determination of whether to acquire the volatile tip before running the query.

protocol/txsubmission/server.go (8)

20-21: LGTM: Imports added for atomic operations and synchronization.

The sync and sync/atomic imports support the new atomic ackCount operations and restart coordination via restartMutex.


33-33: LGTM: ackCount changed to int32 for atomic operations.

The type change from int to int32 enables atomic operations, addressing the data race flagged in the previous review.


36-38: LGTM: Lifecycle and coordination fields added.

The done, onceStop, and restartMutex fields enable proper shutdown signaling and coordinate the restart logic in handleDone() with explicit Stop() calls.


52-54: LGTM: Buffered result channels and done signal.

Buffering the result channels (capacity 1) allows handlers to send results without blocking, and the done channel enables shutdown signaling to blocking operations.


91-105: LGTM: Stop() coordinates with restart via restartMutex.

The Stop() method acquires restartMutex before closing done, coordinating with handleDone()'s restart logic to prevent concurrent closes of the done channel. This addresses the race condition flagged in the previous review.


131-167: LGTM: Atomic operations protect ackCount and shutdown is handled.

Using atomic.LoadInt32 (line 131) and atomic.StoreInt32 (line 163) eliminates the data race on ackCount flagged in the previous review. The select statement (lines 153-167) handles shutdown gracefully by checking both the result channel and done.


192-199: LGTM: RequestTxs() handles shutdown gracefully.

The select statement checks both done and requestTxsResultChan, returning protocol.ErrProtocolShuttingDown if the server is shutting down.


259-298: LGTM: Restart logic coordinates properly with Stop().

The handleDone() method:

  1. Signals RequestTxIds() to stop waiting (non-blocking, lines 260-266)
  2. Acquires restartMutex to coordinate with Stop() (line 274)
  3. Guards against double-close of done with a select (lines 282-287)
  4. Reinitializes all state including atomic ackCount, channels, and onceStop (lines 289-294)
  5. Unlocks before calling Start() to avoid holding the mutex during protocol startup (line 295)

This restart flow addresses the race conditions flagged in the previous review.

@wolf31o2 wolf31o2 force-pushed the perf/protocols-improvements branch from edd9441 to 8e3b9ab Compare November 15, 2025 19:46
@wolf31o2
Copy link
Member Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 15, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
protocol/leiosnotify/client.go (1)

24-32: Synchronize access to started to avoid a Start/Stop data race

started is written in Start() (line 79) and read in Stop() (line 96) without mutex protection. This creates a data race despite onceStart/onceStop usage—those synchronize call-once semantics, not field access across goroutines.

The reference implementation in protocol/localtxmonitor/client.go already uses stateMutex to guard the started field. Apply the same pattern here:

 type Client struct {
 	*protocol.Protocol
 	config          *Config
 	callbackContext CallbackContext
 	requestNextChan chan protocol.Message
 	onceStart       sync.Once
 	onceStop        sync.Once
+	stateMutex      sync.Mutex
 	started         bool
 }

 func (c *Client) Start() {
 	c.onceStart.Do(func() {
+		c.stateMutex.Lock()
+		defer c.stateMutex.Unlock()
 		c.Protocol.Logger().
 			Debug("starting client protocol",
 				"component", "network",
 				"protocol", ProtocolName,
 				"connection_id", c.callbackContext.ConnectionId.String(),
 			)
 		c.started = true
 		c.Protocol.Start()
 	})
 }

 func (c *Client) Stop() error {
 	var err error
 	c.onceStop.Do(func() {
+		c.stateMutex.Lock()
+		defer c.stateMutex.Unlock()
 		c.Protocol.Logger().
 			Debug("stopping client protocol",
 				"component", "network",
 				"protocol", ProtocolName,
 				"connection_id", c.callbackContext.ConnectionId.String(),
 			)
 		msg := NewMsgDone()
 		err = c.SendMessage(msg)
 		// Defer closing channel until protocol fully shuts down (only if started)
 		if c.started {
 			go func() {
 				<-c.DoneChan()
 				close(c.requestNextChan)
 			}()
 		} else {
 			// If protocol was never started, close channel immediately
 			close(c.requestNextChan)
 		}
 	})
 	return err
 }
protocol/localtxsubmission/client.go (1)

26-35: Confirmed: Data race on started field requires synchronization with dedicated mutex

The review correctly identifies a critical race condition. The field started is written in Start() at line 83 and read in Stop() at line 105 without synchronization. Although onceStart and onceStop prevent multiple invocations of each closure independently, they do not synchronize between the two methods—concurrent calls to Start() and Stop() can race on the started field across goroutines.

The reference implementation in localtxmonitor/client.go confirms the correct pattern: it protects started with a dedicated stateMutex, locking it in both Start() and Stop(). The suggested diff accurately mirrors this proven approach and preserves the existing "never started" fast-close logic while eliminating the data race.

protocol/chainsync/client.go (1)

29-38: Synchronize started flag between Start and Stop to avoid races

Data race confirmed: started is a plain bool written in Start() (line 127) and read in Stop() (line 149) with no synchronization. The independent sync.Once guards on onceStart and onceStop do not synchronize against each other. Concurrent Start()/Stop() calls can cause Stop() to read started as false before Start() writes true, closing readyForNextBlockChan immediately while handlers at lines 739, 750, 778, and 790 still attempt to send on it, resulting in a panic.

The fix is correct: replace started bool with started atomic.Bool in the struct (line 37), update line 127 to c.started.Store(true), and line 149 to if c.started.Load(). The sync/atomic import already exists.

@@ type Client struct {
-	started                  bool
+	started                  atomic.Bool
@@ func (c *Client) Start() {
-		c.started = true
+		c.started.Store(true)
@@ func (c *Client) Stop() error {
-		if c.started {
+		if c.started.Load() {
protocol/blockfetch/client.go (1)

29-40: Prevent race and send-after-close hazards around started in Start/Stop

started is a plain bool written in Start() (line 97) and read in Stop() (line 115) without synchronization. Although both methods use sync.Once guards, onceStart and onceStop are separate instances that do not synchronize with each other. If Start() and Stop() run concurrently, Stop() can observe started == false while Start() is still in its Do() block before the assignment completes, causing Stop() to close blockChan and startBatchResultChan immediately. Later message handlers will panic when sending to closed channels.

A minimal fix is to make started atomic and use Store/Load:

@@
-import (
-	"errors"
-	"fmt"
-	"sync"
+import (
+	"errors"
+	"fmt"
+	"sync"
+	"sync/atomic"
@@
 	blockUseCallback     bool              // Whether to use callback for blocks
 	onceStart            sync.Once         // Ensures Start is only called once
 	onceStop             sync.Once         // Ensures Stop is only called once
-	started              bool              // Whether the protocol has been started
+	started              atomic.Bool       // Whether the protocol has been started
@@ func (c *Client) Start() {
 		c.started = true
+		c.started.Store(true)
@@ func (c *Client) Stop() error {
-		if c.started {
+		if c.started.Load() {

This eliminates the data race between Start/Stop and the remaining panic risk. If the intended contract forbids calling Start() after Stop(), document that to avoid unsupported lifecycle patterns.

♻️ Duplicate comments (1)
protocol/txsubmission/server_test.go (1)

28-80: Check and assert the error from Server.Stop() in the shutdown test

TestServerShutdown currently calls oConn.TxSubmission().Server.Stop() (Line 60) without checking the returned error. This is the same pattern that previously hid shutdown issues in other protocol tests.

Consider aligning with the other tests by asserting that Stop() succeeds:

@@ func TestServerShutdown(t *testing.T) {
-	// Start the server
-	oConn.TxSubmission().Server.Start()
-	// Stop the server
-	oConn.TxSubmission().Server.Stop()
+	// Start the server
+	oConn.TxSubmission().Server.Start()
+	// Stop the server
+	if err := oConn.TxSubmission().Server.Stop(); err != nil {
+		t.Fatalf("unexpected error when stopping server: %s", err)
+	}

This will surface any shutdown failures instead of silently ignoring them, even once the test is un-skipped.

🧹 Nitpick comments (7)
protocol/localtxsubmission/client_test.go (1)

167-219: TestClientShutdown gives good coverage of client lifecycle

The new TestClientShutdown exercises LocalTxSubmission().Client.Start() and Stop() against the mock connection, waits for the mock to shut down, and checks for leaks via goleak and timeouts. This is a useful regression test for the new lifecycle behavior. If you want to reduce duplication, you could factor this through the existing runTest helper, but that’s optional.

protocol/chainsync/client.go (1)

241-365: Improved handling of readyForNextBlockChan in GetAvailableBlockRange

The new (ready, ok := <-c.readyForNextBlockChan) logic (Lines 341–349) correctly distinguishes:

  • closed channel → treat as protocol shutdown and return protocol.ErrProtocolShuttingDown
  • ready == false → treat as sync cancellation

This aligns the client-facing API with the new semantics introduced in handleRollForward/handleRollBackward and avoids panics on closed channels.

If you find yourself reusing the "sync cancelled" error elsewhere, consider defining it as a package-level sentinel (var ErrSyncCancelled = errors.New("sync cancelled")) to make comparisons easier and reduce duplication.

protocol/peersharing/client_test.go (1)

28-80: Client shutdown test is solid; consider mirroring the common error-watcher pattern

The shutdown flow (Start → Stop → wait on mock error channel → Close Ouroboros → wait on oConn.ErrorChan) plus goleak.VerifyNone looks correct and should catch leaks. If you want this to match the stricter pattern used in other protocol tests (e.g., runTest in localstatequery), you could also add a short-lived goroutine watching oConn.ErrorChan() during the active phase to fail fast on unexpected Ouroboros errors instead of relying solely on goleak/timeouts.

protocol/peersharing/server_test.go (1)

28-78: Skipped server shutdown test: document intent and future re‑enablement

Because t.Skip is the first statement, the rest of the function (including goleak.VerifyNone) never runs, so there is currently no server-side shutdown coverage. That’s fine if this is a temporary workaround, but it’s worth either:

  • Adding a TODO / issue reference into the skip message so it’s easy to track when NtN mock issues are fixed, or
  • Moving the skip behind a condition (or down in the body) if you want to keep goleak.VerifyNone active once the test is re-enabled.
protocol/localstatequery/client_test.go (1)

25-34: LocalStateQuery shutdown test is correct; consider reusing runTest harness

The new alias import and TestClientShutdown correctly wire a mock connection, start/stop the LocalStateQuery client, and wait for both the mock and Ouroboros connection to shut down with goleak.VerifyNone. To reduce duplication and keep behavior consistent with the other localstatequery tests, you could:

  • Build the handshake-only conversation slice, and
  • Implement TestClientShutdown as a thin wrapper around runTest, with an innerFunc that just asserts LocalStateQuery() != nil, calls Client.Start(), and then Client.Stop().

This would automatically reuse the existing async oConn.ErrorChan watcher and shutdown sequencing.

Also applies to: 357-409

protocol/keepalive/client.go (1)

26-35: Make KeepAlive Stop tie more explicitly into protocol/timer shutdown

The onceStop/stopErr pattern and logging look good, but Stop() currently only sends NewMsgDone and relies on the rest of the stack to fully tear down the protocol and stop the timer. If SendMessage fails (e.g., peer already shutting down) or the remote misbehaves, you may end up with the keepalive timer/goroutine still active even though Stop() returned.

Consider tightening this up by, for example:

  • Calling c.Protocol.Stop() after sending the Done message so DoneChan() is guaranteed to close, and/or
  • Stopping the timer directly in Stop() (under timerMutex) as a best-effort local cleanup, similar to the goroutine in Start that reacts to DoneChan().

This would make KeepAlive’s lifecycle more robust and closer to other protocol clients.

Also applies to: 98-111

protocol/peersharing/client.go (1)

25-35: PeerSharing lifecycle is well structured; guard the Stop‑before‑Start edge case

The added onceStart/onceStop, stateMutex, and started flag give you a clean, idempotent start/stop model, and deferring sharePeersChan closure until after DoneChan() closes (when started) is a nice way to let GetPeers either receive peers or cleanly return ErrProtocolShuttingDown. The shutdown-aware handleSharePeers select is also a good improvement over an unconditional send.

One edge case to consider: if Stop() is called before Start(), you immediately close(c.sharePeersChan) and call c.Protocol.Stop(). A later Start() will still succeed (different sync.Once), but sharePeersChan remains closed; any incoming SharePeers message would cause handleSharePeers to send on a closed channel and panic.

If you want to defend against that misuse, you could, for example:

  • Treat Stop() as a no-op when !c.started, or
  • Track a separate stopped flag and have Start() become a no-op (or return an error) once the client has been stopped.

Either approach would make the API more robust in the face of accidental Stop-before-Start usage without changing the normal Start→GetPeers→Stop flow.

Also applies to: 74-116, 152-165

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between edd9441 and 8e3b9ab.

📒 Files selected for processing (23)
  • protocol/blockfetch/blockfetch.go (1 hunks)
  • protocol/blockfetch/client.go (6 hunks)
  • protocol/blockfetch/client_test.go (1 hunks)
  • protocol/chainsync/chainsync.go (1 hunks)
  • protocol/chainsync/client.go (6 hunks)
  • protocol/chainsync/client_test.go (2 hunks)
  • protocol/keepalive/client.go (2 hunks)
  • protocol/keepalive/client_test.go (1 hunks)
  • protocol/leiosfetch/client.go (4 hunks)
  • protocol/leiosnotify/client.go (4 hunks)
  • protocol/leiosnotify/client_test.go (1 hunks)
  • protocol/localstatequery/client.go (6 hunks)
  • protocol/localstatequery/client_test.go (2 hunks)
  • protocol/localtxmonitor/client.go (7 hunks)
  • protocol/localtxmonitor/client_test.go (1 hunks)
  • protocol/localtxsubmission/client.go (5 hunks)
  • protocol/localtxsubmission/client_test.go (1 hunks)
  • protocol/peersharing/client.go (4 hunks)
  • protocol/peersharing/client_test.go (1 hunks)
  • protocol/peersharing/server.go (1 hunks)
  • protocol/peersharing/server_test.go (1 hunks)
  • protocol/txsubmission/server.go (8 hunks)
  • protocol/txsubmission/server_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • protocol/chainsync/chainsync.go
  • protocol/blockfetch/client_test.go
  • protocol/leiosnotify/client_test.go
  • protocol/localtxmonitor/client_test.go
  • protocol/chainsync/client_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-04T15:54:22.683Z
Learnt from: wolf31o2
Repo: blinklabs-io/gouroboros PR: 1093
File: ledger/mary/pparams.go:143-150
Timestamp: 2025-11-04T15:54:22.683Z
Learning: In the blinklabs-io/gouroboros repository, the design goal for CBOR round-trip tests is to achieve byte-identical encoding WITHOUT using stored CBOR (cbor.DecodeStoreCbor). Instead, the approach uses proper field types (pointers for optional fields) and relies on the cbor package's deterministic encoding (SortCoreDeterministic) to ensure reproducible output. The stored CBOR pattern should not be suggested as a solution for round-trip fidelity in this codebase.

Applied to files:

  • protocol/localstatequery/client_test.go
🔇 Additional comments (25)
protocol/leiosfetch/client.go (1)

230-273: LGTM! Shutdown-aware channel sends prevent deadlocks.

All handler methods now use select with DoneChan() to make channel sends non-blocking during shutdown. This correctly prevents deadlocks if Stop() is called while a handler is blocked waiting to send on a result channel, and consistently returns protocol.ErrProtocolShuttingDown when the protocol is shutting down.

protocol/blockfetch/blockfetch.go (1)

121-123: DefaultRecvQueueSize bump is safe and within validation bounds

The new default of 384 stays below MaxRecvQueueSize (512) and continues to flow through NewConfig/validate() unchanged, so this is a straightforward tuning change with no correctness risk.

protocol/leiosnotify/client.go (1)

143-176: Shutdown‑aware handler sends look correct

Wrapping the handler sends in a select on DoneChan() vs requestNextChan cleanly prevents writes to a channel that’s being closed during shutdown and surfaces protocol.ErrProtocolShuttingDown to the caller in a predictable way. This aligns well with the Stop/DoneChan–based cleanup path.

protocol/localtxsubmission/client.go (1)

157-171: Handler select pattern resolves previous TOCTOU risk

The updated handleAcceptTx and handleRejectTx use select { case <-c.DoneChan(): …; case c.submitResultChan <- … }, which ensures that shutdown via DoneChan wins over any pending result send and prevents writes to a channel that’s being closed by Stop(). This is the right pattern for avoiding the earlier TOCTOU send‑to‑closed‑channel issue.

Also applies to: 173-195

protocol/localtxmonitor/client.go (1)

24-40: Lifecycle tracking and shutdown‑aware handlers in LocalTxMonitor look solid

Adding stateMutex/started and using the mutex in Start()/Stop() cleanly serializes lifecycle transitions and avoids races on the state flag. Deferring channel closes until after DoneChan() when started, and falling back to immediate close otherwise, matches the intended semantics and prevents goroutine leaks. The handler changes that wrap sends in a select on DoneChan() vs the result channels mirror the pattern used elsewhere in this PR and should eliminate send‑to‑closed‑channel panics in this client.

Also applies to: 87-101, 103-140, 272-339

protocol/blockfetch/client.go (1)

224-239: Shutdown-aware selects on internal channels look correct

The new select blocks in handleStartBatch (Lines 233–237), handleNoBlocks (Lines 250–255), and the non-callback path of handleBlock (Lines 305–309) properly gate sends on DoneChan() and return protocol.ErrProtocolShuttingDown on shutdown instead of blindly sending. Together with the deferred channel closure in Stop(), this removes the previous send-on-closed-channel panic risk and makes shutdown semantics much safer.

Also applies to: 241-257, 285-310

protocol/chainsync/client.go (1)

606-753: Shutdown-aware writes to readyForNextBlockChan look safe

The new select blocks in handleRollForward (Lines 736–741 and 747–751) and handleRollBackward (Lines 775–779 and 787–791) ensure that writes to readyForNextBlockChan are skipped once DoneChan() is closed and instead return protocol.ErrProtocolShuttingDown. This fixes the previous write-after-close race from earlier reviews while preserving the “ready vs cancelled” signaling semantics.

Also applies to: 755-793

protocol/keepalive/client_test.go (1)

241-294: Shutdown test for KeepAlive client looks solid

TestClientShutdown follows the common pattern used elsewhere: async error monitoring on the mock connection, explicit Start()/Stop() calls with error checking, time-bounded waits, and final oConn.Close()/shutdown wait. This should give good coverage of the new KeepAlive client lifecycle behavior.

protocol/peersharing/server.go (1)

109-122: Unused-parameter cleanup in handleDone is fine

Switching the handleDone parameter to _ protocol.Message correctly reflects that the message is not used and keeps the existing restart logic unchanged.

protocol/localstatequery/client.go (9)

38-38: LGTM: Lifecycle control fields added.

The new acquiredMutex, onceStop, and started fields provide proper synchronization for acquisition state and idempotent shutdown, aligning with the lifecycle improvements described in the PR objectives.

Also applies to: 44-45


101-112: LGTM: Start lifecycle tracked.

Setting started = true before calling Protocol.Start() correctly tracks whether the protocol was initialized, which is later used in Stop() to determine channel cleanup behavior.


114-140: LGTM: Stop method with deferred cleanup.

The Stop() implementation correctly:

  • Uses onceStop for idempotency
  • Defers channel closure until protocol shutdown when started
  • Closes channels immediately if never started

The goroutine at line 128 correctly waits for DoneChan() before closing channels, preventing "send on closed channel" panics during graceful shutdown.


896-914: LGTM: Acquisition state properly synchronized.

The mutex protection (lines 904-906) ensures thread-safe updates to acquired, and the shutdown-aware select (lines 907-911) prevents blocking on a send during shutdown.


916-942: LGTM: Failure handling respects shutdown.

Both failure paths (lines 927-931, 933-937) now use shutdown-aware selects, preventing blocked sends when the protocol is shutting down.


944-959: LGTM: Result handling with shutdown awareness.

Extracting the result before the select (line 952) and using a shutdown-aware select (lines 953-957) ensures clean shutdown semantics.


961-997: LGTM: Acquire logic synchronized.

Reading acquired under mutex protection (lines 962-964) and using it to choose between Acquire and ReAcquire messages (line 966) ensures correct state-based protocol transitions.


999-1009: LGTM: Release clears acquisition state safely.

Clearing acquired under mutex protection (lines 1004-1006) ensures thread-safe state transitions.


1011-1032: LGTM: Query auto-acquires when needed.

Reading acquired under mutex protection (lines 1013-1015) and auto-acquiring if necessary (lines 1016-1020) provides convenient API behavior while maintaining thread safety.

protocol/txsubmission/server.go (7)

20-21: LGTM: Required imports for concurrency.

The sync and sync/atomic imports support the mutex-based restart coordination and atomic ackCount operations.


33-33: LGTM: Struct fields support safe concurrency.

  • ackCount as int32 enables atomic operations
  • done, onceStop, and restartMutex provide proper coordination between Stop() and the restart path in handleDone()

These changes address the race conditions flagged in previous reviews.

Also applies to: 36-38


52-54: LGTM: Buffered channels and done initialization.

Buffering the result channels (lines 52-53) with capacity 1 enables non-blocking sends in the message handlers, and initializing done (line 54) supports the shutdown signaling mechanism.


91-105: LGTM: Stop coordinates with restart path.

The Stop() method correctly:

  • Uses onceStop for idempotency
  • Acquires restartMutex (lines 94-95) to coordinate with handleDone()
  • Closes done under mutex protection (line 102)

This implementation addresses the critical race condition identified in the previous review.


107-168: LGTM: Atomic ackCount operations and shutdown handling.

The function correctly:

  • Uses atomic.LoadInt32 (line 131) and atomic.StoreInt32 (line 163) for race-free access to ackCount
  • Validates bounds before casting to uint16 (lines 132-141)
  • Handles shutdown via select with done channel (lines 153-167)

The bounds annotation at line 162 correctly justifies the safe conversion. This addresses the data race identified in the previous review.


170-200: LGTM: Shutdown-aware request handling.

The select statement (lines 191-199) correctly handles both shutdown via done channel and closed result channels, ensuring clean termination semantics.


251-299: LGTM: Restart logic coordinates with Stop.

The handleDone() restart sequence correctly:

  • Non-blocking signal to RequestTxIds (lines 259-266)
  • Acquires restartMutex to coordinate with Stop() (line 274)
  • Uses select to prevent double-close of done (lines 282-287)
  • Reinitializes all per-restart state (lines 290-294), including atomic reset of ackCount
  • Calls Start() outside the lock (line 297) for better concurrency

This implementation addresses the critical race condition from the previous review regarding concurrent close of the done channel.

@wolf31o2 wolf31o2 force-pushed the perf/protocols-improvements branch from 8e3b9ab to e999895 Compare November 15, 2025 21:45
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
protocol/txsubmission/server_test.go (1)

25-31: Defer goleak before early exits when you re‑enable this test

Right now t.Skip(...) is called before defer goleak.VerifyNone(t), so the defer is never registered. That’s fine while the test is skipped, but once you remove the t.Skip you’ll probably want goleak.VerifyNone to be the first defer so leaks are checked even if the test bails out early.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e3b9ab and e999895.

📒 Files selected for processing (24)
  • protocol/blockfetch/blockfetch.go (1 hunks)
  • protocol/blockfetch/client.go (7 hunks)
  • protocol/blockfetch/client_test.go (1 hunks)
  • protocol/chainsync/chainsync.go (1 hunks)
  • protocol/chainsync/client.go (6 hunks)
  • protocol/chainsync/client_test.go (2 hunks)
  • protocol/chainsync/error.go (1 hunks)
  • protocol/keepalive/client.go (2 hunks)
  • protocol/keepalive/client_test.go (1 hunks)
  • protocol/leiosfetch/client.go (5 hunks)
  • protocol/leiosnotify/client.go (4 hunks)
  • protocol/leiosnotify/client_test.go (1 hunks)
  • protocol/localstatequery/client.go (6 hunks)
  • protocol/localstatequery/client_test.go (2 hunks)
  • protocol/localtxmonitor/client.go (7 hunks)
  • protocol/localtxmonitor/client_test.go (1 hunks)
  • protocol/localtxsubmission/client.go (5 hunks)
  • protocol/localtxsubmission/client_test.go (1 hunks)
  • protocol/peersharing/client.go (4 hunks)
  • protocol/peersharing/client_test.go (1 hunks)
  • protocol/peersharing/server.go (1 hunks)
  • protocol/peersharing/server_test.go (1 hunks)
  • protocol/txsubmission/server.go (8 hunks)
  • protocol/txsubmission/server_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • protocol/peersharing/server_test.go
  • protocol/chainsync/client.go
  • protocol/keepalive/client.go
  • protocol/keepalive/client_test.go
  • protocol/peersharing/client_test.go
  • protocol/chainsync/chainsync.go
  • protocol/blockfetch/client_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-04T15:54:22.683Z
Learnt from: wolf31o2
Repo: blinklabs-io/gouroboros PR: 1093
File: ledger/mary/pparams.go:143-150
Timestamp: 2025-11-04T15:54:22.683Z
Learning: In the blinklabs-io/gouroboros repository, the design goal for CBOR round-trip tests is to achieve byte-identical encoding WITHOUT using stored CBOR (cbor.DecodeStoreCbor). Instead, the approach uses proper field types (pointers for optional fields) and relies on the cbor package's deterministic encoding (SortCoreDeterministic) to ensure reproducible output. The stored CBOR pattern should not be suggested as a solution for round-trip fidelity in this codebase.

Applied to files:

  • protocol/localstatequery/client_test.go
🧬 Code graph analysis (16)
protocol/chainsync/error.go (1)
protocol/chainsync/chainsync.go (1)
  • New (259-267)
protocol/localtxmonitor/client.go (7)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/leiosnotify/client.go (1)
  • Client (24-33)
protocol/localstatequery/client.go (1)
  • Client (30-46)
protocol/localtxsubmission/client.go (1)
  • Client (26-36)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localtxsubmission/client_test.go (4)
protocol/localstatequery/client_test.go (1)
  • TestClientShutdown (357-376)
connection.go (1)
  • Connection (59-103)
protocol/localtxsubmission/localtxsubmission.go (1)
  • LocalTxSubmission (67-70)
protocol/localtxsubmission/client.go (1)
  • Client (26-36)
protocol/leiosnotify/client.go (5)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/leiosfetch/client.go (1)
  • Client (26-37)
protocol/localtxsubmission/client.go (1)
  • Client (26-36)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localtxmonitor/client_test.go (12)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-264)
protocol/chainsync/client_test.go (1)
  • TestClientShutdown (283-336)
protocol/keepalive/client_test.go (1)
  • TestClientShutdown (241-294)
protocol/leiosnotify/client_test.go (1)
  • TestClientShutdown (56-106)
protocol/localstatequery/client_test.go (1)
  • TestClientShutdown (357-376)
protocol/localtxsubmission/client_test.go (1)
  • TestClientShutdown (167-186)
protocol/peersharing/client_test.go (1)
  • TestClientShutdown (28-90)
connection.go (3)
  • NewConnection (107-130)
  • Connection (59-103)
  • New (133-135)
protocol/protocol.go (2)
  • ProtocolRoleClient (94-94)
  • New (122-133)
connection_options.go (2)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
protocol/localtxmonitor/localtxmonitor.go (1)
  • LocalTxMonitor (112-115)
protocol/localtxmonitor/client.go (1)
  • Client (25-40)
protocol/leiosfetch/client.go (3)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/chainsync/client.go (1)
  • Client (29-46)
protocol/peersharing/client.go (3)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/peersharing/peersharing.go (1)
  • ProtocolName (27-27)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/server.go (5)
protocol/txsubmission/messages.go (1)
  • NewMsgRequestTxIds (68-82)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/txsubmission/txsubmission.go (2)
  • ProtocolName (27-27)
  • MaxAckCount (143-143)
protocol/error.go (2)
  • ErrProtocolViolationRequestExceeded (29-31)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/error.go (1)
  • ErrStopServerProcess (21-21)
protocol/blockfetch/client.go (3)
protocol/protocol.go (1)
  • New (122-133)
muxer/muxer.go (1)
  • New (90-117)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localtxsubmission/client.go (8)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/chainsync/client.go (1)
  • Client (29-46)
protocol/leiosnotify/client.go (1)
  • Client (24-33)
protocol/localstatequery/client.go (1)
  • Client (30-46)
protocol/localtxmonitor/client.go (1)
  • Client (25-40)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/server_test.go (8)
protocol/peersharing/server_test.go (1)
  • TestServerShutdown (28-78)
connection.go (2)
  • NewConnection (107-130)
  • Connection (59-103)
protocol/protocol.go (1)
  • ProtocolRoleServer (95-95)
protocol/blockfetch/blockfetch.go (1)
  • New (156-162)
protocol/chainsync/chainsync.go (1)
  • New (259-267)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/txsubmission/txsubmission.go (1)
  • TxSubmission (126-129)
protocol/txsubmission/server.go (1)
  • Server (28-39)
protocol/localstatequery/client.go (5)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localstatequery/messages.go (4)
  • NewMsgDone (245-252)
  • AcquireFailurePointNotOnChain (44-44)
  • MsgResult (172-175)
  • NewMsgQuery (160-170)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localstatequery/error.go (2)
  • ErrAcquireFailurePointTooOld (20-20)
  • ErrAcquireFailurePointNotOnChain (23-25)
protocol/message.go (1)
  • Message (18-22)
protocol/localstatequery/client_test.go (3)
protocol/localtxsubmission/client_test.go (1)
  • TestClientShutdown (167-186)
connection.go (1)
  • Connection (59-103)
protocol/localstatequery/client.go (1)
  • Client (30-46)
protocol/leiosnotify/client_test.go (5)
protocol/versiondata.go (6)
  • VersionData (40-46)
  • VersionDataNtN13andUp (143-145)
  • VersionDataNtN11to12 (116-122)
  • DiffusionModeInitiatorOnly (21-21)
  • PeerSharingModeNoPeerSharing (27-27)
  • QueryModeDisabled (36-36)
protocol/handshake/messages.go (1)
  • NewMsgAcceptVersion (88-102)
connection.go (2)
  • NewConnection (107-130)
  • Connection (59-103)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/leiosnotify/client.go (1)
  • Client (24-33)
protocol/peersharing/server.go (1)
protocol/message.go (1)
  • Message (18-22)
protocol/chainsync/client_test.go (5)
protocol/chainsync/chainsync.go (3)
  • ChainSync (204-207)
  • New (259-267)
  • NewConfig (273-295)
protocol/chainsync/client.go (1)
  • Client (29-46)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-264)
connection.go (3)
  • NewConnection (107-130)
  • Connection (59-103)
  • New (133-135)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithChainSyncConfig (131-135)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (go)
🔇 Additional comments (27)
protocol/localtxsubmission/client_test.go (1)

166-186: Shutdown test matches existing pattern and looks correct

The new TestClientShutdown reuses the shared runTest helper, asserts the client is non-nil, and exercises Start()/Stop() without extra goroutines or leak risks. It’s consistent with the LocalStateQuery shutdown test and should give good coverage of the new lifecycle behavior.

protocol/localtxmonitor/client.go (1)

38-40: Lifecycle and shutdown handling are materially safer now

  • Adding stateMutex + started and taking the mutex in both Start() and Stop() removes the prior race on lifecycle state and keeps Start/Stop transitions serialized.
  • Deferring channel closure until <-c.DoneChan() when started is true, and closing immediately only in the “never started” path, aligns channel lifetime with the protocol’s own shutdown and avoids premature closes.
  • Updating all handlers (handleAcquired, handleReplyHasTx, handleReplyNextTx, handleReplyGetSizes) to select on c.DoneChan() and return protocol.ErrProtocolShuttingDown makes writes shutdown-aware and prevents handlers from blindly sending into channels on teardown.
  • Callers already treat a closed result channel as ErrProtocolShuttingDown, so the behavior is consistent end‑to‑end.

This is a solid fix to the earlier “write to closed channel” risk while keeping the API behavior predictable during shutdown.

If you haven’t already, it’s worth running the local tx monitor tests (and, ideally, go test -race for this package) to confirm there are no remaining Start/Stop races under concurrent use.

Also applies to: 87-101, 103-140, 272-288, 291-306, 308-323, 325-340

protocol/localtxmonitor/client_test.go (1)

300-352: Shutdown test is consistent and well-scoped

TestClientShutdown mirrors the shutdown tests used in other protocols (mock connection, async error watcher, timeouts, goleak). It validates Start()/Stop() for LocalTxMonitor without over-constraining behavior and should catch regressions in lifecycle handling.

protocol/leiosfetch/client.go (2)

17-21: Atomic started and Stop() semantics resolve prior race while aligning with other clients

Switching started to atomic.Bool and using Store/Load in Start() / Stop() removes the earlier Start/Stop data race on this flag. The Stop logic that:

  • always sends MsgDone, and
  • closes the result channels only after <-c.DoneChan() when started.Load() is true (or immediately if it was never started),

matches the pattern used in other protocols and keeps channel lifetime tied to protocol shutdown. Resetting started to false at the end makes the lifecycle state internally consistent even though onceStart/onceStop prevent reuse.

Please re‑run the leiosfetch tests (and go test -race for this package) to confirm there are no remaining races around Start/Stop and channel closure under concurrent usage.

Also applies to: 26-37, 91-102, 104-134


136-206: Result-channel consumers and handlers now behave correctly during shutdown

  • All request methods (BlockRequest, BlockTxsRequest, VotesRequest, BlockRangeRequest) now treat a closed result channel as protocol.ErrProtocolShuttingDown, which is a clear and consistent signal to callers.
  • The handler functions now use select { case <-c.DoneChan(): ...; case chan <- msg: }, so once the protocol is shutting down they stop enqueueing messages and instead return ErrProtocolShuttingDown.

Together, this prevents late handler sends into closing/closed channels and gives callers a well-defined error path when shutdown races with in-flight requests.

Consider adding or extending a TestClientShutdown for LeiosFetch similar to the other protocols, if not already present in this PR, to validate this behavior end‑to‑end with the mock connection.

Also applies to: 231-274

protocol/leiosnotify/client.go (1)

24-33: LeiosNotify client lifecycle and handler behavior are now shutdown-safe

  • Adding stateMutex and started and taking the mutex in both Start() and Stop() gives a clear, race‑free lifecycle state for the client.
  • Stop() now ties closure of requestNextChan to protocol shutdown (<-c.DoneChan()) when started, while still handling the “never started” case by closing immediately.
  • Updating all handler functions to select on c.DoneChan() and to return protocol.ErrProtocolShuttingDown instead of blindly sending into requestNextChan closes the hole where handlers could write into a closed channel.

This brings LeiosNotify in line with the other protocols’ shutdown model and removes the previously flagged “write to closed channel” risk.

It would be good to confirm via the existing TestClientShutdown (and optionally go test -race ./protocol/leiosnotify) that no Start/Stop races or late handler sends remain under concurrent conditions.

Also applies to: 72-86, 88-114, 150-184

protocol/leiosnotify/client_test.go (1)

56-106: LGTM! Test follows established shutdown patterns.

The test properly validates the LeiosNotify client shutdown behavior using mock connections, asynchronous error handling, and goroutine leak detection. Good to see this test enabled after addressing the previous protocol initialization issues.

protocol/localtxsubmission/client.go (5)

34-35: LGTM! Lifecycle tracking fields added.

The stateMutex and started fields follow the established pattern for lifecycle management across protocol clients, enabling proper coordination between Start and Stop operations.


76-90: LGTM! Start() properly tracks lifecycle state.

The method correctly acquires the mutex, sets the started flag, and starts the underlying protocol. The synchronization ensures safe concurrent access to the lifecycle state.


92-123: LGTM! Stop() safely coordinates channel lifecycle.

The conditional channel closure based on the started flag prevents closing channels before the protocol has fully shut down, eliminating potential race conditions. The deferred closure after DoneChan() fires ensures orderly cleanup.


164-178: LGTM! Shutdown-aware channel send.

The select statement prevents sending on a closed channel by checking DoneChan() first, returning ErrProtocolShuttingDown if the protocol is shutting down.


180-203: LGTM! Consistent shutdown handling.

The reject handler follows the same shutdown-aware pattern as handleAcceptTx, ensuring consistent behavior across message handlers.

protocol/localstatequery/client.go (7)

38-38: LGTM! Proper synchronization primitives added.

The acquiredMutex protects the acquired state, while onceStop and started enable safe lifecycle management. This follows the established pattern across protocol clients.

Also applies to: 44-45


101-112: LGTM! Start() tracks lifecycle state.

Setting started = true enables proper coordination with the Stop() method for conditional channel cleanup.


114-140: LGTM! Stop() properly manages channel lifecycle.

The conditional channel closure based on started prevents closing channels when the protocol hasn't been started, and defers closure until after DoneChan() fires to ensure orderly shutdown.


896-914: LGTM! Thread-safe state update with shutdown awareness.

The method properly protects the acquired flag with acquiredMutex and uses a select statement to avoid sending on channels during shutdown.


916-942: LGTM! Failure handling with shutdown awareness.

Both failure cases properly check for shutdown before sending errors to acquireResultChan, preventing sends on closed channels.


944-959: LGTM! Result handling with shutdown awareness.

The select statement ensures results are only sent when the protocol is not shutting down.


961-997: LGTM! Consistent mutex protection for acquired state.

The acquiredMutex is properly used to protect reads and writes to the acquired flag across acquire(), release(), and runQuery() methods, preventing race conditions.

Also applies to: 999-1009, 1011-1032

protocol/localstatequery/client_test.go (1)

357-376: LGTM! Clean shutdown test.

The test properly validates the LocalStateQuery client shutdown behavior, following the established pattern used across other protocol client tests.

protocol/chainsync/error.go (1)

24-26: LGTM! Clear sentinel error for sync cancellation.

The new ErrSyncCancelled error provides an explicit signal for cancelled sync operations, complementing the existing error types and improving error handling clarity.

protocol/blockfetch/blockfetch.go (1)

122-122: LGTM! Increased buffer size aligns with PR objectives.

The increase from 256 to 384 provides better buffering capacity while remaining well below the maximum of 512, consistent with the PR's goal to improve performance through larger default queue sizes.

protocol/peersharing/server.go (1)

109-109: LGTM! Idiomatic unused parameter naming.

Using _ for the unused message parameter is the standard Go idiom, making it explicit that the message content is intentionally not used by this handler.

protocol/chainsync/client_test.go (2)

83-86: LGTM! Proper test cleanup prevents goroutine leaks.

Adding the client stop call in runTest() ensures proper cleanup after each test, preventing goroutine leaks that could affect test reliability.


283-336: LGTM! Comprehensive shutdown test.

The test properly validates ChainSync client shutdown behavior using the established pattern: mock connection setup, asynchronous error handling, goroutine leak detection, and appropriate timeouts.

protocol/peersharing/client.go (1)

19-35: PeerSharing client lifecycle and shutdown logic look solid

The Start/Stop implementation (with onceStart/onceStop, stateMutex, and the started/stopped flags) plus the DoneChan‑aware select in handleSharePeers gives you clean, idempotent startup and safe shutdown without send‑on‑closed panics. The Stop‑before‑Start path is also handled sensibly by closing sharePeersChan immediately.

Also applies to: 75-123, 168-172

protocol/blockfetch/client.go (1)

40-41: BlockFetch client shutdown semantics correctly avoid send‑on‑closed panics

Using started plus deferring channel closure until DoneChan() fires (when started) and adding DoneChan‑aware selects in handleStartBatch, handleNoBlocks, and the non‑callback handleBlock path makes the shutdown path safe even with in‑flight responses. The GetBlock/GetBlockRange callers also get a clean ErrProtocolShuttingDown when channels are closed.

Also applies to: 89-101, 103-129, 225-238, 243-258, 286-311

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
protocol/localtxmonitor/client_test.go (1)

300-352: Refactor to use existing runTest helper.

This test duplicates the scaffolding logic already provided by the runTest helper (lines 50-106). For consistency and maintainability, use the helper pattern as demonstrated in other protocol tests (e.g., protocol/localstatequery/client_test.go:357-376).

Apply this refactor:

 func TestClientShutdown(t *testing.T) {
-	defer goleak.VerifyNone(t)
-	mockConn := ouroboros_mock.NewConnection(
-		ouroboros_mock.ProtocolRoleClient,
+	runTest(
+		t,
 		[]ouroboros_mock.ConversationEntry{
 			ouroboros_mock.ConversationEntryHandshakeRequestGeneric,
 			ouroboros_mock.ConversationEntryHandshakeNtCResponse,
 		},
-	)
-	asyncErrChan := make(chan error, 1)
-	go func() {
-		err := <-mockConn.(*ouroboros_mock.Connection).ErrorChan()
-		if err != nil {
-			asyncErrChan <- fmt.Errorf("received unexpected error: %w", err)
-		}
-		close(asyncErrChan)
-	}()
-	oConn, err := ouroboros.New(
-		ouroboros.WithConnection(mockConn),
-		ouroboros.WithNetworkMagic(ouroboros_mock.MockNetworkMagic),
-	)
-	if err != nil {
-		t.Fatalf("unexpected error when creating Ouroboros object: %s", err)
-	}
-	if oConn.LocalTxMonitor() == nil {
-		t.Fatalf("LocalTxMonitor client is nil")
-	}
-	// Start the client
-	oConn.LocalTxMonitor().Client.Start()
-	// Stop the client
-	if err := oConn.LocalTxMonitor().Client.Stop(); err != nil {
-		t.Fatalf("unexpected error when stopping client: %s", err)
-	}
-	// Wait for mock connection shutdown
-	select {
-	case err, ok := <-asyncErrChan:
-		if ok {
-			t.Fatal(err.Error())
-		}
-	case <-time.After(2 * time.Second):
-		t.Fatalf("did not complete within timeout")
-	}
-	// Close Ouroboros connection
-	if err := oConn.Close(); err != nil {
-		t.Fatalf("unexpected error when closing Ouroboros object: %s", err)
-	}
-	// Wait for connection shutdown
-	select {
-	case <-oConn.ErrorChan():
-	case <-time.After(10 * time.Second):
-		t.Errorf("did not shutdown within timeout")
-	}
+		func(t *testing.T, oConn *ouroboros.Connection) {
+			if oConn.LocalTxMonitor() == nil {
+				t.Fatalf("LocalTxMonitor client is nil")
+			}
+			// Start the client
+			oConn.LocalTxMonitor().Client.Start()
+			// Stop the client
+			if err := oConn.LocalTxMonitor().Client.Stop(); err != nil {
+				t.Fatalf("unexpected error when stopping client: %s", err)
+			}
+		},
+	)
 }
protocol/txsubmission/server.go (1)

299-330: Consider holding restartMutex until after Start() to prevent TOCTOU race.

There's a narrow window between unlocking restartMutex (line 328) and calling Start() (line 330) where Stop() could execute for the first time, setting stopped=true and closing the newly created done channel. This would cause the new protocol instance to start with an already-closed done channel, making it immediately "shut down."

While safe (operations will return shutdown errors), it's wasteful to start a protocol instance that's immediately non-functional.

Options to fix:

  1. Keep restartMutex locked until after Start() completes (Start() doesn't acquire restartMutex, so no deadlock risk)
  2. Re-check stopped flag after unlocking and before Start(), returning early if true
  3. Accept this as a benign race (current approach per line 329 comment)

Given the comment at line 329 indicates the unlock is intentional for responsiveness, you may prefer option 2 or 3.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fce6cc3 and c2066d1.

📒 Files selected for processing (26)
  • protocol/blockfetch/blockfetch.go (1 hunks)
  • protocol/blockfetch/client.go (7 hunks)
  • protocol/blockfetch/client_test.go (1 hunks)
  • protocol/chainsync/chainsync.go (1 hunks)
  • protocol/chainsync/client.go (6 hunks)
  • protocol/chainsync/client_concurrency_test.go (1 hunks)
  • protocol/chainsync/client_test.go (2 hunks)
  • protocol/chainsync/error.go (1 hunks)
  • protocol/keepalive/client.go (2 hunks)
  • protocol/keepalive/client_test.go (1 hunks)
  • protocol/leiosfetch/client.go (5 hunks)
  • protocol/leiosnotify/client.go (4 hunks)
  • protocol/leiosnotify/client_test.go (1 hunks)
  • protocol/localstatequery/client.go (6 hunks)
  • protocol/localstatequery/client_test.go (2 hunks)
  • protocol/localtxmonitor/client.go (7 hunks)
  • protocol/localtxmonitor/client_test.go (1 hunks)
  • protocol/localtxsubmission/client.go (5 hunks)
  • protocol/localtxsubmission/client_test.go (1 hunks)
  • protocol/peersharing/client.go (4 hunks)
  • protocol/peersharing/client_test.go (1 hunks)
  • protocol/peersharing/server.go (3 hunks)
  • protocol/peersharing/server_test.go (1 hunks)
  • protocol/txsubmission/server.go (8 hunks)
  • protocol/txsubmission/server_concurrency_test.go (1 hunks)
  • protocol/txsubmission/server_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
  • protocol/keepalive/client_test.go
  • protocol/peersharing/server.go
  • protocol/blockfetch/blockfetch.go
  • protocol/blockfetch/client_test.go
  • protocol/localtxsubmission/client_test.go
  • protocol/chainsync/client_test.go
  • protocol/chainsync/error.go
  • protocol/leiosnotify/client.go
  • protocol/txsubmission/server_concurrency_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-04T15:54:22.683Z
Learnt from: wolf31o2
Repo: blinklabs-io/gouroboros PR: 1093
File: ledger/mary/pparams.go:143-150
Timestamp: 2025-11-04T15:54:22.683Z
Learning: In the blinklabs-io/gouroboros repository, the design goal for CBOR round-trip tests is to achieve byte-identical encoding WITHOUT using stored CBOR (cbor.DecodeStoreCbor). Instead, the approach uses proper field types (pointers for optional fields) and relies on the cbor package's deterministic encoding (SortCoreDeterministic) to ensure reproducible output. The stored CBOR pattern should not be suggested as a solution for round-trip fidelity in this codebase.

Applied to files:

  • protocol/localstatequery/client_test.go
🧬 Code graph analysis (16)
protocol/blockfetch/client.go (4)
protocol/protocol.go (2)
  • Protocol (39-60)
  • New (122-133)
protocol/blockfetch/blockfetch.go (1)
  • New (156-162)
muxer/muxer.go (1)
  • New (90-117)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/chainsync/client_concurrency_test.go (4)
connection.go (1)
  • NewConnection (107-130)
protocol/chainsync/chainsync.go (3)
  • New (259-267)
  • NewConfig (273-295)
  • ChainSync (204-207)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithChainSyncConfig (131-135)
protocol/chainsync/client.go (1)
  • Client (29-46)
protocol/leiosnotify/client_test.go (5)
protocol/versiondata.go (6)
  • VersionData (40-46)
  • VersionDataNtN13andUp (143-145)
  • VersionDataNtN11to12 (116-122)
  • DiffusionModeInitiatorOnly (21-21)
  • PeerSharingModeNoPeerSharing (27-27)
  • QueryModeDisabled (36-36)
protocol/handshake/messages.go (1)
  • NewMsgAcceptVersion (88-102)
connection.go (1)
  • Connection (59-103)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/leiosnotify/client.go (1)
  • Client (24-33)
protocol/leiosfetch/client.go (4)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/chainsync/client.go (1)
  • Client (29-46)
protocol/localtxmonitor/client_test.go (5)
protocol/chainsync/client_test.go (1)
  • TestClientShutdown (283-336)
connection.go (3)
  • NewConnection (107-130)
  • Connection (59-103)
  • New (133-135)
protocol/protocol.go (2)
  • ProtocolRoleClient (94-94)
  • New (122-133)
connection_options.go (2)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
protocol/localtxmonitor/client.go (1)
  • Client (25-40)
protocol/chainsync/client.go (3)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/chainsync/error.go (1)
  • ErrSyncCancelled (26-26)
protocol/peersharing/server_test.go (5)
protocol/txsubmission/server_test.go (1)
  • TestServerShutdown (28-82)
connection.go (2)
  • NewConnection (107-130)
  • Connection (59-103)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/peersharing/peersharing.go (1)
  • PeerSharing (67-70)
protocol/peersharing/server.go (1)
  • Server (25-30)
protocol/peersharing/client_test.go (5)
connection.go (2)
  • NewConnection (107-130)
  • Connection (59-103)
protocol/protocol.go (1)
  • ProtocolRoleClient (94-94)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/peersharing/peersharing.go (1)
  • PeerSharing (67-70)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/localstatequery/client_test.go (4)
protocol/localtxsubmission/client_test.go (1)
  • TestClientShutdown (167-186)
connection.go (1)
  • Connection (59-103)
protocol/localstatequery/localstatequery.go (1)
  • LocalStateQuery (116-119)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/localtxsubmission/client.go (5)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/leiosnotify/client.go (1)
  • Client (24-33)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/server_test.go (6)
protocol/peersharing/server_test.go (1)
  • TestServerShutdown (28-82)
connection.go (2)
  • NewConnection (107-130)
  • Connection (59-103)
protocol/protocol.go (1)
  • ProtocolRoleServer (95-95)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/txsubmission/txsubmission.go (1)
  • TxSubmission (126-129)
protocol/txsubmission/server.go (1)
  • Server (28-41)
protocol/localstatequery/client.go (7)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/leiosfetch/client.go (1)
  • Client (26-38)
protocol/leiosnotify/client.go (1)
  • Client (24-33)
protocol/localtxmonitor/client.go (1)
  • Client (25-40)
protocol/localtxsubmission/client.go (1)
  • Client (26-36)
protocol/localstatequery/messages.go (1)
  • NewMsgDone (245-252)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/peersharing/client.go (4)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/peersharing/peersharing.go (1)
  • ProtocolName (27-27)
connection/id.go (1)
  • ConnectionId (22-25)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localtxmonitor/client.go (6)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/leiosnotify/client.go (1)
  • Client (24-33)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/localtxsubmission/client.go (1)
  • Client (26-36)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/keepalive/client.go (5)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/keepalive/keepalive.go (1)
  • ProtocolName (27-27)
connection/id.go (1)
  • ConnectionId (22-25)
protocol/keepalive/messages.go (1)
  • NewMsgDone (94-101)
protocol/txsubmission/server.go (5)
protocol/txsubmission/messages.go (2)
  • TxBody (197-201)
  • NewMsgRequestTxIds (68-82)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/txsubmission/txsubmission.go (2)
  • ProtocolName (27-27)
  • MaxAckCount (143-143)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/error.go (1)
  • ErrStopServerProcess (21-21)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (go)
🔇 Additional comments (42)
protocol/localstatequery/client_test.go (1)

357-376: LGTM!

The test validates the basic shutdown flow for the LocalStateQuery client, following the established pattern used across other protocol tests in this PR.

protocol/keepalive/client.go (1)

98-113: LGTM!

The Stop() method follows the established pattern across protocol clients: sends a Done message, ensures proper protocol shutdown by calling Protocol.Stop(), and provides idempotent behavior via sync.Once.

protocol/leiosnotify/client_test.go (1)

56-109: LGTM!

The test validates LeiosNotify client shutdown behavior with proper NtN protocol setup. The previously reported initialization issues (per past review comments) have been resolved.

protocol/leiosfetch/client.go (4)

32-33: LGTM!

The atomic lifecycle flags prevent data races between Start() and Stop() calls, addressing previous review concerns. The stopped flag ensures Start() cannot be called after Stop().


92-107: LGTM!

The Start() method correctly prevents re-starting after shutdown by checking the stopped flag and uses atomic operations to set the started state.


109-140: LGTM!

The Stop() method correctly handles channel lifecycle: deferring closure until protocol shutdown when started (avoiding panics from in-flight messages), or closing immediately when never started (preventing goroutine leaks).


237-279: LGTM!

All message handlers correctly use non-blocking sends with shutdown detection via select on DoneChan(), preventing deadlocks during protocol shutdown.

protocol/blockfetch/client.go (3)

40-40: LGTM!

The atomic started flag correctly tracks protocol lifecycle state and prevents data races between Start() and Stop().

Also applies to: 98-98


104-132: LGTM!

The Stop() method correctly calls Protocol.Stop() (addressing previous deadlock concerns) and conditionally defers channel closure based on whether the protocol was started, preventing both panics from in-flight messages and goroutine leaks.


237-241: LGTM!

Message handlers correctly use non-blocking sends with shutdown detection, preventing deadlocks when Stop() is called while messages are in flight.

Also applies to: 254-259, 309-313

protocol/localstatequery/client.go (5)

38-38: LGTM!

The new synchronization fields correctly address data race concerns identified in past reviews: stateMutex protects the started flag and acquiredMutex guards the acquired state.

Also applies to: 44-46


102-115: LGTM!

The Start() method correctly guards the started flag with stateMutex, preventing data races with Stop().


117-146: LGTM!

The Stop() method correctly reads the started flag under stateMutex and conditionally manages channel lifecycle: deferring closure until shutdown if started, or closing immediately if never started.


910-912: LGTM!

All accesses to the acquired state are correctly guarded by acquiredMutex, preventing data races across concurrent operations.

Also applies to: 968-970, 1010-1012, 1019-1021


933-943: LGTM!

Message handlers correctly use non-blocking sends with shutdown detection via select on DoneChan(), preventing deadlocks during protocol shutdown.

Also applies to: 958-963

protocol/chainsync/chainsync.go (1)

226-227: LGTM!

The 50% increase in default queue sizes (50 → 75) provides better buffering for improved throughput while maintaining a safe margin below the protocol-specified maximums (100). This aligns with the PR's performance improvement objectives.

protocol/peersharing/client_test.go (1)

28-90: LGTM! Well-structured shutdown test.

The test properly validates the PeerSharing client shutdown behavior with appropriate error handling, timeout guards, and goroutine leak detection.

protocol/txsubmission/server_test.go (1)

28-82: LGTM! Test scaffolding ready for when mock is fixed.

The test is properly skipped with a clear reason. The structure mirrors other shutdown tests and will provide coverage once the mock server issues are resolved.

protocol/peersharing/server_test.go (1)

60-62: Error handling properly implemented.

The Stop() error is now correctly checked, addressing the previous review feedback. The pattern matches other shutdown tests.

protocol/localtxmonitor/client.go (4)

38-39: LGTM! Lifecycle state tracking added.

The stateMutex and started flag provide proper synchronization for the client lifecycle, consistent with the broader refactoring pattern.


87-100: LGTM! Proper startup synchronization.

The Start() method correctly uses the mutex to protect the started flag and ensures thread-safe initialization.


104-140: LGTM! Shutdown properly synchronized.

The Stop() method correctly handles two scenarios:

  • If started: defers channel closing until protocol shutdown completes
  • If never started: closes channels immediately

This eliminates the race condition where handlers could write to closed channels.


283-287: LGTM! Handlers properly synchronized with shutdown.

All message handlers now use select statements that check DoneChan() before writing to result channels, preventing panics from writing to closed channels during shutdown.

Also applies to: 300-304, 317-321, 334-338

protocol/peersharing/client.go (3)

75-95: LGTM! Start() properly prevents restart after stop.

The guard against starting a stopped client (lines 81-84) ensures clean lifecycle management and prevents unexpected behavior.


97-123: LGTM! Stop() properly coordinates channel lifecycle.

The conditional channel closing based on the started flag ensures channels are closed only after protocol shutdown completes, preventing handler panics.


168-173: LGTM! Handler respects shutdown signal.

The handleSharePeers method properly checks DoneChan() before sending to sharePeersChan, preventing writes to closed channels.

protocol/chainsync/client_concurrency_test.go (2)

30-103: LGTM! Thorough concurrency testing.

The test validates that concurrent Start()/Stop() operations don't cause deadlocks or races, with appropriate timeout guards and leak detection.


106-148: LGTM! Important edge case coverage.

The test validates that calling Stop() before Start() doesn't cause panics or deadlocks, ensuring robust lifecycle handling.

protocol/localtxsubmission/client.go (3)

76-90: LGTM! Consistent startup pattern.

The Start() method follows the established pattern with proper mutex protection for the lifecycle state.


93-124: LGTM! Proper shutdown sequencing.

The Stop() method correctly:

  1. Sends MsgDone to signal completion
  2. Calls Protocol.Stop() to initiate shutdown
  3. Defers channel closing until protocol fully shuts down (if started)

This sequence prevents handlers from writing to closed channels.


173-178: LGTM! Handlers properly synchronized.

Both handleAcceptTx and handleRejectTx now use select statements to check for shutdown before writing to submitResultChan, eliminating the TOCTOU race condition.

Also applies to: 198-202

protocol/chainsync/client.go (5)

37-37: LGTM! Appropriate use of atomic for lifecycle flag.

Using atomic.Bool for the started flag provides lock-free synchronization for simple boolean state tracking.


119-130: LGTM! Proper atomic flag update.

The Start() method correctly uses started.Store(true) to atomically set the lifecycle state.


133-160: LGTM! Shutdown properly handles channel lifecycle.

The Stop() method uses started.Load() to atomically check state and conditionally defers channel closing until protocol shutdown completes, preventing handler panics.


342-350: LGTM! Proper handling of channel closure and cancellation.

The updated receive operation correctly handles:

  • Closed channel (!ok) → protocol shutdown
  • False value (!ready) → sync cancellation

This prevents unexpected behavior when the channel closes during operation.


737-742: LGTM! Handlers properly synchronized with shutdown.

All writes to readyForNextBlockChan in handleRollForward and handleRollBackward now use select statements that check DoneChan() first, eliminating the race condition where handlers could write to a closed channel.

Also applies to: 748-752, 776-780, 788-792

protocol/txsubmission/server.go (6)

33-41: LGTM: Well-designed concurrency primitives for lifecycle management.

The introduction of ackCount as int32, done channel with doneMutex, onceStop, restartMutex, and the stopped flag provides a solid foundation for coordinating shutdown, restart, and preventing data races. This addresses the critical issues raised in previous reviews.


49-64: LGTM: Proper initialization of lifecycle channels.

The buffered result channels (capacity 1) enable non-blocking handoff during restart, and the done channel is properly initialized.


94-116: LGTM: Robust shutdown with proper synchronization.

The Stop() method correctly uses onceStop for idempotent execution, acquires restartMutex for coordination with restart, and protects the done channel close with doneMutex and a select guard to prevent double-close panics.


118-130: LGTM: Safe accessors with proper locking.

The doneChan() and IsStopped() helpers provide thread-safe access to the done channel and stopped flag respectively, with appropriate mutex protection.


133-193: LGTM: Shutdown-aware request handling with atomic ackCount.

The method properly uses atomic operations for ackCount access, implements shutdown awareness via doneChan(), and gracefully handles closed channels. The validation and bounds checking are appropriate.


196-225: LGTM: Proper shutdown handling in RequestTxs.

The method correctly uses doneChan() for shutdown awareness and handles closed channels appropriately.

@wolf31o2 wolf31o2 force-pushed the perf/protocols-improvements branch 2 times, most recently from 91e04e4 to 1c271c7 Compare November 16, 2025 17:07
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
protocol/leiosfetch/client.go (1)

109-140: Stop() should also call Protocol.Stop() to ensure shutdown and channel closure complete

Stop() sends NewMsgDone() and defers closing result channels until <-c.DoneChan() when started. However, it never calls c.Protocol.Stop(), so DoneChan() may never close if the remote side misbehaves or doesn’t drive the protocol to completion. In that case:

  • The goroutine spawned in Stop() can block indefinitely on <-c.DoneChan().
  • Callers waiting on the result channels may hang instead of seeing ErrProtocolShuttingDown.

Align this with other clients by stopping the underlying protocol after a successful Done send:

 func (c *Client) Stop() error {
 	var err error
 	c.onceStop.Do(func() {
 		c.Protocol.Logger().
 			Debug("stopping client protocol",
 				"component", "network",
 				"protocol", ProtocolName,
 				"connection_id", c.callbackContext.ConnectionId.String(),
 			)
 		msg := NewMsgDone()
-		err = c.SendMessage(msg)
+		if err = c.SendMessage(msg); err != nil {
+			return
+		}
+		c.Protocol.Stop()
 		// Defer closing channels until protocol fully shuts down (only if started)
 		if c.started.Load() {
 			go func() {
 				<-c.DoneChan()
 				close(c.blockResultChan)
 				close(c.blockTxsResultChan)
 				close(c.votesResultChan)
 				close(c.blockRangeResultChan)
 			}()
 		} else {
 			// If protocol was never started, close channels immediately
 			close(c.blockResultChan)
 			close(c.blockTxsResultChan)
 			close(c.votesResultChan)
 			close(c.blockRangeResultChan)
 		}
 		c.started.Store(false)
 		c.stopped.Store(true)
 	})
 	return err
 }
protocol/txsubmission/server.go (1)

276-335: Fix remaining data race on s.stopped in handleDone’s second check

The restart path in handleDone correctly uses restartMutex to protect most accesses to s.stopped, but the second check:

s.restartMutex.Unlock()
// Check again if permanent stop has been requested (TOCTOU protection)
if s.stopped {
    return nil
}

reads s.stopped without holding restartMutex, racing with Stop() which writes s.stopped under the lock. This is a real data race even if it’s only used for TOCTOU protection.

Use the existing IsStopped() helper (which takes restartMutex) or keep the check under the lock. For example:

-    s.restartMutex.Unlock()
-    // Check again if permanent stop has been requested (TOCTOU protection)
-    if s.stopped {
-        return nil
-    }
-    // Start the new protocol outside the lock for better responsiveness
-    s.Start()
+    s.restartMutex.Unlock()
+    // Check again if permanent stop has been requested (TOCTOU protection)
+    if s.IsStopped() {
+        return nil
+    }
+    // Start the new protocol outside the lock for better responsiveness
+    s.Start()

This keeps the extra check while eliminating the unsynchronized access.

🧹 Nitpick comments (5)
protocol/keepalive/client.go (1)

33-35: KeepAlive.Stop implementation is straightforward and idempotent

The new Stop method cleanly wraps shutdown: it’s guarded by onceStop, logs, sends MsgDone, and calls the underlying Protocol.Stop(), with the error remembered in stopErr for subsequent calls. That matches the lifecycle used elsewhere in the repo.

If you want to be extra defensive, you could also stop the keep‑alive timer inside Stop() (under timerMutex) before sending MsgDone, to avoid any last scheduled send between Stop and protocol shutdown, but the existing DoneChan‑driven cleanup already prevents leaks.

Also applies to: 98-113

protocol/localtxmonitor/client.go (1)

38-40: Shutdown handling is now much safer for LocalTxMonitor

The new stateMutex/started gating plus the revised Start/Stop logic and handler select blocks address the earlier race where handlers could write to closed channels. Channels are now only closed after DoneChan() fires (or immediately if never started), and handlers bail out with ErrProtocolShuttingDown instead of sending once shutdown is in progress. The use of busyMutex around Stop’s SendMessage also prevents overlap with in‑flight client calls like HasTx/NextTx/GetSizes.

If you ever want to harden the API against misuse, you could consider early‑rejecting client calls (e.g., HasTx) when Stop() has already run, rather than relying solely on the channel‑close path, but the current behavior is functionally correct.

Also applies to: 87-101, 103-140, 283-287, 300-304, 317-321, 334-338

protocol/txsubmission/server_concurrency_test.go (1)

137-148: Weak verification: test doesn't exercise restart prevention logic.

The test only verifies the stopped flag is set but doesn't trigger the actual restart path (via handleDone) to confirm prevention works. Lines 146-148 repeat the same check without any intervening action that could trigger a restart, making the second assertion redundant.

Consider deferring this test until the mock infrastructure supports triggering handleDone, or restructure to simulate a Done message that would normally trigger restart.

protocol/peersharing/client_test.go (2)

69-76: Consider increasing the timeout for CI stability.

The 2-second timeout for mock connection shutdown might be too aggressive for slow CI environments or heavily loaded systems, potentially causing test flakiness. Consider increasing it to 5 seconds to align better with the later 10-second timeout.

Apply this diff to increase the timeout:

-	case <-time.After(2 * time.Second):
+	case <-time.After(5 * time.Second):
 		t.Fatalf("did not complete within timeout")

57-65: Consider using error channel instead of panic for better test failure reporting.

While the panic approach is documented, it can make test failures harder to debug. Consider capturing the error in a channel and checking it in the main test goroutine for cleaner test output.

Here's an alternative pattern:

+	oConnErrChan := make(chan error, 1)
 	// Async error handler
 	go func() {
 		err, ok := <-oConn.ErrorChan()
 		if !ok {
 			return
 		}
-		// We can't call t.Fatalf() from a different Goroutine, so we panic instead
-		panic(fmt.Sprintf("unexpected Ouroboros error: %s", err))
+		oConnErrChan <- fmt.Errorf("unexpected Ouroboros error: %w", err)
 	}()
 	// Run test inner function
 	innerFunc(t, oConn)
+	// Check for errors from Ouroboros connection
+	select {
+	case err := <-oConnErrChan:
+		t.Fatal(err.Error())
+	default:
+	}
 	// Wait for mock connection shutdown
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c2066d1 and 1c271c7.

📒 Files selected for processing (27)
  • protocol/blockfetch/blockfetch.go (1 hunks)
  • protocol/blockfetch/client.go (7 hunks)
  • protocol/blockfetch/client_test.go (1 hunks)
  • protocol/chainsync/chainsync.go (1 hunks)
  • protocol/chainsync/client.go (6 hunks)
  • protocol/chainsync/client_concurrency_test.go (1 hunks)
  • protocol/chainsync/client_test.go (2 hunks)
  • protocol/chainsync/error.go (1 hunks)
  • protocol/keepalive/client.go (2 hunks)
  • protocol/keepalive/client_test.go (1 hunks)
  • protocol/leiosfetch/client.go (5 hunks)
  • protocol/leiosnotify/client.go (4 hunks)
  • protocol/leiosnotify/client_test.go (1 hunks)
  • protocol/localstatequery/client.go (6 hunks)
  • protocol/localstatequery/client_test.go (2 hunks)
  • protocol/localtxmonitor/client.go (7 hunks)
  • protocol/localtxmonitor/client_test.go (1 hunks)
  • protocol/localtxsubmission/client.go (5 hunks)
  • protocol/localtxsubmission/client_test.go (1 hunks)
  • protocol/localtxsubmission/localtxsubmission.go (2 hunks)
  • protocol/peersharing/client.go (4 hunks)
  • protocol/peersharing/client_test.go (1 hunks)
  • protocol/peersharing/server.go (3 hunks)
  • protocol/peersharing/server_test.go (1 hunks)
  • protocol/txsubmission/server.go (8 hunks)
  • protocol/txsubmission/server_concurrency_test.go (1 hunks)
  • protocol/txsubmission/server_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • protocol/chainsync/client_concurrency_test.go
  • protocol/blockfetch/client.go
  • protocol/chainsync/chainsync.go
  • protocol/peersharing/client.go
  • protocol/chainsync/client_test.go
  • protocol/chainsync/error.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-04T15:54:22.683Z
Learnt from: wolf31o2
Repo: blinklabs-io/gouroboros PR: 1093
File: ledger/mary/pparams.go:143-150
Timestamp: 2025-11-04T15:54:22.683Z
Learning: In the blinklabs-io/gouroboros repository, the design goal for CBOR round-trip tests is to achieve byte-identical encoding WITHOUT using stored CBOR (cbor.DecodeStoreCbor). Instead, the approach uses proper field types (pointers for optional fields) and relies on the cbor package's deterministic encoding (SortCoreDeterministic) to ensure reproducible output. The stored CBOR pattern should not be suggested as a solution for round-trip fidelity in this codebase.

Applied to files:

  • protocol/localstatequery/client_test.go
🧬 Code graph analysis (20)
protocol/localtxsubmission/localtxsubmission.go (1)
protocol/localtxsubmission/messages.go (1)
  • MessageTypeDone (29-29)
protocol/localtxmonitor/client_test.go (3)
protocol/chainsync/client_test.go (1)
  • TestClientShutdown (283-302)
connection.go (1)
  • Connection (59-103)
protocol/localtxmonitor/client.go (1)
  • Client (25-40)
protocol/localtxsubmission/client.go (8)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/chainsync/client.go (1)
  • Client (29-46)
protocol/leiosnotify/client.go (1)
  • Client (24-33)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/localtxmonitor/client.go (1)
  • Client (25-40)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localstatequery/client_test.go (4)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-230)
protocol/chainsync/client_test.go (1)
  • TestClientShutdown (283-302)
connection.go (1)
  • Connection (59-103)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/peersharing/server.go (1)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localtxsubmission/client_test.go (10)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-230)
protocol/chainsync/client_test.go (1)
  • TestClientShutdown (283-302)
protocol/keepalive/client_test.go (1)
  • TestClientShutdown (303-322)
protocol/leiosnotify/client_test.go (1)
  • TestClientShutdown (117-136)
protocol/localstatequery/client_test.go (1)
  • TestClientShutdown (357-376)
protocol/localtxmonitor/client_test.go (1)
  • TestClientShutdown (300-319)
protocol/peersharing/client_test.go (1)
  • TestClientShutdown (89-108)
connection.go (1)
  • Connection (59-103)
protocol/localtxsubmission/localtxsubmission.go (1)
  • LocalTxSubmission (75-78)
protocol/localtxsubmission/client.go (1)
  • Client (26-36)
protocol/localtxmonitor/client.go (3)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/chainsync/client.go (3)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/chainsync/error.go (1)
  • ErrSyncCancelled (26-26)
protocol/txsubmission/server_concurrency_test.go (5)
connection.go (1)
  • NewConnection (107-130)
protocol/protocol.go (1)
  • ProtocolRoleServer (95-95)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/txsubmission/txsubmission.go (1)
  • TxSubmission (126-129)
protocol/txsubmission/server.go (1)
  • Server (28-41)
protocol/txsubmission/server_test.go (6)
protocol/peersharing/server_test.go (1)
  • TestServerShutdown (28-82)
connection.go (2)
  • NewConnection (107-130)
  • Connection (59-103)
protocol/protocol.go (1)
  • ProtocolRoleServer (95-95)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/txsubmission/txsubmission.go (1)
  • TxSubmission (126-129)
protocol/txsubmission/server.go (1)
  • Server (28-41)
protocol/peersharing/server_test.go (6)
protocol/txsubmission/server_test.go (1)
  • TestServerShutdown (28-82)
connection.go (2)
  • NewConnection (107-130)
  • Connection (59-103)
protocol/protocol.go (1)
  • ProtocolRoleServer (95-95)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/peersharing/peersharing.go (1)
  • PeerSharing (67-70)
protocol/peersharing/server.go (1)
  • Server (25-30)
protocol/keepalive/client_test.go (5)
connection.go (3)
  • Connection (59-103)
  • NewConnection (107-130)
  • New (133-135)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-230)
protocol/keepalive/keepalive.go (1)
  • KeepAlive (85-88)
protocol/keepalive/client.go (1)
  • Client (26-35)
protocol/leiosnotify/client_test.go (11)
protocol/versiondata.go (6)
  • VersionData (40-46)
  • VersionDataNtN13andUp (143-145)
  • VersionDataNtN11to12 (116-122)
  • DiffusionModeInitiatorOnly (21-21)
  • PeerSharingModeNoPeerSharing (27-27)
  • QueryModeDisabled (36-36)
protocol/handshake/messages.go (1)
  • NewMsgAcceptVersion (88-102)
connection.go (2)
  • Connection (59-103)
  • NewConnection (107-130)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-230)
protocol/keepalive/client_test.go (1)
  • TestClientShutdown (303-322)
protocol/localstatequery/client_test.go (1)
  • TestClientShutdown (357-376)
protocol/localtxsubmission/client_test.go (1)
  • TestClientShutdown (167-186)
protocol/peersharing/client_test.go (1)
  • TestClientShutdown (89-108)
protocol/leiosnotify/leiosnotify.go (1)
  • LeiosNotify (75-78)
protocol/leiosnotify/client.go (1)
  • Client (24-33)
protocol/blockfetch/client_test.go (2)
connection.go (1)
  • Connection (59-103)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/localstatequery/client.go (5)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localstatequery/localstatequery.go (1)
  • AcquireTarget (131-133)
protocol/localstatequery/messages.go (4)
  • NewMsgDone (245-252)
  • AcquireFailurePointNotOnChain (44-44)
  • MsgResult (172-175)
  • NewMsgQuery (160-170)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localstatequery/error.go (2)
  • ErrAcquireFailurePointTooOld (20-20)
  • ErrAcquireFailurePointNotOnChain (23-25)
protocol/txsubmission/server.go (5)
protocol/txsubmission/messages.go (2)
  • TxBody (197-201)
  • NewMsgRequestTxIds (68-82)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/txsubmission/txsubmission.go (2)
  • ProtocolName (27-27)
  • MaxAckCount (143-143)
protocol/error.go (2)
  • ErrProtocolViolationRequestExceeded (29-31)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/error.go (1)
  • ErrStopServerProcess (21-21)
protocol/leiosfetch/client.go (3)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/keepalive/client.go (4)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/keepalive/keepalive.go (1)
  • ProtocolName (27-27)
connection/id.go (1)
  • ConnectionId (22-25)
protocol/keepalive/messages.go (1)
  • NewMsgDone (94-101)
protocol/peersharing/client_test.go (5)
connection.go (2)
  • Connection (59-103)
  • NewConnection (107-130)
protocol/protocol.go (1)
  • ProtocolRoleClient (94-94)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/peersharing/peersharing.go (1)
  • PeerSharing (67-70)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/leiosnotify/client.go (4)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/localtxsubmission/client.go (1)
  • Client (26-36)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (go)
🔇 Additional comments (31)
protocol/localtxmonitor/client_test.go (1)

299-319: TestClientShutdown pattern looks consistent and useful

The shutdown test mirrors the existing runTest harness and other protocol shutdown tests, and correctly asserts that Stop returns no error after Start.

protocol/localtxsubmission/localtxsubmission.go (1)

39-50: Confirm Done transition semantics from Busy state

Adding MessageTypeDone -> stateDone from both Idle and Busy states aligns with enabling shutdown from more places, but with stateBusy marked AgencyServer it’s worth double‑checking that the side expected to send MessageTypeDone in Busy is consistent with the underlying protocol/state-machine implementation.

Also applies to: 52-67

protocol/localstatequery/client_test.go (1)

25-25: Alias cleanup and shutdown test look good

Using the ouroboros alias keeps this file consistent with other tests, and TestClientShutdown correctly exercises LocalStateQuery().Client.Start() followed by Stop() with proper error checking.

Also applies to: 357-376

protocol/keepalive/client_test.go (1)

242-301: Verify mock conversation vs KeepAlive.Start behavior

The new runTest helper and TestClientShutdown follow the shared shutdown-testing pattern and look structurally sound. One thing to double‑check: KeepAlive().Client.Start() immediately calls sendKeepAlive, so if ouroboros_mock expects every keep‑alive exchange to be scripted, a handshake‑only conversation may surface as an unexpected-message error. If the mock ignores unscripted keep‑alive traffic, this is fine; otherwise you may want to add a minimal keep‑alive request/response pair to the conversation or disable the initial send for this test.

Also applies to: 303-322

protocol/leiosnotify/client.go (1)

31-33: LeiosNotify client shutdown logic is now consistent and race‑free

The introduction of stateMutex/started, the revised Stop that only closes requestNextChan after DoneChan() (or immediately if never started), and the handlers’ select on DoneChan() vs requestNextChan together remove the previous risk of writes to a closed channel and align this client’s lifecycle with the rest of the protocols. RequestNext correctly surfaces ErrProtocolShuttingDown once the channel is closed.

Also applies to: 72-86, 88-114, 150-155, 159-164, 168-173, 177-182

protocol/leiosnotify/client_test.go (1)

30-136: LeiosNotify shutdown test scaffolding looks solid and consistent with other protocols

The handshake mock, runTest helper, and TestClientShutdown follow the same pattern used across other protocol tests (mock connection, async error watcher, goleak check, Start/Stop, and bounded shutdown waits). I don’t see correctness or concurrency issues here; this gives good coverage of the LeiosNotify client lifecycle.

protocol/chainsync/client.go (3)

28-37: Lifecycle tracking and deferred channel close resolve the previous readyForNextBlockChan race

Using started atomic.Bool plus onceStart/onceStop, and deferring close(readyForNextBlockChan) until <-c.DoneChan() when started, ensures the channel is only closed after the protocol has fully shut down. This removes the prior risk of handlers sending on a closed channel while keeping Stop idempotent.

Also applies to: 119-161


322-356: GetAvailableBlockRange: robust handling of readyForNextBlockChan and cancellations

Switching to ready, ok := <-c.readyForNextBlockChan with:

  • !okErrProtocolShuttingDown
  • ready == falseErrSyncCancelled
    gives clear semantics for shutdown vs. cancellation and avoids undefined behavior on closed channels. The subsequent RequestNext send remains guarded by DoneChan, so this looks correct.

573-754: RollForward/RollBackward: DoneChan‑aware signaling to readyForNextBlockChan

Wrapping all sends to readyForNextBlockChan (both true and false cases) in a select on c.DoneChan() prevents sends after shutdown and cleanly propagates ErrProtocolShuttingDown when needed. Combined with the deferred close in Stop(), this eliminates the send‑after‑close panic risk while preserving the sync and cancellation behavior.

Also applies to: 756-793

protocol/leiosfetch/client.go (2)

26-38: Lifecycle flags and Start guard are reasonable

Using started/stopped as atomic.Bool with onceStart/onceStop gives a clear “start once, stop once, never restart” contract. The stopped.Load() check in Start() prevents accidental restarts after Stop(), which matches expectations for this client.

Also applies to: 92-107


214-280: DoneChan‑aware handlers for result channels look correct

Each handler (handleBlock, handleBlockTxs, handleVotes, handleNextBlockAndTxsInRange, handleLastBlockAndTxsInRange) now sends on the corresponding result channel via a select on c.DoneChan(). This prevents blocking during shutdown and returns ErrProtocolShuttingDown when the protocol is stopping, which is exactly what callers of the request methods expect.

protocol/localstatequery/client.go (2)

29-47: Lifecycle and state mutexes remove the previous data race on started

Adding stateMutex and updating Start() to set c.started under this mutex makes accesses to started in Stop() race‑free. The onceStart/onceStop pairing around these operations is consistent with other protocol clients and avoids double‑start/stop issues.

Also applies to: 102-115


902-965: Acquire/result handling now correctly synchronizes acquired and is shutdown-safe

The introduction of acquiredMutex and its use in handleAcquired, acquire, release, and runQuery removes the prior data race on c.acquired. Combined with DoneChan‑aware sends to acquireResultChan and queryResultChan, this makes acquire/release and query flows behave correctly on shutdown (returning ErrProtocolShuttingDown when channels are closed) without risking send‑after‑close panics.

Also applies to: 967-1038

protocol/txsubmission/server.go (4)

27-41: Server struct and constructor updates look correct

Switching ackCount to int32 for atomic access, adding done with doneMutex, and introducing onceStop/restartMutex/stopped gives the server a clear lifecycle model. Initializing requestTxIdsResultChan and requestTxsResultChan as buffered channels in NewServer is appropriate for the single‑reply pattern.

Also applies to: 48-57


83-130: Stop() and doneChan() provide a clean, coordinated shutdown API

Stop() now:

  • is idempotent via onceStop,
  • coordinates with restarts using restartMutex,
  • safely closes done under doneMutex, and
  • calls s.Protocol.Stop().

Combined with doneChan() and IsStopped(), this gives RequestTxIds/RequestTxs and tests a clear way to observe shutdown. This addresses the prior concurrent close and restart issues around the done channel.


132-193: Ack count atomics and RequestTxIds shutdown behavior look correct

RequestTxIds now:

  • validates reqCount,
  • uses atomic.LoadInt32 / StoreInt32 on ackCount within bounds, and
  • returns ErrProtocolShuttingDown when either requestTxIdsResultChan is closed or doneChan() fires.

The conversion to uint16 is guarded by the range checks, and updating ackCount to len(result.txIds) after each successful call is consistent with the protocol. This path looks solid.


195-225: RequestTxs now shuts down cleanly via doneChan and ok checks

Using select { case <-s.doneChan(): ...; case txs, ok := <-s.requestTxsResultChan: ... } ensures that callers get ErrProtocolShuttingDown if the server is stopping, and the extra ok check on the result channel avoids panics if it’s ever closed in future refactors. This is a good improvement in shutdown semantics.

protocol/txsubmission/server_concurrency_test.go (1)

28-95: LGTM: Well-structured concurrent Stop test.

The test properly exercises concurrent Stop calls with appropriate timeout protection and validates idempotent shutdown behavior. The timeout-based deadlock detection pattern is robust.

protocol/txsubmission/server_test.go (1)

28-82: LGTM: Clean shutdown test structure.

The test follows established patterns with proper async error monitoring, timeout guards, and error checking on Stop. The two-stage timeout approach (2s for mock, 10s for full shutdown) is appropriate.

protocol/blockfetch/blockfetch.go (1)

122-122: LGTM: Queue size increase aligns with performance objectives.

The 50% increase (256→384) in default receive queue size provides better buffering while remaining within the allowed maximum (512). This change supports the PR's performance improvement goals.

protocol/localtxsubmission/client_test.go (1)

167-186: LGTM: Consistent shutdown test pattern.

The test follows the established pattern used across other protocol clients (blockfetch, chainsync, etc.) and properly validates error-free shutdown behavior.

protocol/peersharing/server.go (1)

114-129: LGTM: Clean restart flow with explicit lifecycle.

The updated restart logic properly stops the protocol before reinitializing, making the lifecycle explicit and easier to reason about.

protocol/blockfetch/client_test.go (1)

211-230: LGTM: Consistent client shutdown test.

The test follows the same pattern established across other protocol clients and properly validates the Start/Stop lifecycle.

protocol/peersharing/server_test.go (1)

28-82: LGTM: Clean server shutdown test.

The test properly validates server shutdown with appropriate error handling, async monitoring, and timeout guards. The error check on Stop (lines 60-62) correctly validates shutdown behavior.

protocol/localtxsubmission/client.go (5)

34-35: LGTM: Standard lifecycle tracking fields.

The stateMutex and started fields follow the established pattern used across other protocol clients for tracking lifecycle state.


76-90: LGTM: Proper lifecycle initialization.

The Start method correctly guards the started flag with stateMutex and uses onceStart for idempotency. The mutex scope includes the Protocol.Start() call, which is consistent with other protocol clients.


92-124: LGTM: Proper shutdown sequencing with deferred channel close.

The Stop method correctly:

  • Guards state with stateMutex and prevents concurrent access with busyMutex
  • Sends the Done message before stopping the protocol
  • Defers channel closure until after protocol shutdown (when started==true) to prevent handler races
  • Handles the case where Stop is called before Start

The goroutine (lines 114-117) is safe because Protocol.Stop() guarantees DoneChan will close.


165-179: LGTM: Eliminates TOCTOU race with shutdown-aware send.

The select statement properly addresses the previous TOCTOU race condition by racing the channel send against DoneChan. If shutdown occurs, ErrProtocolShuttingDown is returned; otherwise, the result is delivered safely.


181-204: LGTM: Consistent shutdown-aware error delivery.

The select statement mirrors the pattern in handleAcceptTx, ensuring the error is delivered safely or shutdown is acknowledged. This eliminates the TOCTOU race condition.

protocol/peersharing/client_test.go (2)

30-56: Good test setup with goroutine leak detection.

The test infrastructure is well-structured with proper use of goleak for detecting leaked goroutines and appropriate mock connection setup. The Ouroboros connection is configured correctly with the required options for testing.


89-108: Original review comment is based on false assumptions and should be disregarded.

The Client struct fields started and stopped are private (lowercase) with no corresponding public accessor methods like IsStopped() or IsStarted(). The test correctly follows Go practices by testing only the public API (Start/Stop methods), verifying no errors occur, and relying on goleak for goroutine leak detection. Accessing private fields from tests would violate encapsulation and is unnecessary here.

Likely an incorrect or invalid review comment.

@wolf31o2 wolf31o2 force-pushed the perf/protocols-improvements branch from 1c271c7 to 5c93443 Compare November 16, 2025 18:43
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
protocol/peersharing/server.go (1)

49-54: Error from Protocol.Stop() is silently discarded.

The Stop() method calls s.Protocol.Stop() but does not capture or propagate its error. This means any shutdown failures are lost, which contradicts the method's error return signature and the pattern expected by callers.

Apply this diff to properly propagate shutdown errors:

 func (s *Server) Stop() error {
+	var err error
 	s.onceStop.Do(func() {
-		s.Protocol.Stop()
+		err = s.Protocol.Stop()
 	})
-	return nil
+	return err
 }
🧹 Nitpick comments (2)
protocol/localtxsubmission/client.go (1)

34-36: Shutdown behavior improvements look correct; consider tightening Stop() error path

The changes here look good overall:

  • stateMutex + started cleanly protect the start/stop state.
  • Delaying close(c.submitResultChan) until <-c.DoneChan() when started, and using select { case <-c.DoneChan(): ...; case c.submitResultChan <- ... } in both handlers, removes the TOCTOU window that could previously cause a send on a closed channel.
  • The submitResultChan close-on-Done pattern is consistent with other protocols and should be safe with the new select usage.

One potential follow-up improvement: in Stop() (Lines 93–123), if SendMessage(NewMsgDone()) fails, the function returns early and never calls c.Protocol.Stop() or closes submitResultChan. That keeps the previous behavior but can leave the client in a partially-stopped state on an error path. You might want to still attempt c.Protocol.Stop() and schedule the channel close even when sending MsgDone fails, while propagating the original error to the caller.

Also applies to: 76-89, 93-123, 165-177, 181-203

protocol/leiosnotify/client_test.go (1)

30-136: LeiosNotify shutdown test harness is well-structured

The NtN version-data helpers, conversationEntryNtNResponseV15, and runTest harness give you a clean, reusable way to exercise LeiosNotify client startup/shutdown with goleak verification and strict error handling. TestClientShutdown itself is straightforward and aligns with the other protocol shutdown tests. If you ever touch mockNtNVersionDataV11 again, you could drop the interface return type and avoid the runtime type assertion, but as-is it's perfectly acceptable for test code.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c271c7 and 5c93443.

📒 Files selected for processing (27)
  • protocol/blockfetch/blockfetch.go (1 hunks)
  • protocol/blockfetch/client.go (7 hunks)
  • protocol/blockfetch/client_test.go (1 hunks)
  • protocol/chainsync/chainsync.go (1 hunks)
  • protocol/chainsync/client.go (6 hunks)
  • protocol/chainsync/client_concurrency_test.go (1 hunks)
  • protocol/chainsync/client_test.go (2 hunks)
  • protocol/chainsync/error.go (1 hunks)
  • protocol/keepalive/client.go (2 hunks)
  • protocol/keepalive/client_test.go (1 hunks)
  • protocol/leiosfetch/client.go (5 hunks)
  • protocol/leiosnotify/client.go (4 hunks)
  • protocol/leiosnotify/client_test.go (1 hunks)
  • protocol/localstatequery/client.go (6 hunks)
  • protocol/localstatequery/client_test.go (2 hunks)
  • protocol/localtxmonitor/client.go (10 hunks)
  • protocol/localtxmonitor/client_test.go (1 hunks)
  • protocol/localtxsubmission/client.go (5 hunks)
  • protocol/localtxsubmission/client_test.go (1 hunks)
  • protocol/localtxsubmission/localtxsubmission.go (2 hunks)
  • protocol/peersharing/client.go (4 hunks)
  • protocol/peersharing/client_test.go (1 hunks)
  • protocol/peersharing/server.go (5 hunks)
  • protocol/peersharing/server_test.go (1 hunks)
  • protocol/txsubmission/server.go (8 hunks)
  • protocol/txsubmission/server_concurrency_test.go (1 hunks)
  • protocol/txsubmission/server_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • protocol/localtxmonitor/client_test.go
  • protocol/localtxsubmission/localtxsubmission.go
  • protocol/leiosnotify/client.go
  • protocol/chainsync/client_test.go
  • protocol/peersharing/client_test.go
  • protocol/localstatequery/client_test.go
🧰 Additional context used
🧬 Code graph analysis (18)
protocol/keepalive/client_test.go (5)
connection.go (3)
  • Connection (59-103)
  • NewConnection (107-130)
  • New (133-135)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-230)
protocol/keepalive/keepalive.go (1)
  • KeepAlive (85-88)
protocol/keepalive/client.go (1)
  • Client (26-35)
protocol/blockfetch/client_test.go (6)
protocol/chainsync/client_test.go (1)
  • TestClientShutdown (283-302)
protocol/keepalive/client_test.go (1)
  • TestClientShutdown (303-322)
protocol/peersharing/client_test.go (1)
  • TestClientShutdown (95-114)
connection.go (1)
  • Connection (59-103)
protocol/blockfetch/blockfetch.go (1)
  • BlockFetch (102-105)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/peersharing/client.go (3)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/peersharing/peersharing.go (1)
  • ProtocolName (27-27)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/server_test.go (6)
protocol/peersharing/server_test.go (1)
  • TestServerShutdown (28-82)
connection.go (2)
  • NewConnection (107-130)
  • Connection (59-103)
protocol/protocol.go (1)
  • ProtocolRoleServer (95-95)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/txsubmission/txsubmission.go (1)
  • TxSubmission (126-129)
protocol/txsubmission/server.go (1)
  • Server (28-41)
protocol/peersharing/server.go (2)
protocol/txsubmission/server.go (1)
  • Server (28-41)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/leiosnotify/client_test.go (6)
protocol/versiondata.go (6)
  • VersionData (40-46)
  • VersionDataNtN13andUp (143-145)
  • VersionDataNtN11to12 (116-122)
  • DiffusionModeInitiatorOnly (21-21)
  • PeerSharingModeNoPeerSharing (27-27)
  • QueryModeDisabled (36-36)
protocol/handshake/messages.go (1)
  • NewMsgAcceptVersion (88-102)
connection.go (2)
  • Connection (59-103)
  • NewConnection (107-130)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/leiosnotify/leiosnotify.go (1)
  • LeiosNotify (75-78)
protocol/leiosnotify/client.go (1)
  • Client (24-33)
protocol/localstatequery/client.go (5)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localtxmonitor/client.go (1)
  • Client (25-41)
protocol/localstatequery/messages.go (4)
  • NewMsgDone (245-252)
  • AcquireFailurePointNotOnChain (44-44)
  • MsgResult (172-175)
  • NewMsgQuery (160-170)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localstatequery/error.go (2)
  • ErrAcquireFailurePointTooOld (20-20)
  • ErrAcquireFailurePointNotOnChain (23-25)
protocol/localtxsubmission/client.go (7)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/leiosnotify/client.go (1)
  • Client (24-33)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/localtxmonitor/client.go (1)
  • Client (25-41)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localtxsubmission/client_test.go (6)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-230)
protocol/chainsync/client_test.go (1)
  • TestClientShutdown (283-302)
protocol/localtxmonitor/client_test.go (1)
  • TestClientShutdown (300-319)
connection.go (1)
  • Connection (59-103)
protocol/localtxsubmission/localtxsubmission.go (1)
  • LocalTxSubmission (75-78)
protocol/localtxsubmission/client.go (1)
  • Client (26-36)
protocol/peersharing/server_test.go (5)
protocol/txsubmission/server_test.go (1)
  • TestServerShutdown (28-82)
connection.go (2)
  • NewConnection (107-130)
  • Connection (59-103)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/peersharing/peersharing.go (1)
  • PeerSharing (67-70)
protocol/peersharing/server.go (1)
  • Server (26-32)
protocol/chainsync/client.go (3)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/chainsync/error.go (1)
  • ErrSyncCancelled (26-26)
protocol/leiosfetch/client.go (4)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/chainsync/client.go (1)
  • Client (29-46)
protocol/blockfetch/client.go (2)
protocol/protocol.go (2)
  • Protocol (39-60)
  • New (122-133)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localtxmonitor/client.go (5)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/localtxmonitor/messages.go (1)
  • NewMsgDone (79-86)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/server_concurrency_test.go (5)
connection.go (1)
  • NewConnection (107-130)
protocol/protocol.go (1)
  • ProtocolRoleServer (95-95)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/txsubmission/txsubmission.go (1)
  • TxSubmission (126-129)
protocol/txsubmission/server.go (1)
  • Server (28-41)
protocol/txsubmission/server.go (5)
protocol/txsubmission/messages.go (2)
  • TxBody (197-201)
  • NewMsgRequestTxIds (68-82)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/txsubmission/txsubmission.go (1)
  • MaxAckCount (143-143)
protocol/error.go (2)
  • ErrProtocolViolationRequestExceeded (29-31)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/error.go (1)
  • ErrStopServerProcess (21-21)
protocol/chainsync/client_concurrency_test.go (3)
protocol/chainsync/chainsync.go (3)
  • New (259-267)
  • NewConfig (273-295)
  • ChainSync (204-207)
connection_options.go (1)
  • WithChainSyncConfig (131-135)
protocol/chainsync/client.go (1)
  • Client (29-46)
protocol/keepalive/client.go (4)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/keepalive/keepalive.go (1)
  • ProtocolName (27-27)
connection/id.go (1)
  • ConnectionId (22-25)
protocol/keepalive/messages.go (1)
  • NewMsgDone (94-101)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (go)
🔇 Additional comments (11)
protocol/localtxsubmission/client_test.go (1)

167-186: LGTM: Consistent shutdown test implementation.

The test follows the established pattern seen across other protocol clients and properly validates the Start/Stop lifecycle with appropriate error handling.

protocol/blockfetch/blockfetch.go (1)

122-122: Performance tuning approved.

The increase from 256 to 384 provides better buffering capacity while remaining well within the MaxRecvQueueSize limit of 512, consistent with the PR's performance improvement objectives.

protocol/chainsync/error.go (1)

25-26: LGTM: Well-defined sentinel error.

The new ErrSyncCancelled error follows established patterns in this file and provides a clear signal for cancellation scenarios within the sync lifecycle.

protocol/blockfetch/client_test.go (1)

211-230: LGTM: Proper shutdown test coverage.

The test validates the BlockFetch client lifecycle with appropriate nil checks and error handling, consistent with shutdown tests across other protocols.

protocol/chainsync/chainsync.go (1)

226-227: Performance improvements approved.

Increasing both DefaultPipelineLimit and DefaultRecvQueueSize from 50 to 75 allows for better throughput via increased pipelining while remaining safely within protocol maximums (100).

protocol/keepalive/client_test.go (1)

242-322: LGTM: Comprehensive shutdown test with proper error handling.

The test infrastructure and TestClientShutdown implementation follow the established pattern across protocol tests. Error handling from Stop() is correctly validated (lines 317-319), addressing previous review feedback.

protocol/peersharing/server.go (1)

118-132: Proper restart sequence with cleanup.

The updated handleDone correctly calls Stop() with error checking before reinitializing the protocol, ensuring a clean restart cycle.

protocol/keepalive/client.go (1)

98-120: LGTM: Well-structured Stop() implementation.

The method provides proper lifecycle management with:

  • Single-execution guarantee via onceStop
  • Thread-safe timer cleanup under mutex
  • Error capture from message sending
  • Complete protocol shutdown
protocol/peersharing/server_test.go (1)

28-81: Server shutdown test wiring looks solid

The shutdown test mirrors the TxSubmission pattern: it validates that PeerSharing().Server exists, checks the Stop() error, and waits for both the mock connection and Ouroboros connection to terminate with bounded timeouts. This will be a good guard for future lifecycle regressions once the skip is removed.

protocol/txsubmission/server_concurrency_test.go (1)

28-95: Concurrency tests effectively exercise Stop() idempotence and state

TestServerConcurrentStop and TestServerStopSetsStoppedFlag are structured well: they start the server, coordinate concurrent Stop() calls with a WaitGroup and timeout, and then assert the IsStopped() flag. This should catch deadlocks or mis-set state once the mock issues are resolved and the skips are removed.

Also applies to: 97-143

protocol/txsubmission/server_test.go (1)

28-82: TxSubmission server shutdown test matches the shared lifecycle pattern

This test cleanly validates TxSubmission server startup/shutdown, including checking the Stop() error and waiting for both the mock and Ouroboros connections to terminate within timeouts. It aligns with the peersharing test and provides good coverage for server lifecycle behavior once un-skipped.

@wolf31o2 wolf31o2 force-pushed the perf/protocols-improvements branch from 5c93443 to eff3a32 Compare November 16, 2025 18:58
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
protocol/leiosfetch/client.go (1)

109-143: Fix Stop() early-return on SendMessage error (incomplete shutdown, no retry possible)

If SendMessage(NewMsgDone) fails, the closure in onceStop.Do returns immediately: Protocol.Stop() is never called, result channels are never closed, and started/stopped flags are not updated. Because onceStop has now run, subsequent Stop() calls are no-ops. This can leave the underlying protocol running, leak goroutines waiting on DoneChan (line 126), and cause callers blocked on result channels to never be unblocked.

You should always attempt to stop the protocol and clean up channels, while still propagating the SendMessage error. For example:

 func (c *Client) Stop() error {
 	var err error
 	c.onceStop.Do(func() {
 		c.Protocol.Logger().
 			Debug("stopping client protocol",
 				"component", "network",
 				"protocol", ProtocolName,
 				"connection_id", c.callbackContext.ConnectionId.String(),
 			)
-		msg := NewMsgDone()
-		if err = c.SendMessage(msg); err != nil {
-			return
-		}
-		_ = c.Protocol.Stop() // Error ignored - method returns SendMessage error
+		msg := NewMsgDone()
+		if sendErr := c.SendMessage(msg); sendErr != nil {
+			// Preserve the SendMessage error but still shut down the protocol.
+			err = sendErr
+		}
+		// Always attempt to stop the protocol so DoneChan and muxer shutdown complete.
+		_ = c.Protocol.Stop() // Stop error ignored; err already reflects SendMessage failure if any
 		// Defer closing channels until protocol fully shuts down (only if started)
 		if c.started.Load() {
 			go func() {
 				<-c.DoneChan()
 				close(c.blockResultChan)
 				close(c.blockTxsResultChan)
 				close(c.votesResultChan)
 				close(c.blockRangeResultChan)
 			}()
 		} else {
 			// If protocol was never started, close channels immediately
 			close(c.blockResultChan)
 			close(c.blockTxsResultChan)
 			close(c.votesResultChan)
 			close(c.blockRangeResultChan)
 		}
-		c.started.Store(false)
-		c.stopped.Store(true)
+		c.started.Store(false)
+		c.stopped.Store(true)
 	})
 	return err
 }

This matches the pattern used in localtxsubmission.Client.Stop() (protocol/localtxsubmission/client.go:93–120) and avoids leaving the client in a non-recoverable half-stopped state.

protocol/blockfetch/client.go (1)

103-132: Fix Stop() to always call Protocol.Stop() to prevent goroutine leaks on send errors

The current code only calls c.Protocol.Stop() when SendMessage succeeds. If SendMessage fails, Protocol.Stop() is skipped, which prevents Muxer.UnregisterProtocol() from being called. This leaves muxerDoneChan signaled, causing recvLoop to block indefinitely (line 495 of protocol/protocol.go), which never closes recvDoneChan. Without recvDoneChan closing, sendLoop never exits, leaving sendDoneChan unclosed. The goroutine at protocol.go lines 162-166 waits for both, so doneChan never closes, permanently blocking the cleanup goroutine and leaking channels.

The localtxsubmission client (protocol/localtxsubmission/client.go:109-111) correctly handles this by always calling Protocol.Stop() regardless of SendMessage result. Apply the same pattern:

 func (c *Client) Stop() error {
 	var err error
 	c.onceStop.Do(func() {
 		c.Protocol.Logger().
 			Debug("stopping client protocol",
 				"component", "network",
 				"protocol", ProtocolName,
 				"connection_id", c.callbackContext.ConnectionId.String(),
 			)
 		msg := NewMsgClientDone()
-		err = c.SendMessage(msg)
-		if err == nil {
-			_ = c.Protocol.Stop() // Error ignored - method returns SendMessage error
-		}
+		if sendErr := c.SendMessage(msg); sendErr != nil {
+			err = sendErr
+		}
+		_ = c.Protocol.Stop() // Always stop to signal muxerDoneChan
 		// Defer closing channels until protocol fully shuts down (only if started)
 		if c.started.Load() {
 			go func() {
 				<-c.DoneChan()
 				close(c.blockChan)
 				close(c.startBatchResultChan)
 			}()
 		} else {
 			// If protocol was never started, close channels immediately
 			close(c.blockChan)
 			close(c.startBatchResultChan)
 		}
 	})
 	return err
 }
protocol/localtxmonitor/client.go (1)

204-208: Guard c.acquired with a mutex to avoid a data race between handlers and callers.

c.acquired is written in the handler (handleAcquired) and release() and read in HasTx, NextTx, and GetSizes (inside the busyMutex critical section), but the handler goroutine does not hold busyMutex. That gives you an unsynchronized read/write on acquired, which go test -race will flag.

Given you already introduced acquiredMutex for the localstatequery client in this PR, I’d suggest mirroring that pattern here:

  • Add acquiredMutex sync.Mutex to Client.
  • In handleAcquired and release, set c.acquired under acquiredMutex.
  • In HasTx, NextTx, and GetSizes, read c.acquired under acquiredMutex (then decide whether to call c.acquire()).

Conceptually:

 type Client struct {
@@
-  busyMutex          sync.Mutex
-  acquired           bool
+  busyMutex          sync.Mutex
+  acquiredMutex      sync.Mutex
+  acquired           bool
@@
 func (c *Client) handleAcquired(msg protocol.Message) error {
@@
-  c.acquired = true
+  c.acquiredMutex.Lock()
+  c.acquired = true
+  c.acquiredMutex.Unlock()
@@
 func (c *Client) HasTx(txId []byte) (bool, error) {
@@
-  if !c.acquired {
+  c.acquiredMutex.Lock()
+  acquired := c.acquired
+  c.acquiredMutex.Unlock()
+  if !acquired {
@@
 func (c *Client) release() error {
@@
-  c.acquired = false
+  c.acquiredMutex.Lock()
+  c.acquired = false
+  c.acquiredMutex.Unlock()

And similarly for NextTx and GetSizes. This keeps the handler and client sides race‑free while preserving existing behavior.

Also applies to: 238-242, 272-276, 318-319, 397-398

🧹 Nitpick comments (2)
protocol/txsubmission/server_concurrency_test.go (2)

29-95: Skipped test provides no value until mock issues are resolved.

The test structure is sound and properly checks for concurrent Stop() deadlocks, but it's currently skipped. Consider either:

  1. Fixing the mock server issues with NtN protocol to enable this test
  2. Removing this test until the infrastructure is ready
  3. Adding a tracking issue reference in the skip message

Do you want me to search for existing issues related to the "mock server issues with NtN protocol" or help create a tracking issue?


100-143: Incomplete test with TODO comment.

The test is both skipped and has a TODO comment (line 98-99) indicating planned future enhancements. This suggests the test is a work-in-progress.

Consider either:

  • Completing the test as indicated in the TODO before merging
  • Creating a separate issue to track the enhancement
  • Adding a more specific skip message referencing the TODO
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5c93443 and eff3a32.

📒 Files selected for processing (32)
  • protocol/blockfetch/blockfetch.go (1 hunks)
  • protocol/blockfetch/client.go (7 hunks)
  • protocol/blockfetch/client_test.go (1 hunks)
  • protocol/blockfetch/server.go (1 hunks)
  • protocol/chainsync/chainsync.go (1 hunks)
  • protocol/chainsync/client.go (6 hunks)
  • protocol/chainsync/client_concurrency_test.go (1 hunks)
  • protocol/chainsync/client_test.go (2 hunks)
  • protocol/chainsync/error.go (1 hunks)
  • protocol/chainsync/server.go (1 hunks)
  • protocol/keepalive/client.go (2 hunks)
  • protocol/keepalive/client_test.go (1 hunks)
  • protocol/leiosfetch/client.go (5 hunks)
  • protocol/leiosfetch/server.go (1 hunks)
  • protocol/leiosnotify/client.go (4 hunks)
  • protocol/leiosnotify/client_test.go (1 hunks)
  • protocol/leiosnotify/server.go (1 hunks)
  • protocol/localstatequery/client.go (6 hunks)
  • protocol/localstatequery/client_test.go (2 hunks)
  • protocol/localtxmonitor/client.go (10 hunks)
  • protocol/localtxmonitor/client_test.go (1 hunks)
  • protocol/localtxsubmission/client.go (5 hunks)
  • protocol/localtxsubmission/client_test.go (1 hunks)
  • protocol/localtxsubmission/localtxsubmission.go (2 hunks)
  • protocol/peersharing/client.go (4 hunks)
  • protocol/peersharing/client_test.go (1 hunks)
  • protocol/peersharing/server.go (5 hunks)
  • protocol/peersharing/server_test.go (1 hunks)
  • protocol/protocol.go (4 hunks)
  • protocol/txsubmission/server.go (8 hunks)
  • protocol/txsubmission/server_concurrency_test.go (1 hunks)
  • protocol/txsubmission/server_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • protocol/txsubmission/server_test.go
  • protocol/blockfetch/blockfetch.go
  • protocol/localtxmonitor/client_test.go
  • protocol/peersharing/server_test.go
  • protocol/keepalive/client_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-04T15:54:22.683Z
Learnt from: wolf31o2
Repo: blinklabs-io/gouroboros PR: 1093
File: ledger/mary/pparams.go:143-150
Timestamp: 2025-11-04T15:54:22.683Z
Learning: In the blinklabs-io/gouroboros repository, the design goal for CBOR round-trip tests is to achieve byte-identical encoding WITHOUT using stored CBOR (cbor.DecodeStoreCbor). Instead, the approach uses proper field types (pointers for optional fields) and relies on the cbor package's deterministic encoding (SortCoreDeterministic) to ensure reproducible output. The stored CBOR pattern should not be suggested as a solution for round-trip fidelity in this codebase.

Applied to files:

  • protocol/localstatequery/client_test.go
🧬 Code graph analysis (20)
protocol/chainsync/client_test.go (6)
protocol/chainsync/chainsync.go (1)
  • ChainSync (204-207)
protocol/chainsync/client.go (1)
  • Client (29-46)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-230)
protocol/keepalive/client_test.go (1)
  • TestClientShutdown (303-322)
protocol/localstatequery/client_test.go (1)
  • TestClientShutdown (357-376)
connection.go (1)
  • Connection (59-103)
protocol/leiosnotify/client_test.go (6)
protocol/versiondata.go (6)
  • VersionData (40-46)
  • VersionDataNtN13andUp (143-145)
  • VersionDataNtN11to12 (116-122)
  • DiffusionModeInitiatorOnly (21-21)
  • PeerSharingModeNoPeerSharing (27-27)
  • QueryModeDisabled (36-36)
protocol/handshake/messages.go (1)
  • NewMsgAcceptVersion (88-102)
connection.go (2)
  • Connection (59-103)
  • NewConnection (107-130)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/leiosnotify/leiosnotify.go (1)
  • LeiosNotify (75-78)
protocol/leiosnotify/client.go (1)
  • Client (24-33)
protocol/blockfetch/client_test.go (5)
protocol/chainsync/client_test.go (1)
  • TestClientShutdown (283-302)
protocol/keepalive/client_test.go (1)
  • TestClientShutdown (303-322)
connection.go (1)
  • Connection (59-103)
protocol/blockfetch/blockfetch.go (1)
  • BlockFetch (102-105)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/peersharing/client.go (8)
protocol/leiosfetch/client.go (1)
  • Client (26-38)
protocol/leiosnotify/client.go (1)
  • Client (24-33)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/localtxmonitor/client.go (1)
  • Client (25-41)
protocol/localtxsubmission/client.go (1)
  • Client (26-36)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/peersharing/peersharing.go (1)
  • ProtocolName (27-27)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/keepalive/client.go (4)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/keepalive/keepalive.go (1)
  • ProtocolName (27-27)
protocol/keepalive/messages.go (1)
  • NewMsgDone (94-101)
protocol/blockfetch/client.go (2)
protocol/protocol.go (2)
  • Protocol (39-60)
  • New (122-133)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localtxsubmission/localtxsubmission.go (1)
protocol/localtxsubmission/messages.go (1)
  • MessageTypeDone (29-29)
protocol/leiosfetch/client.go (7)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/chainsync/client.go (1)
  • Client (29-46)
protocol/leiosnotify/client.go (1)
  • Client (24-33)
protocol/localtxmonitor/client.go (1)
  • Client (25-41)
protocol/message.go (1)
  • Message (18-22)
protocol/leiosnotify/client.go (8)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/leiosfetch/client.go (1)
  • Client (26-38)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/localtxmonitor/client.go (1)
  • Client (25-41)
protocol/localtxsubmission/client.go (1)
  • Client (26-36)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/message.go (1)
  • Message (18-22)
protocol/localtxsubmission/client.go (6)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/leiosnotify/client.go (1)
  • Client (24-33)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/localtxmonitor/client.go (1)
  • Client (25-41)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/chainsync/client.go (4)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/chainsync/chainsync.go (1)
  • ProtocolName (30-30)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/chainsync/error.go (1)
  • ErrSyncCancelled (26-26)
protocol/peersharing/client_test.go (4)
connection.go (2)
  • Connection (59-103)
  • NewConnection (107-130)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/peersharing/peersharing.go (1)
  • PeerSharing (67-70)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/localstatequery/client_test.go (6)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-230)
protocol/chainsync/client_test.go (1)
  • TestClientShutdown (283-302)
protocol/keepalive/client_test.go (1)
  • TestClientShutdown (303-322)
connection.go (1)
  • Connection (59-103)
protocol/localstatequery/localstatequery.go (1)
  • LocalStateQuery (116-119)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/localstatequery/client.go (6)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localstatequery/localstatequery.go (1)
  • AcquireTarget (131-133)
protocol/localstatequery/messages.go (3)
  • NewMsgDone (245-252)
  • AcquireFailurePointNotOnChain (44-44)
  • MsgResult (172-175)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localstatequery/error.go (2)
  • ErrAcquireFailurePointTooOld (20-20)
  • ErrAcquireFailurePointNotOnChain (23-25)
protocol/message.go (1)
  • Message (18-22)
protocol/localtxmonitor/client.go (4)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/localtxmonitor/messages.go (1)
  • NewMsgDone (79-86)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/peersharing/server.go (2)
protocol/txsubmission/server.go (1)
  • Server (28-41)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/chainsync/client_concurrency_test.go (3)
protocol/chainsync/chainsync.go (3)
  • New (259-267)
  • NewConfig (273-295)
  • ChainSync (204-207)
connection_options.go (1)
  • WithChainSyncConfig (131-135)
protocol/chainsync/client.go (1)
  • Client (29-46)
protocol/txsubmission/server_concurrency_test.go (5)
connection.go (1)
  • NewConnection (107-130)
protocol/protocol.go (1)
  • New (122-133)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/txsubmission/txsubmission.go (1)
  • TxSubmission (126-129)
protocol/txsubmission/server.go (1)
  • Server (28-41)
protocol/txsubmission/server.go (5)
protocol/txsubmission/messages.go (1)
  • NewMsgRequestTxIds (68-82)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/txsubmission/txsubmission.go (2)
  • ProtocolName (27-27)
  • MaxAckCount (143-143)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/error.go (1)
  • ErrStopServerProcess (21-21)
protocol/localtxsubmission/client_test.go (2)
connection.go (1)
  • Connection (59-103)
protocol/localtxsubmission/client.go (1)
  • Client (26-36)
🔇 Additional comments (39)
protocol/localtxsubmission/localtxsubmission.go (1)

46-49: LGTM! Client-initiated graceful shutdown.

Allowing the client to send Done from the Idle state enables graceful protocol termination, consistent with the shutdown lifecycle improvements mentioned in the PR.

protocol/blockfetch/client_test.go (1)

211-230: LGTM! Shutdown test follows established pattern.

The test properly validates that the BlockFetch client can be started and stopped without errors, matching the pattern used across other protocol clients (ChainSync, KeepAlive, etc.). The test structure includes proper nil checks and error handling.

protocol/localtxsubmission/client_test.go (1)

167-186: LGTM! Shutdown test follows established pattern.

The test properly validates shutdown behavior for the LocalTxSubmission client. Correctly uses the NtC (node-to-client) protocol handshake, as LocalTxSubmission is a client-only protocol.

protocol/leiosfetch/server.go (1)

203-205: LGTM! Proper error propagation prevents invalid restart.

The change ensures that if Stop() fails during the protocol restart sequence, the error is propagated instead of blindly proceeding with reinitialization. This aligns with similar error handling improvements in other server components (chainsync, blockfetch).

protocol/chainsync/server.go (1)

245-247: LGTM! Consistent error handling across server restarts.

Proper error propagation added to the restart sequence, matching the pattern implemented in other server components. This prevents attempting to reinitialize and restart when Stop() encounters an error.

protocol/chainsync/client_test.go (1)

283-302: LGTM! Shutdown test follows established pattern.

The test properly validates that the ChainSync client can be started and stopped without errors, consistent with similar tests in other protocol packages.

protocol/blockfetch/server.go (1)

179-181: LGTM! Consistent error handling in restart flow.

Proper error propagation ensures that if Stop() fails, the error is returned instead of proceeding with reinitialization. This matches the pattern implemented across other server components in this PR.

protocol/chainsync/chainsync.go (1)

226-227: Verify the performance impact and rationale for the 50→75 increase—commit message lacks supporting metrics.

The commit message states "Increase default queue sizes for better buffering" but provides no benchmarking data, memory impact analysis, or performance test results. While the 75 value remains within protocol limits (max 100), please ensure:

  1. The increase is validated with actual performance metrics or production data
  2. Memory impact of the 50% buffer increase is acceptable for all client deployments
  3. This change was tested against expected workload patterns
protocol/leiosnotify/server.go (1)

118-120: LGTM! Stop error propagation prevents restart on failure.

The error handling ensures that if Stop() fails, the protocol won't attempt to reinitialize and restart, which is the correct behavior.

protocol/localstatequery/client_test.go (2)

25-25: LGTM! Import alias improves readability.

The alias distinguishes the main package from test utilities.


357-376: LGTM! Shutdown test validates clean teardown.

The test follows the established pattern across other protocol tests and uses goleak to ensure no goroutine leaks.

protocol/peersharing/server.go (3)

49-55: LGTM! Idempotent Stop() with error propagation.

The sync.Once guard ensures Stop() is idempotent, and the method correctly propagates errors from Protocol.Stop().


119-119: LGTM! Unused parameter correctly ignored.

The blank identifier indicates the message parameter is intentionally unused in handleDone.


128-130: LGTM! Stop error handling prevents unsafe restart.

Checking the Stop() error ensures the protocol won't reinitialize if shutdown fails.

protocol/chainsync/error.go (1)

25-26: LGTM! Well-defined sentinel error.

The new exported error follows Go conventions and provides a clear signal for cancelled sync operations.

protocol/protocol.go (3)

176-189: LGTM! Stop() signature change enables error-aware shutdown.

The signature change from func (p *Protocol) Stop() to func (p *Protocol) Stop() error establishes infrastructure for error propagation during shutdown. Currently returns nil, but the pattern is correctly adopted by all callers in the PR.

Note: This is a breaking API change for external consumers.


247-247: LGTM! Appropriate to ignore error in error path.

The comment correctly indicates this is already an error path where the primary error (ErrProtocolViolationQueueExceeded) takes precedence.


453-453: LGTM! Consistent error handling in cleanup.

Same pattern as line 247—error is appropriately ignored in the error path.

protocol/keepalive/client.go (2)

98-120: LGTM! Comprehensive shutdown handling.

The Stop() method correctly:

  • Uses sync.Once for idempotency
  • Stops the timer under mutex protection
  • Sends the Done message and captures its error
  • Calls Protocol.Stop() (appropriately ignoring its error since SendMessage error is returned)

The design ensures clean shutdown of both the timer and the underlying protocol.


122-142: LGTM! Cleaner timer management.

Extracting timer logic into startTimer() improves code organization and makes the timer lifecycle explicit.

protocol/leiosnotify/client_test.go (1)

1-136: LGTM! Comprehensive shutdown test with leak detection.

The test properly validates LeiosNotify client shutdown behavior:

  • Uses goleak to detect goroutine leaks
  • Employs proper NtN v15 handshake sequence
  • Validates Start/Stop lifecycle
  • Includes timeout protection and error handling

The test structure aligns with the shutdown testing pattern used across other protocol components.

protocol/leiosnotify/client.go (5)

31-32: LGTM! Lifecycle state tracking added.

The stateMutex and started flag enable safe coordination between Start/Stop and message handlers.


74-76: LGTM! Start() properly sets started flag.

The mutex ensures thread-safe access to the started flag.

Also applies to: 83-83


103-111: LGTM! Channel closure deferred until protocol shutdown.

The conditional logic correctly handles two scenarios:

  • If started: closes channel after DoneChan signals (prevents handlers from writing to closed channel)
  • If never started: closes immediately (safe since no handlers are running)

This pattern resolves the race condition noted in past review comments.


151-155: LGTM! Shutdown-aware channel send.

The select statement prevents blocking on channel send when the protocol is shutting down, returning ErrProtocolShuttingDown instead.


160-164: LGTM! Consistent shutdown handling across handlers.

All message handlers follow the same shutdown-aware pattern, preventing writes to closed channels.

Also applies to: 169-173, 178-182

protocol/chainsync/client_concurrency_test.go (2)

105-148: Stop-before-Start coverage looks good

The Stop-before-Start test nicely validates that calling Stop on an unstarted ChainSync client is safe (no panic, no deadlock) and that a subsequent Start/Stop sequence still behaves correctly under goleak. This aligns well with the intended lifecycle contract.


29-103: Concurrent Start/Stop test correctly stresses once-semantics and race safety

This test is valuable for catching races and deadlocks in the ChainSync client's sync.Once-based Start/Stop handling. Because Start() and Stop() each use sync.Once internally, only the first call to each actually executes the underlying logic; later calls from other goroutines are no-ops. If you need to exercise repeated full start/stop cycles, you'll have to create fresh clients per cycle rather than looping on the same instance.

protocol/peersharing/client.go (2)

159-173: Shutdown-aware send in handleSharePeers is correct

The switch to a select on c.DoneChan() vs c.sharePeersChan in handleSharePeers cleanly avoids blocking indefinitely during shutdown and returns ErrProtocolShuttingDown in a predictable way. This aligns with the patterns used in other mini-protocol clients.


24-35: Code is correct and already well-commented; no changes needed

The lifecycle pattern with started and stopped flags is correct and necessary—not redundant. Both flags determine channel closure behavior: if started is true, the channel close is deferred until DoneChan; otherwise it closes immediately. This prevents blocking if Stop is called before Start.

The code already contains clear explanatory comments (lines 110 and 117) that document this conditional logic. Setting started = false at the end of Stop is unnecessary and would be misleading—onceStop prevents Stop from executing again, so started is never read after the first invocation. The flag's semantic meaning is "was the protocol ever started," not "is it currently started."

The implementation is consistent with other clients like leiosfetch and correctly handles the Start-after-Stop prevention pattern.

protocol/leiosfetch/client.go (1)

240-283: Shutdown-aware handlers for leiosfetch look solid

The updated handlers (handleBlock, handleBlockTxs, handleVotes, handleNextBlockAndTxsInRange, handleLastBlockAndTxsInRange) now send on their result channels using select with c.DoneChan(). This ensures graceful behavior during shutdown (returning ErrProtocolShuttingDown instead of blocking or panicking) and aligns with the patterns adopted in blockfetch/chainsync.

protocol/localtxsubmission/client.go (2)

166-205: Handlers correctly gate result delivery on shutdown

Both handleAcceptTx and handleRejectTx now use a select on c.DoneChan() vs c.submitResultChan, returning ErrProtocolShuttingDown when appropriate. Combined with closing submitResultChan after DoneChan fires, this avoids sends to closed channels and ensures SubmitTx callers are reliably unblocked on shutdown.


34-36: Start/Stop lifecycle and channel cleanup are now robust

The introduction of stateMutex + started around Start/Stop, along with always calling c.Protocol.Stop() and deferring submitResultChan closure until DoneChan when started, resolves prior shutdown/TOCTOU issues while keeping Stop idempotent. Lock ordering (stateMutex → busyMutex in Stop only; SubmitTx and handlers take no conflicting locks) eliminates deadlock risk. Handler implementations safely use select over DoneChan first, preventing sends on closed channels.

protocol/blockfetch/client.go (1)

228-243: Handler changes correctly respect shutdown state

The updated handleStartBatch, handleNoBlocks, and non-callback path in handleBlock now send into startBatchResultChan/blockChan via a select that races against c.DoneChan(), returning ErrProtocolShuttingDown if the protocol is shutting down. Combined with deferring channel closure to after DoneChan fires, this protects against panics on send and makes shutdown behavior well-defined for in-flight GetBlock/GetBlockRange operations.

Also applies to: 245-261, 263-316

protocol/localtxmonitor/client.go (1)

24-41: Lifecycle and shutdown mechanics now look robust and race‑free.

The added stateMutex/started/stopped state, the Stop() logic that defers channel closing until <-c.DoneChan() (or closes immediately if never started), and the Done‑aware selects in the handlers collectively fix the prior send‑after‑close risk and give callers consistent ErrProtocolShuttingDown semantics on shutdown. This is aligned with the patterns used in the other protocol clients and looks solid.

Also applies to: 88-102, 105-142, 309-377

protocol/peersharing/client_test.go (1)

30-93: PeerSharing shutdown test harness looks correct and leak‑safe.

runTest’s use of the mock connection, async error channels, goleak.VerifyNone, and the explicit Close + timeout waits gives good coverage of the PeerSharing client Start/Stop lifecycle. TestClientShutdown exercises exactly the public surface you care about, and the structure is consistent with the other protocol tests.

Also applies to: 95-114

protocol/chainsync/client.go (1)

37-38: readyForNextBlockChan shutdown is now correctly synchronized with protocol Done.

The combination of:

  • the started flag and Stop()’s go func(){ <-c.DoneChan(); close(c.readyForNextBlockChan) },
  • the ready, ok := <-c.readyForNextBlockChan handling in GetAvailableBlockRange and syncLoop, and
  • the DoneChan‑aware selects before every send in handleRollForward/handleRollBackward

eliminates the earlier send‑on‑closed‑channel race and gives clear, predictable shutdown/cancellation semantics (ErrProtocolShuttingDown vs ErrSyncCancelled). This is a solid fix.

Also applies to: 119-169, 330-359, 458-487, 742-760, 782-800

protocol/localstatequery/client.go (1)

37-47: Lifecycle and acquire/query synchronization improvements look correct and consistent.

The added acquiredMutex, onceStop, stateMutex, and started flag, together with:

  • Start() marking started under stateMutex,
  • Stop() sending MsgDone, invoking c.Protocol.Stop(), and deferring channel closes until <-c.DoneChan() when started, and
  • the DoneChan‑aware selects in handleAcquired, handleFailure, handleResult, plus the closed‑channel checks in acquire()/runQuery()

give the LocalStateQuery client a clean, race‑free shutdown story and predictable ErrProtocolShuttingDown behavior. This now matches the patterns used in the other protocol clients and addresses the earlier lifecycle races.

Also applies to: 102-115, 117-149, 905-968, 970-1006, 1008-1041

protocol/txsubmission/server.go (1)

17-41: Stop/restart lifecycle and shutdown signaling are now well‑synchronized.

The txsubmission server’s new design looks solid:

  • ackCount is now int32 and fully managed via atomic.LoadInt32/StoreInt32, including a reset on restart.
  • done is guarded by doneMutex in both Stop() and handleDone(), with an atomic check‑and‑close, so there’s no longer a risk of double‑closing or racing with readers.
  • Stop() is idempotent via onceStop, marks a permanent stopped flag under restartMutex, closes done, and calls Protocol.Stop().
  • RequestTxIds/RequestTxs use a select on their result channels vs doneChan(), correctly handling closed channels and propagating ErrProtocolShuttingDown.
  • handleDone() non‑blockingly wakes any in‑flight RequestTxIds, runs the user callback, then performs a serialized restart under restartMutex (including reinitializing channels, done, and ackCount), with a TOCTOU IsStopped() check to avoid restarting after a permanent Stop.

This addresses the earlier concurrency issues around ackCount, done, and restart, and the behavior now looks race‑free and predictable.

Also applies to: 49-57, 93-130, 132-193, 195-225, 276-338

@wolf31o2 wolf31o2 force-pushed the perf/protocols-improvements branch from eff3a32 to 5ddaf9a Compare November 16, 2025 20:28
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
protocol/localtxsubmission/client.go (1)

92-125: Double‑close risk on submitResultChan between Stop() and handleDone()

Stop() (when started is true) schedules:

go func() {
    <-c.DoneChan()
    close(c.submitResultChan)
}()

while handleDone() also calls close(c.submitResultChan) directly. If a server‑initiated Done message arrives during or after a client‑initiated Stop, both paths can execute, and the second close will panic.

Writers are already DoneChan‑aware, so the fix is to make channel closure idempotent rather than done in multiple unguarded places.

A minimal, contained fix is to guard the close with sync.Once:

 type Client struct {
 	*protocol.Protocol
 	config           *Config
 	callbackContext  CallbackContext
 	busyMutex        sync.Mutex
 	submitResultChan chan error
 	onceStart        sync.Once
 	onceStop         sync.Once
 	stateMutex       sync.Mutex
 	started          bool
+	closeSubmitResultOnce sync.Once
 }
@@
 func (c *Client) Stop() error {
@@
-		// Defer closing channel until protocol fully shuts down (only if started)
-		if c.started {
-			go func() {
-				<-c.DoneChan()
-				close(c.submitResultChan)
-			}()
-		} else {
-			// If protocol was never started, close channel immediately
-			close(c.submitResultChan)
-		}
+		// Defer closing channel until protocol fully shuts down (only if started)
+		if c.started {
+			go func() {
+				<-c.DoneChan()
+				c.closeSubmitResultChan()
+			}()
+		} else {
+			// If protocol was never started, close channel immediately
+			c.closeSubmitResultChan()
+		}
@@
 func (c *Client) handleDone() error {
@@
-	// Server is shutting down, close the result channel to unblock any waiting operations
-	close(c.submitResultChan)
+	// Server is shutting down, close the result channel to unblock any waiting operations
+	c.closeSubmitResultChan()
 	return nil
 }
+
+func (c *Client) closeSubmitResultChan() {
+	c.closeSubmitResultOnce.Do(func() {
+		close(c.submitResultChan)
+	})
+}

This keeps existing shutdown semantics but removes the panic risk.

🧹 Nitpick comments (5)
protocol/peersharing/client_test.go (1)

74-82: Consider using consistent timeout values across protocol tests.

This test uses a 5-second timeout for mock connection shutdown, while protocol/localstatequery/client_test.go (line 112) uses a 2-second timeout for the same operation. Consider aligning these values for consistency.

protocol/chainsync/chainsync.go (1)

226-227: Document the rationale for the 50% increase in default buffer sizes.

The defaults have been increased from 50 to 75 (a 50% increase), which could have memory implications for deployments with many concurrent connections. Consider:

  1. Adding a comment explaining the performance benefit and why 75 was chosen specifically
  2. Providing guidance on when users might want to adjust these values

Were these values validated through performance testing? If so, it would be valuable to document the findings (e.g., "testing showed X% throughput improvement with Y% memory increase").

protocol/protocol.go (1)

176-189: Add documentation comment explaining why Protocol.Stop() returns error despite always returning nil.

Verification found no interface requiring Stop() error signature, invalidating the interface compatibility hypothesis. However, call site comments (e.g., "Error ignored - method returns nil by design" in txsubmission/server.go:113) confirm this is an intentional design choice. To prevent API confusion, add a comment to the Protocol.Stop() method explaining whether this is a placeholder for future error conditions or part of a consistent protocol API pattern. The method's current comment lacks this context.

protocol/keepalive/client.go (1)

33-35: Consider propagating Protocol.Stop() errors alongside SendMessage errors

Stop() is now the primary shutdown API and other protocols propagate Protocol.Stop() errors, but here they’re discarded in favor of only the SendMessage error. This can hide teardown failures in the underlying protocol while callers see nil.

You can still keep idempotent behavior and prefer the SendMessage error by combining both results, e.g.:

 func (c *Client) Stop() error {
 	c.onceStop.Do(func() {
@@
-		msg := NewMsgDone()
-		c.stopErr = c.SendMessage(msg)
-		// Ensure protocol shuts down completely
-		_ = c.Protocol.Stop() // Error ignored - method returns SendMessage error
+		msg := NewMsgDone()
+		sendErr := c.SendMessage(msg)
+		stopErr := c.Protocol.Stop()
+
+		// Prefer the send error if present, otherwise return the stop error
+		if sendErr != nil {
+			c.stopErr = sendErr
+		} else {
+			c.stopErr = stopErr
+		}
 	})
 	return c.stopErr
 }

This preserves the current contract while exposing protocol-level shutdown problems to callers.

Also applies to: 98-120, 122-142

protocol/peersharing/client.go (1)

75-123: Start/Stop sequencing is safe; consider surfacing protocol Stop errors

The Start/Stop implementation correctly:

  • Serializes lifecycle with sync.Once and stateMutex
  • Prevents Start after Stop
  • Defers sharePeersChan closure until DoneChan() when started, avoiding send‑after‑close.

Stop() currently ignores any error from c.Protocol.Stop(). If protocol shutdown can fail in meaningful ways, consider returning that error when SendMessage isn’t involved, to keep behavior consistent with other clients and improve observability.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eff3a32 and 5ddaf9a.

📒 Files selected for processing (32)
  • protocol/blockfetch/blockfetch.go (1 hunks)
  • protocol/blockfetch/client.go (7 hunks)
  • protocol/blockfetch/client_test.go (1 hunks)
  • protocol/blockfetch/server.go (1 hunks)
  • protocol/chainsync/chainsync.go (1 hunks)
  • protocol/chainsync/client.go (6 hunks)
  • protocol/chainsync/client_concurrency_test.go (1 hunks)
  • protocol/chainsync/client_test.go (2 hunks)
  • protocol/chainsync/error.go (1 hunks)
  • protocol/chainsync/server.go (1 hunks)
  • protocol/keepalive/client.go (2 hunks)
  • protocol/keepalive/client_test.go (1 hunks)
  • protocol/leiosfetch/client.go (5 hunks)
  • protocol/leiosfetch/server.go (1 hunks)
  • protocol/leiosnotify/client.go (4 hunks)
  • protocol/leiosnotify/client_test.go (1 hunks)
  • protocol/leiosnotify/server.go (1 hunks)
  • protocol/localstatequery/client.go (6 hunks)
  • protocol/localstatequery/client_test.go (2 hunks)
  • protocol/localtxmonitor/client.go (15 hunks)
  • protocol/localtxmonitor/client_test.go (1 hunks)
  • protocol/localtxsubmission/client.go (6 hunks)
  • protocol/localtxsubmission/client_test.go (1 hunks)
  • protocol/localtxsubmission/localtxsubmission.go (2 hunks)
  • protocol/peersharing/client.go (4 hunks)
  • protocol/peersharing/client_test.go (1 hunks)
  • protocol/peersharing/server.go (5 hunks)
  • protocol/peersharing/server_test.go (1 hunks)
  • protocol/protocol.go (4 hunks)
  • protocol/txsubmission/server.go (8 hunks)
  • protocol/txsubmission/server_concurrency_test.go (1 hunks)
  • protocol/txsubmission/server_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
  • protocol/keepalive/client_test.go
  • protocol/blockfetch/client_test.go
  • protocol/chainsync/client_test.go
  • protocol/blockfetch/blockfetch.go
  • protocol/txsubmission/server_test.go
  • protocol/localtxmonitor/client_test.go
  • protocol/localtxsubmission/client_test.go
  • protocol/leiosfetch/server.go
  • protocol/txsubmission/server_concurrency_test.go
  • protocol/peersharing/server_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-04T15:54:22.683Z
Learnt from: wolf31o2
Repo: blinklabs-io/gouroboros PR: 1093
File: ledger/mary/pparams.go:143-150
Timestamp: 2025-11-04T15:54:22.683Z
Learning: In the blinklabs-io/gouroboros repository, the design goal for CBOR round-trip tests is to achieve byte-identical encoding WITHOUT using stored CBOR (cbor.DecodeStoreCbor). Instead, the approach uses proper field types (pointers for optional fields) and relies on the cbor package's deterministic encoding (SortCoreDeterministic) to ensure reproducible output. The stored CBOR pattern should not be suggested as a solution for round-trip fidelity in this codebase.

Applied to files:

  • protocol/localstatequery/client_test.go
🧬 Code graph analysis (16)
protocol/keepalive/client.go (4)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/keepalive/keepalive.go (1)
  • ProtocolName (27-27)
connection/id.go (1)
  • ConnectionId (22-25)
protocol/keepalive/messages.go (1)
  • NewMsgDone (94-101)
protocol/peersharing/client.go (5)
protocol/localtxsubmission/client.go (1)
  • Client (26-36)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/peersharing/peersharing.go (1)
  • ProtocolName (27-27)
connection/id.go (1)
  • ConnectionId (22-25)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/leiosfetch/client.go (4)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/chainsync/client.go (1)
  • Client (29-46)
protocol/leiosnotify/client_test.go (7)
protocol/versiondata.go (6)
  • VersionData (40-46)
  • VersionDataNtN13andUp (143-145)
  • VersionDataNtN11to12 (116-122)
  • DiffusionModeInitiatorOnly (21-21)
  • PeerSharingModeNoPeerSharing (27-27)
  • QueryModeDisabled (36-36)
protocol/handshake/messages.go (1)
  • NewMsgAcceptVersion (88-102)
connection.go (2)
  • Connection (59-103)
  • NewConnection (107-130)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-230)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/leiosnotify/client.go (1)
  • Client (24-33)
protocol/localtxsubmission/client.go (7)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/leiosnotify/client.go (1)
  • Client (24-33)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/localtxmonitor/client.go (1)
  • Client (25-42)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/localtxsubmission/messages.go (1)
  • MessageTypeDone (29-29)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/blockfetch/client.go (2)
protocol/protocol.go (2)
  • Protocol (39-60)
  • New (122-133)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/chainsync/client.go (4)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/chainsync/chainsync.go (1)
  • ProtocolName (30-30)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/chainsync/error.go (1)
  • ErrSyncCancelled (26-26)
protocol/localstatequery/client.go (5)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localtxmonitor/client.go (1)
  • Client (25-42)
protocol/localstatequery/localstatequery.go (1)
  • AcquireTarget (131-133)
protocol/localstatequery/messages.go (4)
  • NewMsgDone (245-252)
  • AcquireFailurePointNotOnChain (44-44)
  • MsgResult (172-175)
  • NewMsgQuery (160-170)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localtxmonitor/client.go (4)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/localtxmonitor/messages.go (1)
  • NewMsgDone (79-86)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/chainsync/client_concurrency_test.go (3)
protocol/chainsync/chainsync.go (3)
  • New (259-267)
  • NewConfig (273-295)
  • ChainSync (204-207)
connection_options.go (1)
  • WithChainSyncConfig (131-135)
protocol/chainsync/client.go (1)
  • Client (29-46)
protocol/leiosnotify/client.go (5)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/localtxsubmission/client.go (1)
  • Client (26-36)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/peersharing/client_test.go (5)
connection.go (2)
  • Connection (59-103)
  • NewConnection (107-130)
protocol/protocol.go (2)
  • ProtocolRoleClient (94-94)
  • New (122-133)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/peersharing/peersharing.go (1)
  • PeerSharing (67-70)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/localtxsubmission/localtxsubmission.go (1)
protocol/localtxsubmission/messages.go (1)
  • MessageTypeDone (29-29)
protocol/txsubmission/server.go (5)
protocol/txsubmission/messages.go (2)
  • TxBody (197-201)
  • NewMsgRequestTxIds (68-82)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/txsubmission/txsubmission.go (1)
  • MaxAckCount (143-143)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/error.go (1)
  • ErrStopServerProcess (21-21)
protocol/localstatequery/client_test.go (3)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-230)
connection.go (1)
  • Connection (59-103)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/peersharing/server.go (2)
protocol/txsubmission/server.go (1)
  • Server (28-41)
protocol/protocol.go (1)
  • Protocol (39-60)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (go)
🔇 Additional comments (47)
protocol/localtxsubmission/localtxsubmission.go (2)

46-49: LGTM: Client-initiated Done from Idle state.

This transition allows the client to gracefully terminate the protocol session when idle, which is semantically correct and expected behavior.


63-66: Issue resolved—client properly handles MessageTypeDone.

Verification confirms the client's messageHandler now includes a case for MessageTypeDone (line 156) that calls handleDone() (line 157). The handler closes the submitResultChan to unblock waiting operations, properly addressing the protocol violation flagged in the previous review. The server can now send Done during the Busy state without leaving the transaction status undefined.

protocol/chainsync/error.go (1)

24-26: LGTM!

The new error variable follows the existing pattern and provides a clear, explicit signal for sync cancellation.

protocol/peersharing/client_test.go (1)

95-114: LGTM!

The shutdown test properly exercises the Start/Stop lifecycle and follows the established pattern used in other protocol tests.

protocol/chainsync/server.go (1)

245-247: LGTM!

The error handling for Stop() prevents the protocol from attempting reinitialization if shutdown fails, which is the correct behavior.

protocol/blockfetch/server.go (1)

179-181: LGTM!

Consistent error handling pattern that prevents protocol reinitialization if Stop() fails.

protocol/leiosnotify/server.go (1)

118-120: LGTM!

Consistent with the error handling pattern applied across other protocol servers.

protocol/localstatequery/client_test.go (2)

25-25: LGTM!

The import alias is needed to support the new shutdown test.


357-376: LGTM!

The shutdown test follows the established pattern and properly exercises the Start/Stop lifecycle of the LocalStateQuery client.

protocol/protocol.go (2)

247-248: LGTM!

Intentionally ignoring the Stop() error in an error path is appropriate, as the protocol is already handling a queue overflow violation.


453-454: LGTM!

Intentionally ignoring the Stop() error in an error path is appropriate, as the protocol is already handling a queue overflow violation.

protocol/leiosnotify/client_test.go (1)

30-115: LeiosNotify client shutdown test helper and e2e Start/Stop look solid

The runTest helper, handshake scaffolding, and TestClientShutdown follow the same pattern as other mini‑protocol tests (mock connection, ouroboros.New, goleak.VerifyNone, bounded timeouts). This gives good coverage that LeiosNotify’s Start/Stop path tears down cleanly without goroutine leaks.

Once the Stop‑before‑Start semantics for the client are finalized, you might consider a small additional test mirroring chainsync.TestStopBeforeStart, but the current test is a good baseline.

Also applies to: 117-136

protocol/chainsync/client_concurrency_test.go (1)

29-103: Good concurrency coverage for ChainSync client Start/Stop semantics

TestConcurrentStartStop and TestStopBeforeStart effectively stress the ChainSync client’s lifecycle: many interleaved Start/Stop calls plus the Stop‑before‑Start scenario, all under goleak and timeouts. This should catch deadlocks and shutdown regressions around the new lifecycle logic.

No issues spotted in the test structure.

Also applies to: 105-148

protocol/peersharing/client.go (2)

24-35: Lifecycle state fields are well-scoped and guarded

onceStart/onceStop plus stateMutex, started, and stopped give clear, race‑free lifecycle control and correctly prevent starting after a Stop. This matches patterns in other clients and looks solid.


159-173: DoneChan‑aware send to sharePeersChan avoids send‑on‑closed panics

Wrapping the send in:

select {
case <-c.DoneChan():
    return protocol.ErrProtocolShuttingDown
case c.sharePeersChan <- msgSharePeers.PeerAddresses:
}

is exactly what’s needed with the new deferred close in Stop(). This ensures handlers bail out cleanly once shutdown begins instead of risking a panic on a closed channel.

protocol/blockfetch/client.go (4)

29-41: Atomic started flag correctly hardens lifecycle against races

Switching started to atomic.Bool and using it only to decide shutdown behavior is appropriate here. With onceStart/onceStop this removes the Start/Stop race on the flag itself without complicating the API.


89-132: Stop() shutdown flow is correctly ordered and channel‑safe

Stop() now:

  • Logs, sends MsgClientDone, and always calls c.Protocol.Stop() to drive muxer shutdown
  • Defers closing blockChan/startBatchResultChan until <-c.DoneChan() when started, otherwise closes immediately.

This sequencing avoids deadlocks and send‑after‑close panics while still unblocking waiters with ErrProtocolShuttingDown.


228-261: Start batch / no‑blocks handlers are now shutdown‑aware

Using:

select {
case <-c.DoneChan():
    return protocol.ErrProtocolShuttingDown
case c.startBatchResultChan <- <nil or err>:
}

ensures outstanding GetBlock/GetBlockRange calls fail cleanly when shutting down instead of panicking on a closed result channel.


263-315: Block handler’s DoneChan checks correctly guard the data path

The two‑stage DoneChan check around decoding and the final select on c.blockChan protect both work and sends from racing with shutdown. This aligns nicely with the deferred channel closure in Stop() and should prevent the in‑flight response races you previously hit.

protocol/localtxsubmission/client.go (2)

25-36: Lifecycle mutex + started flag are fine, but channel closure needs consolidation

The addition of stateMutex and started to guard Start/Stop state is reasonable and race‑free given you only read/write started under that mutex. The main concern is how submitResultChan is now closed (see below).


149-207: DoneChan‑aware result signaling looks good once closure is deduplicated

The updated handleAcceptTx and handleRejectTx use:

select {
case <-c.DoneChan():
    return protocol.ErrProtocolShuttingDown
case c.submitResultChan <- <err or nil>:
}

which is exactly what you want with deferred channel closing: writers bail out cleanly on shutdown instead of racing a close. Once the closure is routed through a single sync.Once helper, this path should be robust.

protocol/leiosfetch/client.go (3)

26-38: Atomic lifecycle flags give clear, race‑free Start/Stop semantics

Introducing started and stopped as atomic.Bool and using them to prevent Start after Stop is appropriate here. This makes lifecycle intent explicit and safe under concurrent Start/Stop calls.


92-145: Stop() shutdown path is well‑sequenced with deferred channel closure

Stop() now:

  • Logs, sends MsgDone, preserves any SendMessage error, and always attempts c.Protocol.Stop()
  • Closes all result channels only after DoneChan() when started is true, otherwise immediately
  • Marks started=false and stopped=true atomically.

This is consistent with other protocols in the PR and should avoid send‑after‑close panics while keeping the API simple.


242-285: DoneChan‑aware sends on result channels match the new Stop semantics

All handlers (handleBlock, handleBlockTxs, handleVotes, handleNextBlockAndTxsInRange, handleLastBlockAndTxsInRange) now gate sends with:

select {
case <-c.DoneChan():
    return protocol.ErrProtocolShuttingDown
case <chan> <- msg:
}

Combined with deferred channel closure in Stop(), this gives a clean shutdown story with no send‑after‑close risk.

protocol/chainsync/client.go (4)

28-38: Started flag is a simple, effective lifecycle indicator

Adding started atomic.Bool and setting it in Start() gives Stop() a reliable way to distinguish “never started” from “active” when deciding how to close readyForNextBlockChan. This is a lightweight, race‑free solution.


132-169: Stop() now shuts down the protocol and channel safely

Stop():

  • Holds busyMutex while sending MsgDone, preserving existing mutual exclusion
  • Calls c.Protocol.Stop() and logs (but doesn’t bubble) any error
  • Defers closing readyForNextBlockChan until <-c.DoneChan() when started is true, otherwise closes immediately.

This removes the previous write‑after‑close race on readyForNextBlockChan while still unblocking consumers cleanly.


330-364: GetAvailableBlockRange now handles shutdown and cancellation explicitly

Handling:

case <-c.DoneChan():
    return start, end, protocol.ErrProtocolShuttingDown
case ready, ok := <-c.readyForNextBlockChan:
    if !ok { return start, end, protocol.ErrProtocolShuttingDown }
    if !ready { return start, end, ErrSyncCancelled }
    // else send another RequestNext

means callers won’t hang if the sync is cancelled or the protocol shuts down while they’re waiting on the range; they get a clear, typed error instead.


742-800: DoneChan‑aware signaling on readyForNextBlockChan fixes the prior race

Both handleRollForward and handleRollBackward now signal readiness/cancellation via:

select {
case <-c.DoneChan():
    return protocol.ErrProtocolShuttingDown
case c.readyForNextBlockChan <- <true/false>:
}

With readyForNextBlockChan only closed after DoneChan() in Stop(), this removes the send‑after‑close panic previously called out while still giving syncLoop and range callers precise readiness/cancellation signals.

protocol/localtxmonitor/client.go (9)

30-42: LGTM: Clean lifecycle and acquired-state synchronization.

The addition of acquiredMutex and stateMutex provides proper synchronization for the acquired state and lifecycle flags (started, stopped), respectively. This separation of concerns is clean and consistent with other protocol clients.


89-103: LGTM: Start() properly synchronizes lifecycle state.

The stateMutex correctly guards the write to started, ensuring no data race with readers in Stop() and the channel cleanup logic.


106-148: LGTM: Stop() correctly implements shutdown with deferred cleanup.

The implementation properly:

  • Sets stopped under stateMutex for synchronized access
  • Calls Protocol.Stop() after releasing busyMutex to avoid potential deadlocks
  • Defers channel closure until DoneChan() closes, preventing handler goroutines from writing to closed channels

This addresses the critical issues raised in past reviews.


151-178: LGTM: Shutdown-aware public methods.

Acquire() and Release() correctly check the stopped flag under stateMutex and return protocol.ErrProtocolShuttingDown when shutting down, preventing operations after shutdown has begun.


193-227: LGTM: HasTx() properly guards state access.

The function checks stopped under stateMutex and reads acquired under acquiredMutex, ensuring synchronized access to both lifecycle and acquired state.


230-264: LGTM: NextTx() properly guards state access.

Synchronized checks for stopped and acquired flags prevent races and ensure clean shutdown behavior.


267-301: LGTM: GetSizes() properly guards state access.

Synchronized checks for stopped and acquired flags are consistent with other public methods.


324-394: LGTM: Message handlers properly guard channel sends.

All handlers use non-blocking select with DoneChan() to avoid writing to result channels during shutdown. This pattern prevents the race condition where handlers write to channels after Stop() closes them.


409-418: LGTM: release() properly synchronizes acquired state.

The acquiredMutex correctly guards the write to acquired, ensuring no race with readers in HasTx(), NextTx(), and GetSizes().

protocol/localstatequery/client.go (5)

38-47: LGTM: Lifecycle synchronization fields added.

The addition of acquiredMutex, stateMutex, and started flag provides proper synchronization for state access, consistent with other protocol clients.


102-115: LGTM: Start() properly synchronizes lifecycle state.

The stateMutex correctly guards the write to started, resolving the data race identified in past reviews where Start() and Stop() accessed started without synchronization.


117-149: LGTM: Stop() correctly implements shutdown with Protocol.Stop().

The implementation properly:

  • Calls Protocol.Stop() to drive protocol shutdown (addressing past review feedback)
  • Reads started under stateMutex for race-free access
  • Defers channel closure until DoneChan() closes if the client was started

This resolves the issues raised in past reviews.


905-968: LGTM: Message handlers properly guard channel sends.

All handlers (handleAcquired, handleFailure, handleResult) use non-blocking select with DoneChan() to avoid writing to result channels during shutdown, preventing panics on closed channels.


970-1041: LGTM: Helper functions properly synchronize acquired state.

All helpers (acquire, release, runQuery) correctly use acquiredMutex to guard reads and writes to the acquired flag, ensuring race-free access.

protocol/txsubmission/server.go (5)

33-64: LGTM: Proper initialization for restart-capable server.

The changes provide solid foundations for safe restart:

  • ackCount as int32 enables atomic operations
  • Buffered result channels prevent blocking on non-blocking sends
  • done, doneMutex, restartMutex, and stopped support coordinated shutdown and restart

93-130: LGTM: Stop() properly coordinates with restart logic.

The implementation correctly:

  • Uses restartMutex to coordinate with handleDone() restart path
  • Guards the select-and-close of done with doneMutex to prevent double-close panics
  • Provides doneChan() accessor for safe concurrent reads
  • Offers IsStopped() to check permanent stop state

This addresses all critical race conditions flagged in past reviews.


132-193: LGTM: RequestTxIds() safely uses atomic operations for ackCount.

The implementation properly:

  • Uses atomic.LoadInt32 and atomic.StoreInt32 for race-free access to ackCount
  • Validates before type conversion to uint16
  • Uses select with doneChan() for shutdown-aware blocking

This resolves the data race on ackCount flagged in past reviews.


195-225: LGTM: RequestTxs() properly handles shutdown.

The select with doneChan() and channel-closed checks provide clean shutdown semantics, consistent with RequestTxIds().


276-338: LGTM: handleDone() restart logic is properly synchronized.

The restart implementation correctly:

  • Uses restartMutex to coordinate with Stop()
  • Guards the select-and-close of done with doneMutex (matching Stop()'s pattern)
  • Does not reset onceStop, preserving idempotent stop semantics
  • Atomically resets ackCount
  • Includes TOCTOU protection via IsStopped() check
  • Starts the new protocol outside the lock for better responsiveness

This resolves all critical race conditions identified in past reviews.

@wolf31o2 wolf31o2 force-pushed the perf/protocols-improvements branch from 5ddaf9a to eb18d3d Compare November 16, 2025 21:24
@wolf31o2
Copy link
Member Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 16, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
protocol/txsubmission/server.go (1)

27-41: Data race confirmed: concurrent channel field accesses without synchronization

Analysis confirms the critical issue. The code exhibits a clear data race:

  • Unsynchronized readers: RequestTxIds() (line ~175) and RequestTxs() (line 220) read requestTxIdsResultChan and requestTxsResultChan without holding restartMutex
  • Unsynchronized senders: handleReplyTxIds() (line 257), handleReplyTxs() (line 272), and handleDone() (line 286) write to these channel fields without holding restartMutex
  • Synchronized reassigner: handleDone() (lines 319-320) reassigns both channel fields while holding restartMutex

This violates Go's memory model: concurrent accesses to the same struct fields without synchronization constitute a data race. The pattern can cause panics (nil channel, closed channel) or undefined behavior.

The proposed fix is sound: guard all accesses with restartMutex and snapshot the channel before any blocking operation, ensuring no concurrent field read/write during reassignment.

🧹 Nitpick comments (2)
protocol/localtxsubmission/localtxsubmission.go (1)

63-66: Test coverage for Done during active transaction is missing; add test case.

The code implementation is sound—the Done transition from Busy state is properly defined, and the client's handleDone() handler (client.go:210-220) correctly closes the result channel when the server initiates shutdown.

However, current tests do not verify this scenario:

  • TestClientShutdown only tests client lifecycle without an active transaction
  • TestSubmitTxAccept and TestSubmitTxRject test normal response paths with AcceptTx and RejectTx, not Done
  • No test mock sends MessageTypeDone while the client is in Busy state

Recommended action: Add a test case (e.g., TestSubmitTxServerShutdown) that sends SubmitTx and then receives Done from the server to verify the client correctly handles forced shutdown during transaction processing.

protocol/localstatequery/client.go (1)

905-923: acquiredMutex correctly protects acquired across acquire/release/query

All reads/writes of c.acquired now go through acquiredMutex (handleAcquired, acquire, release, runQuery), eliminating the prior race on this flag while preserving the expected behavior (reacquire vs acquire, lazy acquire in runQuery). You might optionally convert the lock/unlock pairs to defer style for readability, but functionally this looks correct.

Also applies to: 970-1006, 1008-1018, 1020-1041

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ddaf9a and eb18d3d.

📒 Files selected for processing (33)
  • protocol/blockfetch/blockfetch.go (1 hunks)
  • protocol/blockfetch/client.go (7 hunks)
  • protocol/blockfetch/client_test.go (1 hunks)
  • protocol/blockfetch/server.go (1 hunks)
  • protocol/chainsync/chainsync.go (1 hunks)
  • protocol/chainsync/client.go (6 hunks)
  • protocol/chainsync/client_concurrency_test.go (1 hunks)
  • protocol/chainsync/client_test.go (2 hunks)
  • protocol/chainsync/error.go (1 hunks)
  • protocol/chainsync/server.go (1 hunks)
  • protocol/keepalive/client.go (2 hunks)
  • protocol/keepalive/client_test.go (1 hunks)
  • protocol/leiosfetch/client.go (5 hunks)
  • protocol/leiosfetch/server.go (1 hunks)
  • protocol/leiosnotify/client.go (3 hunks)
  • protocol/leiosnotify/client_concurrency_test.go (1 hunks)
  • protocol/leiosnotify/client_test.go (1 hunks)
  • protocol/leiosnotify/server.go (1 hunks)
  • protocol/localstatequery/client.go (6 hunks)
  • protocol/localstatequery/client_test.go (3 hunks)
  • protocol/localtxmonitor/client.go (15 hunks)
  • protocol/localtxmonitor/client_test.go (2 hunks)
  • protocol/localtxsubmission/client.go (6 hunks)
  • protocol/localtxsubmission/client_test.go (2 hunks)
  • protocol/localtxsubmission/localtxsubmission.go (2 hunks)
  • protocol/peersharing/client.go (4 hunks)
  • protocol/peersharing/client_test.go (1 hunks)
  • protocol/peersharing/server.go (4 hunks)
  • protocol/peersharing/server_test.go (1 hunks)
  • protocol/protocol.go (4 hunks)
  • protocol/txsubmission/server.go (8 hunks)
  • protocol/txsubmission/server_concurrency_test.go (1 hunks)
  • protocol/txsubmission/server_test.go (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • protocol/leiosnotify/client_concurrency_test.go
🚧 Files skipped from review as they are similar to previous changes (12)
  • protocol/chainsync/error.go
  • protocol/leiosfetch/server.go
  • protocol/txsubmission/server_test.go
  • protocol/peersharing/server_test.go
  • protocol/protocol.go
  • protocol/txsubmission/server_concurrency_test.go
  • protocol/peersharing/client_test.go
  • protocol/leiosnotify/server.go
  • protocol/blockfetch/client_test.go
  • protocol/keepalive/client_test.go
  • protocol/chainsync/chainsync.go
  • protocol/leiosnotify/client_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-04T15:54:22.683Z
Learnt from: wolf31o2
Repo: blinklabs-io/gouroboros PR: 1093
File: ledger/mary/pparams.go:143-150
Timestamp: 2025-11-04T15:54:22.683Z
Learning: In the blinklabs-io/gouroboros repository, the design goal for CBOR round-trip tests is to achieve byte-identical encoding WITHOUT using stored CBOR (cbor.DecodeStoreCbor). Instead, the approach uses proper field types (pointers for optional fields) and relies on the cbor package's deterministic encoding (SortCoreDeterministic) to ensure reproducible output. The stored CBOR pattern should not be suggested as a solution for round-trip fidelity in this codebase.

Applied to files:

  • protocol/localstatequery/client_test.go
🔇 Additional comments (48)
protocol/localtxsubmission/localtxsubmission.go (1)

46-49: LGTM! Client-initiated Done transition enables graceful shutdown.

Allowing the client to send Done from the Idle state is a clean way to support graceful, client-initiated shutdown when no transaction is pending. This aligns well with the PR's goal of improving lifecycle management across protocols.

protocol/blockfetch/blockfetch.go (1)

122-122: LGTM! Consistent performance tuning across protocols.

The 50% increase in DefaultRecvQueueSize (256→384) aligns with similar adjustments in other protocols (e.g., chainsync 50→75). This improves throughput for block-fetch operations.

protocol/blockfetch/server.go (1)

179-181: LGTM! Error-aware restart prevents continuing on Stop() failure.

The handleClientDone method now properly checks Stop() for errors before reinitializing and restarting the protocol, preventing restart attempts when shutdown fails.

protocol/localtxmonitor/client_test.go (2)

93-93: LGTM! Timeout extension accommodates async shutdown.

Extending the timeout from 2 to 5 seconds provides sufficient margin for the shutdown sequences introduced across protocol clients.


300-319: LGTM! Shutdown test validates lifecycle management.

The new TestClientShutdown properly exercises the Start/Stop lifecycle, ensuring the LocalTxMonitor client can cleanly shut down without errors.

protocol/chainsync/server.go (1)

245-247: LGTM! Consistent error-aware restart pattern.

The handleDone method now mirrors the error propagation pattern in blockfetch/server.go, ensuring Stop() errors are surfaced before attempting restart.

protocol/chainsync/client_test.go (3)

83-88: LGTM! Proper cleanup with error logging.

The test cleanup now stops the ChainSync client and logs any Stop() errors via t.Logf, surfacing shutdown issues without failing tests. This addresses the previously flagged concern.


95-95: LGTM! Consistent timeout adjustment.

The 5-second timeout aligns with similar adjustments across other protocol tests to accommodate extended shutdown sequences.


285-304: LGTM! Comprehensive shutdown test.

TestClientShutdown properly validates the Start/Stop lifecycle for the ChainSync client, ensuring clean shutdown behavior.

protocol/localtxsubmission/client_test.go (2)

86-86: LGTM! Consistent timeout adjustment.

The extended timeout accommodates async shutdown flows across protocol tests.


167-186: LGTM! Standard shutdown test pattern.

TestClientShutdown validates the LocalTxSubmission client lifecycle, consistent with shutdown tests across other protocol implementations.

protocol/localstatequery/client_test.go (3)

25-25: LGTM! Import alias improves clarity.

Adding the ouroboros alias for the main package import improves readability and is consistent with Go conventions for avoiding package name conflicts.


112-112: LGTM! Consistent timeout extension.

The 5-second timeout aligns with other protocol test adjustments for async shutdown handling.


357-376: LGTM! Standard shutdown validation.

TestClientShutdown properly exercises the LocalStateQuery client Start/Stop lifecycle.

protocol/keepalive/client.go (3)

33-34: LGTM! Lifecycle fields support idempotent shutdown.

The onceStop and stopErr fields enable thread-safe, idempotent Stop() behavior, consistent with lifecycle patterns across other protocol clients.


98-126: LGTM! Well-structured Stop() with proper coordination.

The Stop() implementation correctly:

  • Uses onceStop for idempotency
  • Stops and clears the timer under mutex protection before sending MsgDone
  • Prioritizes send errors over stop errors
  • Coordinates with the Start() goroutine's defensive cleanup on DoneChan

The mutex synchronization and timer lifecycle management prevent races between Stop() and timer callbacks.


128-148: LGTM! Timer management properly refactored.

Extracting startTimer() as a separate method improves clarity and ensures consistent timer lifecycle management with proper mutex protection.

protocol/peersharing/server.go (2)

49-55: LGTM! Idempotent shutdown with proper error propagation.

The Stop() method correctly uses sync.Once to ensure single execution and propagates errors from Protocol.Stop().


127-132: LGTM! Correct restart sequence avoiding onceStop limitation.

The restart logic properly stops the current Protocol instance directly via s.Protocol.Stop() before reinitializing, ensuring each protocol incarnation can be stopped independently.

protocol/peersharing/client.go (3)

76-95: LGTM! Safe lifecycle management preventing Start after Stop.

The Start() method correctly checks the stopped flag under mutex protection and returns early if the client has been stopped, preventing invalid state transitions.


98-124: LGTM! Proper channel cleanup synchronized with protocol shutdown.

The conditional channel closure ensures:

  • If started, channels close only after protocol fully shuts down (via DoneChan())
  • If never started, channels close immediately

This prevents send-on-closed-channel panics.


169-173: LGTM! Shutdown-aware message handling.

The non-blocking select with DoneChan() ensures the handler returns gracefully during shutdown instead of attempting to send on a closed channel.

protocol/chainsync/client_concurrency_test.go (2)

30-103: LGTM! Comprehensive concurrency test.

This test validates that concurrent Start() and Stop() operations don't cause deadlocks or races, with appropriate timeout detection and leak verification.


106-148: LGTM! Critical edge case coverage.

Testing Stop() before Start() ensures the lifecycle guards handle out-of-order operations gracefully without panics or deadlocks.

protocol/leiosnotify/client.go (3)

73-91: LGTM! Prevents Start after Stop.

The mutex-protected stopped check ensures Start() is a no-op if Stop() has already been called, preventing invalid state transitions.


93-120: LGTM! Proper shutdown coordination.

The Stop() method correctly:

  • Marks the client as stopped
  • Attempts to stop the protocol
  • Defers channel closure until protocol shutdown completes (if started)

156-190: LGTM! All message handlers are shutdown-aware.

Each handler uses a non-blocking select with DoneChan() to gracefully handle shutdown, preventing panics from sending on closed channels.

protocol/blockfetch/client.go (4)

90-101: LGTM! Clean startup with atomic flag.

Using atomic.Bool for the started flag ensures race-free tracking of the protocol lifecycle.


104-132: LGTM! Proper shutdown sequence with channel cleanup.

The Stop() method correctly:

  • Sends the Done message
  • Calls Protocol.Stop() to unregister from muxer
  • Defers channel closure until protocol fully shuts down (preventing send-on-closed panics)

237-241: LGTM! Shutdown-aware batch start handling.

The select statement ensures graceful shutdown by checking DoneChan() before sending to the result channel.


289-314: LGTM! Comprehensive shutdown handling in block delivery.

The handler checks for shutdown before processing and uses a non-blocking select when sending blocks via channel, ensuring graceful termination.

protocol/leiosfetch/client.go (3)

92-107: LGTM! Race-free lifecycle coordination using atomic operations.

The use of atomic.Bool for both stopped and started flags ensures thread-safe lifecycle management without data races.


109-145: LGTM! Comprehensive shutdown with atomic flag coordination.

The Stop() method properly:

  • Marks the client as stopped atomically
  • Always attempts protocol shutdown
  • Defers result channel closure until protocol completes
  • Updates lifecycle flags atomically

242-285: LGTM! Consistent shutdown-aware message handling across all handlers.

All message handlers use the same pattern: non-blocking select with DoneChan() to gracefully handle shutdown and prevent send-on-closed-channel panics.

protocol/chainsync/client.go (5)

119-130: LGTM! Clean startup with atomic lifecycle tracking.

The started atomic flag is set before starting the protocol, enabling proper coordination in Stop().


133-169: LGTM! Proper shutdown sequence preventing channel races.

The Stop() method correctly:

  • Sends Done while holding busyMutex
  • Calls Protocol.Stop() to ensure muxer unregistration
  • Defers channel closure until protocol fully shuts down (preventing handler panics)

350-358: LGTM! Proper channel closure handling.

Checking the ok return value and distinguishing between channel closure (shutdown) and false readiness (cancellation) provides robust error handling.


743-760: LGTM! Shutdown-aware channel signaling in RollForward.

Both cancellation and readiness signals now use non-blocking selects with DoneChan(), preventing send-on-closed-channel panics during shutdown.


782-800: LGTM! Consistent shutdown-aware signaling in RollBackward.

The handler mirrors the handleRollForward pattern, using non-blocking selects for both cancellation and readiness signals.

protocol/localtxmonitor/client.go (5)

89-103: LGTM! Clean startup with mutex-protected flag.

The started flag is set under stateMutex protection, ensuring coordination with Stop().


106-148: LGTM! Careful lock ordering prevents deadlock.

The Stop() method correctly:

  • Releases locks before calling Protocol.Stop() (preventing deadlock)
  • Defers channel closure until protocol shutdown completes
  • Marks the client as stopped under proper synchronization

152-273: LGTM! Consistent stopped-state checks across all operations.

All public operations check the stopped flag under stateMutex and return ErrProtocolShuttingDown, preventing operations on a stopped client.


210-212: LGTM! Fine-grained locking for acquired state.

The separate acquiredMutex protects the acquired flag independently from operation-level locking, preventing races without holding busyMutex unnecessarily.

Also applies to: 247-249, 284-286, 333-336, 414-416


337-341: LGTM! All message handlers are shutdown-aware.

Each handler uses a non-blocking select with DoneChan() to prevent send-on-closed-channel panics during shutdown.

Also applies to: 354-358, 371-375, 388-392

protocol/localstatequery/client.go (2)

29-47: Lifecycle synchronization and channel shutdown pattern look solid

stateMutex + started make Start/Stop coordination race‑free, and deferring result‑channel closure until <-c.DoneChan() when started avoids send‑on‑closed panics while still handling the “never started” case correctly. This aligns well with the patterns used in other protocol clients.

Also applies to: 102-115, 117-149


886-903: Shutdown‑aware handlers correctly avoid send‑on‑closed races

The updated handleAcquired, handleFailure, and handleResult now all select on c.DoneChan() vs their respective result channels, returning ErrProtocolShuttingDown when the protocol is stopping and only sending while the protocol is still alive. Combined with closing the result channels after DoneChan in Stop(), this removes the TOCTOU send‑on‑closed risk that existed in other clients.

Also applies to: 925-951, 953-968

protocol/localtxsubmission/client.go (1)

25-37: Client lifecycle, shutdown, and result‑channel handling are consistent and safe

The combination of stateMutex/started, onceStart/onceStop, and closeSubmitResultOnce gives you race‑free Start/Stop semantics and idempotent, panic‑free closure of submitResultChan. Using DoneChan in the accept/reject handlers and treating a closed channel as ErrProtocolShuttingDown in SubmitTx matches the shutdown pattern used elsewhere and avoids TOCTOU send‑on‑closed issues.

Also applies to: 77-91, 93-126, 150-227

protocol/txsubmission/server.go (1)

33-41: ackCount, done, and Stop()/restart coordination now look race‑free

Switching ackCount to int32 with atomic.LoadInt32/StoreInt32, introducing done plus doneMutex, and using onceStop + restartMutex + stopped give the server well‑defined permanent‑stop vs restart semantics without the earlier double‑close and TOCTOU issues on done. The RequestTxIds/RequestTxs select on doneChan() correctly surface shutdown as ErrProtocolShuttingDown, and the restart path in handleDone() cleanly resets per‑restart state (channels, done, ackCount) under restartMutex.

Also applies to: 49-57, 93-116, 118-130, 156-171, 178-192, 217-222, 285-289, 299-329

@wolf31o2 wolf31o2 force-pushed the perf/protocols-improvements branch from eb18d3d to 2228478 Compare November 16, 2025 23:30
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
protocol/chainsync/client.go (1)

28-46: Stop-before-Start and Start-after-Stop can lead to send-on-closed-channel panics and potential hangs

Verified against actual implementation in protocol/chainsync/client.go. The concerns are confirmed critical:

  1. No stopped flag in Client struct (line 29-46): Unlike leiosnotify and leiosfetch clients, which have a stopped flag and guard Start() with if c.stopped { return }, chainsync.Client lacks this guard.

  2. Stop-before-Start closes readyForNextBlockChan without prevention of later Start (lines 133-169):

    • Stop() checks c.started.Load() at line 158; if false, immediately closes readyForNextBlockChan at line 165.
    • Start() (lines 119-130) has no guard against being called after Stop(), so the channel is never recreated.
    • When handlers like handleRollBackward later execute their select { case <-c.DoneChan(): ... case c.readyForNextBlockChan <- false: } (lines 784-788), they will panic on send-to-closed-channel.
  3. Unconditional SendMessage in Stop() (line 145): If Stop() is called before the protocol was ever started (no prior Protocol.Start()), the underlying sendQueueChan is nil, and SendMessage blocks indefinitely. The new TestStopBeforeStart (line 106) does not exercise subsequent operations that would trigger this hang.

Suggested fix (align with other clients):
Introduce a stopped lifecycle flag in the struct and gate both Start() and Stop() to match the pattern in leiosnotify.Client and leiosfetch.Client:

  • In the struct:
 type Client struct {
     *protocol.Protocol
@@
     onceStart                sync.Once
     onceStop                 sync.Once
     started                  atomic.Bool
+    stopped                  atomic.Bool // prevents Start() after Stop()
  • In Start() (gate execution after stop):
 func (c *Client) Start() {
     c.onceStart.Do(func() {
+        if c.stopped.Load() {
+            return
+        }
         c.Protocol.Logger().
             Debug("starting client protocol",
                 "component", "network",
                 "protocol", ProtocolName,
                 "connection_id", c.callbackContext.ConnectionId.String(),
             )
         c.started.Store(true)
         c.Protocol.Start()
     })
 }
  • In Stop() (guard SendMessage when protocol not started):
 func (c *Client) Stop() error {
     var err error
     c.onceStop.Do(func() {
+        c.stopped.Store(true)
         c.Protocol.Logger().
             Debug("stopping client protocol",
                 "component", "network",
                 "protocol", ProtocolName,
                 "connection_id", c.callbackContext.ConnectionId.String(),
             )
         c.busyMutex.Lock()
         defer c.busyMutex.Unlock()
         msg := NewMsgDone()
-        if err = c.SendMessage(msg); err != nil {
-            return
+        if c.started.Load() {
+            if sendErr := c.SendMessage(msg); sendErr != nil {
+                err = sendErr
+                // Still proceed to stopping the protocol
+            }
         }
         if stopErr := c.Protocol.Stop(); stopErr != nil {
             c.Protocol.Logger().
                 Error("error stopping protocol",
                     "component", "network",
                     "protocol", ProtocolName,
                     "connection_id", c.callbackContext.ConnectionId.String(),
                     "error", stopErr,
                 )
         }
         // Defer closing channel until protocol fully shuts down (only if started)
         if c.started.Load() {
             go func() {
                 <-c.DoneChan()
                 close(c.readyForNextBlockChan)
             }()
         } else {
             // If protocol was never started, close channel immediately
             close(c.readyForNextBlockChan)
         }
     })
     return err
 }

Also add a test that calls Stop() before Start(), then Start(), and immediately attempts Sync() or GetAvailableBlockRange() to verify no send-on-closed-channel panic occurs.

Also applies to: 119-130, 132-169, 350-359, 458-487, 615-762, 764-801

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb18d3d and 2228478.

📒 Files selected for processing (33)
  • protocol/blockfetch/blockfetch.go (1 hunks)
  • protocol/blockfetch/client.go (7 hunks)
  • protocol/blockfetch/client_test.go (1 hunks)
  • protocol/blockfetch/server.go (1 hunks)
  • protocol/chainsync/chainsync.go (1 hunks)
  • protocol/chainsync/client.go (6 hunks)
  • protocol/chainsync/client_concurrency_test.go (1 hunks)
  • protocol/chainsync/client_test.go (2 hunks)
  • protocol/chainsync/error.go (1 hunks)
  • protocol/chainsync/server.go (1 hunks)
  • protocol/keepalive/client.go (2 hunks)
  • protocol/keepalive/client_test.go (1 hunks)
  • protocol/leiosfetch/client.go (5 hunks)
  • protocol/leiosfetch/server.go (1 hunks)
  • protocol/leiosnotify/client.go (3 hunks)
  • protocol/leiosnotify/client_concurrency_test.go (1 hunks)
  • protocol/leiosnotify/client_test.go (1 hunks)
  • protocol/leiosnotify/server.go (1 hunks)
  • protocol/localstatequery/client.go (6 hunks)
  • protocol/localstatequery/client_test.go (3 hunks)
  • protocol/localtxmonitor/client.go (15 hunks)
  • protocol/localtxmonitor/client_test.go (2 hunks)
  • protocol/localtxsubmission/client.go (6 hunks)
  • protocol/localtxsubmission/client_test.go (2 hunks)
  • protocol/localtxsubmission/localtxsubmission.go (2 hunks)
  • protocol/peersharing/client.go (4 hunks)
  • protocol/peersharing/client_test.go (1 hunks)
  • protocol/peersharing/server.go (4 hunks)
  • protocol/peersharing/server_test.go (1 hunks)
  • protocol/protocol.go (4 hunks)
  • protocol/txsubmission/server.go (10 hunks)
  • protocol/txsubmission/server_concurrency_test.go (1 hunks)
  • protocol/txsubmission/server_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (12)
  • protocol/chainsync/error.go
  • protocol/leiosnotify/client_concurrency_test.go
  • protocol/chainsync/chainsync.go
  • protocol/keepalive/client_test.go
  • protocol/blockfetch/blockfetch.go
  • protocol/leiosfetch/server.go
  • protocol/leiosnotify/client_test.go
  • protocol/peersharing/client.go
  • protocol/txsubmission/server_test.go
  • protocol/localtxsubmission/localtxsubmission.go
  • protocol/peersharing/server_test.go
  • protocol/chainsync/server.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-04T15:54:22.683Z
Learnt from: wolf31o2
Repo: blinklabs-io/gouroboros PR: 1093
File: ledger/mary/pparams.go:143-150
Timestamp: 2025-11-04T15:54:22.683Z
Learning: In the blinklabs-io/gouroboros repository, the design goal for CBOR round-trip tests is to achieve byte-identical encoding WITHOUT using stored CBOR (cbor.DecodeStoreCbor). Instead, the approach uses proper field types (pointers for optional fields) and relies on the cbor package's deterministic encoding (SortCoreDeterministic) to ensure reproducible output. The stored CBOR pattern should not be suggested as a solution for round-trip fidelity in this codebase.

Applied to files:

  • protocol/localstatequery/client_test.go
🧬 Code graph analysis (18)
protocol/chainsync/client_test.go (4)
protocol/chainsync/chainsync.go (1)
  • ChainSync (204-207)
protocol/chainsync/client.go (1)
  • Client (29-46)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-230)
connection.go (1)
  • Connection (59-103)
protocol/chainsync/client.go (5)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/chainsync/chainsync.go (1)
  • ProtocolName (30-30)
connection/id.go (1)
  • ConnectionId (22-25)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/chainsync/error.go (1)
  • ErrSyncCancelled (26-26)
protocol/leiosfetch/client.go (5)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/chainsync/client.go (1)
  • Client (29-46)
protocol/leiosnotify/client.go (1)
  • Client (24-34)
protocol/localstatequery/client_test.go (10)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-230)
protocol/chainsync/client_test.go (1)
  • TestClientShutdown (285-304)
protocol/keepalive/client_test.go (1)
  • TestClientShutdown (303-322)
protocol/leiosnotify/client_test.go (1)
  • TestClientShutdown (117-136)
protocol/localtxmonitor/client_test.go (1)
  • TestClientShutdown (300-319)
protocol/localtxsubmission/client_test.go (1)
  • TestClientShutdown (167-186)
protocol/peersharing/client_test.go (1)
  • TestClientShutdown (95-114)
connection.go (1)
  • Connection (59-103)
protocol/localstatequery/localstatequery.go (1)
  • LocalStateQuery (116-119)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/keepalive/client.go (3)
protocol/protocol.go (1)
  • Protocol (39-60)
connection/id.go (1)
  • ConnectionId (22-25)
protocol/keepalive/messages.go (1)
  • NewMsgDone (94-101)
protocol/localstatequery/client.go (5)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localtxmonitor/client.go (1)
  • Client (25-42)
protocol/keepalive/messages.go (1)
  • NewMsgDone (94-101)
protocol/localstatequery/messages.go (4)
  • NewMsgDone (245-252)
  • AcquireFailurePointNotOnChain (44-44)
  • MsgResult (172-175)
  • NewMsgQuery (160-170)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/peersharing/client_test.go (5)
connection.go (2)
  • Connection (59-103)
  • NewConnection (107-130)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-230)
protocol/peersharing/peersharing.go (1)
  • PeerSharing (67-70)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/peersharing/server.go (2)
protocol/txsubmission/server.go (1)
  • Server (28-41)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localtxmonitor/client.go (8)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/leiosnotify/client.go (1)
  • Client (24-34)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/localtxsubmission/client.go (1)
  • Client (26-37)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/keepalive/messages.go (1)
  • NewMsgDone (94-101)
protocol/localtxmonitor/messages.go (1)
  • NewMsgDone (79-86)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/chainsync/client_concurrency_test.go (4)
protocol/chainsync/chainsync.go (3)
  • New (259-267)
  • NewConfig (273-295)
  • ChainSync (204-207)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithChainSyncConfig (131-135)
protocol/chainsync/client.go (1)
  • Client (29-46)
protocol/leiosnotify/client_concurrency_test.go (1)
  • TestStopBeforeStart (27-69)
protocol/localtxsubmission/client.go (4)
protocol/localtxsubmission/localtxsubmission.go (3)
  • Config (81-84)
  • CallbackContext (87-91)
  • ProtocolName (27-27)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localtxsubmission/messages.go (1)
  • MessageTypeDone (29-29)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/server.go (5)
protocol/txsubmission/messages.go (2)
  • TxBody (197-201)
  • NewMsgRequestTxIds (68-82)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/txsubmission/txsubmission.go (2)
  • ProtocolName (27-27)
  • MaxAckCount (143-143)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/error.go (1)
  • ErrStopServerProcess (21-21)
protocol/txsubmission/server_concurrency_test.go (2)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/txsubmission/server.go (1)
  • Server (28-41)
protocol/blockfetch/client.go (2)
protocol/protocol.go (2)
  • Protocol (39-60)
  • New (122-133)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/leiosnotify/client.go (5)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/leiosfetch/client.go (1)
  • Client (26-38)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/leiosnotify/messages.go (1)
  • NewMsgDone (149-156)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/blockfetch/client_test.go (4)
protocol/keepalive/client_test.go (1)
  • TestClientShutdown (303-322)
connection.go (1)
  • Connection (59-103)
protocol/blockfetch/blockfetch.go (1)
  • BlockFetch (102-105)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/localtxmonitor/client_test.go (2)
connection.go (1)
  • Connection (59-103)
protocol/localtxmonitor/client.go (1)
  • Client (25-42)
protocol/localtxsubmission/client_test.go (6)
connection.go (1)
  • Connection (59-103)
protocol/localtxsubmission/localtxsubmission.go (2)
  • LocalTxSubmission (75-78)
  • ProtocolId (28-28)
protocol/localtxsubmission/client.go (1)
  • Client (26-37)
internal/test/helpers.go (1)
  • DecodeHexString (14-22)
ledger/babbage.go (1)
  • TxTypeBabbage (39-39)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (go)
🔇 Additional comments (45)
protocol/keepalive/client.go (1)

33-34: LGTM: Lifecycle control fields added.

The onceStop and stopErr fields properly support idempotent shutdown semantics, consistent with the pattern used in the underlying Protocol struct.

protocol/protocol.go (3)

175-193: Stop() API change and implementation look consistent and safe

Using onceStop with muxer unregistration and a nil error return is fine for now and matches the new callers that ignore the error in error paths. The method is idempotent and safe to call from the violation paths in SendMessage and readLoop.


247-252: Using Stop() on queue overflow is reasonable

On send-queue byte-limit violation, you emit ErrProtocolViolationQueueExceeded, call Stop(), and return the same error. This is a sensible fail-fast path; ignoring the Stop() error is appropriate given the current implementation always returns nil.


453-458: Recv-side queue overflow shutdown path is consistent with send-side

The recv-byte-limit violation mirrors the send path: emit ErrProtocolViolationQueueExceeded, call Stop(), and abort. This keeps shutdown behavior symmetric and centralizes muxer cleanup in Stop().

protocol/chainsync/client_concurrency_test.go (2)

29-103: Concurrent Start/Stop test is well-structured

The test exercises repeated concurrent Start/Stop with a clear timeout and leak detection, which is appropriate for the new lifecycle semantics.


105-148: Stop-before-Start scenario is clearly captured

This test codifies the expectation that Stop() on an unstarted client and a subsequent Start()/Stop() should neither panic nor deadlock, which is useful to guard the lifecycle behavior going forward.

protocol/leiosnotify/client.go (1)

31-34: Lifecycle flags and shutdown-aware handlers resolve prior channel race

The addition of stateMutex, started, and stopped plus the updated Start/Stop logic makes the client’s lifecycle well-defined and prevents Start() after Stop() from reusing a closed requestNextChan. Deferring the close of requestNextChan until <-c.DoneChan() when started, combined with the handlers’ select on c.DoneChan() vs requestNextChan <- msg, avoids send-on-closed-channel panics while still surfacing ErrProtocolShuttingDown to callers.

You may want to double‑check, via tests similar to TestStopBeforeStart, that calling Stop() before Start() does not attempt to send MsgDone via a protocol that was never started, i.e., that the underlying Protocol is either started elsewhere or explicitly documented as supporting pre‑Start SendMessage calls.

Also applies to: 73-90, 93-120, 156-189

protocol/leiosfetch/client.go (1)

17-24: Atomic lifecycle management and shutdown-aware handlers look solid

Using atomic.Bool for started/stopped plus the updated Start/Stop semantics gives a race-free lifecycle: Start() refuses to run after Stop(), and Stop() always attempts to stop the underlying protocol while deferring channel closures until <-c.DoneChan() where appropriate. The handler select blocks on c.DoneChan() and the result channels, correctly surfacing ErrProtocolShuttingDown when the client is shutting down.

It would be worth confirming via tests (similar to the chainsync and leiosnotify concurrency/StopBeforeStart tests) that calling Stop() before Start() either behaves as intended (no deadlock, returns promptly) or is explicitly disallowed by API contract, since Stop() still sends a MsgDone regardless of whether the protocol was started.

Also applies to: 26-38, 92-107, 109-145, 242-285

protocol/txsubmission/server_concurrency_test.go (1)

1-143: Well-structured concurrency tests, appropriately skipped.

Both tests follow best practices:

  • Goroutine leak detection via goleak.VerifyNone(t) (lines 31, 102)
  • Proper resource cleanup with deferred Close (lines 49-53, 120-124)
  • Timeout protection for concurrent operations (lines 89-94)
  • Clear test intent with descriptive names and comments

The skip reasons are documented (lines 30, 101) and reasonable. Once mock infrastructure supports the NtN protocol interactions needed to exercise server lifecycle, these tests will validate that:

  1. Concurrent Stop() calls don't deadlock (test uses 5 goroutines with 5-second timeout)
  2. IsStopped() correctly reflects server state after Stop()
protocol/blockfetch/server.go (1)

169-185: Good improvement: Stop() error now propagated.

The restart path now checks and returns Stop() errors (lines 179-181) instead of ignoring them. This prevents the server from attempting to reinitialize and restart if the underlying protocol fails to stop cleanly, avoiding potential resource leaks or inconsistent state.

This aligns with the broader pattern across protocol servers in this PR (e.g., protocol/leiosnotify/server.go).

protocol/blockfetch/client_test.go (1)

211-230: Consistent shutdown test addition.

TestClientShutdown follows the established pattern seen across protocol tests in this PR (e.g., protocol/keepalive/client_test.go, protocol/chainsync/client_test.go). The test correctly:

  • Uses the runTest helper for mock setup and cleanup
  • Verifies the client is non-nil
  • Starts the client before stopping
  • Asserts Stop() completes without error
protocol/localtxmonitor/client_test.go (2)

93-93: Reasonable timeout increase.

Extending the mock connection shutdown timeout from 2 to 5 seconds aligns with similar increases across protocol tests in this PR (e.g., protocol/chainsync/client_test.go line 95). This accommodates the slightly longer shutdown sequences introduced by the lifecycle improvements.


300-319: Consistent shutdown test addition.

TestClientShutdown follows the same pattern as other protocol client shutdown tests added in this PR. The test structure is identical to those in protocol/blockfetch/client_test.go, protocol/keepalive/client_test.go, and others, ensuring consistent validation of the Start/Stop lifecycle.

protocol/leiosnotify/server.go (1)

109-124: Good improvement: Stop() error now propagated.

The restart path in handleDone now checks and returns Stop() errors (lines 118-120), consistent with the pattern in protocol/blockfetch/server.go and other servers in this PR. This prevents restart attempts after a failed stop, avoiding potential resource leaks.

protocol/chainsync/client_test.go (3)

83-88: Good improvement: Stop() error now logged in cleanup.

The test cleanup now checks the Stop() error and logs it (lines 85-87), ensuring shutdown failures are surfaced during test runs rather than silently ignored. This addresses the past review comment and improves test observability.


95-95: Reasonable timeout increase.

Extending the timeout from 2 to 5 seconds is consistent with similar adjustments across protocol tests in this PR (e.g., protocol/localtxmonitor/client_test.go), accommodating the lifecycle improvements.


285-304: Consistent shutdown test addition.

TestClientShutdown follows the established pattern for shutdown tests across protocols in this PR. The test correctly verifies that the client can be started and stopped without error, validating the lifecycle improvements.

protocol/localstatequery/client_test.go (3)

25-25: Import alias added for clarity.

Adding the ouroboros alias is consistent with the import style in other test files in this PR (e.g., protocol/blockfetch/client_test.go line 22) and improves readability by distinguishing between the main package and the mock package.


112-112: Reasonable timeout increase.

The timeout extension to 5 seconds matches similar changes across protocol tests (e.g., protocol/chainsync/client_test.go line 95, protocol/localtxmonitor/client_test.go line 93), accommodating the lifecycle improvements.


357-376: Consistent shutdown test addition.

TestClientShutdown follows the same pattern as shutdown tests in other protocol clients (e.g., protocol/blockfetch/client_test.go, protocol/keepalive/client_test.go), ensuring uniform validation of the Start/Stop lifecycle across the codebase.

protocol/txsubmission/server.go (1)

28-351: LGTM! Lifecycle and concurrency improvements are well-implemented.

The server implementation correctly addresses all previously identified critical concurrency issues:

  • Done channel management: Safe close operations using doneMutex with select-before-close pattern prevent double-close panics
  • ackCount synchronization: Atomic operations (LoadInt32/StoreInt32) eliminate data races
  • Restart coordination: restartMutex properly serializes restart and stop operations, with TOCTOU checks preventing restart after permanent stop
  • Idempotent shutdown: onceStop ensures Stop() executes once, and the stopped flag correctly prevents restarts
  • Channel access: Result channels are captured under restartMutex before blocking operations, avoiding races during restart
  • Send protection: Reply handlers wrap channel sends with restartMutex

The shutdown semantics are clean: RequestTxIds/RequestTxs return ErrProtocolShuttingDown when shutdown is signaled, and handleDone restarts the protocol only if permanent stop hasn't been requested.

protocol/peersharing/server.go (2)

49-55: LGTM: Stop() now properly propagates Protocol.Stop() errors.

The idempotent Stop implementation using sync.Once correctly captures and returns errors from the underlying Protocol, addressing the previous review concern.


127-132: LGTM: Restart logic correctly calls Protocol.Stop() directly.

The restart flow now calls s.Protocol.Stop() directly instead of s.Stop(), which correctly addresses the previous issue where onceStop would prevent stopping restarted protocol instances. The nil check is a good defensive measure.

protocol/blockfetch/client.go (4)

90-101: LGTM: Start() correctly tracks lifecycle with atomic flag.

The implementation properly sets the started flag using atomic.Bool.Store before starting the protocol, ensuring thread-safe lifecycle tracking.


104-132: LGTM: Stop() correctly handles shutdown and deferred channel closure.

The implementation properly:

  • Sends the Done message with error handling
  • Calls Protocol.Stop() to ensure muxer unregistration (addressing past review)
  • Defers channel closure until after DoneChan() fires if started, preventing panics from in-flight responses
  • Closes channels immediately if never started

237-241: LGTM: handleStartBatch uses shutdown-aware channel send.

The select statement correctly checks DoneChan() before sending to startBatchResultChan, preventing send-on-closed panics during shutdown.


254-259: LGTM: Message handlers use shutdown-aware channel operations.

Both handleNoBlocks and handleBlock correctly use select statements with DoneChan() to avoid sending to closed channels during shutdown.

Also applies to: 309-313

protocol/localtxsubmission/client_test.go (2)

86-86: LGTM: Timeout increase accommodates shutdown verification.

Extending the timeout from 2 to 5 seconds provides sufficient time for shutdown sequences to complete, consistent with similar changes across other protocol tests.


167-216: LGTM: Shutdown tests validate lifecycle and error handling.

The new tests properly verify:

  • TestClientShutdown: Basic start/stop lifecycle completes without error
  • TestSubmitTxServerShutdown: Client correctly returns ErrProtocolShuttingDown when server sends Done

These tests align with the broader PR pattern of adding shutdown verification across all protocols.

protocol/peersharing/client_test.go (2)

30-93: LGTM: Test helper follows established patterns.

The runTest helper correctly implements:

  • Goroutine leak detection with goleak.VerifyNone
  • Async error handling for both mock and Ouroboros connections
  • Proper timeouts for completion (5s) and shutdown (10s)
  • Clean connection lifecycle management

95-114: LGTM: TestClientShutdown validates PeerSharing lifecycle.

The test properly verifies that the PeerSharing client can be started and stopped without error, ensuring proper shutdown semantics consistent with other protocol clients in this PR.

protocol/localtxmonitor/client.go (5)

89-103: LGTM: Start() properly synchronizes lifecycle state.

The implementation correctly guards the started flag with stateMutex, preventing data races between Start and Stop as noted in past reviews.


105-148: LGTM: Stop() correctly releases locks before calling Protocol.Stop().

The implementation properly:

  • Sets stopped flag under lock to prevent new operations
  • Releases all locks before calling Protocol.Stop(), avoiding potential deadlocks (as addressed in past review)
  • Defers channel closure until DoneChan() fires if started, preventing handler panics

152-157: LGTM: Operations correctly pre-check stopped state.

All public operations properly check the stopped flag under stateMutex and return ErrProtocolShuttingDown early, preventing new operations after shutdown is initiated.

Also applies to: 173-178, 194-199, 231-236, 268-273


337-341: LGTM: Message handlers use shutdown-aware channel operations.

All message handlers correctly use select statements with DoneChan() to avoid sending to closed channels during shutdown, addressing the critical race condition noted in past reviews.

Also applies to: 354-358, 371-375, 388-392


333-336: LGTM: acquired flag properly synchronized with dedicated mutex.

The acquiredMutex consistently guards all reads and writes to the acquired flag, preventing data races across concurrent operations.

Also applies to: 414-416

protocol/localstatequery/client.go (4)

102-115: LGTM: Start() eliminates data race on started flag.

The implementation correctly guards the started flag with stateMutex, addressing the data race noted in past reviews where Start and Stop could concurrently access started.


117-149: LGTM: Stop() ensures protocol termination and safe channel closure.

The implementation properly:

  • Sends the Done message with error handling
  • Calls Protocol.Stop() to drive protocol shutdown (addressing past review)
  • Defers channel closure until DoneChan() fires if started, preventing handler panics

913-920: LGTM: Message handlers properly synchronized and shutdown-aware.

All handlers correctly:

  • Guard acquired flag updates with acquiredMutex
  • Use select statements with DoneChan() to avoid send-on-closed panics during shutdown

Also applies to: 936-946, 961-966


971-973: LGTM: acquired flag consistently synchronized with dedicated mutex.

All reads and writes to the acquired flag are properly guarded by acquiredMutex, ensuring race-free state tracking.

Also applies to: 1013-1014, 1022-1024

protocol/localtxsubmission/client.go (5)

77-91: LGTM: Start() properly synchronizes started flag.

The implementation correctly guards the started flag with stateMutex, ensuring thread-safe lifecycle tracking.


93-126: LGTM: Stop() ensures protocol termination and safe channel closure.

The implementation properly:

  • Sends the Done message with error handling
  • Calls Protocol.Stop() to drive protocol shutdown
  • Defers channel closure via closeSubmitResultChan() helper until DoneChan() fires if started
  • Uses sync.Once in the helper to prevent multiple closes

177-182: LGTM: Message handlers eliminate TOCTOU race with shutdown-aware selects.

Both handlers correctly use select statements with DoneChan() to race the channel send against shutdown, addressing the TOCTOU race condition noted in past reviews.

Also applies to: 202-207


210-221: LGTM: handleDone gracefully handles server-initiated shutdown.

The handler correctly closes the result channel when the server sends Done, unblocking any operations waiting on submitResultChan and allowing them to return ErrProtocolShuttingDown.


223-227: LGTM: closeSubmitResultChan helper prevents double-close panics.

The helper correctly uses sync.Once to ensure the submitResultChan is closed exactly once, even if called from multiple paths (Stop, handleDone).

@wolf31o2 wolf31o2 force-pushed the perf/protocols-improvements branch from 2228478 to 0371e62 Compare November 17, 2025 00:58
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2228478 and 0371e62.

📒 Files selected for processing (33)
  • protocol/blockfetch/blockfetch.go (1 hunks)
  • protocol/blockfetch/client.go (7 hunks)
  • protocol/blockfetch/client_test.go (1 hunks)
  • protocol/blockfetch/server.go (1 hunks)
  • protocol/chainsync/chainsync.go (1 hunks)
  • protocol/chainsync/client.go (6 hunks)
  • protocol/chainsync/client_concurrency_test.go (1 hunks)
  • protocol/chainsync/client_test.go (2 hunks)
  • protocol/chainsync/error.go (1 hunks)
  • protocol/chainsync/server.go (1 hunks)
  • protocol/keepalive/client.go (2 hunks)
  • protocol/keepalive/client_test.go (1 hunks)
  • protocol/leiosfetch/client.go (5 hunks)
  • protocol/leiosfetch/server.go (1 hunks)
  • protocol/leiosnotify/client.go (3 hunks)
  • protocol/leiosnotify/client_concurrency_test.go (1 hunks)
  • protocol/leiosnotify/client_test.go (1 hunks)
  • protocol/leiosnotify/server.go (1 hunks)
  • protocol/localstatequery/client.go (6 hunks)
  • protocol/localstatequery/client_test.go (3 hunks)
  • protocol/localtxmonitor/client.go (15 hunks)
  • protocol/localtxmonitor/client_test.go (2 hunks)
  • protocol/localtxsubmission/client.go (6 hunks)
  • protocol/localtxsubmission/client_test.go (2 hunks)
  • protocol/localtxsubmission/localtxsubmission.go (2 hunks)
  • protocol/peersharing/client.go (4 hunks)
  • protocol/peersharing/client_test.go (1 hunks)
  • protocol/peersharing/server.go (4 hunks)
  • protocol/peersharing/server_test.go (1 hunks)
  • protocol/protocol.go (6 hunks)
  • protocol/txsubmission/server.go (10 hunks)
  • protocol/txsubmission/server_concurrency_test.go (1 hunks)
  • protocol/txsubmission/server_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (12)
  • protocol/blockfetch/client_test.go
  • protocol/localstatequery/client_test.go
  • protocol/localtxsubmission/client_test.go
  • protocol/chainsync/error.go
  • protocol/peersharing/server.go
  • protocol/chainsync/chainsync.go
  • protocol/leiosnotify/client_concurrency_test.go
  • protocol/leiosnotify/server.go
  • protocol/chainsync/server.go
  • protocol/keepalive/client_test.go
  • protocol/txsubmission/server_test.go
  • protocol/localtxsubmission/client.go
🧰 Additional context used
🧬 Code graph analysis (17)
protocol/localtxmonitor/client_test.go (9)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-230)
protocol/chainsync/client_test.go (1)
  • TestClientShutdown (285-304)
protocol/keepalive/client_test.go (1)
  • TestClientShutdown (303-322)
protocol/localstatequery/client_test.go (1)
  • TestClientShutdown (357-376)
protocol/localtxsubmission/client_test.go (1)
  • TestClientShutdown (167-186)
protocol/peersharing/client_test.go (1)
  • TestClientShutdown (95-114)
connection.go (1)
  • Connection (59-103)
protocol/localtxmonitor/localtxmonitor.go (1)
  • LocalTxMonitor (112-115)
protocol/localtxmonitor/client.go (1)
  • Client (25-42)
protocol/chainsync/client_test.go (10)
protocol/chainsync/chainsync.go (1)
  • ChainSync (204-207)
protocol/chainsync/client.go (1)
  • Client (29-47)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-230)
protocol/keepalive/client_test.go (1)
  • TestClientShutdown (303-322)
protocol/leiosnotify/client_test.go (1)
  • TestClientShutdown (117-136)
protocol/localstatequery/client_test.go (1)
  • TestClientShutdown (357-376)
protocol/localtxmonitor/client_test.go (1)
  • TestClientShutdown (300-319)
protocol/localtxsubmission/client_test.go (1)
  • TestClientShutdown (167-186)
protocol/peersharing/client_test.go (1)
  • TestClientShutdown (95-114)
connection.go (1)
  • Connection (59-103)
protocol/keepalive/client.go (4)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/keepalive/keepalive.go (1)
  • ProtocolName (27-27)
connection/id.go (1)
  • ConnectionId (22-25)
protocol/keepalive/messages.go (1)
  • NewMsgDone (94-101)
protocol/peersharing/client.go (4)
protocol/leiosnotify/client.go (1)
  • Client (24-34)
protocol/localtxsubmission/client.go (1)
  • Client (26-37)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/leiosfetch/client.go (3)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/chainsync/client.go (1)
  • Client (29-47)
protocol/blockfetch/client.go (2)
protocol/protocol.go (2)
  • Protocol (39-60)
  • New (122-133)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localtxsubmission/localtxsubmission.go (1)
protocol/localtxsubmission/messages.go (1)
  • MessageTypeDone (29-29)
protocol/localtxmonitor/client.go (7)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/leiosfetch/client.go (1)
  • Client (26-38)
protocol/leiosnotify/client.go (1)
  • Client (24-34)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/keepalive/messages.go (1)
  • NewMsgDone (94-101)
protocol/localtxmonitor/messages.go (1)
  • NewMsgDone (79-86)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/server_concurrency_test.go (5)
connection.go (1)
  • NewConnection (107-130)
protocol/protocol.go (2)
  • ProtocolRoleServer (95-95)
  • New (122-133)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/txsubmission/txsubmission.go (1)
  • TxSubmission (126-129)
protocol/txsubmission/server.go (1)
  • Server (28-41)
protocol/peersharing/client_test.go (5)
connection.go (2)
  • Connection (59-103)
  • NewConnection (107-130)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-230)
protocol/peersharing/peersharing.go (1)
  • PeerSharing (67-70)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/leiosnotify/client_test.go (6)
protocol/versiondata.go (6)
  • VersionData (40-46)
  • VersionDataNtN13andUp (143-145)
  • VersionDataNtN11to12 (116-122)
  • DiffusionModeInitiatorOnly (21-21)
  • PeerSharingModeNoPeerSharing (27-27)
  • QueryModeDisabled (36-36)
protocol/handshake/messages.go (1)
  • NewMsgAcceptVersion (88-102)
connection.go (2)
  • Connection (59-103)
  • NewConnection (107-130)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-230)
protocol/leiosnotify/client.go (1)
  • Client (24-34)
protocol/leiosnotify/client.go (5)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localtxsubmission/client.go (1)
  • Client (26-37)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/leiosnotify/messages.go (1)
  • NewMsgDone (149-156)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localstatequery/client.go (6)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localtxmonitor/client.go (1)
  • Client (25-42)
protocol/localstatequery/localstatequery.go (1)
  • AcquireTarget (131-133)
protocol/localstatequery/messages.go (3)
  • NewMsgDone (245-252)
  • AcquireFailurePointNotOnChain (44-44)
  • MsgResult (172-175)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localstatequery/error.go (1)
  • ErrAcquireFailurePointNotOnChain (23-25)
protocol/chainsync/client.go (4)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/leiosfetch/client.go (1)
  • Client (26-38)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/chainsync/error.go (1)
  • ErrSyncCancelled (26-26)
protocol/chainsync/client_concurrency_test.go (6)
protocol/chainsync/chainsync.go (3)
  • New (259-267)
  • NewConfig (273-295)
  • ChainSync (204-207)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithChainSyncConfig (131-135)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/chainsync/client.go (1)
  • Client (29-47)
protocol/keepalive/client.go (1)
  • Client (26-35)
protocol/leiosnotify/client_concurrency_test.go (1)
  • TestStopBeforeStart (27-69)
protocol/txsubmission/server.go (4)
protocol/txsubmission/messages.go (2)
  • TxBody (197-201)
  • NewMsgRequestTxIds (68-82)
protocol/txsubmission/txsubmission.go (1)
  • MaxAckCount (143-143)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/error.go (1)
  • ErrStopServerProcess (21-21)
protocol/peersharing/server_test.go (5)
protocol/txsubmission/server_test.go (1)
  • TestServerShutdown (28-82)
connection.go (2)
  • NewConnection (107-130)
  • Connection (59-103)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/peersharing/peersharing.go (1)
  • PeerSharing (67-70)
protocol/peersharing/server.go (1)
  • Server (26-32)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (go)
🔇 Additional comments (35)
protocol/localtxsubmission/localtxsubmission.go (2)

46-49: Client-initiated termination looks good.

Adding the MessageTypeDone transition from stateIdle enables the client to gracefully terminate the protocol when it has agency, which is appropriate for coordinated shutdown handling.


63-66: Verify this state transition change aligns with Ouroboros specification.

The verification confirms the previous review's requested changes are present:

  • ✓ Client properly handles MessageTypeDone in Busy state (line 157-158 of client.go)
  • ✓ Transaction status is communicated when server closes during submission (via channel closure returning protocol.ErrProtocolShuttingDown)
  • ✓ No data loss or undefined behavior

However, the semantic design question remains unverified: the state machine now allows the server to send Done and transition from Busy to Done (lines 63-66), which terminates a transaction submission in progress. While this is now properly handled on the client side, please confirm:

  1. This state transition aligns with the Ouroboros local-tx-submission protocol specification
  2. The intended behavior when the server closes the connection mid-submission is to report ErrProtocolShuttingDown rather than a transaction accept/reject result
protocol/leiosfetch/server.go (1)

203-205: LGTM!

The error handling for Stop() is correctly implemented. If Stop() fails, the error is returned immediately, preventing the restart sequence from proceeding with a partially stopped protocol.

protocol/blockfetch/blockfetch.go (1)

122-122: Verify the performance impact of the increased queue size.

The default receive queue size has been increased by 50% (256→384). While this may improve throughput by reducing backpressure, it also increases memory usage per connection.

Please ensure this change has been tested under realistic load conditions to confirm:

  1. The throughput improvement justifies the increased memory footprint
  2. No adverse effects on GC pressure or connection scaling
protocol/localtxmonitor/client_test.go (2)

93-93: LGTM!

The timeout increase from 2 to 5 seconds accommodates the more coordinated shutdown sequences introduced across the codebase.


300-319: LGTM!

The shutdown test follows the established pattern used across other protocol tests (blockfetch, chainsync, keepalive, etc.) and validates that Start()/Stop() can be called without errors.

protocol/leiosfetch/client.go (3)

92-107: LGTM!

The lifecycle guards prevent starting a stopped client and properly track the started state using atomic operations for race-free access.


109-145: LGTM!

The shutdown sequence is well-coordinated:

  • SendMessage error is preserved
  • Protocol.Stop() is always called to complete muxer shutdown
  • Channels are closed after DoneChan signals if started, or immediately if never started
  • Lifecycle flags are updated atomically

This prevents send-on-closed-channel panics and ensures clean shutdown.


242-285: LGTM!

All message handlers now use shutdown-aware selects that check DoneChan() before sending to result channels, preventing panics during concurrent shutdown and ensuring ErrProtocolShuttingDown is returned appropriately.

protocol/blockfetch/client.go (5)

90-101: LGTM!

The started flag is properly initialized using atomic operations, ensuring race-free lifecycle tracking.


104-132: LGTM!

The shutdown sequence correctly:

  • Preserves SendMessage errors
  • Always calls Protocol.Stop() to signal muxer shutdown
  • Defers channel closure until protocol completes if started
  • Closes channels immediately if never started

This addresses the previous critical issues regarding premature channel closure and missing Protocol.Stop() call.


228-243: LGTM!

The shutdown-aware select prevents sending to startBatchResultChan during shutdown, returning ErrProtocolShuttingDown instead.


245-261: LGTM!

The error handling and shutdown-aware delivery of the "block(s) not found" error is correctly implemented.


309-314: LGTM!

For the non-callback path (single block requests), the shutdown-aware select ensures blocks are only delivered when the protocol is still active.

protocol/blockfetch/server.go (1)

179-181: LGTM!

The error handling for Stop() prevents restarting the protocol if shutdown fails, ensuring consistent state management.

protocol/localtxmonitor/client.go (5)

89-103: LGTM!

The Start() method correctly sets the started flag under the stateMutex, ensuring thread-safe lifecycle tracking.


105-148: LGTM!

The shutdown sequence is comprehensive and safe:

  • stopped flag prevents new operations
  • Locks are released before calling Protocol.Stop() to avoid deadlocks
  • Channels are closed after protocol shutdown if started, or immediately if never started
  • Previous critical issues about race conditions and missing Protocol.Stop() call have been addressed

150-301: LGTM!

All public methods (Acquire, Release, HasTx, NextTx, GetSizes) correctly:

  • Pre-check the stopped flag under stateMutex
  • Check the acquired state under acquiredMutex where needed
  • Return ErrProtocolShuttingDown when stopped

This prevents operations from proceeding after shutdown has been initiated.


324-394: LGTM!

All message handlers now:

  • Update state under the appropriate mutex (acquiredMutex for handleAcquired)
  • Use shutdown-aware selects that check DoneChan() before sending to result channels
  • Return ErrProtocolShuttingDown when the protocol is shutting down

This prevents send-on-closed-channel panics and ensures consistent error handling during shutdown.


409-418: LGTM!

The release() method correctly clears the acquired flag under the acquiredMutex, maintaining thread-safe state management.

protocol/protocol.go (2)

272-394: LGTM! Excellent shutdown-aware sendLoop implementation.

The graceful shutdown logic in sendLoop is well-designed:

  • The shuttingDown flag ensures clean message draining before exit
  • Messages already queued are processed even during shutdown, preventing data loss
  • Multiple exit points properly handle the recvDoneChan signal
  • The nested select at lines 303-309 correctly transitions to shutdown mode while processing the queue

This pattern prevents abrupt termination and ensures protocol compliance during shutdown.


175-193: LGTM! Well-documented API design for Stop() method.

The Stop() signature change to return error is good API design:

  • Provides consistency with other Stop() methods across the codebase
  • Comprehensive documentation explains the current nil return and future extensibility
  • Allows callers to prepare for potential future failure modes
  • Clear guidance for error checking
protocol/txsubmission/server_concurrency_test.go (1)

28-95: LGTM! Thorough concurrent shutdown test.

The test correctly validates:

  • Idempotent Stop() calls from 5 concurrent goroutines
  • Proper use of sync.WaitGroup for coordination
  • Timeout detection (5s) to catch deadlocks
  • Error handling for each goroutine

The skip reason is documented and acceptable. The test will provide valuable coverage once the mock infrastructure issues are resolved.

protocol/chainsync/client.go (2)

136-177: LGTM! Race condition properly resolved.

The Stop() implementation correctly addresses the race condition identified in the past review:

  • MsgDone is sent only if the client was started
  • Protocol.Stop() is called to trigger shutdown
  • Critical fix: Channel close is deferred via goroutine that waits for DoneChan(), ensuring all message handlers complete before closing readyForNextBlockChan
  • If never started, channels are closed immediately (no protocol to wait for)

This prevents the panic that could occur if message handlers were still writing to the channel during close. The fix properly sequences shutdown operations.


753-768: LGTM! Shutdown-aware message handler pattern.

Both handleRollForward and handleRollBackward correctly implement shutdown coordination:

  • All writes to readyForNextBlockChan are wrapped in select statements
  • Detect protocol shutdown via DoneChan() before sending
  • Return protocol.ErrProtocolShuttingDown on shutdown detection
  • Handle both cancellation (false) and readiness (true) signals

This pattern ensures no writes occur to closed channels and provides clean error propagation during shutdown.

Also applies to: 792-808

protocol/localstatequery/client.go (2)

102-115: LGTM! Data race properly resolved.

The Start() method correctly addresses the data race identified in the past review:

  • stateMutex protects the started flag
  • Lock is acquired before setting started = true
  • Lock is released before calling Protocol.Start() to avoid holding mutex during potentially blocking operation

This matches the pattern used in other protocol clients and ensures thread-safe access to the started flag.


117-149: LGTM! Complete Stop() implementation with proper lifecycle handling.

The Stop() method correctly implements shutdown:

  • Sends MsgDone message with error handling
  • Calls Protocol.Stop() to trigger underlying protocol shutdown
  • Defers channel closure based on started state:
    • If started: goroutine waits for DoneChan() before closing channels (prevents race with in-flight handlers)
    • If never started: closes channels immediately

This addresses the past review feedback and ensures proper synchronization during shutdown.

protocol/txsubmission/server.go (3)

93-116: LGTM! All critical race conditions properly resolved.

The Stop() implementation addresses all issues from past reviews:

  • restartMutex coordination: acquired before onceStop.Do(), preventing race with handleDone()
  • stopped flag: marks permanent stop under mutex protection
  • doneMutex guards channel close: select statement prevents double-close panic
  • Protocol.Stop() called to trigger underlying shutdown

This layered synchronization correctly handles concurrent Stop() calls, ongoing restarts, and channel lifecycle.


132-196: LGTM! Atomic operations and shutdown coordination implemented correctly.

RequestTxIds properly addresses the concurrency issues:

  • Atomic operations: atomic.LoadInt32 and atomic.StoreInt32 for ackCount eliminate the data race identified in past reviews
  • restartMutex protection: safely captures result channel reference before blocking select
  • Shutdown awareness: doneChan() check allows clean exit during protocol shutdown
  • Closed channel handling: ok check returns ErrProtocolShuttingDown on closed channel

The validation logic (negative/overflow checks) remains before unsafe conversions, maintaining safety.


312-351: LGTM! Complex restart logic with proper TOCTOU protection.

The handleDone() restart sequence is well-structured:

  • Non-blocking signal: notifies RequestTxIds waiter without blocking
  • First stopped check (line 314): early exit if Stop() was called
  • doneMutex + select: prevents double-close race with Stop()
  • Protocol reinitialization: fresh channels, atomic ackCount reset
  • Second stopped check (line 345): TOCTOU protection before restart
  • Restart outside lock: Start() called after releasing restartMutex for better responsiveness

This correctly handles the complex interplay between restart, stop, and ongoing requests.

protocol/chainsync/client_test.go (2)

83-88: LGTM! Cleanup improved per past review feedback.

The test cleanup now properly:

  • Checks for client existence before calling Stop()
  • Captures and logs Stop() errors (not fatal in cleanup)
  • Addresses the missing error check identified in the past review

Using t.Logf() is appropriate for cleanup as it informs without failing the test during teardown.


285-304: LGTM! Consistent shutdown test pattern.

TestClientShutdown follows the established pattern:

  • Uses runTest helper for consistent setup/teardown
  • Minimal handshake conversation (sufficient for lifecycle test)
  • Validates client existence, starts, then stops
  • Checks Stop() error explicitly
  • Leverages runTest's goleak verification and timeout handling

This provides solid coverage of the client shutdown behavior introduced in this PR.

protocol/peersharing/server_test.go (1)

28-82: LGTM! Server shutdown test with proper error handling.

The test correctly implements:

  • Documented skip reason for mock infrastructure limitation
  • Goleak verification for goroutine leak detection
  • NtN handshake conversation setup
  • Async error monitoring for both mock and Ouroboros connections
  • Stop() error checking (lines 60-62) per past review feedback
  • Reasonable timeouts (2s for mock, 10s for connection)

Pattern is consistent with other server shutdown tests (e.g., txsubmission/server_test.go).

protocol/leiosnotify/client_test.go (2)

30-54: LGTM! Proper NtN version 15+ handshake setup.

The custom handshake configuration is correct:

  • mockNtNVersionData() returns VersionDataNtN13andUp (required for v13+)
  • mockNtNVersionDataV11() sets appropriate fields (network magic, diffusion mode, peer sharing, query)
  • conversationEntryNtNResponseV15 constructs AcceptVersion with version 15 and proper version data
  • This addresses the "protocol initialization issues" mentioned in past review

The LeiosNotify protocol requires NtN v15+ per protocol/leiosnotify documentation, and this setup satisfies that requirement.


117-136: LGTM! Shutdown test with protocol-specific handshake.

TestClientShutdown correctly:

  • Uses the custom NtN v15 handshake (required for LeiosNotify)
  • Follows the established test pattern (runTest helper)
  • Validates client existence before starting
  • Checks Stop() error explicitly
  • Leverages runTest's goleak and timeout protections

This provides proper coverage of the LeiosNotify client shutdown behavior, resolving the past review concern about skipped tests.

Comment on lines 98 to 132
// Stop stops the KeepAlive client protocol
func (c *Client) Stop() error {
c.onceStop.Do(func() {
c.Protocol.Logger().
Debug("stopping client protocol",
"component", "network",
"protocol", ProtocolName,
"connection_id", c.callbackContext.ConnectionId.String(),
)
// Stop the keep-alive timer to prevent any further sends
c.timerMutex.Lock()
if c.timer != nil {
c.timer.Stop()
c.timer = nil
}
c.timerMutex.Unlock()
msg := NewMsgDone()
sendErr := c.SendMessage(msg)
stopErr := c.Protocol.Stop()

// Prefer the send error if present, otherwise return the stop error
if sendErr != nil {
c.stopErr = sendErr
} else {
c.stopErr = stopErr
}
})
return c.stopErr
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify that the MsgDone race condition is resolved.

The previous critical review identified that SendMessage(msg) followed immediately by Protocol.Stop() can race: Protocol.Stop() triggers sendLoop to exit via recvDoneChan before processing queued messages in sendQueueChan, potentially losing the MsgDone message.

The past review was marked as "✅ Addressed" but the code pattern remains unchanged. Please confirm that either:

  1. Protocol.Stop() was modified to drain sendQueueChan before exiting, or
  2. This race is acceptable because the peer will detect connection closure

Run this script to verify the current sendLoop behavior:


🏁 Script executed:

#!/bin/bash
# Check if sendLoop drains sendQueueChan before exiting when recvDoneChan closes
rg -n "func.*sendLoop" protocol/protocol.go -A 80 | head -100

Length of output: 2268


I can see the code was truncated. Let me retrieve the complete sendLoop implementation and then verify the shutdown flow:

I need to examine the complete sendLoop function and the Protocol.Stop() implementation to verify the drain logic:

I need to see the complete sendLoop function and the Protocol.Stop() implementation to verify the drain logic thoroughly:


🏁 Script executed:

#!/bin/bash
# Get complete sendLoop function
rg -n "func.*sendLoop" protocol/protocol.go -A 150 | head -200

Length of output: 4472


Now I need to examine the Protocol.Stop() implementation to understand the complete shutdown sequence and how recvDoneChan is closed:


🏁 Script executed:

#!/bin/bash
# Get Protocol.Stop() implementation
rg -n "func.*\bStop\(\)" protocol/protocol.go -A 50 | head -80

Length of output: 1523


I need to find where recvDoneChan is actually closed and get the complete SendMessage implementation:


🏁 Script executed:

#!/bin/bash
# Find where recvDoneChan is closed
rg -n "recvDoneChan" protocol/protocol.go -B2 -A2

Length of output: 927


I need to see the recvLoop implementation to understand what causes it to exit and close recvDoneChan:


🏁 Script executed:

#!/bin/bash
# Get recvLoop implementation
rg -n "func.*recvLoop" protocol/protocol.go -A 100 | head -150

Length of output: 2638


Let me verify that sendQueueChan is never closed (which would be the proper way to signal shutdown):


🏁 Script executed:

#!/bin/bash
# Check if sendQueueChan is ever closed
rg -n "close.*sendQueueChan|sendQueueChan.*close" protocol/

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Get complete SendMessage implementation
rg -n "func.*SendMessage" protocol/protocol.go -A 40

Length of output: 1351


Race condition persists; sendLoop uses unsafe len() to decide shutdown, not draining the queue reliably.

The review's concern is valid. While the code appears to implement a drain-before-exit pattern, it relies entirely on unsafe channel length checks for synchronization.

Critical issue:

  • Client.Stop() calls SendMessage(MsgDone) which asynchronously queues the message (line 256 in protocol.go), then immediately calls Protocol.Stop()
  • Protocol.Stop() triggers UnregisterProtocol(), which closes muxerDoneChan, causing recvLoop to exit and close recvDoneChan
  • sendLoop detects recvDoneChan close, sets shuttingDown=true, then checks if shuttingDown && len(p.sendQueueChan) == 0 { return } (line 292)
  • Race window: len() is not atomic with respect to concurrent sends; MsgDone can be added to the queue after the length check but before the return statement, causing it to be lost

The len(channel) pattern for shutdown synchronization is unsafe in Go. There is no acquire/release semantics, only a relaxed read. sendQueueChan is never closed, so the !ok exit path never triggers. All shutdown paths depend on these unsafe len() checks at lines 292, 307, and 362.

To fix this properly, either:

  1. Close sendQueueChan when shutdown begins, allowing sendLoop to exit reliably via the !ok path, or
  2. Implement proper synchronization (mutex, atomic flag with acquire semantics, or sync.Cond) to safely coordinate the drain
🤖 Prompt for AI Agents
In protocol/keepalive/client.go around lines 98-126, Stop() currently sends
MsgDone and calls Protocol.Stop() but relies on unsafe len(sendQueueChan) checks
in sendLoop; to fix, implement a proper shutdown by marking shutdown under a
mutex/once and closing the sendQueueChan so sendLoop can reliably range
over/drain the channel and exit via the closed-channel path. Specifically:
ensure Stop sets a shutting-down flag and closes sendQueueChan exactly once (use
c.onceStop or a dedicated once for channel close), have SendMessage return an
error if the client is shutting down (to avoid sending on closed channel), and
keep Stop waiting for sendLoop to finish (e.g., wait on the existing stop/wait
channel) before calling Protocol.Stop and returning the combined errors.

@wolf31o2 wolf31o2 force-pushed the perf/protocols-improvements branch 2 times, most recently from 13c60d2 to 20d440a Compare November 17, 2025 03:16
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
protocol/leiosfetch/client.go (1)

92-145: Guard Stop()’s SendMessage for never-started clients to avoid potential deadlock

Stop() currently sends MsgDone regardless of whether the client was ever started:

msg := NewMsgDone()
if sendErr := c.SendMessage(msg); sendErr != nil {
    // Preserve the SendMessage error but still shut down the protocol.
    err = sendErr
}

If Start() was never called, the underlying protocol send loop may not be running, so SendMessage can block indefinitely. This is the same pattern that previously caused hangs in leiosnotify.Client.Stop() before it was changed to send MsgDone only when the client was started.

Given you already track lifecycle via started and stopped, it would be safer to gate the send:

 func (c *Client) Stop() error {
     var err error
     c.onceStop.Do(func() {
         c.Protocol.Logger().
             Debug("stopping client protocol",
                 "component", "network",
                 "protocol", ProtocolName,
                 "connection_id", c.callbackContext.ConnectionId.String(),
             )
-        msg := NewMsgDone()
-        if sendErr := c.SendMessage(msg); sendErr != nil {
-            // Preserve the SendMessage error but still shut down the protocol.
-            err = sendErr
-        }
+        msg := NewMsgDone()
+        // Only send MsgDone if the protocol was actually started; otherwise
+        // avoid blocking on a send queue that is not being drained.
+        if c.started.Load() {
+            if sendErr := c.SendMessage(msg); sendErr != nil {
+                // Preserve the SendMessage error but still shut down the protocol.
+                err = sendErr
+            }
+        }
         // Always attempt to stop the protocol so DoneChan and muxer shutdown complete.
         _ = c.Protocol.Stop()
         // Defer closing channels until protocol fully shuts down (only if started)
         if c.started.Load() {
             go func() {
                 <-c.DoneChan()
                 close(c.blockResultChan)
                 close(c.blockTxsResultChan)
                 close(c.votesResultChan)
                 close(c.blockRangeResultChan)
             }()
         } else {
             // If protocol was never started, close channels immediately
             close(c.blockResultChan)
             close(c.blockTxsResultChan)
             close(c.votesResultChan)
             close(c.blockRangeResultChan)
         }
         c.started.Store(false)
         c.stopped.Store(true)
     })
     return err
 }

This keeps the new lifecycle semantics but avoids hangs when Stop() is invoked on a never-started client.

🧹 Nitpick comments (2)
protocol/protocol.go (1)

420-440: sendMessage helper is shutdown‑only; pendingBytes mismatch is acceptable

Using sendMessage only during shutdown to push any remaining messages directly to the muxer simplifies the drain path. It doesn’t adjust pendingSendBytes or transition state, but since it runs only after shutdown has been initiated, this inconsistency can’t affect normal operation. If you ever expose Stop as a recoverable operation, consider aligning this path with the main send accounting.

protocol/txsubmission/server.go (1)

286-351: handleDone restart sequence is well‑structured; minor future‑proofing nit

handleDone now:

  • Non‑blockingly notifies any in‑flight RequestTxIds via ErrStopServerProcess,
  • Invokes the user DoneFunc,
  • Under restartMutex, closes the current done channel (with doneMutex), stops the current protocol, re‑creates the protocol and per‑request channels, resets ackCount, and
  • Skips restart entirely if stopped is (or becomes) true.

This addresses earlier races on onceStop, done, and ackCount. The only minor nit is that if s.Protocol.Stop() ever returns a non‑nil error in the future, this path will return while still holding restartMutex; if you later make Stop() fallible, consider capturing the error, unlocking, and then returning.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0371e62 and 20d440a.

📒 Files selected for processing (34)
  • protocol/blockfetch/blockfetch.go (1 hunks)
  • protocol/blockfetch/client.go (7 hunks)
  • protocol/blockfetch/client_test.go (1 hunks)
  • protocol/blockfetch/server.go (1 hunks)
  • protocol/chainsync/chainsync.go (1 hunks)
  • protocol/chainsync/client.go (6 hunks)
  • protocol/chainsync/client_concurrency_test.go (1 hunks)
  • protocol/chainsync/client_test.go (2 hunks)
  • protocol/chainsync/error.go (1 hunks)
  • protocol/chainsync/server.go (1 hunks)
  • protocol/keepalive/client.go (2 hunks)
  • protocol/keepalive/client_test.go (1 hunks)
  • protocol/keepalive/keepalive.go (1 hunks)
  • protocol/leiosfetch/client.go (5 hunks)
  • protocol/leiosfetch/server.go (1 hunks)
  • protocol/leiosnotify/client.go (3 hunks)
  • protocol/leiosnotify/client_concurrency_test.go (1 hunks)
  • protocol/leiosnotify/client_test.go (1 hunks)
  • protocol/leiosnotify/server.go (1 hunks)
  • protocol/localstatequery/client.go (6 hunks)
  • protocol/localstatequery/client_test.go (3 hunks)
  • protocol/localtxmonitor/client.go (15 hunks)
  • protocol/localtxmonitor/client_test.go (2 hunks)
  • protocol/localtxsubmission/client.go (6 hunks)
  • protocol/localtxsubmission/client_test.go (2 hunks)
  • protocol/localtxsubmission/localtxsubmission.go (2 hunks)
  • protocol/peersharing/client.go (4 hunks)
  • protocol/peersharing/client_test.go (1 hunks)
  • protocol/peersharing/server.go (4 hunks)
  • protocol/peersharing/server_test.go (1 hunks)
  • protocol/protocol.go (8 hunks)
  • protocol/txsubmission/server.go (10 hunks)
  • protocol/txsubmission/server_concurrency_test.go (1 hunks)
  • protocol/txsubmission/server_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (14)
  • protocol/blockfetch/blockfetch.go
  • protocol/blockfetch/server.go
  • protocol/txsubmission/server_concurrency_test.go
  • protocol/chainsync/chainsync.go
  • protocol/leiosnotify/client_test.go
  • protocol/leiosnotify/client_concurrency_test.go
  • protocol/localtxsubmission/client_test.go
  • protocol/peersharing/client.go
  • protocol/keepalive/client_test.go
  • protocol/leiosfetch/server.go
  • protocol/peersharing/client_test.go
  • protocol/localtxsubmission/localtxsubmission.go
  • protocol/blockfetch/client_test.go
  • protocol/chainsync/client_concurrency_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-04T15:54:22.683Z
Learnt from: wolf31o2
Repo: blinklabs-io/gouroboros PR: 1093
File: ledger/mary/pparams.go:143-150
Timestamp: 2025-11-04T15:54:22.683Z
Learning: In the blinklabs-io/gouroboros repository, the design goal for CBOR round-trip tests is to achieve byte-identical encoding WITHOUT using stored CBOR (cbor.DecodeStoreCbor). Instead, the approach uses proper field types (pointers for optional fields) and relies on the cbor package's deterministic encoding (SortCoreDeterministic) to ensure reproducible output. The stored CBOR pattern should not be suggested as a solution for round-trip fidelity in this codebase.

Applied to files:

  • protocol/localstatequery/client_test.go
🧬 Code graph analysis (16)
protocol/protocol.go (3)
protocol/message.go (1)
  • Message (18-22)
cbor/encode.go (1)
  • Encode (27-40)
muxer/segment.go (1)
  • NewSegment (48-69)
protocol/leiosfetch/client.go (5)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/leiosfetch/leiosfetch.go (1)
  • ProtocolName (26-26)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/chainsync/client.go (1)
  • Client (29-47)
protocol/localstatequery/client_test.go (10)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-230)
protocol/chainsync/client_test.go (1)
  • TestClientShutdown (285-304)
protocol/keepalive/client_test.go (1)
  • TestClientShutdown (303-322)
protocol/leiosnotify/client_test.go (1)
  • TestClientShutdown (117-136)
protocol/localtxmonitor/client_test.go (1)
  • TestClientShutdown (300-319)
protocol/localtxsubmission/client_test.go (1)
  • TestClientShutdown (167-186)
protocol/peersharing/client_test.go (1)
  • TestClientShutdown (95-114)
connection.go (1)
  • Connection (59-103)
protocol/localstatequery/localstatequery.go (1)
  • LocalStateQuery (116-119)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/leiosnotify/client.go (9)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/chainsync/client.go (1)
  • Client (29-47)
protocol/leiosfetch/client.go (1)
  • Client (26-38)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/localtxmonitor/client.go (1)
  • Client (25-42)
protocol/localtxsubmission/client.go (1)
  • Client (26-37)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/leiosnotify/messages.go (1)
  • NewMsgDone (149-156)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localtxsubmission/client.go (5)
protocol/localtxsubmission/localtxsubmission.go (2)
  • Config (81-84)
  • CallbackContext (87-91)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/localtxsubmission/messages.go (1)
  • MessageTypeDone (29-29)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/chainsync/client_test.go (4)
protocol/chainsync/chainsync.go (1)
  • ChainSync (204-207)
protocol/chainsync/client.go (1)
  • Client (29-47)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-230)
connection.go (1)
  • Connection (59-103)
protocol/peersharing/server_test.go (5)
protocol/txsubmission/server_test.go (1)
  • TestServerShutdown (28-82)
connection.go (2)
  • NewConnection (107-130)
  • Connection (59-103)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/peersharing/peersharing.go (1)
  • PeerSharing (67-70)
protocol/peersharing/server.go (1)
  • Server (26-32)
protocol/keepalive/client.go (3)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/keepalive/messages.go (1)
  • NewMsgDone (94-101)
protocol/blockfetch/client.go (3)
protocol/protocol.go (2)
  • Protocol (39-60)
  • New (122-133)
protocol/blockfetch/blockfetch.go (1)
  • New (156-162)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/peersharing/server.go (3)
protocol/txsubmission/server.go (1)
  • Server (28-41)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/peersharing/peersharing.go (1)
  • ProtocolName (27-27)
protocol/txsubmission/server_test.go (5)
protocol/peersharing/server_test.go (1)
  • TestServerShutdown (28-82)
connection.go (2)
  • NewConnection (107-130)
  • Connection (59-103)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/txsubmission/txsubmission.go (1)
  • TxSubmission (126-129)
protocol/txsubmission/server.go (1)
  • Server (28-41)
protocol/chainsync/client.go (10)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/leiosfetch/client.go (1)
  • Client (26-38)
protocol/leiosnotify/client.go (1)
  • Client (24-34)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/localtxmonitor/client.go (1)
  • Client (25-42)
protocol/localtxsubmission/client.go (1)
  • Client (26-37)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/chainsync/error.go (1)
  • ErrSyncCancelled (26-26)
protocol/localstatequery/client.go (5)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localtxmonitor/client.go (1)
  • Client (25-42)
protocol/localstatequery/messages.go (4)
  • NewMsgDone (245-252)
  • AcquireFailurePointNotOnChain (44-44)
  • MsgResult (172-175)
  • NewMsgQuery (160-170)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localstatequery/error.go (2)
  • ErrAcquireFailurePointTooOld (20-20)
  • ErrAcquireFailurePointNotOnChain (23-25)
protocol/txsubmission/server.go (5)
protocol/txsubmission/messages.go (2)
  • TxBody (197-201)
  • NewMsgRequestTxIds (68-82)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/txsubmission/txsubmission.go (2)
  • ProtocolName (27-27)
  • MaxAckCount (143-143)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/error.go (1)
  • ErrStopServerProcess (21-21)
protocol/keepalive/keepalive.go (3)
protocol/keepalive/messages.go (1)
  • MessageTypeDone (30-30)
protocol/state.go (1)
  • NewState (38-43)
protocol/blockfetch/blockfetch.go (1)
  • StateDone (46-46)
protocol/localtxmonitor/client_test.go (3)
connection.go (1)
  • Connection (59-103)
protocol/localtxmonitor/localtxmonitor.go (1)
  • LocalTxMonitor (112-115)
protocol/localtxmonitor/client.go (1)
  • Client (25-42)
🔇 Additional comments (43)
protocol/chainsync/error.go (1)

25-26: New ErrSyncCancelled constant looks good

The added error and comment are consistent with the existing error style in this package and give a clear sentinel for cancelled sync operations. No changes needed.

protocol/localstatequery/client_test.go (1)

25-25: LocalStateQuery shutdown test and timeout adjustments are consistent

  • Using the ouroboros alias aligns this file with other protocol tests.
  • Bumping the runTest timeout to 5s should help avoid flakiness while still catching hangs.
  • TestClientShutdown follows the same Start/Stop and error-check pattern used by other protocol clients and gives good coverage of the new lifecycle behavior.

No issues from this side.

Also applies to: 112-112, 357-376

protocol/peersharing/server_test.go (1)

28-81: PeerSharing server shutdown test matches established pattern

The new TestServerShutdown mirrors the txsubmission server test pattern (mock NtN connection, Start/Stop, async error monitoring, shutdown timeouts, goleak). Error from Server.Stop() is now checked, which aligns with prior feedback and avoids masking shutdown failures. Skipping the test while NtN mock issues are unresolved is reasonable.

Looks solid.

protocol/leiosfetch/client.go (1)

242-285: Shutdown-aware handlers look correct

The updated handlers that select on c.DoneChan() vs sending to the result channels are well-structured:

  • They avoid blocking indefinitely during shutdown.
  • They prevent send-on-closed-channel panics because channels are only closed after DoneChan fires.
  • Returning protocol.ErrProtocolShuttingDown aligns with the caller-side logic that interprets closed channels/shutdown consistently.

No further changes needed here.

protocol/leiosnotify/client.go (1)

31-33: LeiosNotify client lifecycle and shutdown logic now looks robust

The revised client lifecycle handling addresses the earlier race conditions:

  • stateMutex + started/stopped ensure:
    • Stop-before-Start is safe and prevents later Start.
    • Start() and Stop() don’t race on the state fields.
  • Stop() sends MsgDone only when started is true and defers closing requestNextChan until DoneChan when the protocol was running, which avoids send-on-closed panics while still unblocking RequestNext with ErrProtocolShuttingDown.
  • The handlers’ select on DoneChan() vs requestNextChan mirrors the shutdown-aware pattern used in other protocols and correctly short-circuits when the protocol is shutting down.

This implementation looks correct and aligned with the broader lifecycle pattern in the PR.

Also applies to: 73-90, 93-123, 160-194

protocol/blockfetch/client.go (3)

21-21: LGTM: Lifecycle tracking with atomic.Bool is thread-safe.

The addition of started atomic.Bool and its use in Start() correctly tracks whether the protocol was started, enabling proper conditional cleanup in Stop(). Using atomic.Bool avoids data races without needing additional mutex synchronization.

Also applies to: 40-40, 98-98


114-129: LGTM: Shutdown sequencing is correct and race-free.

The Stop() implementation properly:

  • Handles SendMessage errors without losing them
  • Always calls Protocol.Stop() to signal muxer shutdown
  • Defers channel closure until protocol fully stops (after DoneChan signals), avoiding send-on-closed panics
  • Closes channels immediately if never started, preventing goroutine leaks

This resolves the previously identified deadlock and race conditions.


237-241: LGTM: Shutdown-aware channel operations prevent panics.

All message handlers correctly use select with DoneChan() to avoid sending to closed channels during shutdown. This pattern ensures handlers return ErrProtocolShuttingDown instead of panicking when the protocol is stopping.

Also applies to: 254-259, 309-313

protocol/localtxsubmission/client.go (4)

34-36: LGTM: Proper lifecycle state tracking with mutex protection.

The addition of stateMutex, started, and closeSubmitResultOnce fields provides thread-safe lifecycle management. Start() correctly guards the started flag with stateMutex, preventing data races between Start/Stop called from different goroutines.

Also applies to: 79-88


97-123: LGTM: Stop() properly sequences shutdown and prevents channel closure races.

Stop() correctly:

  • Guards state changes with stateMutex
  • Handles SendMessage errors gracefully
  • Always calls Protocol.Stop() to ensure muxer cleanup
  • Defers channel closure until protocol shutdown (only if started)
  • Uses closeSubmitResultChan() helper to ensure exactly-once closure

This prevents the previously identified TOCTOU race between channel checks and closure.


157-158: LGTM: Server-initiated shutdown properly handled.

The addition of MessageTypeDone handling and handleDone() allows the server to cleanly shut down the client by closing the result channel. This complements the client-initiated Stop() path.

Also applies to: 210-221


177-181: LGTM: Message handlers use shutdown-aware channel operations.

Both handleAcceptTx() and handleRejectTx() correctly use select with DoneChan() to avoid sending to closed channels during shutdown, preventing panics and ensuring clean shutdown behavior.

Also applies to: 202-206

protocol/localtxmonitor/client.go (4)

30-30: LGTM: Thread-safe lifecycle management with multiple mutexes.

The addition of acquiredMutex, stateMutex, started, and stopped fields provides proper synchronization for lifecycle state. Start() correctly guards the started flag, preventing races between concurrent Start/Stop calls.

Also applies to: 39-41, 91-101


109-145: LGTM: Stop() avoids deadlocks with careful lock ordering.

Stop() correctly:

  • Marks stopped under stateMutex first
  • Acquires busyMutex only to serialize with SendMessage
  • Releases busyMutex before calling Protocol.Stop() - crucial to avoid potential deadlocks
  • Defers channel closure until protocol shutdown completes

The lock release before Protocol.Stop() is especially important and properly implemented.


152-157: LGTM: Operations guard against post-shutdown invocation.

All public operations (Acquire, Release, HasTx, NextTx, GetSizes) correctly check the stopped flag under stateMutex and return ErrProtocolShuttingDown early, preventing wasted work and ensuring consistent error reporting after Stop().

Also applies to: 173-178, 194-199, 231-236, 268-273


210-213: LGTM: Acquired state properly synchronized.

All reads and writes to the acquired boolean are correctly guarded by acquiredMutex, and message handlers use select with DoneChan() for shutdown-aware channel operations. This prevents races on the acquired state and ensures clean shutdown.

Also applies to: 247-250, 284-287, 333-341, 414-416

protocol/localtxmonitor/client_test.go (2)

93-93: LGTM: Timeout increase accommodates shutdown complexity.

Increasing the mock connection timeout from 2 to 5 seconds is reasonable given the added shutdown synchronization across the PR, which may require more time in test environments.


300-319: LGTM: New test validates client lifecycle.

TestClientShutdown properly exercises the Start/Stop lifecycle with a minimal handshake conversation, ensuring the client can be started and stopped cleanly without leaking goroutines (verified by goleak).

protocol/localstatequery/client.go (5)

38-38: LGTM: Proper synchronization for lifecycle state.

The addition of acquiredMutex, onceStop, stateMutex, and started provides thread-safe lifecycle management. Start() correctly guards the started flag with stateMutex, matching the pattern used in other protocol clients.

Also applies to: 44-46, 110-114


117-149: LGTM: Stop() properly introduced with correct sequencing.

The new Stop() method correctly:

  • Sends Done message with error handling
  • Calls Protocol.Stop() to ensure muxer cleanup
  • Defers channel closure until protocol shutdown (only if started)
  • Returns any SendMessage error to the caller

This resolves the previously identified issue where Stop() was missing and DoneChan would never close.


152-175: LGTM: Nil routing and explicit acquire methods improve API usability.

Acquire() now accepts nil and routes to AcquireVolatileTip(), providing a sensible default. The addition of explicit AcquireVolatileTip() and AcquireImmutableTip() methods gives users more control over acquisition targets while maintaining backward compatibility.

Also applies to: 177-205


913-922: LGTM: Message handlers use shutdown-aware operations.

All handlers (handleAcquired, handleFailure, handleResult) correctly:

  • Guard shared state (acquired) with acquiredMutex
  • Use select with DoneChan() to avoid sending to closed channels during shutdown

This ensures clean shutdown behavior and prevents panics.

Also applies to: 936-946, 961-967


971-1006: LGTM: Internal helpers properly synchronized.

The acquire(), release(), and runQuery() helpers correctly read and write the acquired state under acquiredMutex, ensuring thread-safe coordination of the acquisition lifecycle.

Also applies to: 1013-1017, 1022-1029

protocol/chainsync/server.go (1)

245-247: LGTM: Propagating Stop() error makes restart failures observable.

The change to check and return the error from Stop() before reinitializing and restarting the protocol ensures that shutdown failures are properly surfaced to callers rather than silently ignored. This aligns with the broader error-propagation improvements across the PR.

protocol/keepalive/keepalive.go (1)

77-80: LGTM: State machine updated to handle server-side Done transition.

Adding the MessageTypeDone transition from StateServer to StateDone provides an explicit completion path for server-initiated shutdown. This complements the existing client-initiated Done path and aligns with the broader shutdown handling improvements across the PR.

protocol/chainsync/client_test.go (3)

83-88: LGTM: Test cleanup prevents goroutine leaks.

Adding client.Stop() to the runTest cleanup ensures the ChainSync client is properly stopped after each test, preventing goroutine leaks. Logging Stop() errors rather than failing the test is appropriate for cleanup code, as tests may have already completed their assertions.


95-95: LGTM: Timeout increase accommodates shutdown complexity.

Increasing the timeout from 2 to 5 seconds is consistent with similar changes across other protocol tests and accommodates the additional shutdown synchronization introduced in this PR.


285-304: LGTM: New test validates client shutdown behavior.

TestClientShutdown properly exercises the Start/Stop lifecycle with a minimal handshake conversation, ensuring the ChainSync client can be started and stopped cleanly. The test correctly asserts that Stop() succeeds without error.

protocol/leiosnotify/server.go (1)

109-123: handleDone now correctly propagates Stop() errors

Calling s.Stop() and returning its error before reinitializing ensures any future Stop failures abort the restart sequence instead of being silently ignored. Given Protocol.Stop() is currently infallible, this is a safe, forward‑compatible change.

protocol/txsubmission/server_test.go (1)

28-82: Server shutdown test scaffold looks good

The shutdown test wiring (mock connection, async error funnel, time‑bounded waits, goleak gate) matches the existing pattern used for other protocols and should be effective once the NtN mock issues are resolved and the skip is removed.

protocol/peersharing/server.go (2)

17-32: Server.Stop is now idempotent and propagates protocol errors

Adding onceStop and returning the result of s.Protocol.Stop() gives you an idempotent server‑level Stop with proper error propagation, which aligns with the other mini‑protocols.

Also applies to: 49-55


119-135: handleDone restart path avoids onceStop interaction

Switching handleDone to call s.Protocol.Stop() directly (with a nil‑check) before re‑initializing and starting the protocol fixes the earlier mismatch between Server.onceStop and per‑restart shutdown, while still surfacing Stop errors if they ever occur.

protocol/chainsync/client.go (3)

28-47: Start/Stop lifecycle flags correctly prevent unsafe reuse

Using atomic started/stopped to:

  • Block Start() after Stop(), and
  • Only send MsgDone/close readyForNextBlockChan when the protocol has actually started

gives you a clean, idempotent lifecycle and removes the prior risk of closing readyForNextBlockChan while handlers were still active.

Also applies to: 120-177


356-366: GetAvailableBlockRange now differentiates shutdown vs. cancellation

Handling:

  • ok == false as protocol.ErrProtocolShuttingDown, and
  • ready == false as ErrSyncCancelled

makes the caller’s error semantics much clearer. Just ensure upstream users treat ErrSyncCancelled as a “local abort” and ErrProtocolShuttingDown as a global shutdown signal.


750-769: Done-aware sends to readyForNextBlockChan fix send-on-closed races

Wrapping writes to readyForNextBlockChan in:

select {
case <-c.DoneChan():
    return protocol.ErrProtocolShuttingDown
case c.readyForNextBlockChan <- {true|false}:
}

ensures handlers either respect shutdown or safely signal readiness/cancellation, eliminating the earlier race where Stop() could close the channel while message handlers were still sending.

Also applies to: 792-808

protocol/protocol.go (4)

175-193: Stop() API now future‑proof while remaining effectively no‑op

Changing Stop() to return error but keeping it infallible for now gives you room to introduce failure modes later without another API break. Callers that either ignore or log the error remain correct under the current implementation.


225-257: Error path Stop() invocation in SendMessage is consistent

On pending‑byte limit violations, calling p.SendError(ErrProtocolViolationQueueExceeded) and then _ = p.Stop() is consistent with treating this as a protocol‑fatal condition while keeping the Stop error non‑disruptive for callers.


272-307: sendLoop shutdown/drain behavior resolves prior MsgDone race

The new shuttingDown path that drains sendQueueChan whenever recvDoneChan is closed ensures all queued messages (including MsgDone) are flushed before exiting, instead of relying on racy len() checks to decide when to stop. The remaining len(p.sendQueueChan) use is now only for batching and no longer affects correctness.

Also applies to: 309-388


510-515: readLoop’s Stop() on recv‑queue violations matches send side

On receive‑queue limit violations, emitting ErrProtocolViolationQueueExceeded and then calling _ = p.Stop() mirrors the send side behavior and keeps the protocol shutdown semantics consistent.

protocol/txsubmission/server.go (4)

27-41: Server lifecycle fields and Stop()/IsStopped() coordination look solid

The combination of:

  • ackCount as int32 with atomics,
  • a dedicated done channel protected by doneMutex,
  • onceStop for idempotent Stop, and
  • restartMutex + stopped flag

gives the server a clear lifecycle model. Stop() now safely coordinates with restarts (via restartMutex) and avoids the previous double‑close race on done.

Also applies to: 49-57, 93-130


156-171: RequestTxIds now has race‑free ackCount and restart‑aware waits

Using atomic.LoadInt32/StoreInt32 for ackCount removes the earlier data race, and taking a snapshot of requestTxIdsResultChan under restartMutex before waiting ensures calls either see a result for the current protocol instance or cleanly bail out via the done channel on shutdown.

Also applies to: 178-195


219-230: RequestTxs adopts the same guarded wait semantics

Mirroring the RequestTxIds pattern—snapshotting requestTxsResultChan under restartMutex and racing it against doneChan()—makes RequestTxs restart/shutdown‑aware in the same way, which is good for consistency.


263-268: Reply handlers synchronize with restarts via restartMutex

Locking restartMutex while sending to requestTxIdsResultChan / requestTxsResultChan ensures these sends are coordinated with channel reallocation in handleDone, removing the prior risk of writing into a channel that has just been swapped during a restart.

Also applies to: 280-283

@wolf31o2 wolf31o2 force-pushed the perf/protocols-improvements branch from 20d440a to 45dcdee Compare November 17, 2025 04:04
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
protocol/localstatequery/client.go (1)

29-47: Stop() can block forever if called before Start(); guard it on started.

The new stateMutex/started state fixes the prior data race, but Stop() still assumes the underlying protocol has been started:

  • Start() initializes internal protocol state and sets started = true under stateMutex.
  • Stop() always does msg := NewMsgDone(); err = c.SendMessage(msg) before checking started (and always calls c.Protocol.Stop()).

If Stop() is called before Start(), SendMessage will send on a nil sendQueueChan and block indefinitely, since Protocol.Start() was never called to create the send queue or run sendLoop. That’s the same class of issue that was addressed in other clients via started/atomic.Bool guards.

To make Stop safe and idempotent regardless of call order, gate the Done send and Protocol.Stop on started, for example:

func (c *Client) Stop() error {
    var err error
    c.onceStop.Do(func() {
        c.Protocol.Logger().
            Debug("stopping client protocol",
                "component", "network",
                "protocol", ProtocolName,
                "connection_id", c.callbackContext.ConnectionId.String(),
            )

-       msg := NewMsgDone()
-       if err = c.SendMessage(msg); err != nil {
-           return
-       }
-       _ = c.Protocol.Stop() // Error ignored - method returns SendMessage error
-       // Defer closing channels until protocol fully shuts down (only if started)
-       c.stateMutex.Lock()
-       started := c.started
-       c.stateMutex.Unlock()
-       if started {
+       c.stateMutex.Lock()
+       started := c.started
+       c.stateMutex.Unlock()
+       if started {
+           msg := NewMsgDone()
+           if err = c.SendMessage(msg); err != nil {
+               return
+           }
+           _ = c.Protocol.Stop()
            go func() {
                <-c.DoneChan()
                close(c.queryResultChan)
                close(c.acquireResultChan)
            }()
        } else {
            // If protocol was never started, close channels immediately
            close(c.queryResultChan)
            close(c.acquireResultChan)
        }
    })
    return err
}

This keeps current behaviour for normal Start→Stop flows, and makes Stop safe to call even if Start was never invoked or failed early.

Also applies to: 102-115, 117-149

protocol/localtxmonitor/client.go (1)

24-42: Fix data race on started and Stop‑before‑Start behavior in LocalTxMonitor client

There are two tightly related issues in the LocalTxMonitor client:

  1. Data race on started

    • Start() writes c.started under stateMutex.
    • Stop() reads c.started in if c.started { ... } and in the channel‑closure branch without any lock.
      This is a classic Go data race (unsynchronized read/write of the same variable across goroutines), and it can also cause Stop() to see a stale value and make the wrong decision about sending MsgDone or deferring channel closure.
  2. Stop‑before‑Start still sends MsgDone, risking the same hang fixed elsewhere
    If Stop() is called before Start(), c.started is still false in actual state, but due to the race it may be read as either value. When Stop() sends MsgDone while the protocol has never been started, it can block on an undrained send queue, the situation other clients in this PR now avoid by checking a started flag before sending Done.

You can address both issues and align this client with the patterns used in e.g. leiosfetch by:

  • Reading started under stateMutex into a local variable in Stop(), and using that local for both the MsgDone send and the channel‑closure decision.
  • Only sending MsgDone when started is true.
  • Adding a stopped check in Start() so Start-after-Stop is a no-op instead of starting a protocol whose result channels may already be closed.

Concrete diff:

 func (c *Client) Start() {
 	c.onceStart.Do(func() {
-		c.stateMutex.Lock()
-		defer c.stateMutex.Unlock()
-
-		c.Protocol.Logger().
-			Debug("starting client protocol",
-				"component", "network",
-				"protocol", ProtocolName,
-				"connection_id", c.callbackContext.ConnectionId.String(),
-			)
-		c.started = true
-		c.Protocol.Start()
+		c.stateMutex.Lock()
+		if c.stopped {
+			// Do not start a client that has already been stopped
+			c.stateMutex.Unlock()
+			return
+		}
+		c.started = true
+		c.stateMutex.Unlock()
+
+		c.Protocol.Logger().
+			Debug("starting client protocol",
+				"component", "network",
+				"protocol", ProtocolName,
+				"connection_id", c.callbackContext.ConnectionId.String(),
+			)
+		c.Protocol.Start()
 	})
 }
 
 // Stop transitions the protocol to the Done state. No more operations will be possible
 func (c *Client) Stop() error {
 	var err error
 	c.onceStop.Do(func() {
-		c.stateMutex.Lock()
-		c.stopped = true
-		c.stateMutex.Unlock()
+		c.stateMutex.Lock()
+		started := c.started
+		c.stopped = true
+		c.stateMutex.Unlock()
 
 		c.Protocol.Logger().
 			Debug("stopping client protocol",
 				"component", "network",
 				"protocol", ProtocolName,
 				"connection_id", c.callbackContext.ConnectionId.String(),
 			)
-		c.busyMutex.Lock()
-		msg := NewMsgDone()
-		if err = c.SendMessage(msg); err != nil {
-			c.busyMutex.Unlock()
-			return
-		}
-		c.busyMutex.Unlock()
+		if started {
+			c.busyMutex.Lock()
+			msg := NewMsgDone()
+			if err = c.SendMessage(msg); err != nil {
+				c.busyMutex.Unlock()
+				return
+			}
+			c.busyMutex.Unlock()
+		}
 
-		// Call Protocol.Stop() after releasing locks to avoid potential deadlocks
-		_ = c.Protocol.Stop()
+		// Call Protocol.Stop() after releasing locks to avoid potential deadlocks
+		_ = c.Protocol.Stop()
 
-		// Defer closing channels until protocol fully shuts down (only if started)
-		if c.started {
+		// Defer closing channels until protocol fully shuts down (only if started)
+		if started {
 			go func() {
 				<-c.DoneChan()
 				close(c.acquireResultChan)
 				close(c.hasTxResultChan)
 				close(c.nextTxResultChan)
 				close(c.getSizesResultChan)
 			}()
 		} else {
 			// If protocol was never started, close channels immediately
 			close(c.acquireResultChan)
 			close(c.hasTxResultChan)
 			close(c.nextTxResultChan)
 			close(c.getSizesResultChan)
 		}
 	})
 	return err
 }

With this change:

  • All accesses to started are synchronized via stateMutex, eliminating the race.
  • Stop() skips MsgDone when the client was never started, preventing the Stop-before-Start send-blocking issue.
  • Start() becomes a no-op after Stop(), matching the semantics in other updated clients and avoiding surprising restarts.

Also applies to: 89-103, 105-148, 150-301, 324-343, 345-394, 396-418

♻️ Duplicate comments (1)
protocol/localtxsubmission/client.go (1)

150-167: Done handling and result channel shutdown fix the prior TOCTOU risk

Adding MessageTypeDone to messageHandler, implementing handleDone to close submitResultChan once, and using select on <-c.DoneChan() vs c.submitResultChan <- … in both handleAcceptTx and handleRejectTx resolves the earlier send‑on‑closed‑channel race: sends now either complete or return ErrProtocolShuttingDown, and the channel is only closed via closeSubmitResultChan()’s sync.Once guard after shutdown. This is consistent with the updated state machine and the new SubmitTx server‑shutdown test.

Also applies to: 169-208, 210-227

🧹 Nitpick comments (7)
protocol/localtxsubmission/client.go (1)

25-37: Start/Stop lifecycle and locking look sound; Stop waits for in‑flight submits

The added onceStart/onceStop plus stateMutex/started tracking and closeSubmitResultOnce give a clear, single‑shot lifecycle with a safely closed submitResultChan. Holding busyMutex in Stop means it will wait for any in‑flight SubmitTx to finish rather than cancel it mid‑flight, which is reasonable but means Stop can block if the server is stuck in Busy. If you ever need Stop to actively cancel a hung submit, you may want a separate cancellation path that doesn’t serialize on busyMutex.

Also applies to: 77-126

protocol/keepalive/client.go (1)

27-37: Lifecycle flags and Stop() implementation mostly look solid; consider guarding Protocol.Stop() on started.

Using onceStart/onceStop plus started as an atomic.Bool gives you safe, idempotent Start/Stop and prevents SendMessage on a nil sendQueueChan. Timer cleanup on both DoneChan() and explicit Stop() is also correct.

The one edge case left is calling Stop() before Start(): you correctly skip MsgDone when c.started.Load() is false, but you still call c.Protocol.Stop(), which unconditionally unregisters from the muxer. If a caller accidentally calls Stop() without ever starting the protocol, this relies on Muxer.UnregisterProtocol being a no-op for unregistered protocols.

If you want Stop to be completely safe regardless of call order (and symmetric with other clients), consider also gating Protocol.Stop() on started, e.g.:

-        if c.started.Load() {
-            msg := NewMsgDone()
-            sendErr := c.SendMessage(msg)
-            if sendErr != nil {
-                c.stopErr = sendErr
-            }
-        }
-        stopErr := c.Protocol.Stop()
+        if c.started.Load() {
+            msg := NewMsgDone()
+            sendErr := c.SendMessage(msg)
+            if sendErr != nil {
+                c.stopErr = sendErr
+            }
+            stopErr := c.Protocol.Stop()
+            if c.stopErr == nil {
+                c.stopErr = stopErr
+            }
+        }

[significant behavior change only if Stop-before-Start is used; otherwise semantics stay the same]

Also applies to: 76-99, 101-133

protocol/protocol.go (1)

175-193: sendLoop shutdown/drain behaviour is much safer now; consider optionally rejecting sends after shutdown.

The changes here are a clear improvement:

  • Stop() is idempotent (via onceStop) and centralizes muxer unregistration.
  • The new shuttingDown flag plus drain loops triggered by recvDoneChan ensure messages queued before shutdown are flushed using sendMessage, without relying on len() during shutdown.
  • Queue-limit violations on send/recv now trigger Stop() on the protocol, which is consistent with the error path.

One remaining semantic edge case is SendMessage calls after shutdown has begun: since sendQueueChan is never closed and SendMessage doesn’t consult doneChan, such calls can still succeed (enqueue) even though sendLoop has already drained and exited, so those messages will never be sent and no error is propagated.

If you want stronger guarantees, consider (in a follow-up, not necessarily this PR):

  • Guarding SendMessage with a fast check on doneChan / a shutdown flag (returning ErrProtocolShuttingDown), or
  • Closing sendQueueChan at shutdown start to make further sends panic in tests and be caught early.

As-is, the refactor is sound and materially improves shutdown behaviour; the above would just tighten post-shutdown semantics.

Also applies to: 225-258, 272-418, 420-443, 516-517

protocol/txsubmission/server_concurrency_test.go (2)

28-95: Concurrent Stop() test structure looks solid

The Start/Stop concurrency test is well‑structured: bounded goroutine count, WaitGroup, and a timeout guard to catch deadlocks. Once the NtN mock issues are resolved, this should give good coverage for idempotent, concurrent Server.Stop() behavior.


97-143: Stopped flag assertion is straightforward and matches the API

The Stop/IsStopped() test cleanly exercises the lifecycle flag and matches the intended semantics of a permanent server stop. With the test currently skipped due to mock limitations, it’s a good candidate to enable once the NtN mock behavior is fixed.

protocol/localtxmonitor/client_test.go (1)

300-319: Client shutdown test aligns with new lifecycle semantics

The new TestClientShutdown cleanly verifies that the LocalTxMonitor client can be started and stopped without error using the standard handshake conversation. This matches the shutdown coverage added for other protocols and should catch basic lifecycle regressions.

protocol/peersharing/client.go (1)

30-35: Lifecycle and shutdown wiring for PeerSharing client looks sound; consider a small consistency tweak

The Start/Stop pattern plus handleSharePeersselect on DoneChan() correctly prevents send-on-closed panics and avoids Start-after-Stop; the started/stopped flags are only touched under stateMutex, so there’s no obvious race here.

If you want to match the patterns used in other clients (e.g., localtxmonitor, leiosfetch) and further reduce coupling between client locks and the underlying Protocol, you could release stateMutex before calling c.Protocol.Stop() (there’s no need to hold it across that call). Behavior wouldn’t change, but it slightly simplifies the locking story and keeps all Protocol.Stop() calls outside client-specific mutexes.

Also applies to: 75-95, 97-124, 160-175

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 20d440a and 45dcdee.

📒 Files selected for processing (34)
  • protocol/blockfetch/blockfetch.go (1 hunks)
  • protocol/blockfetch/client.go (7 hunks)
  • protocol/blockfetch/client_test.go (1 hunks)
  • protocol/blockfetch/server.go (1 hunks)
  • protocol/chainsync/chainsync.go (1 hunks)
  • protocol/chainsync/client.go (6 hunks)
  • protocol/chainsync/client_concurrency_test.go (1 hunks)
  • protocol/chainsync/client_test.go (2 hunks)
  • protocol/chainsync/error.go (1 hunks)
  • protocol/chainsync/server.go (1 hunks)
  • protocol/keepalive/client.go (4 hunks)
  • protocol/keepalive/client_test.go (1 hunks)
  • protocol/keepalive/keepalive.go (1 hunks)
  • protocol/leiosfetch/client.go (5 hunks)
  • protocol/leiosfetch/server.go (1 hunks)
  • protocol/leiosnotify/client.go (3 hunks)
  • protocol/leiosnotify/client_concurrency_test.go (1 hunks)
  • protocol/leiosnotify/client_test.go (1 hunks)
  • protocol/leiosnotify/server.go (1 hunks)
  • protocol/localstatequery/client.go (6 hunks)
  • protocol/localstatequery/client_test.go (3 hunks)
  • protocol/localtxmonitor/client.go (15 hunks)
  • protocol/localtxmonitor/client_test.go (2 hunks)
  • protocol/localtxsubmission/client.go (6 hunks)
  • protocol/localtxsubmission/client_test.go (2 hunks)
  • protocol/localtxsubmission/localtxsubmission.go (2 hunks)
  • protocol/peersharing/client.go (4 hunks)
  • protocol/peersharing/client_test.go (1 hunks)
  • protocol/peersharing/server.go (4 hunks)
  • protocol/peersharing/server_test.go (1 hunks)
  • protocol/protocol.go (8 hunks)
  • protocol/txsubmission/server.go (10 hunks)
  • protocol/txsubmission/server_concurrency_test.go (1 hunks)
  • protocol/txsubmission/server_test.go (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • protocol/peersharing/server_test.go
🚧 Files skipped from review as they are similar to previous changes (13)
  • protocol/blockfetch/server.go
  • protocol/chainsync/client_test.go
  • protocol/keepalive/keepalive.go
  • protocol/chainsync/client_concurrency_test.go
  • protocol/txsubmission/server_test.go
  • protocol/keepalive/client_test.go
  • protocol/chainsync/error.go
  • protocol/blockfetch/blockfetch.go
  • protocol/blockfetch/client.go
  • protocol/leiosnotify/client_concurrency_test.go
  • protocol/localstatequery/client_test.go
  • protocol/chainsync/chainsync.go
  • protocol/leiosnotify/client_test.go
🧰 Additional context used
🧬 Code graph analysis (17)
protocol/blockfetch/client_test.go (5)
protocol/chainsync/client_test.go (1)
  • TestClientShutdown (285-304)
protocol/keepalive/client_test.go (1)
  • TestClientShutdown (303-322)
connection.go (1)
  • Connection (59-103)
protocol/blockfetch/blockfetch.go (1)
  • BlockFetch (102-105)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/localtxmonitor/client_test.go (2)
connection.go (1)
  • Connection (59-103)
protocol/localtxmonitor/client.go (1)
  • Client (25-42)
protocol/peersharing/client.go (8)
protocol/keepalive/client.go (1)
  • Client (27-37)
protocol/leiosnotify/client.go (1)
  • Client (24-34)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/localtxmonitor/client.go (1)
  • Client (25-42)
protocol/localtxsubmission/client.go (1)
  • Client (26-37)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/peersharing/peersharing.go (1)
  • ProtocolName (27-27)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/leiosnotify/client.go (9)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/chainsync/client.go (1)
  • Client (29-47)
protocol/leiosfetch/client.go (1)
  • Client (26-38)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/localtxmonitor/client.go (1)
  • Client (25-42)
protocol/localtxsubmission/client.go (1)
  • Client (26-37)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/leiosnotify/messages.go (1)
  • NewMsgDone (149-156)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localtxsubmission/client_test.go (13)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-230)
protocol/chainsync/client_test.go (1)
  • TestClientShutdown (285-304)
protocol/keepalive/client_test.go (1)
  • TestClientShutdown (303-322)
protocol/leiosnotify/client_test.go (1)
  • TestClientShutdown (117-136)
protocol/localstatequery/client_test.go (1)
  • TestClientShutdown (357-376)
protocol/localtxmonitor/client_test.go (1)
  • TestClientShutdown (300-319)
protocol/peersharing/client_test.go (1)
  • TestClientShutdown (95-114)
connection.go (1)
  • Connection (59-103)
protocol/localtxsubmission/localtxsubmission.go (1)
  • LocalTxSubmission (75-78)
protocol/localtxsubmission/client.go (1)
  • Client (26-37)
internal/test/helpers.go (1)
  • DecodeHexString (14-22)
ledger/babbage.go (1)
  • TxTypeBabbage (39-39)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localtxsubmission/localtxsubmission.go (1)
protocol/localtxsubmission/messages.go (1)
  • MessageTypeDone (29-29)
protocol/localstatequery/client.go (5)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localstatequery/localstatequery.go (1)
  • AcquireTarget (131-133)
protocol/localstatequery/messages.go (3)
  • NewMsgDone (245-252)
  • AcquireFailurePointNotOnChain (44-44)
  • MsgResult (172-175)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localstatequery/error.go (2)
  • ErrAcquireFailurePointTooOld (20-20)
  • ErrAcquireFailurePointNotOnChain (23-25)
protocol/peersharing/client_test.go (5)
connection.go (2)
  • Connection (59-103)
  • NewConnection (107-130)
protocol/protocol.go (2)
  • ProtocolRoleClient (94-94)
  • New (122-133)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/peersharing/peersharing.go (1)
  • PeerSharing (67-70)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/chainsync/client.go (4)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/chainsync/error.go (1)
  • ErrSyncCancelled (26-26)
protocol/protocol.go (3)
protocol/message.go (1)
  • Message (18-22)
cbor/encode.go (1)
  • Encode (27-40)
muxer/segment.go (1)
  • NewSegment (48-69)
protocol/peersharing/server.go (2)
protocol/txsubmission/server.go (1)
  • Server (28-41)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localtxsubmission/client.go (4)
protocol/localtxsubmission/localtxsubmission.go (3)
  • Config (81-84)
  • CallbackContext (87-91)
  • ProtocolName (27-27)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localtxsubmission/messages.go (1)
  • MessageTypeDone (29-29)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/leiosfetch/client.go (4)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/chainsync/client.go (1)
  • Client (29-47)
protocol/localtxmonitor/client.go (4)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/localtxmonitor/messages.go (1)
  • NewMsgDone (79-86)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/server.go (5)
protocol/txsubmission/messages.go (1)
  • NewMsgRequestTxIds (68-82)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/txsubmission/txsubmission.go (1)
  • MaxAckCount (143-143)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/error.go (1)
  • ErrStopServerProcess (21-21)
protocol/txsubmission/server_concurrency_test.go (3)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/txsubmission/txsubmission.go (1)
  • TxSubmission (126-129)
protocol/txsubmission/server.go (1)
  • Server (28-41)
protocol/keepalive/client.go (3)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/keepalive/messages.go (1)
  • NewMsgDone (94-101)
🔇 Additional comments (14)
protocol/chainsync/server.go (1)

234-250: Propagating Stop error from handleDone looks correct

Returning the s.Stop() error instead of blindly restarting improves failure visibility and avoids reinitializing a protocol that failed to stop cleanly. The restart path is only taken on a clean Stop, which is a sensible guard.

protocol/localtxsubmission/localtxsubmission.go (1)

39-68: Done transitions in StateMap align with client/server shutdown semantics

Adding MessageTypeDone transitions from stateIdle (client) and stateBusy (server) into stateDone matches the new client MessageTypeDone handler and shutdown tests, giving the protocol an explicit completion path on both sides.

protocol/blockfetch/client_test.go (1)

210-230: New TestClientShutdown provides useful lifecycle coverage

The shutdown test mirrors the pattern used by other mini‑protocol tests and ensures Client.Start()/Client.Stop() work without additional traffic, which is a good smoke test for the new lifecycle logic.

protocol/localtxsubmission/client_test.go (2)

81-88: Extending the mock shutdown timeout to 5s is reasonable

Bumping the wait from 2s to 5s in the harness should make these tests less flaky under load while still keeping failures bounded in time.


167-216: New client lifecycle and server‑shutdown tests nicely cover the Done path

TestClientShutdown confirms the basic Start/Stop lifecycle for LocalTxSubmission, and TestSubmitTxServerShutdown validates that a server‑initiated Done causes SubmitTx to return ErrProtocolShuttingDown as intended. Together they exercise the new Done handling and result‑channel closure behavior end‑to‑end.

protocol/chainsync/client.go (1)

35-39: Shutdown-aware lifecycle and readyForNextBlockChan handling look correct and fix the prior race.

The combination of:

  • started/stopped as atomic.Bool with onceStart/onceStop,
  • Stop() sending MsgDone only when started, then closing readyForNextBlockChan after DoneChan() when started (or immediately when never started),
  • and the new Done-guarded select sends/receives on readyForNextBlockChan (in GetAvailableBlockRange, handleRollForward, and handleRollBackward),

gives you a clean, race-free shutdown path and clear error signalling (ErrProtocolShuttingDown vs ErrSyncCancelled). This resolves the previous write-after-close panic risk on readyForNextBlockChan without introducing obvious new concurrency hazards.

Also applies to: 120-177, 258-382, 623-769, 772-809

protocol/localstatequery/client.go (1)

37-42: acquired state and shutdown-aware handlers are correctly synchronized.

The introduction of acquiredMutex and its use in:

  • handleAcquired (setting c.acquired = true before signalling acquireResultChan),
  • acquire() / release() (reading and updating acquired),
  • and runQuery() (auto-acquiring a volatile tip when not acquired),

gives you race-free management of the acquired state.

The Done-aware selects in handleAcquired, handleFailure, and handleResult, combined with closing queryResultChan / acquireResultChan after DoneChan() in Stop, ensure:

  • No sends occur after channels are closed.
  • Callers of acquire()/runQuery() see ErrProtocolShuttingDown consistently when the client is stopping.

Overall this is a solid concurrency cleanup for LocalStateQuery.

Also applies to: 905-923, 925-951, 970-1006, 1008-1017, 1020-1041

protocol/txsubmission/server.go (1)

27-41: Server lifecycle, restart, and ackCount handling look correct and fix earlier races.

Key points that look good here:

  • ackCount is now an int32 with all access via sync/atomic, eliminating the previous data race between RequestTxIds() and the restart path.
  • done is protected by doneMutex for both closing and re-assignment, and all waiters go through doneChan(), which removes prior unsynchronized read/write and double-close risks.
  • Stop() is idempotent (onceStop), coordinates with handleDone() via restartMutex, sets a stopped flag, and safely closes done before calling Protocol.Stop().
  • RequestTxIds/RequestTxs snapshot their result channels under restartMutex and then select on that vs doneChan(), giving clean shutdown semantics (ErrProtocolShuttingDown when stopping).
  • handleDone() now:
    • Non-blockingly signals any in-flight RequestTxIds,
    • Skips restart when stopped is true,
    • Otherwise stops the current protocol, reinitializes Protocol and the result/done channels, resets ackCount, and restarts the protocol outside the lock, with an extra IsStopped() check for TOCTOU.

Overall, this is a solid concurrency and lifecycle cleanup for TxSubmission.Server, with no obvious remaining races in the Stop/restart paths.

Also applies to: 48-57, 83-116, 118-130, 132-195, 198-231, 254-283, 286-349

protocol/leiosfetch/server.go (1)

194-208: Properly propagating Stop() errors before restart

Routing the Done path through s.Stop() and returning its error before reinitializing the protocol is the right move; it prevents silent shutdown failures and aligns this server with the lifecycle pattern used elsewhere in the codebase.

protocol/localtxmonitor/client_test.go (1)

88-95: Increased mock shutdown timeout reduces flakiness

Bumping the mock connection shutdown timeout to 5 seconds is reasonable and should help reduce sporadic test failures on slower CI runs, while still keeping hung tests bounded.

protocol/leiosnotify/server.go (1)

109-123: Done handler now correctly respects Stop() failures

Having handleDone call s.Stop() and propagate any error before reinitializing and starting the protocol ensures that restart only proceeds from a clean shutdown, improving observability and consistency with the broader protocol lifecycle pattern.

protocol/peersharing/server.go (1)

17-32: Server Stop() and Done restart flow now correctly separate per‑server and per‑protocol shutdown

Introducing onceStop on Server with a Stop() method that delegates to s.Protocol.Stop() (and returns its error) gives you an idempotent, externally callable shutdown hook. Updating handleDone to call s.Protocol.Stop() directly before reinitializing and restarting cleanly separates the per‑Server “final Stop” from the per‑Protocol restart-on-Done behavior, avoiding the earlier once/Restart interaction issues.

Also applies to: 49-55, 119-135

protocol/leiosfetch/client.go (1)

20-37: LeiosFetch client lifecycle and shutdown handling looks correct

The atomic started/stopped flags, conditional MsgDone send, unconditional Protocol.Stop(), and DoneChan‑aware handlers together give a clean Start/Stop story without obvious races or send‑on‑closed risks. The Stop‑before‑Start and Start‑after‑Stop cases are both handled in a way consistent with the other clients.

Also applies to: 92-107, 109-149, 151-221, 246-289

protocol/peersharing/client_test.go (1)

28-93: PeerSharing client shutdown test harness looks solid

The runTest helper plus TestClientShutdown give good coverage of client Start/Stop behavior under a mock NtN handshake, with goleak and explicit timeouts guarding against hangs or leaks. The async error channels also ensure unexpected protocol errors fail the test promptly.

Also applies to: 95-114

@wolf31o2 wolf31o2 force-pushed the perf/protocols-improvements branch from 45dcdee to 06fa23d Compare November 17, 2025 20:03
- Increase default queue sizes for better buffering
- Eliminate unnecessary goroutines for channel cleanup
- Apply optimizations consistently across all protocols

Signed-off-by: Chris Gianelloni <wolf31o2@blinklabs.io>
@wolf31o2 wolf31o2 force-pushed the perf/protocols-improvements branch from 06fa23d to 1d4e741 Compare November 17, 2025 22:11
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
protocol/blockfetch/client.go (1)

30-41: Guard SendMessage(MsgClientDone) with started flag to prevent Send on uninitialized queue

The blockfetch Stop() method unconditionally sends MsgClientDone and calls SendMessage, even if Start() was never invoked. This diverges from the established pattern in keepalive/client.go and leiosnotify/client.go, which both guard SendMessage with a started check. If Stop() is called before Start(), the send will occur on an uninitialized sendQueueChan, causing a hang or panic.

Fix at lines 103–132: Guard the MsgClientDone send and Protocol.Stop() call with c.started.Load():

func (c *Client) Stop() error {
	var err error
	c.onceStop.Do(func() {
		c.Protocol.Logger().
			Debug("stopping client protocol",
				"component", "network",
				"protocol", ProtocolName,
				"connection_id", c.callbackContext.ConnectionId.String(),
			)
-		msg := NewMsgClientDone()
-		if sendErr := c.SendMessage(msg); sendErr != nil {
-			err = sendErr
-		}
-		_ = c.Protocol.Stop()
+		if c.started.Load() {
+			msg := NewMsgClientDone()
+			if sendErr := c.SendMessage(msg); sendErr != nil {
+				err = sendErr
+			}
+			_ = c.Protocol.Stop()
+		}
		// Defer closing channels until protocol fully shuts down (only if started)
		if c.started.Load() {
			go func() {
				<-c.DoneChan()
				close(c.blockChan)
				close(c.startBatchResultChan)
			}()
		} else {
			// If protocol was never started, close channels immediately
			close(c.blockChan)
			close(c.startBatchResultChan)
		}
	})
	return err
}
protocol/leiosnotify/client.go (1)

24-35: Stop() should call Protocol.Stop() to ensure shutdown progress regardless of peer behavior

The lifecycle flags and stateMutex ordering in Start()/Stop() are safe, and the requestNextChan closure logic is correct. However, Stop() never calls c.Protocol.Stop(), which means:

  • The recvLoop() waits indefinitely on muxerDoneChan, which only signals when UnregisterProtocol() is called
  • Since Stop() never calls Protocol.Stop(), UnregisterProtocol() is never invoked
  • If the peer disconnects or misbehaves, muxerDoneChan may never signal, leaving recvDoneChan open
  • This prevents p.doneChan from ever closing (it awaits both recvDoneChan and sendDoneChan)
  • The goroutine at line 115 blocks indefinitely on <-c.DoneChan()

Other clients in this PR already call Protocol.Stop() (blockfetch, chainsync, peersharing, keepalive, localtxsubmission, localtxmonitor, leiosfetch) to guarantee DoneChan() closes even on peer misbehavior. Since Protocol.Stop() is idempotent and safe to call regardless of protocol state, update Stop() to call it.

The suggested diff pattern is sound: capture state flags under lock, release early, and call Protocol.Stop() after releasing the lock to avoid deadlock.

🧹 Nitpick comments (7)
protocol/leiosfetch/server.go (1)

194-208: Decouple restart path from any future Server.Stop guards

Propagating the Stop() error here is good. However, calling s.Stop() for the restart path couples handleDone() to whatever semantics Server.Stop() may grow in the future (e.g., a sync.Once guard, as seen in peersharing.Server). That can reintroduce the “can’t stop restarted protocols” issue if Stop() becomes a one-shot shutdown.

Consider calling the embedded protocol directly instead:

-	// Restart protocol
-	if err := s.Stop(); err != nil {
+	// Restart protocol: stop only the current Protocol instance
+	if err := s.Protocol.Stop(); err != nil {
 		return err
 	}

This keeps handleDone() focused on per-instance teardown while allowing Server.Stop() (if added or changed later) to represent permanent server shutdown.

protocol/leiosfetch/client.go (1)

17-21: Tighten Stop() error handling and clarify Start/Stop concurrency expectations

The lifecycle changes (atomic started/stopped, onceStart/onceStop) significantly improve shutdown behavior and allow Stop()-before-Start() cases to be handled cleanly. A couple of refinements would make this even more robust:

  1. Preserve Protocol.Stop() errors as well as SendMessage errors

Stop() currently only propagates SendMessage failures; any error from c.Protocol.Stop() is silently dropped:

if c.started.Load() {
    if sendErr := c.SendMessage(msg); sendErr != nil {
        err = sendErr
    }
}
// Always attempt to stop the protocol...
_ = c.Protocol.Stop() // Stop error ignored

If Protocol.Stop() can fail (e.g., muxer shutdown problems), callers have no visibility. You can preserve both while still prioritizing the SendMessage error:

 func (c *Client) Stop() error {
 	var err error
 	c.onceStop.Do(func() {
@@
-		if c.started.Load() {
-			if sendErr := c.SendMessage(msg); sendErr != nil {
-				// Preserve the SendMessage error but still shut down the protocol.
-				err = sendErr
-			}
-		}
-		// Always attempt to stop the protocol so DoneChan and muxer shutdown complete.
-		_ = c.Protocol.Stop() // Stop error ignored; err already reflects SendMessage failure if any
+		if c.started.Load() {
+			if sendErr := c.SendMessage(msg); sendErr != nil {
+				// Preserve the SendMessage error but still shut down the protocol.
+				err = sendErr
+			}
+		}
+		// Always attempt to stop the protocol so DoneChan and muxer shutdown complete.
+		if stopErr := c.Protocol.Stop(); stopErr != nil && err == nil {
+			// Only surface Stop() error if we don't already have a SendMessage error.
+			err = stopErr
+		}
@@
-		c.started.Store(false)
-		c.stopped.Store(true)
+		c.started.Store(false)
+		c.stopped.Store(true)
 	})
 	return err
 }
  1. Document or guard against concurrent Start/Stop from different goroutines

Atomics eliminate data races on started/stopped, but there’s still no higher-level coordination between Start() and Stop() calls from different goroutines. For example, if Stop() runs early while started is still false, it may close the result channels immediately, and a concurrent Start() that passes the stopped.Load() check (due to timing) can still call c.Protocol.Start() with already-closed result channels.

If your usage guarantees Start()/Stop() are serialized by the caller, consider adding a brief comment to that effect. If concurrent Start/Stop is intended to be supported, adding a small state mutex (similar to leiosnotify.Client and peersharing.Client) around the high-level state changes would make those guarantees explicit.

Also applies to: 26-38, 92-149

protocol/localtxsubmission/client.go (1)

77-91: Consider aligning Start/Stop semantics with other clients

Here Start() unconditionally marks started = true and calls Protocol.Start(), and Stop() always sends MsgDone (even if Start() was never called), while only using started to decide when to wait on DoneChan before closing submitResultChan. Other clients (e.g. LocalTxMonitor) track a stopped flag and skip sending Done when never started, and prevent Start() from running after a prior Stop().

Not a bug, but you may want to:

  • Add a stopped flag and short‑circuit Start() when stopped is true, and
  • Only send MsgDone in Stop() when started is true,

to keep lifecycle semantics consistent across protocols and avoid odd edge‑cases like Stop() being called before Start().

Also applies to: 93-126

protocol/keepalive/client.go (1)

26-37: KeepAlive Start/Stop mostly look good; consider tightening the started semantics for concurrency

The new lifecycle wiring is an improvement:

  • Stop() is idempotent via onceStop,
  • It skips MsgDone/Protocol.Stop() when the client was never started, avoiding the earlier “Stop-before-Start” bug, and
  • Timer cleanup is handled both on Stop() and on protocol shutdown via the DoneChan() watcher.

One subtle corner case: Start() sets c.started to true before calling c.Protocol.Start(), while Stop() uses c.started.Load() to decide whether to send MsgDone and call Protocol.Stop(). If Start() and Stop() can be invoked from different goroutines, there is a small window where Stop() may observe started=true even though the underlying protocol hasn’t actually finished starting yet (e.g., sendQueueChan not initialized), which is exactly what started was intended to guard against.

If you expect concurrent Start/Stop, it would be safer to:

  • Move c.started.Store(true) after c.Protocol.Start(), and
  • Optionally document that Stop-before-Start and Start-after-Stop are supported only in the non‑concurrent sense.

Example tweak:

func (c *Client) Start() {
	c.onceStart.Do(func() {
-		c.started.Store(true)
 		c.Protocol.Logger().
 			Debug("starting client protocol", ...)
 		c.Protocol.Start()
+		c.started.Store(true)
 		// DoneChan cleanup goroutine...
 		go func() { ... }()
 		c.sendKeepAlive()
	})
}

If your usage never calls Start()/Stop() concurrently, this remains mostly a robustness improvement rather than a correctness fix, but it’s cheap insurance.

Also applies to: 76-99, 101-132

protocol/chainsync/client.go (1)

28-47: ChainSync lifecycle tracking is solid; watch started ordering for concurrent Start/Stop

The new lifecycle additions improve robustness:

  • stopped prevents Start() from running after Stop().
  • started is used to decide when it’s valid to send MsgDone and to defer closing readyForNextBlockChan until DoneChan() has fired.
  • Stop() always calls c.Protocol.Stop(), so muxer/protocol shutdown is not left to the peer.

Sequential Start/Stop and Stop-before-Start flows look correct. The only subtlety is the same as in keepalive:

  • Start() does c.started.Store(true) before c.Protocol.Start(), while
  • Stop() uses c.started.Load() to decide whether to send MsgDone.

If Start() and Stop() can be invoked from different goroutines, there’s a small window where Stop() may see started=true but the underlying protocol isn’t fully started yet (e.g., send loop not initialised), and SendMessage may again run “too early”.

If concurrent Start/Stop is part of your contract (and client_concurrency_test.go suggests it might be), consider:

func (c *Client) Start() {
	c.onceStart.Do(func() {
		if c.stopped.Load() {
			return
		}
		c.Protocol.Logger().Debug("starting client protocol", ...)
-		c.started.Store(true)
-		c.Protocol.Start()
+		c.Protocol.Start()
+		c.started.Store(true)
	})
}

so started more accurately reflects “protocol ready for messages”.

Also applies to: 120-134, 136-177

protocol/protocol.go (1)

281-330: Consider extracting duplicate draining logic.

The draining loops at lines 292-307 and 316-330 are nearly identical. Consider extracting this pattern into a helper method to reduce duplication:

func (p *Protocol) drainSendQueue() {
    for {
        select {
        case msg, ok := <-p.sendQueueChan:
            if !ok {
                return
            }
            p.sendMessage(msg)
        default:
            return
        }
    }
}

Then replace both loops with p.drainSendQueue().

protocol/localstatequery/client.go (1)

925-951: Consider simplifying duplicate error mapping.

The two failure cases have identical channel-send logic. Consider reducing duplication:

 func (c *Client) handleFailure(msg protocol.Message) error {
 	...
 	msgFailure := msg.(*MsgFailure)
+	var err error
 	switch msgFailure.Failure {
 	case AcquireFailurePointTooOld:
-		select {
-		case <-c.DoneChan():
-			return protocol.ErrProtocolShuttingDown
-		case c.acquireResultChan <- ErrAcquireFailurePointTooOld:
-		}
+		err = ErrAcquireFailurePointTooOld
 	case AcquireFailurePointNotOnChain:
-		select {
-		case <-c.DoneChan():
-			return protocol.ErrProtocolShuttingDown
-		case c.acquireResultChan <- ErrAcquireFailurePointNotOnChain:
-		}
+		err = ErrAcquireFailurePointNotOnChain
 	default:
 		return fmt.Errorf("unknown failure type: %d", msgFailure.Failure)
 	}
+	select {
+	case <-c.DoneChan():
+		return protocol.ErrProtocolShuttingDown
+	case c.acquireResultChan <- err:
+	}
 	return nil
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45dcdee and 1d4e741.

📒 Files selected for processing (32)
  • protocol/blockfetch/client.go (7 hunks)
  • protocol/blockfetch/client_test.go (1 hunks)
  • protocol/blockfetch/server.go (1 hunks)
  • protocol/chainsync/client.go (6 hunks)
  • protocol/chainsync/client_concurrency_test.go (1 hunks)
  • protocol/chainsync/client_test.go (2 hunks)
  • protocol/chainsync/error.go (1 hunks)
  • protocol/chainsync/server.go (1 hunks)
  • protocol/keepalive/client.go (4 hunks)
  • protocol/keepalive/client_test.go (1 hunks)
  • protocol/keepalive/keepalive.go (1 hunks)
  • protocol/leiosfetch/client.go (5 hunks)
  • protocol/leiosfetch/server.go (1 hunks)
  • protocol/leiosnotify/client.go (3 hunks)
  • protocol/leiosnotify/client_concurrency_test.go (1 hunks)
  • protocol/leiosnotify/client_test.go (1 hunks)
  • protocol/leiosnotify/server.go (1 hunks)
  • protocol/localstatequery/client.go (6 hunks)
  • protocol/localstatequery/client_test.go (3 hunks)
  • protocol/localtxmonitor/client.go (15 hunks)
  • protocol/localtxmonitor/client_test.go (2 hunks)
  • protocol/localtxsubmission/client.go (6 hunks)
  • protocol/localtxsubmission/client_test.go (2 hunks)
  • protocol/localtxsubmission/localtxsubmission.go (2 hunks)
  • protocol/peersharing/client.go (4 hunks)
  • protocol/peersharing/client_test.go (1 hunks)
  • protocol/peersharing/server.go (4 hunks)
  • protocol/peersharing/server_test.go (1 hunks)
  • protocol/protocol.go (8 hunks)
  • protocol/txsubmission/server.go (10 hunks)
  • protocol/txsubmission/server_concurrency_test.go (1 hunks)
  • protocol/txsubmission/server_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (12)
  • protocol/localtxsubmission/localtxsubmission.go
  • protocol/keepalive/client_test.go
  • protocol/peersharing/client_test.go
  • protocol/chainsync/client_concurrency_test.go
  • protocol/localtxsubmission/client_test.go
  • protocol/chainsync/server.go
  • protocol/blockfetch/server.go
  • protocol/peersharing/server_test.go
  • protocol/txsubmission/server_concurrency_test.go
  • protocol/chainsync/client_test.go
  • protocol/localstatequery/client_test.go
  • protocol/txsubmission/server_test.go
🧰 Additional context used
🧬 Code graph analysis (17)
protocol/leiosfetch/client.go (8)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/chainsync/client.go (1)
  • Client (29-47)
protocol/keepalive/client.go (1)
  • Client (27-37)
protocol/leiosnotify/client.go (1)
  • Client (24-34)
protocol/peersharing/client.go (1)
  • Client (25-35)
protocol/message.go (1)
  • Message (18-22)
protocol/leiosnotify/client_concurrency_test.go (2)
protocol/chainsync/client_concurrency_test.go (1)
  • TestStopBeforeStart (106-148)
protocol/leiosnotify/client.go (1)
  • Client (24-34)
protocol/blockfetch/client.go (2)
protocol/protocol.go (2)
  • Protocol (39-60)
  • New (122-133)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/peersharing/client.go (5)
protocol/leiosnotify/client.go (1)
  • Client (24-34)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/localtxsubmission/client.go (1)
  • Client (26-37)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/peersharing/server.go (2)
protocol/txsubmission/server.go (1)
  • Server (28-41)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/keepalive/keepalive.go (3)
protocol/keepalive/messages.go (1)
  • MessageTypeDone (30-30)
protocol/state.go (1)
  • NewState (38-43)
protocol/blockfetch/blockfetch.go (1)
  • StateDone (46-46)
protocol/leiosnotify/client_test.go (6)
protocol/versiondata.go (6)
  • VersionData (40-46)
  • VersionDataNtN13andUp (143-145)
  • VersionDataNtN11to12 (116-122)
  • DiffusionModeInitiatorOnly (21-21)
  • PeerSharingModeNoPeerSharing (27-27)
  • QueryModeDisabled (36-36)
protocol/handshake/messages.go (1)
  • NewMsgAcceptVersion (88-102)
connection.go (2)
  • Connection (59-103)
  • NewConnection (107-130)
connection_options.go (3)
  • WithConnection (36-40)
  • WithNetworkMagic (50-54)
  • WithNodeToNode (78-82)
protocol/leiosnotify/leiosnotify.go (1)
  • LeiosNotify (75-78)
protocol/leiosnotify/client.go (1)
  • Client (24-34)
protocol/leiosnotify/client.go (3)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/leiosnotify/messages.go (1)
  • NewMsgDone (149-156)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/blockfetch/client_test.go (3)
connection.go (1)
  • Connection (59-103)
protocol/blockfetch/blockfetch.go (1)
  • BlockFetch (102-105)
protocol/blockfetch/client.go (1)
  • Client (30-41)
protocol/protocol.go (3)
protocol/message.go (1)
  • Message (18-22)
cbor/encode.go (1)
  • Encode (27-40)
muxer/segment.go (1)
  • NewSegment (48-69)
protocol/localtxmonitor/client.go (4)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localstatequery/client.go (1)
  • Client (30-47)
protocol/localtxmonitor/messages.go (1)
  • NewMsgDone (79-86)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localtxsubmission/client.go (4)
protocol/localtxsubmission/localtxsubmission.go (2)
  • Config (81-84)
  • CallbackContext (87-91)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localtxsubmission/messages.go (1)
  • MessageTypeDone (29-29)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/chainsync/client.go (3)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/chainsync/error.go (1)
  • ErrSyncCancelled (26-26)
protocol/txsubmission/server.go (5)
protocol/txsubmission/messages.go (2)
  • TxBody (197-201)
  • NewMsgRequestTxIds (68-82)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/txsubmission/txsubmission.go (1)
  • MaxAckCount (143-143)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/txsubmission/error.go (1)
  • ErrStopServerProcess (21-21)
protocol/localtxmonitor/client_test.go (10)
protocol/blockfetch/client_test.go (1)
  • TestClientShutdown (211-230)
protocol/chainsync/client_test.go (1)
  • TestClientShutdown (285-304)
protocol/keepalive/client_test.go (1)
  • TestClientShutdown (303-322)
protocol/leiosnotify/client_test.go (1)
  • TestClientShutdown (117-136)
protocol/localstatequery/client_test.go (1)
  • TestClientShutdown (357-376)
protocol/localtxsubmission/client_test.go (1)
  • TestClientShutdown (167-186)
protocol/peersharing/client_test.go (1)
  • TestClientShutdown (95-114)
connection.go (1)
  • Connection (59-103)
protocol/localtxmonitor/localtxmonitor.go (1)
  • LocalTxMonitor (112-115)
protocol/localtxmonitor/client.go (1)
  • Client (25-42)
protocol/keepalive/client.go (2)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/keepalive/messages.go (1)
  • NewMsgDone (94-101)
protocol/localstatequery/client.go (5)
protocol/protocol.go (1)
  • Protocol (39-60)
protocol/localtxmonitor/client.go (1)
  • Client (25-42)
protocol/localstatequery/messages.go (4)
  • NewMsgDone (245-252)
  • AcquireFailurePointNotOnChain (44-44)
  • MsgResult (172-175)
  • NewMsgQuery (160-170)
protocol/error.go (1)
  • ErrProtocolShuttingDown (19-19)
protocol/localstatequery/error.go (2)
  • ErrAcquireFailurePointTooOld (20-20)
  • ErrAcquireFailurePointNotOnChain (23-25)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (go)
🔇 Additional comments (38)
protocol/keepalive/keepalive.go (1)

69-81: Done transition from server state looks correct

Adding MessageTypeDone -> StateDone for StateServer cleanly mirrors the client-side transition and gives the keep-alive state machine a proper completion path from both agencies. No issues from a protocol/state-machine perspective.

protocol/leiosnotify/client_concurrency_test.go (1)

30-55: Verify LeiosNotify is actually initialized and the handshake fixture is available

This test assumes:

  1. conversationEntryNtNResponseV15 is defined in the leiosnotify_test package (or imported via another file in the same package), and
  2. ouroboros.New(..., ouroboros.WithNodeToNode(true)) is sufficient to initialize the LeiosNotify protocol such that oConn.LeiosNotify().Client is non-nil.

If either assumption is false, you’ll hit a compile-time undefined symbol error (for the fixture) or a runtime nil-pointer when accessing LeiosNotify().Client.

Please double-check:

  • That conversationEntryNtNResponseV15 is declared in another _test file in this package, and
  • That LeiosNotify is enabled by default for NtN connections in ouroboros.New (or add an explicit WithLeiosNotifyConfig(...) if required).
protocol/peersharing/server.go (1)

17-32: Server Stop semantics and Done-handling look correct now

The new onceStop-guarded Stop() correctly propagates Protocol.Stop() errors, and handleDone() now calls s.Protocol.Stop() directly (with a nil check) before reinitializing and restarting the protocol. This cleanly separates:

  • Per-Server permanent shutdown via Server.Stop(), and
  • Per-conversation restart behavior via handleDone().

This addresses the earlier concern where using s.Stop() inside handleDone() made restarted protocol instances un-stoppable once onceStop had fired.

Also applies to: 49-55, 119-135

protocol/leiosfetch/client.go (1)

246-251: Non-blocking result handlers correctly respect shutdown

The updated handlers that select on c.DoneChan() versus sending to the result channels are a solid improvement:

select {
case <-c.DoneChan():
    return protocol.ErrProtocolShuttingDown
case c.blockResultChan <- msg:
}

This pattern ensures:

  • No sends to closed channels (since channel closure is gated on DoneChan()), and
  • Handlers bail out promptly with ErrProtocolShuttingDown once shutdown is in progress.

The same logic applied across all result channels looks consistent and safe.

Also applies to: 255-260, 264-269, 273-278, 282-287

protocol/localtxsubmission/client.go (2)

28-37: Shutdown sequencing and channel close logic look sound

The added lifecycle state (stateMutex, started, onceStart/onceStop) plus closeSubmitResultOnce and the DoneChan-gated Stop() logic form a coherent shutdown story: submitResultChan is only closed once, after the protocol is fully done (or immediately when never started), and Stop() always attempts to drive the underlying protocol to completion. This resolves the previous “send to closed channel” risk around submitResultChan while keeping shutdown behavior predictable for callers (SubmitTx sees ErrProtocolShuttingDown on a closed channel).

Also applies to: 77-91, 93-126


150-167: Handler changes correctly respect shutdown and avoid races

The updated handlers for AcceptTx, RejectTx, and Done now:

  • Use select { case <-c.DoneChan(): ... case c.submitResultChan <- ... } to avoid sending on result channels once shutdown is active, and
  • Centralize channel closing in handleDone()/closeSubmitResultChan() using sync.Once.

This is the right pattern to avoid TOCTOU panics on shutdown and ensures any in‑flight SubmitTx calls either receive a result or observe ErrProtocolShuttingDown via a closed channel.

Also applies to: 169-208, 210-227

protocol/localtxmonitor/client.go (2)

24-42: Lifecycle guards and Stop() behavior are robust

The added stateMutex with started/stopped flags, plus the reworked Start() and Stop():

  • Prevent starting a client after it’s been stopped,
  • Only send MsgDone when the protocol actually ran, and
  • Call Protocol.Stop() after releasing all locks to avoid deadlocks,

while closing all result channels only after DoneChan() when started (or immediately otherwise). This nicely fixes the earlier “Stop() never stops Protocol / DoneChan never closes” concern and gives callers a clear ErrProtocolShuttingDown signal.

Also applies to: 89-156


158-235: Channel interactions are now shutdown-aware and race-free

The changes to Acquire/Release/HasTx/NextTx/GetSizes and the corresponding handlers:

  • Gate public operations on stopped and return protocol.ErrProtocolShuttingDown when appropriate,
  • Use acquiredMutex to safely manage acquired/acquiredSlot, and
  • Route all handler results through select statements that race sends against DoneChan().

This prevents writes to closed channels and ensures that in-flight operations either complete normally or observe a clean shutdown error once Stop() runs. The implementation looks correct and consistent with other protocols.

Also applies to: 237-309, 332-351, 353-402, 404-425

protocol/localtxmonitor/client_test.go (1)

50-106: Extended mock-connection timeout is reasonable

Bumping the runTest completion timeout to 5s aligns with other protocol tests and gives a safer margin for the additional shutdown sequencing without weakening the leak/timeout checks. No issues here.

protocol/blockfetch/client_test.go (1)

211-230: Shutdown test adds useful coverage

TestClientShutdown follows the existing harness pattern (NtN handshake, start then stop client) and gives explicit coverage for the new BlockFetch lifecycle behavior. Looks correct and consistent with other protocol tests.

protocol/leiosnotify/server.go (1)

109-123: Guarding restart on Stop() error is correct

Updating handleDone() to check and return s.Stop()’s error before reinitializing the protocol avoids restarting on a failed shutdown and cleanly propagates problems to the caller. This matches the tightened lifecycle semantics elsewhere in the PR.

protocol/chainsync/error.go (1)

19-26: New ErrSyncCancelled error is well-defined

ErrSyncCancelled is named and documented consistently with the existing chainsync errors and provides a clear sentinel for cancellation cases. No issues here.

protocol/leiosnotify/client_test.go (1)

30-55: LeiosNotify shutdown test and handshake helpers look good

The mocked NtN v15 handshake (mockNtNVersionData/mockNtNVersionDataV11) plus TestClientShutdown follow the same pattern as other protocol shutdown tests, with realistic version data and strict leak/timeout checks. This gives solid coverage for the new LeiosNotify client lifecycle behavior.

Also applies to: 58-115, 117-136

protocol/peersharing/client.go (2)

30-35: Lifecycle guards for Start/Stop look sound and avoid send-on-closed races

The onceStart/onceStop plus stateMutex/started/stopped pattern makes Start() idempotent and ensures Start() becomes a no-op after Stop(), while Stop() safely defers sharePeersChan closure until DoneChan() for started clients and closes immediately when never started. This avoids the earlier “Stop-before-Start” and send-on-closed issues and aligns well with the other protocol clients.

If you want extra safety, you could add a small TestStopBeforeStart/TestStartAfterStop in protocol/peersharing/client_test.go mirroring the chainsync tests to lock in these semantics.

Also applies to: 75-125


161-175: Shutdown-aware send in handleSharePeers is correct

The select on DoneChan() vs c.sharePeersChan <- … ensures handlers return ErrProtocolShuttingDown instead of ever sending to a channel that will be (or has been) closed during shutdown, eliminating the prior panic window.

protocol/leiosnotify/client.go (1)

160-194: Handler select patterns correctly avoid send-on-closed panics

The four handlers now route notifications via:

select {
case <-c.DoneChan():
    return protocol.ErrProtocolShuttingDown
case c.requestNextChan <- msg:
}

This is the right pattern to avoid writing to requestNextChan after Stop() has closed it, while still letting in-flight requests complete when the protocol is healthy.

protocol/blockfetch/client.go (1)

228-243: Shutdown-aware selects in handlers correctly protect result channels

The updated handlers:

  • handleStartBatch and handleNoBlocks now do a select on DoneChan() vs c.startBatchResultChan <- …, and
  • handleBlock checks DoneChan() both before heavy work and again before sending into blockChan,

so handlers now return ErrProtocolShuttingDown instead of writing into channels that may be closed by Stop(). This matches the pattern used in other protocols and fixes the prior send-on-closed race while keeping behavior for successful cases unchanged.

Also applies to: 245-261, 263-316

protocol/chainsync/client.go (2)

256-382: readyForNextBlockChan handling in GetAvailableBlockRange and syncLoop is now shutdown-safe

The new case ready, ok := <-c.readyForNextBlockChan in GetAvailableBlockRange:

  • Returns ErrProtocolShuttingDown when the channel is closed (ok==false), and
  • Returns ErrSyncCancelled when ready is false, matching the semantics introduced for ErrStopSyncProcess.

In syncLoop, reading readyForNextBlockChan with <-c.readyForNextBlockChan and checking ok/ready similarly ensures the loop exits cleanly on shutdown or cancellation.

Combined with Stop() deferring channel closure until after DoneChan() (for started clients), this ensures:

  • No send-on-closed panic,
  • Block-range computation and sync loop both terminate promptly on shutdown, and
  • Cancellation is distinguishable from shutdown via ErrSyncCancelled.

Also applies to: 466-496


623-770: Roll-forward/backward handlers now coordinate cleanly with shutdown and cancellation

In both handleRollForward and handleRollBackward:

  • When callbacks return ErrStopSyncProcess, the handlers signal cancellation via:
select {
case <-c.DoneChan():
    return protocol.ErrProtocolShuttingDown
case c.readyForNextBlockChan <- false:
}
  • On normal progress they signal readiness for the next block with a similar select sending true.

These changes:

  • Route all sync-loop control through readyForNextBlockChan,
  • Respect DoneChan() to avoid writes after Stop() closes the channel, and
  • Cleanly distinguish shutdown (ErrProtocolShuttingDown) from user-driven cancellation (ErrSyncCancelled).

This is a good, consistent pattern with the rest of the PR.

Also applies to: 772-810

protocol/protocol.go (3)

175-193: LGTM: Stop() signature change provides future extensibility.

The error return type allows future implementations to report failures without another breaking change. The current nil return and comprehensive documentation are appropriate.


420-443: LGTM: sendMessage() helper appropriately scoped for shutdown draining.

The decision to skip state accounting during shutdown is well-documented and correct—these operations are irrelevant once shutdown is initiated.


251-251: LGTM: Stop() error handling appropriate in error path.

Ignoring the Stop() error here is correct—the protocol violation error is the primary concern, and Stop() currently returns nil anyway. The inline comment clarifies the intent.

protocol/localstatequery/client.go (8)

38-46: LGTM: Lifecycle synchronization fields properly added.

The addition of stateMutex and onceStop correctly addresses the data race on the started field identified in previous reviews. The acquiredMutex provides proper synchronization for the acquired state.


102-115: LGTM: Start() properly synchronizes started flag.

The stateMutex correctly guards the started field write, preventing the data race with Stop().


117-149: Verify shutdown on SendMessage error.

If SendMessage(msg) fails at line 133, the function returns without calling c.Protocol.Stop() (line 136). This leaves the underlying protocol running. Consider calling Protocol.Stop() in the error path:

 		msg := NewMsgDone()
 		if err = c.SendMessage(msg); err != nil {
+			_ = c.Protocol.Stop()
 			return
 		}
 		_ = c.Protocol.Stop() // Error ignored - method returns SendMessage error

Alternatively, if leaving the protocol running on SendMessage failure is intentional (e.g., to allow retries), document this behavior.


913-922: LGTM: handleAcquired properly synchronizes acquired state.

The acquiredMutex correctly protects the acquired field, and the shutdown-aware channel send prevents blocking on a closed protocol.


953-968: LGTM: handleResult properly handles shutdown.

The shutdown-aware channel send prevents blocking during protocol shutdown.


970-1006: LGTM: acquire() correctly synchronizes acquired state.

The acquiredMutex properly protects the read of acquired, and the channel receive correctly handles shutdown scenarios.


1008-1018: LGTM: release() properly synchronizes acquired state.

The mutex correctly guards the acquired field update.


1020-1041: LGTM: runQuery() correctly checks acquired state and handles shutdown.

The acquiredMutex properly protects the read, and the shutdown handling via closed channel check is correct.

protocol/txsubmission/server.go (8)

33-40: LGTM: Concurrency fields properly added to address data races.

The int32 ackCount with atomic operations, doneMutex for channel synchronization, and restart coordination via restartMutex correctly address the race conditions identified in previous reviews.


93-116: LGTM: Stop() correctly prevents double-close with atomic select-close.

The acquisition of doneMutex before the select-and-close operation prevents the double-close panic identified in previous reviews. The locking order (restartMutex → doneMutex) is consistent with handleDone().


118-130: LGTM: Helper methods provide safe state access.

The doneChan() and IsStopped() methods correctly encapsulate mutex-protected access to shared state.


132-196: LGTM: RequestTxIds correctly uses atomic operations and restart-safe channel access.

The atomic operations on ackCount and the restart-protected capture of requestTxIdsResultChan correctly address the data races identified in previous reviews. The shutdown handling is proper.


198-231: LGTM: RequestTxs properly handles restart and shutdown.

The restart-protected capture of requestTxsResultChan and shutdown-aware channel receive are correct.


254-284: LGTM: Reply handlers properly synchronize channel sends with restart.

The restartMutex correctly ensures channel sends target the current generation of result channels, preventing sends to stale channels during restart.


286-349: LGTM: handleDone restart sequence properly synchronized.

The restart logic correctly:

  • Prevents concurrent Stop() via restartMutex
  • Avoids double-close via atomic select-close under doneMutex
  • Checks stopped flag twice for TOCTOU protection
  • Starts new protocol outside the lock

Note: Any in-flight reply messages during the restart window may be lost as result channels are recreated, but this is likely acceptable for a protocol restart scenario.


54-56: LGTM: Buffered result channels support non-blocking restart signaling.

The capacity-1 buffers allow handleDone() to signal RequestTxIds non-blockingly (lines 298-304), which is essential for the restart flow.

@wolf31o2
Copy link
Member Author

This code has gotten a little out of control compared to the original performance improvements, which I moved to another PR. Closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants