-
Notifications
You must be signed in to change notification settings - Fork 21
perf(protocols): improve performance across all protocols #1244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughThis pull request introduces comprehensive lifecycle management and graceful shutdown synchronization across multiple protocol implementations in the Ouroboros network library. The changes add explicit Stop() methods, lifecycle state tracking (started/stopped flags), and shutdown-aware channel operations to client and server implementations for blockfetch, chainsync, leiosnotify, localtxmonitor, localtxsubmission, txsubmission, keepalive, leiosfetch, peersharing, and localstatequery protocols. The core protocol.go Stop() signature is updated to return an error, and all result-sending pathways are retrofitted with non-blocking selects that respect protocol shutdown signals. Supporting test files add shutdown and concurrency test cases to verify lifecycle correctness. Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Areas requiring extra attention:
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
protocol/txsubmission/server.go (1)
252-257: Add synchronization to prevent race condition when restarting protocol.The concern is valid.
RequestTxIds()(line 143) andRequestTxs()(line 179) both read from channels without any state checks or synchronization. WhenhandleDone()callsStop()at line 252, it closes these channels at lines 93-94. Between closing and recreation at lines 254-255, any call to these methods would read from a closed channel.A mutex protecting the Stop/reinit/Start sequence and guarding the RequestTxIds/RequestTxs channel reads is necessary. Alternatively, ensure the protocol state machine prevents these methods from being called during the transition window.
🧹 Nitpick comments (1)
protocol/localtxmonitor/client_test.go (1)
300-352: LGTM! Shutdown test validates explicit cleanup.The test properly verifies the client lifecycle with goroutine leak detection and timeout handling. It's consistent with shutdown tests added to other protocol implementations in this PR.
Minor observation: This test duplicates some setup logic from the existing
runTesthelper (lines 50-106). However, maintaining consistency with other shutdown tests in this PR is more valuable than eliminating this duplication.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
protocol/blockfetch/blockfetch.go(1 hunks)protocol/blockfetch/client.go(1 hunks)protocol/blockfetch/client_test.go(1 hunks)protocol/chainsync/chainsync.go(1 hunks)protocol/chainsync/client.go(1 hunks)protocol/chainsync/client_test.go(2 hunks)protocol/leiosnotify/client.go(1 hunks)protocol/localtxmonitor/client.go(1 hunks)protocol/localtxmonitor/client_test.go(1 hunks)protocol/localtxsubmission/client.go(1 hunks)protocol/localtxsubmission/client_test.go(1 hunks)protocol/txsubmission/server.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
protocol/localtxsubmission/client_test.go (7)
protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-264)protocol/chainsync/client_test.go (1)
TestClientShutdown(283-336)protocol/localtxmonitor/client_test.go (1)
TestClientShutdown(300-352)connection.go (3)
NewConnection(107-130)Connection(59-103)New(133-135)connection_options.go (2)
WithConnection(36-40)WithNetworkMagic(50-54)protocol/localtxsubmission/localtxsubmission.go (1)
LocalTxSubmission(67-70)protocol/localtxsubmission/client.go (1)
Client(26-34)
protocol/blockfetch/client_test.go (7)
protocol/chainsync/client_test.go (1)
TestClientShutdown(283-336)protocol/localtxmonitor/client_test.go (1)
TestClientShutdown(300-352)protocol/localtxsubmission/client_test.go (1)
TestClientShutdown(167-219)connection.go (3)
NewConnection(107-130)Connection(59-103)New(133-135)protocol/blockfetch/blockfetch.go (2)
New(156-162)BlockFetch(102-105)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/blockfetch/client.go (1)
Client(29-39)
protocol/localtxmonitor/client_test.go (6)
protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-264)protocol/chainsync/client_test.go (1)
TestClientShutdown(283-336)protocol/localtxsubmission/client_test.go (1)
TestClientShutdown(167-219)connection.go (3)
NewConnection(107-130)Connection(59-103)New(133-135)connection_options.go (2)
WithConnection(36-40)WithNetworkMagic(50-54)protocol/localtxmonitor/client.go (1)
Client(25-38)
protocol/txsubmission/server.go (3)
protocol/protocol.go (1)
Protocol(39-60)protocol/txsubmission/txsubmission.go (1)
ProtocolName(27-27)connection/id.go (1)
ConnectionId(22-25)
protocol/chainsync/client_test.go (4)
protocol/chainsync/chainsync.go (2)
ChainSync(204-207)New(259-267)protocol/chainsync/client.go (1)
Client(29-45)protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-264)connection.go (3)
NewConnection(107-130)Connection(59-103)New(133-135)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (12)
protocol/blockfetch/blockfetch.go (1)
122-122: LGTM: Default queue size increase improves buffering capacity.The increase from 256 to 384 aligns with the PR objective of improving performance through better buffering, while remaining well within the maximum limit of 512.
protocol/localtxsubmission/client_test.go (1)
167-219: LGTM: Shutdown test follows established patterns.The test properly verifies client lifecycle management and shutdown behavior, consistent with similar tests in other protocol clients.
protocol/chainsync/chainsync.go (1)
226-227: LGTM: Default increases improve pipeline capacity and buffering.Increasing both DefaultPipelineLimit and DefaultRecvQueueSize from 50 to 75 enhances throughput while staying within protocol limits.
protocol/chainsync/client.go (1)
146-146: LGTM: Explicit channel cleanup in Stop() simplifies lifecycle.Moving channel closure from an implicit shutdown goroutine to the explicit Stop() method makes resource cleanup deterministic and reduces goroutine overhead.
protocol/leiosnotify/client.go (1)
93-93: LGTM: Explicit cleanup consistent with lifecycle pattern.Closing requestNextChan in Stop() ensures blocked RequestNext() calls terminate gracefully with ErrProtocolShuttingDown.
protocol/blockfetch/client.go (1)
112-113: LGTM: Dual channel cleanup ensures complete shutdown.Explicitly closing both blockChan and startBatchResultChan in Stop() ensures any blocked GetBlock() or GetBlockRange() calls terminate properly.
protocol/chainsync/client_test.go (2)
83-86: LGTM: Cleanup prevents goroutine leaks in tests.Adding explicit client Stop() in runTest ensures proper cleanup after each test, preventing potential goroutine leaks.
283-336: LGTM: Comprehensive shutdown test validates lifecycle.The test properly verifies the client can be started and stopped without errors or goroutine leaks, consistent with similar tests across other protocols.
protocol/localtxsubmission/client.go (1)
102-102: LGTM: Explicit cleanup completes the lifecycle pattern.Closing submitResultChan in Stop() ensures blocked SubmitTx() calls terminate gracefully, completing the consistent cleanup pattern across all protocol clients.
protocol/localtxmonitor/client.go (1)
113-116: LGTM! Explicit channel cleanup in Stop().The explicit channel closures in
Stop()are well-synchronized viabusyMutex, which also protects all channel readers throughout the file. This eliminates the need for a background goroutine and provides more predictable cleanup behavior.protocol/blockfetch/client_test.go (1)
211-264: LGTM! Comprehensive shutdown test.The test properly validates the client lifecycle with goroutine leak detection, timeout handling, and error propagation. The structure is consistent with shutdown tests in other protocol implementations.
protocol/txsubmission/server.go (1)
85-96: LGTM! Explicit shutdown for server protocol.The
Stop()method follows the same explicit cleanup pattern adopted across other protocol implementations in this PR, closing result channels before stopping the underlying protocol.
bb59f79 to
beec783
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
protocol/txsubmission/server.go (2)
18-35: GuardStop()against double-close ondone
handleDone()now callss.Stop(), and the connection teardown path is likely to do the same. As written, the second caller panics onclose(s.done). We need an idempotent guard (e.g.,sync.Once) and to reset it when we create a fresh channel during restart.
Apply this diff:@@ -import ( - "errors" - "fmt" +import ( + "errors" + "fmt" + "sync" @@ type Server struct { *protocol.Protocol @@ - done chan struct{} + done chan struct{} + stopOnce sync.Once @@ func (s *Server) Stop() { s.Protocol.Logger(). Debug("stopping server protocol", "component", "network", "protocol", ProtocolName, "connection_id", s.callbackContext.ConnectionId.String(), ) - close(s.done) + s.stopOnce.Do(func() { + close(s.done) + }) s.Protocol.Stop() }Also applies to: 88-96
247-263: Avoid deadlocking onrequestTxIdsResultChanwhen handling DoneWhen the peer sends
Donewhile noRequestTxIdscall is outstanding, the current send blocks forever on the unbuffered channel, preventing restart. Switching to a non-blocking send (and resetting the newstopOnce) keeps shutdown smooth.
Apply this diff:@@ - s.requestTxIdsResultChan <- requestTxIdsResult{ - err: ErrStopServerProcess, - } + select { + case s.requestTxIdsResultChan <- requestTxIdsResult{err: ErrStopServerProcess}: + default: + } @@ - s.requestTxIdsResultChan = make(chan requestTxIdsResult) - s.requestTxsResultChan = make(chan []TxBody) - s.done = make(chan struct{}) + s.requestTxIdsResultChan = make(chan requestTxIdsResult) + s.requestTxsResultChan = make(chan []TxBody) + s.done = make(chan struct{}) + s.stopOnce = sync.Once{} s.ackCount = 0 s.Start()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
protocol/blockfetch/blockfetch.go(1 hunks)protocol/blockfetch/client.go(1 hunks)protocol/blockfetch/client_test.go(1 hunks)protocol/chainsync/chainsync.go(1 hunks)protocol/chainsync/client.go(1 hunks)protocol/chainsync/client_test.go(2 hunks)protocol/leiosnotify/client.go(1 hunks)protocol/localtxmonitor/client.go(1 hunks)protocol/localtxmonitor/client_test.go(1 hunks)protocol/localtxsubmission/client.go(1 hunks)protocol/localtxsubmission/client_test.go(1 hunks)protocol/txsubmission/server.go(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- protocol/chainsync/chainsync.go
- protocol/leiosnotify/client.go
- protocol/chainsync/client_test.go
- protocol/localtxsubmission/client_test.go
🧰 Additional context used
🧬 Code graph analysis (3)
protocol/localtxmonitor/client_test.go (7)
protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-264)protocol/chainsync/client_test.go (1)
TestClientShutdown(283-336)protocol/localtxsubmission/client_test.go (1)
TestClientShutdown(167-219)connection.go (3)
NewConnection(107-130)Connection(59-103)New(133-135)connection_options.go (2)
WithConnection(36-40)WithNetworkMagic(50-54)protocol/localtxmonitor/localtxmonitor.go (1)
LocalTxMonitor(112-115)protocol/localtxmonitor/client.go (1)
Client(25-38)
protocol/blockfetch/client_test.go (6)
protocol/chainsync/client_test.go (1)
TestClientShutdown(283-336)connection.go (3)
NewConnection(107-130)Connection(59-103)New(133-135)protocol/protocol.go (2)
ProtocolRoleClient(94-94)New(122-133)protocol/blockfetch/blockfetch.go (2)
New(156-162)BlockFetch(102-105)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/blockfetch/client.go (1)
Client(29-39)
protocol/txsubmission/server.go (2)
protocol/protocol.go (1)
Protocol(39-60)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (5)
protocol/blockfetch/blockfetch.go (1)
122-122: Recv queue bump looks safeThe new default of 384 still respects
MaxRecvQueueSize(512) and helps accommodate the larger buffers targeted by the PR. No issues here.protocol/localtxmonitor/client.go (1)
113-116: Orderly shutdown looks goodHolding
busyMutexbefore closing the result channels ensures no in-flight request is still writing, so the deterministic shutdown here reads well.protocol/localtxmonitor/client_test.go (1)
300-352: Shutdown test covers the new Stop pathThanks for adding parity shutdown coverage—the mock handshake plus async error watcher matches the other protocol suites and should catch future regressions.
protocol/localtxsubmission/client.go (1)
102-102: Channel closure under mutex looks correctLocking
busyMutexbefore closingsubmitResultChanprevents the handler goroutines from racing the close; the shutdown flow remains safe.protocol/blockfetch/client_test.go (1)
211-264: Shutdown regression test appreciatedThis mirrors the other protocol shutdown suites and will flag future lifecycle regressions quickly.
agaffney
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bot seems to think that it's a bad idea to move the channel close into the Stop() function on some of these protocols, and I'd tend to agree. The way it is now allows for cleaner async shutdown.
beec783 to
fe904df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
protocol/txsubmission/server.go (1)
250-273: Fix the silent error loss and thread-safety issues in the restart sequence.The non-blocking send to the unbuffered
requestTxIdsResultChan(lines 251-257) silently discards the error if no goroutine is actively blocked in the select statement at line 149. Any subsequentRequestTxIdscall after restart will receive from the newly created channel (line 267) and will not be aware that a Done message was processed. This violates the protocol's contract of notifying callers about the done state.Additionally, the restart sequence (lines 265-272) creates a thread-safety issue: reinitializing
onceStop = sync.Once{}(line 270) allowsStop()to execute again after being called at line 265, potentially triggering duplicate stop logic or partial restarts ifStop()is called concurrently during the window between lines 270-272. The channel reassignments (lines 267-269) are also not synchronized, risking data races if goroutines are blocked on sends/receives during restart.Recommended fixes:
- Use a buffered channel for
requestTxIdsResultChanor implement a callback-based notification mechanism to ensure no error loss- Synchronize the restart sequence with a lock to prevent concurrent
Stop()calls and unsafe channel reassignments- Add test coverage for restart scenarios, particularly with concurrent
RequestTxIdscalls andStop()invocations
♻️ Duplicate comments (1)
protocol/chainsync/client.go (1)
146-150: Acknowledge existing race condition flagged in previous review.The critical race condition between message handlers writing to
readyForNextBlockChanandStop()closing it has already been thoroughly documented in the previous review comment. This remains unresolved in the current changes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
protocol/blockfetch/blockfetch.go(1 hunks)protocol/blockfetch/client.go(1 hunks)protocol/blockfetch/client_test.go(1 hunks)protocol/chainsync/chainsync.go(1 hunks)protocol/chainsync/client.go(1 hunks)protocol/chainsync/client_test.go(2 hunks)protocol/leiosnotify/client.go(1 hunks)protocol/localtxmonitor/client.go(1 hunks)protocol/localtxmonitor/client_test.go(1 hunks)protocol/localtxsubmission/client.go(1 hunks)protocol/localtxsubmission/client_test.go(1 hunks)protocol/txsubmission/server.go(8 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- protocol/localtxsubmission/client_test.go
- protocol/blockfetch/blockfetch.go
- protocol/blockfetch/client_test.go
- protocol/chainsync/client_test.go
- protocol/blockfetch/client.go
- protocol/localtxmonitor/client_test.go
🧰 Additional context used
🧬 Code graph analysis (1)
protocol/txsubmission/server.go (3)
protocol/protocol.go (1)
Protocol(39-60)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/txsubmission/error.go (1)
ErrStopServerProcess(21-21)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (3)
protocol/chainsync/chainsync.go (1)
226-227: LGTM: Reasonable increase in default buffer sizes for better throughput.Increasing
DefaultPipelineLimitandDefaultRecvQueueSizefrom 50 to 75 aligns with the PR objectives to improve buffering and performance. The new values remain comfortably within protocol limits (max=100).protocol/txsubmission/server.go (2)
89-101: LGTM: Proper shutdown implementation with idempotent Stop().The
Stop()method correctly usesonceStopto ensure idempotent shutdown, closes thedonechannel to signal waiting operations, and stops the underlying protocol. The logging is appropriate for debugging.
148-161: LGTM: Proper shutdown handling in RequestTxIds.The updated
RequestTxIdscorrectly uses aselectstatement to wait on both the result channel and thedonechannel, ensuring the function returns promptly on shutdown. The validation ofresult.errand the update ofackCountare correct.
fe904df to
bc29c22
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
protocol/localtxmonitor/client.go (1)
113-120: Critical race condition: message handlers can write to closed channels.The cleanup goroutine closes result channels after
DoneChan()signals, but message handlers (handleAcquiredat line 266,handleReplyHasTxat line 279,handleReplyNextTxat line 292,handleReplyGetSizesat line 305) write to these channels without checkingDoneChan(). Between whenMsgDoneis sent (line 109) and when the protocol fully shuts down, handlers can still execute. Once the cleanup goroutine closes the channels, any in-flight handler writes will panic.Apply this fix to add DoneChan() checks in handlers:
func (c *Client) handleAcquired(msg protocol.Message) error { // ... logging ... msgAcquired := msg.(*MsgAcquired) c.acquired = true c.acquiredSlot = msgAcquired.SlotNo + select { + case <-c.DoneChan(): + return protocol.ErrProtocolShuttingDown + case c.acquireResultChan <- true: + } - c.acquireResultChan <- true return nil } func (c *Client) handleReplyHasTx(msg protocol.Message) error { // ... logging ... msgReplyHasTx := msg.(*MsgReplyHasTx) + select { + case <-c.DoneChan(): + return protocol.ErrProtocolShuttingDown + case c.hasTxResultChan <- msgReplyHasTx.Result: + } - c.hasTxResultChan <- msgReplyHasTx.Result return nil } func (c *Client) handleReplyNextTx(msg protocol.Message) error { // ... logging ... msgReplyNextTx := msg.(*MsgReplyNextTx) + select { + case <-c.DoneChan(): + return protocol.ErrProtocolShuttingDown + case c.nextTxResultChan <- msgReplyNextTx.Transaction.Tx: + } - c.nextTxResultChan <- msgReplyNextTx.Transaction.Tx return nil } func (c *Client) handleReplyGetSizes(msg protocol.Message) error { // ... logging ... msgReplyGetSizes := msg.(*MsgReplyGetSizes) + select { + case <-c.DoneChan(): + return protocol.ErrProtocolShuttingDown + case c.getSizesResultChan <- msgReplyGetSizes.Result: + } - c.getSizesResultChan <- msgReplyGetSizes.Result return nil }protocol/chainsync/client.go (1)
146-150: Critical: Handlers write to channel without synchronization.Message handlers
handleRollForward(lines 721, 728) andhandleRollBackward(lines 752, 760) write toreadyForNextBlockChanwithout checkingDoneChan(). Between whenStop()sendsMsgDone(line 142) and when the protocol shuts down, handlers can still execute. Once the cleanup goroutine closes the channel, handler writes will panic.Apply this fix to add DoneChan() checks before channel writes:
func (c *Client) handleRollForward(msgGeneric protocol.Message) error { // ... existing logic ... if callbackErr != nil { if errors.Is(callbackErr, ErrStopSyncProcess) { - c.readyForNextBlockChan <- false + select { + case <-c.DoneChan(): + case c.readyForNextBlockChan <- false: + } return nil } else { return callbackErr } } - c.readyForNextBlockChan <- true + select { + case <-c.DoneChan(): + return protocol.ErrProtocolShuttingDown + case c.readyForNextBlockChan <- true: + } return nil } func (c *Client) handleRollBackward(msgGeneric protocol.Message) error { // ... existing logic ... if callbackErr := c.config.RollBackwardFunc(c.callbackContext, msgRollBackward.Point, msgRollBackward.Tip); callbackErr != nil { if errors.Is(callbackErr, ErrStopSyncProcess) { - c.readyForNextBlockChan <- false + select { + case <-c.DoneChan(): + case c.readyForNextBlockChan <- false: + } return nil } else { return callbackErr } } - c.readyForNextBlockChan <- true + select { + case <-c.DoneChan(): + return protocol.ErrProtocolShuttingDown + case c.readyForNextBlockChan <- true: + } return nil }protocol/leiosnotify/client.go (1)
93-97: Critical race condition: message handlers can write to closed channel.The cleanup goroutine closes
requestNextChanafterDoneChan(), but message handlers (handleBlockAnnouncementat line 137,handleBlockOfferat line 142,handleBlockTxsOfferat line 147,handleVotesOfferat line 152) write to this channel without checking for shutdown. Between whenMsgDoneis sent (line 91) and when the protocol fully shuts down, handlers can still execute and write to the channel. Once the channel is closed, any handler write will panic.Apply this fix to add DoneChan() checks in all handlers:
func (c *Client) handleBlockAnnouncement(msg protocol.Message) error { + select { + case <-c.DoneChan(): + return protocol.ErrProtocolShuttingDown + case c.requestNextChan <- msg: + } - c.requestNextChan <- msg return nil } func (c *Client) handleBlockOffer(msg protocol.Message) error { + select { + case <-c.DoneChan(): + return protocol.ErrProtocolShuttingDown + case c.requestNextChan <- msg: + } - c.requestNextChan <- msg return nil } func (c *Client) handleBlockTxsOffer(msg protocol.Message) error { + select { + case <-c.DoneChan(): + return protocol.ErrProtocolShuttingDown + case c.requestNextChan <- msg: + } - c.requestNextChan <- msg return nil } func (c *Client) handleVotesOffer(msg protocol.Message) error { + select { + case <-c.DoneChan(): + return protocol.ErrProtocolShuttingDown + case c.requestNextChan <- msg: + } - c.requestNextChan <- msg return nil }
🧹 Nitpick comments (4)
protocol/localtxsubmission/client_test.go (1)
167-219: Good baseline shutdown test.The test validates basic Start/Stop lifecycle and goroutine cleanup. Consider adding a follow-up test that verifies shutdown behavior during active operations (e.g., pending SubmitTx call) to ensure graceful handling.
protocol/chainsync/client_test.go (1)
283-336: Baseline shutdown test is sound.The test validates Start/Stop lifecycle and goroutine cleanup. Consider adding follow-up tests that verify shutdown behavior during active sync operations to ensure graceful handling of in-flight requests.
protocol/localtxmonitor/client_test.go (1)
300-352: Shutdown test follows established pattern.The test validates basic lifecycle and leak detection. Consider adding tests that verify shutdown during active operations (e.g., pending
HasTx,NextTx, orGetSizescalls) to ensure graceful handling.protocol/txsubmission/server.go (1)
149-162: Consider select case ordering during shutdown.The
selectstatement doesn't guarantee priority between cases. If bothrequestTxIdsResultChananddoneare ready, Go randomly chooses one. This means a valid result might be discarded in favor of returningErrProtocolShuttingDown.This behavior is likely acceptable—once shutdown is initiated, returning an error is reasonable. However, if you want to prioritize draining pending results before acknowledging shutdown, you could check the result channel first with a non-blocking receive, then fall back to the blocking select.
Current behavior is reasonable for most use cases, so this is optional.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
protocol/blockfetch/blockfetch.go(1 hunks)protocol/blockfetch/client.go(1 hunks)protocol/blockfetch/client_test.go(1 hunks)protocol/chainsync/chainsync.go(1 hunks)protocol/chainsync/client.go(1 hunks)protocol/chainsync/client_test.go(2 hunks)protocol/leiosnotify/client.go(1 hunks)protocol/localtxmonitor/client.go(1 hunks)protocol/localtxmonitor/client_test.go(1 hunks)protocol/localtxsubmission/client.go(1 hunks)protocol/localtxsubmission/client_test.go(1 hunks)protocol/txsubmission/server.go(8 hunks)
✅ Files skipped from review due to trivial changes (1)
- protocol/blockfetch/blockfetch.go
🚧 Files skipped from review as they are similar to previous changes (2)
- protocol/localtxsubmission/client.go
- protocol/blockfetch/client_test.go
🧰 Additional context used
🧬 Code graph analysis (4)
protocol/chainsync/client_test.go (4)
protocol/chainsync/chainsync.go (3)
ChainSync(204-207)New(259-267)NewConfig(273-295)protocol/chainsync/client.go (1)
Client(29-45)protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-264)connection.go (3)
NewConnection(107-130)Connection(59-103)New(133-135)
protocol/txsubmission/server.go (3)
protocol/protocol.go (1)
Protocol(39-60)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/txsubmission/error.go (1)
ErrStopServerProcess(21-21)
protocol/localtxmonitor/client_test.go (6)
protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-264)protocol/chainsync/client_test.go (1)
TestClientShutdown(283-336)protocol/localtxsubmission/client_test.go (1)
TestClientShutdown(167-219)connection.go (3)
NewConnection(107-130)Connection(59-103)New(133-135)connection_options.go (2)
WithConnection(36-40)WithNetworkMagic(50-54)protocol/localtxmonitor/client.go (1)
Client(25-38)
protocol/localtxsubmission/client_test.go (3)
connection.go (3)
NewConnection(107-130)Connection(59-103)New(133-135)connection_options.go (2)
WithConnection(36-40)WithNetworkMagic(50-54)protocol/localtxsubmission/client.go (1)
Client(26-34)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (9)
protocol/blockfetch/client.go (1)
112-117: Shutdown pattern correctly implemented.The deferred channel cleanup is safe because all message handlers (
handleStartBatchat lines 226-230,handleNoBlocksat lines 244-249, andhandleBlockat lines 281-286) checkDoneChan()before writing to channels, preventing writes to closed channels.protocol/chainsync/client_test.go (1)
83-86: Good cleanup addition.Adding
client.Stop()in the test teardown ensures proper cleanup of goroutines and prevents resource leaks in tests.protocol/chainsync/chainsync.go (1)
226-227: Performance tuning looks reasonable.Increasing default queue sizes from 50 to 75 (still within protocol max of 100) aligns with the PR's performance improvement objectives. The values remain compliant with protocol limits and validation logic.
protocol/txsubmission/server.go (6)
35-37: LGTM! Appropriate synchronization primitives for shutdown and restart.The new fields provide the necessary synchronization for graceful shutdown and restart:
donesignals shutdown to waiting operationsonceStopensures idempotentStop()behaviorrestartMutexprotects the restart sequence
51-53: Good performance improvement with buffered channels.Buffering the result channels with capacity 1 decouples producers and consumers, allowing message handlers to send results without blocking. This aligns with the PR's performance optimization objectives.
80-88: LGTM! Cleanup goroutine elimination aligns with PR objectives.Moving cleanup from
Start()toStop()simplifies the code and eliminates an unnecessary goroutine, as noted in the PR description.
187-194: LGTM! Proper shutdown handling for RequestTxs.The shutdown handling correctly checks channel closure and returns
ErrProtocolShuttingDownappropriately.
254-261: Good defensive pattern with non-blocking send.The non-blocking send to
requestTxIdsResultChanproperly prevents deadlock during restart. IfRequestTxIdsis waiting, it receives the stop signal; if not, the restart proceeds without blocking.
281-285: LGTM! Complete state reinitialization for restart.The restart sequence properly reinitializes all stateful fields:
- Result channels recreated with appropriate buffer sizes
- Fresh
donechannel for the new protocol instanceonceStopreset allows the restarted server to be stoppedackCountreset is correct for a new session
bc29c22 to
41df945
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
protocol/txsubmission/server.go (1)
90-102: Critical race condition: concurrent close ofdonechannel (duplicate).Both
Stop()(line 99) andhandleDone()(line 278) attempt to closes.donewithout coordination:
Stop()usesonceStopbut doesn't acquirerestartMutexhandleDone()acquiresrestartMutexbut closesdoneunconditionally- If called concurrently, both may attempt to close
s.done, causing a panicThe suggested fix from the previous review is still applicable:
- In Stop(): Acquire
restartMutexbefore closingdone:func (s *Server) Stop() { s.onceStop.Do(func() { + s.restartMutex.Lock() + defer s.restartMutex.Unlock() s.Protocol.Logger(). Debug("stopping server protocol", "component", "network", "protocol", ProtocolName, "connection_id", s.callbackContext.ConnectionId.String(), ) close(s.done) s.Protocol.Stop() }) }
- In handleDone() (line 278): Guard the close:
- close(s.done) + select { + case <-s.done: + // Already closed by Stop() + default: + close(s.done) + }
🧹 Nitpick comments (4)
protocol/peersharing/client_test.go (2)
72-76: Consider validating the value received from ErrorChan().The test discards the value received from
ErrorChan()without checking if an error occurred during shutdown. Consider capturing and validating the received value to ensure clean shutdown.Apply this diff to validate the shutdown:
// Wait for connection shutdown select { - case <-oConn.ErrorChan(): + case err := <-oConn.ErrorChan(): + if err != nil { + t.Errorf("unexpected error during shutdown: %s", err) + } case <-time.After(10 * time.Second): t.Errorf("did not shutdown within timeout") }
56-66: Consider testing shutdown after protocol operations.While testing basic shutdown is valuable, performing actual PeerSharing protocol operations before shutdown would better validate the cleanup improvements mentioned in the PR (deferred cleanup, eliminated goroutines). This would ensure that active protocol state is properly cleaned up.
protocol/txsubmission/server_test.go (1)
28-30: Track the mock server limitation as a follow-up issue.The test is skipped due to mock server issues with the NtN protocol, which means the server shutdown behavior isn't currently being validated. Consider creating a GitHub issue to track fixing the mock server so this test can be enabled.
Would you like me to help draft an issue description for tracking the mock server limitation?
protocol/localstatequery/client_test.go (1)
357-409: Consider adding Done message to mock conversation for completeness.The test verifies that
Stop()can be called without error, but the mock conversation (lines 361-364) doesn't include an expectedDonemessage from the client. SinceStop()sendsMsgDone(as implemented inclient.goline 127), consider adding this to the mock conversation:mockConn := ouroboros_mock.NewConnection( ouroboros_mock.ProtocolRoleClient, []ouroboros_mock.ConversationEntry{ ouroboros_mock.ConversationEntryHandshakeRequestGeneric, ouroboros_mock.ConversationEntryHandshakeNtCResponse, ouroboros_mock.ConversationEntryInput{ ProtocolId: localstatequery.ProtocolId, MessageType: localstatequery.MessageTypeDone, }, }, )This would make the test more explicit about what's expected during shutdown and catch any regressions where the Done message isn't sent.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
protocol/blockfetch/blockfetch.go(1 hunks)protocol/blockfetch/client.go(4 hunks)protocol/blockfetch/client_test.go(1 hunks)protocol/chainsync/chainsync.go(1 hunks)protocol/chainsync/client.go(3 hunks)protocol/chainsync/client_test.go(2 hunks)protocol/keepalive/client_test.go(1 hunks)protocol/leiosnotify/client.go(2 hunks)protocol/leiosnotify/client_test.go(1 hunks)protocol/localstatequery/client.go(5 hunks)protocol/localstatequery/client_test.go(1 hunks)protocol/localtxmonitor/client.go(5 hunks)protocol/localtxmonitor/client_test.go(1 hunks)protocol/localtxsubmission/client.go(3 hunks)protocol/localtxsubmission/client_test.go(1 hunks)protocol/peersharing/client_test.go(1 hunks)protocol/peersharing/server_test.go(1 hunks)protocol/txsubmission/server.go(8 hunks)protocol/txsubmission/server_test.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- protocol/localtxmonitor/client_test.go
- protocol/chainsync/client.go
- protocol/chainsync/client_test.go
- protocol/blockfetch/blockfetch.go
- protocol/blockfetch/client_test.go
- protocol/localtxsubmission/client_test.go
🔇 Additional comments (25)
protocol/peersharing/client_test.go (1)
28-29: Good use of goleak for leak detection.The
defer goleak.VerifyNone(t)ensures no goroutine leaks occur during the test, which aligns well with the PR's objective of eliminating unnecessary goroutines and improving cleanup.protocol/txsubmission/server_test.go (1)
31-80: Well-structured shutdown test.The test correctly validates the shutdown sequence with proper error handling, timeouts, and cleanup. The pattern of monitoring async errors, starting/stopping the server, and verifying orderly shutdown is sound.
protocol/keepalive/client_test.go (1)
241-292: LGTM! Client shutdown test is well-structured.The test follows the same robust pattern as the server tests, with proper async error monitoring, timeout handling, and cleanup. Unlike the server tests, this one isn't skipped, so it will actively validate the client shutdown behavior.
protocol/chainsync/chainsync.go (1)
226-227: Appropriate default increases for better buffering.The increase from 50 to 75 for both
DefaultPipelineLimitandDefaultRecvQueueSizealigns with the PR's performance objectives. The new defaults at 75% of the maximum limits (100) provide better buffering while maintaining a reasonable margin.protocol/leiosnotify/client.go (2)
93-97: Proper deferred channel cleanup.The change to defer closing
requestNextChanuntil afterDoneChan()fires correctly addresses the race condition mentioned in past reviews. This ensures handlers complete before the channel is closed.
137-169: Shutdown-aware message handling is correctly implemented.All four message handlers (
handleBlockAnnouncement,handleBlockOffer,handleBlockTxsOffer,handleVotesOffer) consistently use the select pattern to checkDoneChan()before writing torequestNextChan, preventing panics from writes to closed channels during shutdown.protocol/localtxmonitor/client.go (2)
113-120: Correct deferred cleanup for all result channels.The goroutine properly waits for
DoneChan()before closing all four result channels (acquireResultChan,hasTxResultChan,nextTxResultChan,getSizesResultChan), ensuring handlers complete before cleanup.
266-321: All handlers correctly implement shutdown guards.All four message handlers consistently use the select pattern with
DoneChan()to prevent writes to result channels during shutdown, addressing the race condition flagged in previous reviews.protocol/blockfetch/client.go (3)
112-117: Deferred channel cleanup correctly implemented.The change to defer closing
blockChanandstartBatchResultChanuntil afterDoneChan()fires addresses the race condition from the previous review, preventing panics when responses arrive afterStop()is called.
225-230: Shutdown-aware batch start handling.The select statement properly guards the send to
startBatchResultChan, returningErrProtocolShuttingDownif the protocol is shutting down.
242-301: Proper shutdown handling in block and error paths.Both
handleNoBlocksandhandleBlockcorrectly implement shutdown checks:
handleNoBlockscreates the error before the select, then guards the sendhandleBlockchecks for shutdown before callback processing (lines 278-282) and uses select for channel sends in the non-callback path (lines 297-301)protocol/localtxsubmission/client.go (2)
102-106: Deferred channel cleanup prevents race condition.The goroutine correctly waits for
DoneChan()before closingsubmitResultChan, ensuring that handlers complete before the channel is closed.
158-187: Handlers correctly use select to eliminate TOCTOU race.Both
handleAcceptTxandhandleRejectTxuse the select pattern withDoneChan(), which atomically checks for shutdown while attempting to send tosubmitResultChan. This eliminates the time-of-check-time-of-use race condition mentioned in previous reviews.protocol/peersharing/server_test.go (2)
28-30: Track mock server limitation alongside TxSubmission test.This test is skipped for the same reason as
protocol/txsubmission/server_test.go. These should be tracked together in a single issue for fixing the mock server's NtN protocol support.
31-78: Consistent and well-structured server shutdown test.The test follows the same robust pattern as other shutdown tests in this PR, with proper async error monitoring, timeout handling, and cleanup. The consistent pattern across the codebase makes the tests easier to understand and maintain.
protocol/localstatequery/client.go (5)
99-115: LGTM! Deferred cleanup improves shutdown reliability.The goroutine that closes result channels on protocol shutdown ensures that callers waiting on these channels will unblock when the protocol stops. This prevents resource leaks and aligns with the PR's goal of improving shutdown handling.
117-131: LGTM! Idempotent Stop() with proper error handling.The Stop() method correctly uses
onceStopto ensure it only executes once, even if called multiple times. Returning the error fromSendMessageallows callers to handle cases where the message cannot be sent (e.g., if the protocol is already shutting down).
887-903: LGTM! Shutdown-aware sending prevents blocking.Setting
acquired = truebefore the select statement ensures state is updated consistently. The select statement properly checks for shutdown before sending toacquireResultChan, returningprotocol.ErrProtocolShuttingDownif the protocol is shutting down.
905-931: LGTM! Consistent shutdown handling in failure cases.Both failure cases now use shutdown-aware sending via select statements, preventing blocked sends during protocol shutdown. This ensures that failure handling respects the protocol lifecycle.
933-948: LGTM! Optimized result extraction with shutdown awareness.Extracting the
MsgResultbefore the select statement is a minor optimization that improves readability. The shutdown-aware send pattern is consistent with the other handlers.protocol/txsubmission/server.go (5)
27-38: LGTM! New fields support shutdown and restart coordination.The added fields (
done,onceStop,restartMutex) provide the necessary infrastructure for graceful shutdown and protocol restart. This follows established patterns for lifecycle management.
46-61: LGTM! Buffered result channels improve shutdown handling.The buffered result channels (capacity 1) allow handlers to send results without blocking, which is particularly useful during shutdown. The unbuffered
donechannel is appropriate for signaling.
104-163: LGTM! Shutdown-aware request handling with proper state updates.The select statement properly handles both results and shutdown signals. The
ackCountupdate at line 158 correctly occurs after successfully receiving results, ensuring consistency.
165-195: LGTM! Performance optimization and shutdown handling.The pre-allocation of
txString(lines 167-172) is a nice performance optimization for logging. The shutdown-aware select statement properly handles both the done signal and channel closure.
246-288: LGTM! Comprehensive restart logic with proper state reinitialization.The non-blocking send (lines 254-261) ensures
RequestTxIdsis signaled without blocking. The restart sequence properly:
- Acquires
restartMutexfor coordination- Stops the current protocol
- Recreates channels with appropriate buffering
- Resets
onceStopto allow future Stop() calls- Reinitializes all state
Note: The race condition with
close(s.done)at line 278 was flagged in the Stop() method review.
41df945 to
af11acf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
protocol/localstatequery/client.go (1)
887-903: Potential data race onc.acquiredbetween handler and callers
c.acquiredis written here inhandleAcquired()(called from the protocol’s message-handling goroutine) and read inacquire()/runQuery()without any shared synchronization. That gives you a likely data race under the Go race detector, and in the worst case could cause a caller to take the wrongAcquirevsReAcquirepath under contention.Since
busyMutexis held by the caller whileacquire()waits onc.acquireResultChan, you can’t safely reusebusyMutexhere without risking deadlock. Consider makingacquiredan atomic value or guarding it with a small dedicated mutex (read+write on both the call path and inhandleAcquired()/release()).
🧹 Nitpick comments (2)
protocol/localstatequery/client.go (1)
43-44: Stop() lifecycle guard and deferred cleanup look solidThe new
onceStopguard plusStop()that sendsMsgDoneand then closesqueryResultChan/acquireResultChanafter<-c.DoneChan()matches the shutdown pattern used in other protocol clients and avoids the earlier eager cleanup inStart(). This should reduce goroutine churn and races around channel closing.One small nit: if
DoneChan()never closes (e.g., protocol was never started or gets wedged), the goroutine spawned inStop()will park forever. Probably acceptable in practice, but if you want to be extra defensive you could gate the goroutine ononceStarthaving run or add a timeout around the wait.Also applies to: 99-131
protocol/txsubmission/server.go (1)
256-263: handleDone signaling and restart flow are well‑structuredThe non‑blocking send of
ErrStopServerProcessintorequestTxIdsResultChanplus invokingDoneFuncand then restarting underrestartMutex(including guarded close ofdone,Protocol.Stop(),initProtocol(), re‑creating channels, resettingdoneandonceStop, zeroingackCount, andStart()) gives a coherent restart story and avoids the previous concurrentdoneclose issue.One minor consideration:
s.Start()currently runs whilerestartMutexis held. If you ever wantStop()calls not to block behind a potentially slow restart, you could move theStart()call outside the locked section, once all fields are safely reinitialized. Not critical, but might improve responsiveness under heavy load.Also applies to: 271-292
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
protocol/blockfetch/blockfetch.go(1 hunks)protocol/blockfetch/client.go(4 hunks)protocol/blockfetch/client_test.go(1 hunks)protocol/chainsync/chainsync.go(1 hunks)protocol/chainsync/client.go(3 hunks)protocol/chainsync/client_test.go(2 hunks)protocol/keepalive/client_test.go(1 hunks)protocol/leiosnotify/client.go(2 hunks)protocol/leiosnotify/client_test.go(1 hunks)protocol/localstatequery/client.go(5 hunks)protocol/localstatequery/client_test.go(1 hunks)protocol/localtxmonitor/client.go(5 hunks)protocol/localtxmonitor/client_test.go(1 hunks)protocol/localtxsubmission/client.go(3 hunks)protocol/localtxsubmission/client_test.go(1 hunks)protocol/peersharing/client.go(4 hunks)protocol/peersharing/client_test.go(1 hunks)protocol/peersharing/server_test.go(1 hunks)protocol/txsubmission/server.go(8 hunks)protocol/txsubmission/server_test.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
- protocol/chainsync/chainsync.go
- protocol/localtxsubmission/client.go
- protocol/peersharing/server_test.go
- protocol/leiosnotify/client.go
- protocol/peersharing/client_test.go
- protocol/chainsync/client_test.go
- protocol/chainsync/client.go
- protocol/blockfetch/client_test.go
- protocol/txsubmission/server_test.go
🧰 Additional context used
🧬 Code graph analysis (10)
protocol/localtxmonitor/client_test.go (4)
protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-264)connection.go (3)
NewConnection(107-130)Connection(59-103)New(133-135)connection_options.go (2)
WithConnection(36-40)WithNetworkMagic(50-54)protocol/localtxmonitor/client.go (1)
Client(25-38)
protocol/keepalive/client_test.go (5)
protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-264)protocol/peersharing/client_test.go (1)
TestClientShutdown(28-81)connection.go (3)
NewConnection(107-130)Connection(59-103)New(133-135)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/keepalive/keepalive.go (1)
KeepAlive(85-88)
protocol/localstatequery/client.go (5)
protocol/protocol.go (1)
Protocol(39-60)protocol/localstatequery/localstatequery.go (1)
ProtocolName(28-28)protocol/localstatequery/messages.go (3)
NewMsgDone(245-252)AcquireFailurePointNotOnChain(44-44)MsgResult(172-175)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/localstatequery/error.go (2)
ErrAcquireFailurePointTooOld(20-20)ErrAcquireFailurePointNotOnChain(23-25)
protocol/localtxsubmission/client_test.go (4)
protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-264)connection.go (3)
NewConnection(107-130)Connection(59-103)New(133-135)connection_options.go (2)
WithConnection(36-40)WithNetworkMagic(50-54)protocol/localtxsubmission/client.go (1)
Client(26-34)
protocol/blockfetch/client.go (4)
protocol/blockfetch/blockfetch.go (1)
New(156-162)protocol/protocol.go (1)
New(122-133)muxer/muxer.go (1)
New(90-117)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/txsubmission/server.go (3)
protocol/protocol.go (1)
Protocol(39-60)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/txsubmission/error.go (1)
ErrStopServerProcess(21-21)
protocol/peersharing/client.go (3)
protocol/protocol.go (1)
Protocol(39-60)protocol/peersharing/peersharing.go (1)
ProtocolName(27-27)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/localtxmonitor/client.go (1)
protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/leiosnotify/client_test.go (4)
protocol/versiondata.go (6)
VersionData(40-46)VersionDataNtN13andUp(143-145)VersionDataNtN11to12(116-122)DiffusionModeInitiatorOnly(21-21)PeerSharingModeNoPeerSharing(27-27)QueryModeDisabled(36-36)protocol/handshake/messages.go (1)
NewMsgAcceptVersion(88-102)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/leiosnotify/client.go (1)
Client(24-31)
protocol/localstatequery/client_test.go (4)
protocol/localtxmonitor/client_test.go (1)
TestClientShutdown(300-352)connection.go (3)
NewConnection(107-130)Connection(59-103)New(133-135)connection_options.go (2)
WithConnection(36-40)WithNetworkMagic(50-54)protocol/localstatequery/client.go (1)
Client(30-44)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (14)
protocol/peersharing/client.go (3)
72-83: LGTM: Clean idempotent start implementation.The Start() method correctly uses sync.Once to ensure single initialization, includes appropriate logging for observability, and delegates to the underlying Protocol.Start(). The implementation follows the lifecycle pattern described in the PR objectives.
147-151: LGTM: Shutdown-aware channel send prevents blocking.The select statement correctly checks DoneChan before sending to sharePeersChan, returning
protocol.ErrProtocolShuttingDownduring shutdown instead of blocking. This integrates well with the deferred channel closure in Stop() and aligns with the PR's goal of consistent shutdown handling across protocols.
85-102: Code pattern is sound; error handling concern was based on misreading Protocol.Stop() signature.The shutdown flow is properly implemented:
- Protocol.Stop() returns void (not error), so the concern about error propagation is incorrect.
- The goroutine at lines 95-98 will reliably close when Protocol.Stop() → Muxer.UnregisterProtocol() closes the receive channel, causing the Protocol's loops to exit and the sentinel goroutine (Protocol.Start lines 161-165) to close DoneChan.
- No indefinite goroutine leak: the cleanup chain closes recvChan → protocol loops exit → sentinel closes doneChan → Client goroutine unblocks and closes sharePeersChan.
The async shutdown design is acceptable; Client.Stop() correctly returns nil since the actual cleanup happens asynchronously.
protocol/localtxmonitor/client_test.go (1)
300-352: LGTM!The shutdown test follows the established pattern and properly validates error handling, timeout behavior, and goroutine cleanup.
protocol/localtxmonitor/client.go (1)
113-120: Deferred cleanup correctly addresses race condition.The goroutine now waits for
DoneChan()before closing result channels, ensuring no handlers attempt to send after closure. Combined with the shutdown-aware selects in handlers (lines 266-270, 283-287, 300-304, 317-321), this eliminates the panic risk flagged in past reviews.protocol/leiosnotify/client_test.go (1)
56-106: Good fix: test is no longer skipped.The protocol initialization issues mentioned in past reviews have been resolved. The test now runs and properly validates LeiosNotify client shutdown behavior with appropriate handshake configuration (NtN version 15).
protocol/blockfetch/blockfetch.go (1)
122-122: LGTM!Increasing
DefaultRecvQueueSizefrom 256 to 384 aligns with the PR's performance objectives by providing better buffering capacity while remaining well within theMaxRecvQueueSizelimit of 512.protocol/localtxsubmission/client_test.go (1)
167-219: LGTM!The shutdown test properly validates error handling, mock connection teardown, and timeout behavior consistent with the established testing pattern across other protocols.
protocol/localstatequery/client_test.go (1)
357-409: LGTM!The shutdown test follows the established pattern with proper error handling and timeout-based validation of orderly client shutdown.
protocol/blockfetch/client.go (1)
112-117: Deferred cleanup correctly prevents panics.The goroutine now waits for
DoneChan()before closing channels, ensuring message handlers (lines 225-229, 242-247, 297-301) cannot send to closed channels. This addresses the race condition flagged in past reviews.protocol/localstatequery/client.go (2)
905-931: Shutdown‑aware failure handling looks correctWrapping the sends to
c.acquireResultChanin aselectonc.DoneChan()means the handler now cleanly returnsprotocol.ErrProtocolShuttingDowninstead of risking a send on a channel that will soon be closed. This aligns with the rest of the PR’s shutdown handling and looks good.
933-947: Result handling now safely coordinates with shutdownDecoding
MsgResultup front and then using aselectonc.DoneChan()vsc.queryResultChan <- msgResult.Resultensures callers either see the result or a consistentprotocol.ErrProtocolShuttingDownwhen the protocol is stopping, avoiding sends into a channel closed byStop(). This change is straightforward and sound.protocol/txsubmission/server.go (2)
20-21: Server lifecycle fields and Stop() coordination look goodIntroducing
done,onceStop, andrestartMutexonServer, initializing them inNewServer, and havingStop()acquirerestartMutexbefore closingdoneand callingProtocol.Stop()gives a clear, idempotent shutdown path and avoids the earlier double‑close race ondone. This setup aligns well with the restart logic inhandleDone()and with the rest of the PR’s shutdown model.Also applies to: 26-38, 45-54, 80-104
188-196: RequestTxs: shutdown‑aware select is correctThe updated
selectons.donevss.requestTxsResultChanensures callers either seeprotocol.ErrProtocolShuttingDownor the Tx bodies, and gracefully handles a closed result channel. This matches the shutdown strategy used elsewhere and looks correct.
af11acf to
ddbf172
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
protocol/keepalive/client_test.go (1)
241-294: LGTM! Shutdown test properly validates client lifecycle.The test correctly:
- Creates a mock connection and wires async error monitoring
- Starts and stops the KeepAlive client
- Checks the
Stop()error (addressing the past review comment)- Verifies shutdown within timeouts
- Uses
goleak.VerifyNone()to detect goroutine leaks
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (23)
protocol/blockfetch/blockfetch.go(1 hunks)protocol/blockfetch/client.go(6 hunks)protocol/blockfetch/client_test.go(1 hunks)protocol/chainsync/chainsync.go(1 hunks)protocol/chainsync/client.go(5 hunks)protocol/chainsync/client_test.go(2 hunks)protocol/keepalive/client.go(2 hunks)protocol/keepalive/client_test.go(1 hunks)protocol/leiosfetch/client.go(2 hunks)protocol/leiosnotify/client.go(4 hunks)protocol/leiosnotify/client_test.go(1 hunks)protocol/localstatequery/client.go(6 hunks)protocol/localstatequery/client_test.go(1 hunks)protocol/localtxmonitor/client.go(7 hunks)protocol/localtxmonitor/client_test.go(1 hunks)protocol/localtxsubmission/client.go(5 hunks)protocol/localtxsubmission/client_test.go(1 hunks)protocol/peersharing/client.go(4 hunks)protocol/peersharing/client_test.go(1 hunks)protocol/peersharing/server.go(1 hunks)protocol/peersharing/server_test.go(1 hunks)protocol/txsubmission/server.go(8 hunks)protocol/txsubmission/server_test.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- protocol/chainsync/chainsync.go
- protocol/peersharing/server_test.go
- protocol/localstatequery/client_test.go
- protocol/txsubmission/server_test.go
- protocol/chainsync/client_test.go
- protocol/blockfetch/blockfetch.go
- protocol/leiosnotify/client_test.go
🧰 Additional context used
🧬 Code graph analysis (15)
protocol/peersharing/server.go (1)
protocol/message.go (1)
Message(18-22)
protocol/localtxsubmission/client.go (1)
protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/localtxsubmission/client_test.go (8)
protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-264)protocol/chainsync/client_test.go (1)
TestClientShutdown(283-336)protocol/localstatequery/client_test.go (1)
TestClientShutdown(357-409)protocol/localtxmonitor/client_test.go (1)
TestClientShutdown(300-352)connection.go (3)
NewConnection(107-130)Connection(59-103)New(133-135)connection_options.go (2)
WithConnection(36-40)WithNetworkMagic(50-54)protocol/localtxsubmission/localtxsubmission.go (1)
LocalTxSubmission(67-70)protocol/localtxsubmission/client.go (1)
Client(26-35)
protocol/localtxmonitor/client.go (2)
protocol/error.go (1)
ErrProtocolShuttingDown(19-19)ledger/tx.go (1)
Transaction(26-26)
protocol/blockfetch/client_test.go (5)
protocol/keepalive/client_test.go (1)
TestClientShutdown(241-294)connection.go (3)
NewConnection(107-130)Connection(59-103)New(133-135)protocol/blockfetch/blockfetch.go (2)
New(156-162)BlockFetch(102-105)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/blockfetch/client.go (1)
Client(29-40)
protocol/leiosnotify/client.go (2)
protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/blockfetch/client.go (1)
Client(29-40)
protocol/keepalive/client.go (4)
protocol/protocol.go (1)
Protocol(39-60)protocol/keepalive/keepalive.go (1)
ProtocolName(27-27)connection/id.go (1)
ConnectionId(22-25)protocol/keepalive/messages.go (1)
NewMsgDone(94-101)
protocol/chainsync/client.go (1)
protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/blockfetch/client.go (3)
protocol/blockfetch/blockfetch.go (1)
New(156-162)protocol/protocol.go (1)
New(122-133)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/peersharing/client_test.go (4)
connection.go (2)
NewConnection(107-130)Connection(59-103)protocol/protocol.go (1)
ProtocolRoleClient(94-94)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/peersharing/client.go (1)
Client(25-33)
protocol/localstatequery/client.go (5)
protocol/protocol.go (1)
Protocol(39-60)protocol/localtxmonitor/client.go (1)
Client(25-39)protocol/localstatequery/messages.go (4)
NewMsgDone(245-252)AcquireFailurePointNotOnChain(44-44)MsgResult(172-175)NewMsgQuery(160-170)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/localstatequery/error.go (2)
ErrAcquireFailurePointTooOld(20-20)ErrAcquireFailurePointNotOnChain(23-25)
protocol/localtxmonitor/client_test.go (6)
protocol/localstatequery/client_test.go (1)
TestClientShutdown(357-409)connection.go (3)
NewConnection(107-130)Connection(59-103)New(133-135)protocol/protocol.go (2)
ProtocolRoleClient(94-94)New(122-133)connection_options.go (2)
WithConnection(36-40)WithNetworkMagic(50-54)protocol/localtxmonitor/localtxmonitor.go (1)
LocalTxMonitor(112-115)protocol/localtxmonitor/client.go (1)
Client(25-39)
protocol/peersharing/client.go (4)
protocol/blockfetch/client.go (1)
Client(29-40)protocol/protocol.go (1)
Protocol(39-60)protocol/peersharing/peersharing.go (1)
ProtocolName(27-27)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/keepalive/client_test.go (5)
connection.go (3)
NewConnection(107-130)Connection(59-103)New(133-135)protocol/protocol.go (2)
ProtocolRoleClient(94-94)New(122-133)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/keepalive/keepalive.go (1)
KeepAlive(85-88)protocol/keepalive/client.go (1)
Client(26-34)
protocol/txsubmission/server.go (5)
protocol/txsubmission/messages.go (2)
TxBody(197-201)NewMsgRequestTxIds(68-82)protocol/protocol.go (1)
Protocol(39-60)protocol/txsubmission/txsubmission.go (2)
ProtocolName(27-27)MaxAckCount(143-143)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/txsubmission/error.go (1)
ErrStopServerProcess(21-21)
🔇 Additional comments (26)
protocol/leiosfetch/client.go (1)
101-107: LGTM! Proper deferred cleanup on shutdown.The goroutine correctly waits for protocol shutdown via
DoneChan()before closing result channels, preventing writes to closed channels during the shutdown window.protocol/keepalive/client.go (1)
97-111: LGTM! Clean Stop() implementation.The new
Stop()method properly usesonceStopto ensure idempotency, logs the shutdown, and sends theMsgDonemessage. The timer cleanup is correctly handled by the goroutine inStart()(lines 84-92), which waits forDoneChan()before stopping the timer.protocol/localtxmonitor/client_test.go (1)
300-352: LGTM! Well-structured shutdown test.The test follows the established pattern across protocol clients, properly validating:
- Client availability check
- Start/Stop lifecycle with error handling
- Async error monitoring with timeouts
- Goroutine leak detection via
goleakprotocol/leiosnotify/client.go (2)
79-79: LGTM! Proper lifecycle management withstartedflag.The
startedflag is correctly used to determine channel closure timing inStop():
- If started, defers
requestNextChanclosure until protocol shutdown (lines 96-100)- If never started, closes immediately (lines 102-103)
This prevents closing channels that may still receive writes during shutdown, addressing the race condition noted in past reviews.
Also applies to: 95-104
143-177: LGTM! Shutdown-aware channel sends prevent panics.All message handlers now use
selectwithDoneChan()to avoid writing torequestNextChanafter it's closed. This eliminates the TOCTOU race condition where handlers could check shutdown state and then write to a closed channel.protocol/localtxsubmission/client.go (2)
83-83: LGTM! Consistent lifecycle management pattern.The
startedflag and conditional channel closure inStop()follow the same pattern as other protocol clients in this PR, ensuringsubmitResultChanis closed at the appropriate time based on whether the protocol was started.Also applies to: 104-113
165-169: LGTM! Handlers properly guard against shutdown races.Both
handleAcceptTxandhandleRejectTxnow useselectto race the channel send againstDoneChan(), returningErrProtocolShuttingDownif shutdown occurs first. This addresses the TOCTOU race noted in past reviews.Also applies to: 190-194
protocol/localtxsubmission/client_test.go (1)
167-219: LGTM! Comprehensive shutdown test.The test properly validates the LocalTxSubmission client lifecycle:
- Checks client availability
- Starts and stops with error handling
- Uses async error monitoring with timeouts
- Verifies clean shutdown and goroutine leak detection
protocol/blockfetch/client.go (2)
97-97: LGTM! Proper deferred channel closure prevents panics.The
startedflag and conditional closure logic ensure thatblockChanandstartBatchResultChanare closed only after the protocol fully shuts down (viaDoneChan()), preventing panics from in-flight responses attempting to write to closed channels. This addresses the critical issue noted in past reviews.Also applies to: 114-125
233-238: LGTM! All handlers are shutdown-aware.The message handlers now properly use
selectto checkDoneChan()before sending to result channels, ensuring no writes occur after shutdown begins. This eliminates race conditions between shutdown and message handling.Also applies to: 250-256, 305-309
protocol/blockfetch/client_test.go (1)
211-264: LGTM! Thorough shutdown lifecycle test.The test properly validates the new Start/Stop lifecycle by:
- Verifying the client can be started and stopped cleanly
- Using goroutine leak detection to ensure no resources are leaked
- Handling async errors from the mock connection appropriately
- Enforcing reasonable timeouts for shutdown sequences
This follows the established testing pattern used across other protocol clients in this PR.
protocol/chainsync/client.go (3)
37-37: Good addition of lifecycle tracking.The
startedflag enables conditional channel cleanup inStop(), ensuring channels are only closed after the protocol fully shuts down when the client was actually started.
148-157: Proper shutdown sequencing prevents resource leaks.The conditional deferred channel closure ensures:
- If started, channels close only after the protocol signals shutdown via
DoneChan()- If never started, channels close immediately to prevent leaks
This addresses the race condition flagged in previous reviews.
728-733: Shutdown-aware channel writes prevent panics.The select statements properly handle shutdown by:
- Checking
DoneChan()before writing toreadyForNextBlockChan- Returning
protocol.ErrProtocolShuttingDownif the protocol is shutting down- Preventing writes to potentially closed channels
This resolves the race condition between message handlers and
Stop()noted in past reviews.Also applies to: 739-743
protocol/localtxmonitor/client.go (2)
115-130: Excellent shutdown coordination for multiple result channels.The conditional deferred closure handles all four result channels (
acquireResultChan,hasTxResultChan,nextTxResultChan,getSizesResultChan) appropriately:
- Waits for protocol shutdown if started
- Closes immediately if never started
This prevents resource leaks and addresses the race condition noted in previous reviews.
276-281: Consistent shutdown-aware pattern across all handlers.All message handlers now use the same safe pattern:
- Select on
DoneChan()to detect shutdown- Return
protocol.ErrProtocolShuttingDownif shutting down- Otherwise, write to the result channel
This eliminates the race condition where handlers could write to channels closed by
Stop().Also applies to: 293-297, 310-314, 327-331
protocol/localstatequery/client.go (3)
904-911: Good synchronization for acquired state.Using
acquiredMutexto protect theacquiredflag before signaling prevents race conditions with concurrent acquire/release operations.
927-937: Shutdown-aware error and result handling.The handlers properly check for shutdown before writing to channels, returning
protocol.ErrProtocolShuttingDownwhen appropriate. This prevents writes to closed channels.Also applies to: 952-957
114-140: Resource leak on Stop() error path.If
SendMessage(msg)fails at line 125, the function returns the error at line 139 without:
- Closing the channels (
queryResultChan,acquireResultChan)- Calling
Protocol.Stop()This causes resource leaks when the Done message fails to send.
Apply this fix to ensure cleanup even on error:
func (c *Client) Stop() error { var err error c.onceStop.Do(func() { c.Protocol.Logger(). Debug("stopping client protocol", "component", "network", "protocol", ProtocolName, "connection_id", c.callbackContext.ConnectionId.String(), ) msg := NewMsgDone() - err = c.SendMessage(msg) + if sendErr := c.SendMessage(msg); sendErr != nil { + // Log the error but continue with cleanup + c.Protocol.Logger(). + Error("failed to send Done message during stop", + "error", sendErr, + "component", "network", + "protocol", ProtocolName, + "connection_id", c.callbackContext.ConnectionId.String(), + ) + } // Defer closing channels until protocol fully shuts down (only if started) if c.started { go func() { <-c.DoneChan() close(c.queryResultChan) close(c.acquireResultChan) }() } else { // If protocol was never started, close channels immediately close(c.queryResultChan) close(c.acquireResultChan) } + c.Protocol.Stop() }) - return err + return nil }Alternatively, if you want to preserve the error, call
Protocol.Stop()unconditionally before returning the error.Likely an incorrect or invalid review comment.
protocol/txsubmission/server.go (3)
91-105: Proper shutdown synchronization with restart mechanism.The
Stop()method correctly:
- Uses
onceStopfor idempotent shutdown- Holds
restartMutexto coordinate withhandleDone()'s restart logic- Closes
donechannel to signal shutdown- Calls
Protocol.Stop()to clean up the underlying protocolThis addresses the race condition flagged in previous reviews.
131-163: Atomic operations fix ackCount data race.The conversion to
int32with atomic operations (atomic.LoadInt32andatomic.StoreInt32) properly synchronizesackCountaccess between:
RequestTxIds()reading and writing ackCounthandleDone()resetting ackCount during restartThis resolves the data race noted in previous reviews.
282-287: Safe channel close coordination.The select statement prevents closing an already-closed
donechannel by checking ifStop()already closed it. Combined withrestartMutexheld by bothStop()andhandleDone(), this eliminates the race condition.protocol/peersharing/server.go (1)
109-109: LGTM! Cosmetic improvement to eliminate unused parameter.Renaming the unused
msgparameter to_properly indicates it's intentionally unused while maintaining the function signature for the message handler interface.protocol/peersharing/client_test.go (1)
28-81: LGTM! Comprehensive shutdown test for PeerSharing client.The test validates:
- Client can be started and stopped cleanly
- No goroutine leaks occur (via
goleak.VerifyNone)- Proper error handling from mock connection
- Shutdown completes within reasonable timeouts
Consistent with shutdown tests for other protocols in this PR.
protocol/peersharing/client.go (2)
73-109: Excellent lifecycle management implementation.The Start/Stop methods provide proper lifecycle control:
Start()is idempotent viaonceStartand sets thestartedflagStop()is idempotent viaonceStopand conditionally defers channel closure- If started, channel closes only after protocol shutdown via
DoneChan()- If never started, channel closes immediately to prevent leaks
This mirrors the pattern used consistently across other protocol clients in this PR.
154-158: Shutdown-aware channel write prevents panics.The select statement properly handles shutdown by checking
DoneChan()before writing tosharePeersChan, returningprotocol.ErrProtocolShuttingDownif the protocol is shutting down. This prevents writes to a closed channel.
ddbf172 to
edd9441
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
protocol/leiosfetch/client.go (1)
230-253: Critical race condition: message handlers can write to closed channels.Unlike the other protocol clients in this PR (leiosnotify, localtxsubmission, blockfetch), the message handlers here perform unconditional sends to result channels (lines 231, 236, 241, 246, 251) without checking
DoneChan(). WhenStop()spawns the cleanup goroutine (lines 116-122), there's a window where handlers can write to channels that are being closed, causing a panic.Apply shutdown-aware sends to all handlers:
func (c *Client) handleBlock(msg protocol.Message) error { - c.blockResultChan <- msg + select { + case <-c.DoneChan(): + return protocol.ErrProtocolShuttingDown + case c.blockResultChan <- msg: + } return nil } func (c *Client) handleBlockTxs(msg protocol.Message) error { - c.blockTxsResultChan <- msg + select { + case <-c.DoneChan(): + return protocol.ErrProtocolShuttingDown + case c.blockTxsResultChan <- msg: + } return nil } func (c *Client) handleVotes(msg protocol.Message) error { - c.votesResultChan <- msg + select { + case <-c.DoneChan(): + return protocol.ErrProtocolShuttingDown + case c.votesResultChan <- msg: + } return nil } func (c *Client) handleNextBlockAndTxsInRange(msg protocol.Message) error { - c.blockRangeResultChan <- msg + select { + case <-c.DoneChan(): + return protocol.ErrProtocolShuttingDown + case c.blockRangeResultChan <- msg: + } return nil } func (c *Client) handleLastBlockAndTxsInRange(msg protocol.Message) error { - c.blockRangeResultChan <- msg + select { + case <-c.DoneChan(): + return protocol.ErrProtocolShuttingDown + case c.blockRangeResultChan <- msg: + } return nil }protocol/peersharing/client.go (1)
24-33: Data race onstartedbetween Start and Stop
startedis written inStart()and read inStop()but is not protected by any shared synchronization primitive. BecauseonceStartandonceStopare independentsync.Onceinstances, callingStart()andStop()from different goroutines can trigger a race onstarted, and the Stop path (immediate vs deferred channel close) becomes non-deterministic under the race detector.A minimal way to make this race-free is to guard
started(and the Stop cleanup decision) with a mutex:type Client struct { *protocol.Protocol config *Config callbackContext CallbackContext sharePeersChan chan []PeerAddress onceStart sync.Once onceStop sync.Once + stateMutex sync.Mutex started bool } // Start starts the PeerSharing client protocol func (c *Client) Start() { c.onceStart.Do(func() { - c.Protocol.Logger(). + c.stateMutex.Lock() + defer c.stateMutex.Unlock() + + c.Protocol.Logger(). Debug("starting client protocol", "component", "network", "protocol", ProtocolName, "connection_id", c.callbackContext.ConnectionId.String(), ) c.started = true c.Protocol.Start() }) } // Stop stops the PeerSharing client protocol func (c *Client) Stop() error { c.onceStop.Do(func() { - c.Protocol.Logger(). + c.stateMutex.Lock() + defer c.stateMutex.Unlock() + + c.Protocol.Logger(). Debug("stopping client protocol", "component", "network", "protocol", ProtocolName, "connection_id", c.callbackContext.ConnectionId.String(), ) // Defer closing channel until protocol fully shuts down (only if started) if c.started { go func() { <-c.DoneChan() close(c.sharePeersChan) }() } else { // If protocol was never started, close channel immediately close(c.sharePeersChan) } c.Protocol.Stop() }) return nil }Alternatively, an
atomic.Boolwould also work if you’re comfortable taking a dependency onsync/atomicfor this.Also applies to: 73-109
protocol/localtxmonitor/client.go (1)
24-39: Unsynchronizedstartedflag between Start and StopAs in the PeerSharing client,
startedis written inStart()and read inStop()without shared synchronization. BecauseonceStartandonceStopare distinctsync.Onceinstances, concurrent calls toStart()andStop()can race onstarted, which in turn controls whether channels are closed immediately or only afterDoneChan()completes. This is a classic data race that will be flagged by-race.You can make this race-free by guarding
startedand the Stop cleanup decision with a mutex dedicated to lifecycle state:type Client struct { *protocol.Protocol config *Config callbackContext CallbackContext busyMutex sync.Mutex acquired bool acquiredSlot uint64 acquireResultChan chan bool hasTxResultChan chan bool nextTxResultChan chan []byte getSizesResultChan chan MsgReplyGetSizesResult onceStart sync.Once onceStop sync.Once + stateMutex sync.Mutex started bool } func (c *Client) Start() { c.onceStart.Do(func() { - c.Protocol.Logger(). + c.stateMutex.Lock() + defer c.stateMutex.Unlock() + + c.Protocol.Logger(). Debug("starting client protocol", "component", "network", "protocol", ProtocolName, "connection_id", c.callbackContext.ConnectionId.String(), ) c.started = true c.Protocol.Start() }) } // Stop transitions the protocol to the Done state. No more operations will be possible func (c *Client) Stop() error { var err error c.onceStop.Do(func() { - c.Protocol.Logger(). + c.stateMutex.Lock() + defer c.stateMutex.Unlock() + + c.Protocol.Logger(). Debug("stopping client protocol", "component", "network", "protocol", ProtocolName, "connection_id", c.callbackContext.ConnectionId.String(), ) c.busyMutex.Lock() defer c.busyMutex.Unlock() msg := NewMsgDone() if err = c.SendMessage(msg); err != nil { return } // Defer closing channels until protocol fully shuts down (only if started) if c.started { go func() { <-c.DoneChan() close(c.acquireResultChan) close(c.hasTxResultChan) close(c.nextTxResultChan) close(c.getSizesResultChan) }() } else { // If protocol was never started, close channels immediately close(c.acquireResultChan) close(c.hasTxResultChan) close(c.nextTxResultChan) close(c.getSizesResultChan) } }) return err }An
atomic.Boolwould also work if you prefer that pattern over an extra mutex.Also applies to: 86-97, 99-133
protocol/chainsync/client.go (1)
341-346: Check channel closure in GetAvailableBlockRange.The receive from
readyForNextBlockChanat line 341 doesn't check theokreturn value. If the channel is closed duringGetAvailableBlockRange(e.g., concurrentStop()), the receive will succeed repeatedly with a zero value, causing the loop to spin and send spuriousRequestNextmessages untilDoneChan()is checked on the next iteration.Apply this diff to handle channel closure:
- case <-c.readyForNextBlockChan: + case ready, ok := <-c.readyForNextBlockChan: + if !ok { + // Channel closed, protocol shutting down + return start, end, protocol.ErrProtocolShuttingDown + } + // Only proceed if ready is true + if !ready { + return start, end, errors.New("sync cancelled") + } // Request the next block msg := NewMsgRequestNext() if err := c.SendMessage(msg); err != nil {
🧹 Nitpick comments (1)
protocol/keepalive/client.go (1)
33-34: Consider preserving the first Stop() error across subsequent callsRight now
Stop()returns theSendMessageerror only on the first invocation; later calls always returnnilbecauseonceStop.Doshort-circuits anderrstays zero-valued. If callers may legitimately callStop()more than once, consider storing the first error in theClientand always returning that, e.g. astopErrfield populated inside theonceStopclosure.Also applies to: 97-111
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (23)
protocol/blockfetch/blockfetch.go(1 hunks)protocol/blockfetch/client.go(6 hunks)protocol/blockfetch/client_test.go(1 hunks)protocol/chainsync/chainsync.go(1 hunks)protocol/chainsync/client.go(5 hunks)protocol/chainsync/client_test.go(2 hunks)protocol/keepalive/client.go(2 hunks)protocol/keepalive/client_test.go(1 hunks)protocol/leiosfetch/client.go(3 hunks)protocol/leiosnotify/client.go(4 hunks)protocol/leiosnotify/client_test.go(1 hunks)protocol/localstatequery/client.go(6 hunks)protocol/localstatequery/client_test.go(1 hunks)protocol/localtxmonitor/client.go(7 hunks)protocol/localtxmonitor/client_test.go(1 hunks)protocol/localtxsubmission/client.go(5 hunks)protocol/localtxsubmission/client_test.go(1 hunks)protocol/peersharing/client.go(4 hunks)protocol/peersharing/client_test.go(1 hunks)protocol/peersharing/server.go(1 hunks)protocol/peersharing/server_test.go(1 hunks)protocol/txsubmission/server.go(8 hunks)protocol/txsubmission/server_test.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- protocol/blockfetch/blockfetch.go
- protocol/leiosnotify/client_test.go
- protocol/blockfetch/client_test.go
- protocol/peersharing/server_test.go
- protocol/localtxsubmission/client_test.go
- protocol/chainsync/chainsync.go
- protocol/peersharing/server.go
🔇 Additional comments (38)
protocol/leiosnotify/client.go (2)
31-31: LGTM! Proper lifecycle tracking and conditional channel closure.The
startedflag correctly tracks whether the protocol has been started, andStop()uses this to defer channel cleanup until the protocol fully shuts down (viaDoneChan()). This prevents races where channels are closed prematurely.Also applies to: 79-79, 95-104
143-176: LGTM! Shutdown-aware sends prevent write-to-closed-channel panics.All message handlers properly use
selectto checkDoneChan()before sending torequestNextChan, ensuring no sends occur after shutdown begins.protocol/localtxsubmission/client.go (2)
34-34: LGTM! Consistent lifecycle management.The
startedflag and conditional channel closure inStop()follow the same correct pattern as other protocol clients in this PR.Also applies to: 83-83, 104-113
157-171: LGTM! Handlers properly check for shutdown.Both
handleAcceptTxandhandleRejectTxuseselectto race the send againstDoneChan(), eliminating the TOCTOU race mentioned in past review comments.Also applies to: 173-196
protocol/blockfetch/client.go (2)
39-39: LGTM! Proper deferred channel cleanup.The
startedflag ensures channels are only closed after the protocol fully shuts down (viaDoneChan()), preventing panics from in-flight responses.Also applies to: 97-97, 114-125
224-239: LGTM! All message handlers are shutdown-aware.All three handlers (
handleStartBatch,handleNoBlocks,handleBlock) properly useselectwithDoneChan()before sending to result channels, addressing the critical race condition noted in past reviews.Also applies to: 241-257, 259-312
protocol/chainsync/client_test.go (2)
83-86: LGTM! Proper cleanup prevents goroutine leaks.Adding explicit client shutdown after each test ensures proper resource cleanup, which is validated by the
goleak.VerifyNone(t)check.
283-336: LGTM! Standard shutdown lifecycle test.The test properly validates the Start/Stop lifecycle with timeout checks and error handling, consistent with shutdown tests in other protocol clients.
protocol/txsubmission/server_test.go (1)
28-80: LGTM! Test structure is sound, skip is documented.The test follows the same shutdown validation pattern as other protocol tests. The skip reason clearly documents a known mock infrastructure limitation. Once mock server issues are resolved, this test will provide coverage for the server shutdown path.
protocol/keepalive/client_test.go (1)
241-294: LGTM! Consistent shutdown test with proper error handling.The test properly validates the
Stop()error return (lines 272-274), addressing the inconsistency noted in past reviews. The structure matches other protocol shutdown tests.protocol/localtxmonitor/client_test.go (1)
300-352: LGTM! Consistent shutdown lifecycle validation.The test follows the established pattern for shutdown tests across protocol clients, with proper error handling and timeout validation.
protocol/peersharing/client_test.go (1)
28-80: Shutdown test pattern looks solidThe test exercises start/stop, validates both mock and Ouroboros shutdown, and leverages goleak to catch leaks. This nicely mirrors the other protocol shutdown tests.
protocol/localstatequery/client_test.go (1)
357-409: Consistent shutdown coverage for LocalStateQueryThis mirrors the other protocol shutdown tests: verifies client Start/Stop, checks mock and Ouroboros shutdown, and uses goleak for leak detection. Good addition to guard the new lifecycle behavior.
protocol/peersharing/client.go (1)
145-159: Shutdown-awarehandleSharePeerslooks correctUsing a
selectonDoneChan()before sending tosharePeersChanaligns with the deferred-close logic inStop()and avoids writes to a closed channel while giving callers a clearErrProtocolShuttingDownsignal viaGetPeers.protocol/localtxmonitor/client.go (1)
265-333: HandlerselectonDoneChancleanly avoids send-on-closed-channel panicsThe updated handlers (
handleAcquired,handleReplyHasTx,handleReplyNextTx,handleReplyGetSizes) now select onc.DoneChan()before sending into their result channels and propagateErrProtocolShuttingDownwhen shutting down. Combined with deferring channel closure until afterDoneChaninStop(), this resolves the prior race between handler writes and channel closes.protocol/chainsync/client.go (5)
37-37: LGTM:startedfield tracks lifecycle state.The
startedboolean tracks whetherStart()has been invoked, enabling conditional cleanup inStop(). Access is safe because it's written once underonceStartand read once underonceStop.
119-130: LGTM: Start() sets lifecycle flag.Setting
started = trueenables the conditional cleanup logic inStop(). Protected byonceStart.
148-157: LGTM: Shutdown sequencing addresses the race condition.The goroutine waits for protocol shutdown (
DoneChan()) before closingreadyForNextBlockChan, ensuring message handlers complete before the channel closes. This addresses the race condition flagged in the previous review.
728-743: LGTM: Shutdown-aware channel sends prevent write-after-close.Wrapping the
readyForNextBlockChansends inselectstatements withDoneChan()checks prevents writes to a closed channel during shutdown. This coordinates correctly withStop(), which waits for protocol shutdown before closing the channel.
767-783: LGTM: Consistent shutdown-aware channel sends.The
handleRollBackwardmethod applies the same shutdown-aware pattern ashandleRollForward, preventing write-after-close panics.protocol/localstatequery/client.go (10)
38-38: LGTM: Mutex added to protectacquiredstate.The
acquiredMutexsynchronizes access to theacquiredboolean, preventing data races between concurrent operations likeacquire(),release(),runQuery(), and message handlers.
44-45: LGTM: Lifecycle management fields added.The
onceStopandstartedfields enable idempotent shutdown and conditional cleanup, consistent with other protocols in this PR.
101-112: LGTM: Start() sets lifecycle flag and removes immediate cleanup goroutine.Setting
started = trueand removing the cleanup goroutine aligns with the PR's approach of deferring channel cleanup untilStop()ensures protocol shutdown.
114-140: LGTM: Stop() defers channel cleanup until protocol shutdown.The
Stop()method sendsMsgDoneand waits for full protocol shutdown before closingqueryResultChanandacquireResultChan, preventing write-after-close panics in message handlers. The conditional logic handles both started and non-started cases correctly.
904-911: LGTM: handleAcquired() protects state and uses shutdown-aware send.The
acquiredMutexprotects theacquiredboolean, and theselectstatement prevents writing to a closed channel during shutdown.
927-937: LGTM: handleFailure() uses shutdown-aware sends.Both failure cases wrap
acquireResultChansends inselectstatements withDoneChan()checks, preventing writes to closed channels during shutdown.
952-957: LGTM: handleResult() uses shutdown-aware send.Extracting
msgResultbefore theselectand checkingDoneChan()prevents writes to a closed channel during shutdown.
961-997: LGTM: acquire() readsacquiredstate under mutex.The
acquiredMutexprotects the read of theacquiredboolean (lines 962-964), ensuring thread-safe access when determining whether to sendAcquireorReAcquiremessages.
1004-1006: LGTM: release() protects state update.The
acquiredMutexprotects the write toacquired, maintaining consistent synchronization with other accesses.
1011-1020: LGTM: runQuery() readsacquiredstate under mutex.The
acquiredMutexprotects the read ofacquired(lines 1013-1015), ensuring thread-safe determination of whether to acquire the volatile tip before running the query.protocol/txsubmission/server.go (8)
20-21: LGTM: Imports added for atomic operations and synchronization.The
syncandsync/atomicimports support the new atomicackCountoperations and restart coordination viarestartMutex.
33-33: LGTM:ackCountchanged toint32for atomic operations.The type change from
inttoint32enables atomic operations, addressing the data race flagged in the previous review.
36-38: LGTM: Lifecycle and coordination fields added.The
done,onceStop, andrestartMutexfields enable proper shutdown signaling and coordinate the restart logic inhandleDone()with explicitStop()calls.
52-54: LGTM: Buffered result channels and done signal.Buffering the result channels (capacity 1) allows handlers to send results without blocking, and the
donechannel enables shutdown signaling to blocking operations.
91-105: LGTM: Stop() coordinates with restart via restartMutex.The
Stop()method acquiresrestartMutexbefore closingdone, coordinating withhandleDone()'s restart logic to prevent concurrent closes of thedonechannel. This addresses the race condition flagged in the previous review.
131-167: LGTM: Atomic operations protectackCountand shutdown is handled.Using
atomic.LoadInt32(line 131) andatomic.StoreInt32(line 163) eliminates the data race onackCountflagged in the previous review. Theselectstatement (lines 153-167) handles shutdown gracefully by checking both the result channel anddone.
192-199: LGTM: RequestTxs() handles shutdown gracefully.The
selectstatement checks bothdoneandrequestTxsResultChan, returningprotocol.ErrProtocolShuttingDownif the server is shutting down.
259-298: LGTM: Restart logic coordinates properly with Stop().The
handleDone()method:
- Signals
RequestTxIds()to stop waiting (non-blocking, lines 260-266)- Acquires
restartMutexto coordinate withStop()(line 274)- Guards against double-close of
donewith aselect(lines 282-287)- Reinitializes all state including atomic
ackCount, channels, andonceStop(lines 289-294)- Unlocks before calling
Start()to avoid holding the mutex during protocol startup (line 295)This restart flow addresses the race conditions flagged in the previous review.
edd9441 to
8e3b9ab
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
protocol/leiosnotify/client.go (1)
24-32: Synchronize access tostartedto avoid a Start/Stop data race
startedis written inStart()(line 79) and read inStop()(line 96) without mutex protection. This creates a data race despiteonceStart/onceStopusage—those synchronize call-once semantics, not field access across goroutines.The reference implementation in
protocol/localtxmonitor/client.goalready usesstateMutexto guard thestartedfield. Apply the same pattern here:type Client struct { *protocol.Protocol config *Config callbackContext CallbackContext requestNextChan chan protocol.Message onceStart sync.Once onceStop sync.Once + stateMutex sync.Mutex started bool } func (c *Client) Start() { c.onceStart.Do(func() { + c.stateMutex.Lock() + defer c.stateMutex.Unlock() c.Protocol.Logger(). Debug("starting client protocol", "component", "network", "protocol", ProtocolName, "connection_id", c.callbackContext.ConnectionId.String(), ) c.started = true c.Protocol.Start() }) } func (c *Client) Stop() error { var err error c.onceStop.Do(func() { + c.stateMutex.Lock() + defer c.stateMutex.Unlock() c.Protocol.Logger(). Debug("stopping client protocol", "component", "network", "protocol", ProtocolName, "connection_id", c.callbackContext.ConnectionId.String(), ) msg := NewMsgDone() err = c.SendMessage(msg) // Defer closing channel until protocol fully shuts down (only if started) if c.started { go func() { <-c.DoneChan() close(c.requestNextChan) }() } else { // If protocol was never started, close channel immediately close(c.requestNextChan) } }) return err }protocol/localtxsubmission/client.go (1)
26-35: Confirmed: Data race onstartedfield requires synchronization with dedicated mutexThe review correctly identifies a critical race condition. The field
startedis written inStart()at line 83 and read inStop()at line 105 without synchronization. AlthoughonceStartandonceStopprevent multiple invocations of each closure independently, they do not synchronize between the two methods—concurrent calls toStart()andStop()can race on thestartedfield across goroutines.The reference implementation in
localtxmonitor/client.goconfirms the correct pattern: it protectsstartedwith a dedicatedstateMutex, locking it in bothStart()andStop(). The suggested diff accurately mirrors this proven approach and preserves the existing "never started" fast-close logic while eliminating the data race.protocol/chainsync/client.go (1)
29-38: Synchronizestartedflag between Start and Stop to avoid racesData race confirmed:
startedis a plain bool written inStart()(line 127) and read inStop()(line 149) with no synchronization. The independentsync.Onceguards ononceStartandonceStopdo not synchronize against each other. ConcurrentStart()/Stop()calls can causeStop()to readstartedas false beforeStart()writes true, closingreadyForNextBlockChanimmediately while handlers at lines 739, 750, 778, and 790 still attempt to send on it, resulting in a panic.The fix is correct: replace
started boolwithstarted atomic.Boolin the struct (line 37), update line 127 toc.started.Store(true), and line 149 toif c.started.Load(). Thesync/atomicimport already exists.@@ type Client struct { - started bool + started atomic.Bool @@ func (c *Client) Start() { - c.started = true + c.started.Store(true) @@ func (c *Client) Stop() error { - if c.started { + if c.started.Load() {protocol/blockfetch/client.go (1)
29-40: Prevent race and send-after-close hazards aroundstartedin Start/Stop
startedis a plain bool written inStart()(line 97) and read inStop()(line 115) without synchronization. Although both methods usesync.Onceguards,onceStartandonceStopare separate instances that do not synchronize with each other. IfStart()andStop()run concurrently,Stop()can observestarted == falsewhile Start() is still in its Do() block before the assignment completes, causingStop()to closeblockChanandstartBatchResultChanimmediately. Later message handlers will panic when sending to closed channels.A minimal fix is to make
startedatomic and useStore/Load:@@ -import ( - "errors" - "fmt" - "sync" +import ( + "errors" + "fmt" + "sync" + "sync/atomic" @@ blockUseCallback bool // Whether to use callback for blocks onceStart sync.Once // Ensures Start is only called once onceStop sync.Once // Ensures Stop is only called once - started bool // Whether the protocol has been started + started atomic.Bool // Whether the protocol has been started @@ func (c *Client) Start() { c.started = true + c.started.Store(true) @@ func (c *Client) Stop() error { - if c.started { + if c.started.Load() {This eliminates the data race between Start/Stop and the remaining panic risk. If the intended contract forbids calling
Start()afterStop(), document that to avoid unsupported lifecycle patterns.
♻️ Duplicate comments (1)
protocol/txsubmission/server_test.go (1)
28-80: Check and assert the error fromServer.Stop()in the shutdown test
TestServerShutdowncurrently callsoConn.TxSubmission().Server.Stop()(Line 60) without checking the returned error. This is the same pattern that previously hid shutdown issues in other protocol tests.Consider aligning with the other tests by asserting that
Stop()succeeds:@@ func TestServerShutdown(t *testing.T) { - // Start the server - oConn.TxSubmission().Server.Start() - // Stop the server - oConn.TxSubmission().Server.Stop() + // Start the server + oConn.TxSubmission().Server.Start() + // Stop the server + if err := oConn.TxSubmission().Server.Stop(); err != nil { + t.Fatalf("unexpected error when stopping server: %s", err) + }This will surface any shutdown failures instead of silently ignoring them, even once the test is un-skipped.
🧹 Nitpick comments (7)
protocol/localtxsubmission/client_test.go (1)
167-219: TestClientShutdown gives good coverage of client lifecycleThe new
TestClientShutdownexercisesLocalTxSubmission().Client.Start()andStop()against the mock connection, waits for the mock to shut down, and checks for leaks via goleak and timeouts. This is a useful regression test for the new lifecycle behavior. If you want to reduce duplication, you could factor this through the existingrunTesthelper, but that’s optional.protocol/chainsync/client.go (1)
241-365: Improved handling ofreadyForNextBlockChaninGetAvailableBlockRangeThe new
(ready, ok := <-c.readyForNextBlockChan)logic (Lines 341–349) correctly distinguishes:
- closed channel → treat as protocol shutdown and return
protocol.ErrProtocolShuttingDownready == false→ treat as sync cancellationThis aligns the client-facing API with the new semantics introduced in
handleRollForward/handleRollBackwardand avoids panics on closed channels.If you find yourself reusing the
"sync cancelled"error elsewhere, consider defining it as a package-level sentinel (var ErrSyncCancelled = errors.New("sync cancelled")) to make comparisons easier and reduce duplication.protocol/peersharing/client_test.go (1)
28-80: Client shutdown test is solid; consider mirroring the common error-watcher patternThe shutdown flow (Start → Stop → wait on mock error channel → Close Ouroboros → wait on
oConn.ErrorChan) plusgoleak.VerifyNonelooks correct and should catch leaks. If you want this to match the stricter pattern used in other protocol tests (e.g.,runTestinlocalstatequery), you could also add a short-lived goroutine watchingoConn.ErrorChan()during the active phase to fail fast on unexpected Ouroboros errors instead of relying solely on goleak/timeouts.protocol/peersharing/server_test.go (1)
28-78: Skipped server shutdown test: document intent and future re‑enablementBecause
t.Skipis the first statement, the rest of the function (includinggoleak.VerifyNone) never runs, so there is currently no server-side shutdown coverage. That’s fine if this is a temporary workaround, but it’s worth either:
- Adding a TODO / issue reference into the skip message so it’s easy to track when NtN mock issues are fixed, or
- Moving the skip behind a condition (or down in the body) if you want to keep
goleak.VerifyNoneactive once the test is re-enabled.protocol/localstatequery/client_test.go (1)
25-34: LocalStateQuery shutdown test is correct; consider reusingrunTestharnessThe new alias import and
TestClientShutdowncorrectly wire a mock connection, start/stop the LocalStateQuery client, and wait for both the mock and Ouroboros connection to shut down withgoleak.VerifyNone. To reduce duplication and keep behavior consistent with the other localstatequery tests, you could:
- Build the handshake-only conversation slice, and
- Implement
TestClientShutdownas a thin wrapper aroundrunTest, with aninnerFuncthat just assertsLocalStateQuery() != nil, callsClient.Start(), and thenClient.Stop().This would automatically reuse the existing async
oConn.ErrorChanwatcher and shutdown sequencing.Also applies to: 357-409
protocol/keepalive/client.go (1)
26-35: Make KeepAliveStoptie more explicitly into protocol/timer shutdownThe
onceStop/stopErrpattern and logging look good, butStop()currently only sendsNewMsgDoneand relies on the rest of the stack to fully tear down the protocol and stop the timer. IfSendMessagefails (e.g., peer already shutting down) or the remote misbehaves, you may end up with the keepalive timer/goroutine still active even thoughStop()returned.Consider tightening this up by, for example:
- Calling
c.Protocol.Stop()after sending the Done message soDoneChan()is guaranteed to close, and/or- Stopping the timer directly in
Stop()(undertimerMutex) as a best-effort local cleanup, similar to the goroutine inStartthat reacts toDoneChan().This would make KeepAlive’s lifecycle more robust and closer to other protocol clients.
Also applies to: 98-111
protocol/peersharing/client.go (1)
25-35: PeerSharing lifecycle is well structured; guard the Stop‑before‑Start edge caseThe added
onceStart/onceStop,stateMutex, andstartedflag give you a clean, idempotent start/stop model, and deferringsharePeersChanclosure until afterDoneChan()closes (when started) is a nice way to letGetPeerseither receive peers or cleanly returnErrProtocolShuttingDown. The shutdown-awarehandleSharePeersselectis also a good improvement over an unconditional send.One edge case to consider: if
Stop()is called beforeStart(), you immediatelyclose(c.sharePeersChan)and callc.Protocol.Stop(). A laterStart()will still succeed (differentsync.Once), butsharePeersChanremains closed; any incoming SharePeers message would causehandleSharePeersto send on a closed channel and panic.If you want to defend against that misuse, you could, for example:
- Treat
Stop()as a no-op when!c.started, or- Track a separate
stoppedflag and haveStart()become a no-op (or return an error) once the client has been stopped.Either approach would make the API more robust in the face of accidental Stop-before-Start usage without changing the normal Start→GetPeers→Stop flow.
Also applies to: 74-116, 152-165
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (23)
protocol/blockfetch/blockfetch.go(1 hunks)protocol/blockfetch/client.go(6 hunks)protocol/blockfetch/client_test.go(1 hunks)protocol/chainsync/chainsync.go(1 hunks)protocol/chainsync/client.go(6 hunks)protocol/chainsync/client_test.go(2 hunks)protocol/keepalive/client.go(2 hunks)protocol/keepalive/client_test.go(1 hunks)protocol/leiosfetch/client.go(4 hunks)protocol/leiosnotify/client.go(4 hunks)protocol/leiosnotify/client_test.go(1 hunks)protocol/localstatequery/client.go(6 hunks)protocol/localstatequery/client_test.go(2 hunks)protocol/localtxmonitor/client.go(7 hunks)protocol/localtxmonitor/client_test.go(1 hunks)protocol/localtxsubmission/client.go(5 hunks)protocol/localtxsubmission/client_test.go(1 hunks)protocol/peersharing/client.go(4 hunks)protocol/peersharing/client_test.go(1 hunks)protocol/peersharing/server.go(1 hunks)protocol/peersharing/server_test.go(1 hunks)protocol/txsubmission/server.go(8 hunks)protocol/txsubmission/server_test.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- protocol/chainsync/chainsync.go
- protocol/blockfetch/client_test.go
- protocol/leiosnotify/client_test.go
- protocol/localtxmonitor/client_test.go
- protocol/chainsync/client_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-04T15:54:22.683Z
Learnt from: wolf31o2
Repo: blinklabs-io/gouroboros PR: 1093
File: ledger/mary/pparams.go:143-150
Timestamp: 2025-11-04T15:54:22.683Z
Learning: In the blinklabs-io/gouroboros repository, the design goal for CBOR round-trip tests is to achieve byte-identical encoding WITHOUT using stored CBOR (cbor.DecodeStoreCbor). Instead, the approach uses proper field types (pointers for optional fields) and relies on the cbor package's deterministic encoding (SortCoreDeterministic) to ensure reproducible output. The stored CBOR pattern should not be suggested as a solution for round-trip fidelity in this codebase.
Applied to files:
protocol/localstatequery/client_test.go
🔇 Additional comments (25)
protocol/leiosfetch/client.go (1)
230-273: LGTM! Shutdown-aware channel sends prevent deadlocks.All handler methods now use
selectwithDoneChan()to make channel sends non-blocking during shutdown. This correctly prevents deadlocks ifStop()is called while a handler is blocked waiting to send on a result channel, and consistently returnsprotocol.ErrProtocolShuttingDownwhen the protocol is shutting down.protocol/blockfetch/blockfetch.go (1)
121-123: DefaultRecvQueueSize bump is safe and within validation boundsThe new default of 384 stays below
MaxRecvQueueSize(512) and continues to flow throughNewConfig/validate()unchanged, so this is a straightforward tuning change with no correctness risk.protocol/leiosnotify/client.go (1)
143-176: Shutdown‑aware handler sends look correctWrapping the handler sends in a
selectonDoneChan()vsrequestNextChancleanly prevents writes to a channel that’s being closed during shutdown and surfacesprotocol.ErrProtocolShuttingDownto the caller in a predictable way. This aligns well with the Stop/DoneChan–based cleanup path.protocol/localtxsubmission/client.go (1)
157-171: Handler select pattern resolves previous TOCTOU riskThe updated
handleAcceptTxandhandleRejectTxuseselect { case <-c.DoneChan(): …; case c.submitResultChan <- … }, which ensures that shutdown viaDoneChanwins over any pending result send and prevents writes to a channel that’s being closed byStop(). This is the right pattern for avoiding the earlier TOCTOU send‑to‑closed‑channel issue.Also applies to: 173-195
protocol/localtxmonitor/client.go (1)
24-40: Lifecycle tracking and shutdown‑aware handlers in LocalTxMonitor look solidAdding
stateMutex/startedand using the mutex inStart()/Stop()cleanly serializes lifecycle transitions and avoids races on the state flag. Deferring channel closes until afterDoneChan()when started, and falling back to immediate close otherwise, matches the intended semantics and prevents goroutine leaks. The handler changes that wrap sends in aselectonDoneChan()vs the result channels mirror the pattern used elsewhere in this PR and should eliminate send‑to‑closed‑channel panics in this client.Also applies to: 87-101, 103-140, 272-339
protocol/blockfetch/client.go (1)
224-239: Shutdown-aware selects on internal channels look correctThe new
selectblocks inhandleStartBatch(Lines 233–237),handleNoBlocks(Lines 250–255), and the non-callback path ofhandleBlock(Lines 305–309) properly gate sends onDoneChan()and returnprotocol.ErrProtocolShuttingDownon shutdown instead of blindly sending. Together with the deferred channel closure inStop(), this removes the previous send-on-closed-channel panic risk and makes shutdown semantics much safer.Also applies to: 241-257, 285-310
protocol/chainsync/client.go (1)
606-753: Shutdown-aware writes toreadyForNextBlockChanlook safeThe new
selectblocks inhandleRollForward(Lines 736–741 and 747–751) andhandleRollBackward(Lines 775–779 and 787–791) ensure that writes toreadyForNextBlockChanare skipped onceDoneChan()is closed and instead returnprotocol.ErrProtocolShuttingDown. This fixes the previous write-after-close race from earlier reviews while preserving the “ready vs cancelled” signaling semantics.Also applies to: 755-793
protocol/keepalive/client_test.go (1)
241-294: Shutdown test for KeepAlive client looks solid
TestClientShutdownfollows the common pattern used elsewhere: async error monitoring on the mock connection, explicitStart()/Stop()calls with error checking, time-bounded waits, and finaloConn.Close()/shutdown wait. This should give good coverage of the new KeepAlive client lifecycle behavior.protocol/peersharing/server.go (1)
109-122: Unused-parameter cleanup inhandleDoneis fineSwitching the
handleDoneparameter to_ protocol.Messagecorrectly reflects that the message is not used and keeps the existing restart logic unchanged.protocol/localstatequery/client.go (9)
38-38: LGTM: Lifecycle control fields added.The new
acquiredMutex,onceStop, andstartedfields provide proper synchronization for acquisition state and idempotent shutdown, aligning with the lifecycle improvements described in the PR objectives.Also applies to: 44-45
101-112: LGTM: Start lifecycle tracked.Setting
started = truebefore callingProtocol.Start()correctly tracks whether the protocol was initialized, which is later used inStop()to determine channel cleanup behavior.
114-140: LGTM: Stop method with deferred cleanup.The
Stop()implementation correctly:
- Uses
onceStopfor idempotency- Defers channel closure until protocol shutdown when started
- Closes channels immediately if never started
The goroutine at line 128 correctly waits for
DoneChan()before closing channels, preventing "send on closed channel" panics during graceful shutdown.
896-914: LGTM: Acquisition state properly synchronized.The mutex protection (lines 904-906) ensures thread-safe updates to
acquired, and the shutdown-aware select (lines 907-911) prevents blocking on a send during shutdown.
916-942: LGTM: Failure handling respects shutdown.Both failure paths (lines 927-931, 933-937) now use shutdown-aware selects, preventing blocked sends when the protocol is shutting down.
944-959: LGTM: Result handling with shutdown awareness.Extracting the result before the select (line 952) and using a shutdown-aware select (lines 953-957) ensures clean shutdown semantics.
961-997: LGTM: Acquire logic synchronized.Reading
acquiredunder mutex protection (lines 962-964) and using it to choose betweenAcquireandReAcquiremessages (line 966) ensures correct state-based protocol transitions.
999-1009: LGTM: Release clears acquisition state safely.Clearing
acquiredunder mutex protection (lines 1004-1006) ensures thread-safe state transitions.
1011-1032: LGTM: Query auto-acquires when needed.Reading
acquiredunder mutex protection (lines 1013-1015) and auto-acquiring if necessary (lines 1016-1020) provides convenient API behavior while maintaining thread safety.protocol/txsubmission/server.go (7)
20-21: LGTM: Required imports for concurrency.The
syncandsync/atomicimports support the mutex-based restart coordination and atomicackCountoperations.
33-33: LGTM: Struct fields support safe concurrency.
ackCountasint32enables atomic operationsdone,onceStop, andrestartMutexprovide proper coordination betweenStop()and the restart path inhandleDone()These changes address the race conditions flagged in previous reviews.
Also applies to: 36-38
52-54: LGTM: Buffered channels and done initialization.Buffering the result channels (lines 52-53) with capacity 1 enables non-blocking sends in the message handlers, and initializing
done(line 54) supports the shutdown signaling mechanism.
91-105: LGTM: Stop coordinates with restart path.The
Stop()method correctly:
- Uses
onceStopfor idempotency- Acquires
restartMutex(lines 94-95) to coordinate withhandleDone()- Closes
doneunder mutex protection (line 102)This implementation addresses the critical race condition identified in the previous review.
107-168: LGTM: Atomic ackCount operations and shutdown handling.The function correctly:
- Uses
atomic.LoadInt32(line 131) andatomic.StoreInt32(line 163) for race-free access toackCount- Validates bounds before casting to
uint16(lines 132-141)- Handles shutdown via select with
donechannel (lines 153-167)The bounds annotation at line 162 correctly justifies the safe conversion. This addresses the data race identified in the previous review.
170-200: LGTM: Shutdown-aware request handling.The select statement (lines 191-199) correctly handles both shutdown via
donechannel and closed result channels, ensuring clean termination semantics.
251-299: LGTM: Restart logic coordinates with Stop.The
handleDone()restart sequence correctly:
- Non-blocking signal to
RequestTxIds(lines 259-266)- Acquires
restartMutexto coordinate withStop()(line 274)- Uses select to prevent double-close of
done(lines 282-287)- Reinitializes all per-restart state (lines 290-294), including atomic reset of
ackCount- Calls
Start()outside the lock (line 297) for better concurrencyThis implementation addresses the critical race condition from the previous review regarding concurrent close of the
donechannel.
8e3b9ab to
e999895
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
protocol/txsubmission/server_test.go (1)
25-31: Defer goleak before early exits when you re‑enable this testRight now
t.Skip(...)is called beforedefer goleak.VerifyNone(t), so the defer is never registered. That’s fine while the test is skipped, but once you remove thet.Skipyou’ll probably wantgoleak.VerifyNoneto be the first defer so leaks are checked even if the test bails out early.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (24)
protocol/blockfetch/blockfetch.go(1 hunks)protocol/blockfetch/client.go(7 hunks)protocol/blockfetch/client_test.go(1 hunks)protocol/chainsync/chainsync.go(1 hunks)protocol/chainsync/client.go(6 hunks)protocol/chainsync/client_test.go(2 hunks)protocol/chainsync/error.go(1 hunks)protocol/keepalive/client.go(2 hunks)protocol/keepalive/client_test.go(1 hunks)protocol/leiosfetch/client.go(5 hunks)protocol/leiosnotify/client.go(4 hunks)protocol/leiosnotify/client_test.go(1 hunks)protocol/localstatequery/client.go(6 hunks)protocol/localstatequery/client_test.go(2 hunks)protocol/localtxmonitor/client.go(7 hunks)protocol/localtxmonitor/client_test.go(1 hunks)protocol/localtxsubmission/client.go(5 hunks)protocol/localtxsubmission/client_test.go(1 hunks)protocol/peersharing/client.go(4 hunks)protocol/peersharing/client_test.go(1 hunks)protocol/peersharing/server.go(1 hunks)protocol/peersharing/server_test.go(1 hunks)protocol/txsubmission/server.go(8 hunks)protocol/txsubmission/server_test.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- protocol/peersharing/server_test.go
- protocol/chainsync/client.go
- protocol/keepalive/client.go
- protocol/keepalive/client_test.go
- protocol/peersharing/client_test.go
- protocol/chainsync/chainsync.go
- protocol/blockfetch/client_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-04T15:54:22.683Z
Learnt from: wolf31o2
Repo: blinklabs-io/gouroboros PR: 1093
File: ledger/mary/pparams.go:143-150
Timestamp: 2025-11-04T15:54:22.683Z
Learning: In the blinklabs-io/gouroboros repository, the design goal for CBOR round-trip tests is to achieve byte-identical encoding WITHOUT using stored CBOR (cbor.DecodeStoreCbor). Instead, the approach uses proper field types (pointers for optional fields) and relies on the cbor package's deterministic encoding (SortCoreDeterministic) to ensure reproducible output. The stored CBOR pattern should not be suggested as a solution for round-trip fidelity in this codebase.
Applied to files:
protocol/localstatequery/client_test.go
🧬 Code graph analysis (16)
protocol/chainsync/error.go (1)
protocol/chainsync/chainsync.go (1)
New(259-267)
protocol/localtxmonitor/client.go (7)
protocol/protocol.go (1)
Protocol(39-60)protocol/blockfetch/client.go (1)
Client(30-41)protocol/leiosnotify/client.go (1)
Client(24-33)protocol/localstatequery/client.go (1)
Client(30-46)protocol/localtxsubmission/client.go (1)
Client(26-36)protocol/peersharing/client.go (1)
Client(25-35)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/localtxsubmission/client_test.go (4)
protocol/localstatequery/client_test.go (1)
TestClientShutdown(357-376)connection.go (1)
Connection(59-103)protocol/localtxsubmission/localtxsubmission.go (1)
LocalTxSubmission(67-70)protocol/localtxsubmission/client.go (1)
Client(26-36)
protocol/leiosnotify/client.go (5)
protocol/protocol.go (1)
Protocol(39-60)protocol/leiosfetch/client.go (1)
Client(26-37)protocol/localtxsubmission/client.go (1)
Client(26-36)protocol/peersharing/client.go (1)
Client(25-35)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/localtxmonitor/client_test.go (12)
protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-264)protocol/chainsync/client_test.go (1)
TestClientShutdown(283-336)protocol/keepalive/client_test.go (1)
TestClientShutdown(241-294)protocol/leiosnotify/client_test.go (1)
TestClientShutdown(56-106)protocol/localstatequery/client_test.go (1)
TestClientShutdown(357-376)protocol/localtxsubmission/client_test.go (1)
TestClientShutdown(167-186)protocol/peersharing/client_test.go (1)
TestClientShutdown(28-90)connection.go (3)
NewConnection(107-130)Connection(59-103)New(133-135)protocol/protocol.go (2)
ProtocolRoleClient(94-94)New(122-133)connection_options.go (2)
WithConnection(36-40)WithNetworkMagic(50-54)protocol/localtxmonitor/localtxmonitor.go (1)
LocalTxMonitor(112-115)protocol/localtxmonitor/client.go (1)
Client(25-40)
protocol/leiosfetch/client.go (3)
protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/blockfetch/client.go (1)
Client(30-41)protocol/chainsync/client.go (1)
Client(29-46)
protocol/peersharing/client.go (3)
protocol/protocol.go (1)
Protocol(39-60)protocol/peersharing/peersharing.go (1)
ProtocolName(27-27)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/txsubmission/server.go (5)
protocol/txsubmission/messages.go (1)
NewMsgRequestTxIds(68-82)protocol/protocol.go (1)
Protocol(39-60)protocol/txsubmission/txsubmission.go (2)
ProtocolName(27-27)MaxAckCount(143-143)protocol/error.go (2)
ErrProtocolViolationRequestExceeded(29-31)ErrProtocolShuttingDown(19-19)protocol/txsubmission/error.go (1)
ErrStopServerProcess(21-21)
protocol/blockfetch/client.go (3)
protocol/protocol.go (1)
New(122-133)muxer/muxer.go (1)
New(90-117)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/localtxsubmission/client.go (8)
protocol/protocol.go (1)
Protocol(39-60)protocol/blockfetch/client.go (1)
Client(30-41)protocol/chainsync/client.go (1)
Client(29-46)protocol/leiosnotify/client.go (1)
Client(24-33)protocol/localstatequery/client.go (1)
Client(30-46)protocol/localtxmonitor/client.go (1)
Client(25-40)protocol/peersharing/client.go (1)
Client(25-35)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/txsubmission/server_test.go (8)
protocol/peersharing/server_test.go (1)
TestServerShutdown(28-78)connection.go (2)
NewConnection(107-130)Connection(59-103)protocol/protocol.go (1)
ProtocolRoleServer(95-95)protocol/blockfetch/blockfetch.go (1)
New(156-162)protocol/chainsync/chainsync.go (1)
New(259-267)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/txsubmission/txsubmission.go (1)
TxSubmission(126-129)protocol/txsubmission/server.go (1)
Server(28-39)
protocol/localstatequery/client.go (5)
protocol/protocol.go (1)
Protocol(39-60)protocol/localstatequery/messages.go (4)
NewMsgDone(245-252)AcquireFailurePointNotOnChain(44-44)MsgResult(172-175)NewMsgQuery(160-170)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/localstatequery/error.go (2)
ErrAcquireFailurePointTooOld(20-20)ErrAcquireFailurePointNotOnChain(23-25)protocol/message.go (1)
Message(18-22)
protocol/localstatequery/client_test.go (3)
protocol/localtxsubmission/client_test.go (1)
TestClientShutdown(167-186)connection.go (1)
Connection(59-103)protocol/localstatequery/client.go (1)
Client(30-46)
protocol/leiosnotify/client_test.go (5)
protocol/versiondata.go (6)
VersionData(40-46)VersionDataNtN13andUp(143-145)VersionDataNtN11to12(116-122)DiffusionModeInitiatorOnly(21-21)PeerSharingModeNoPeerSharing(27-27)QueryModeDisabled(36-36)protocol/handshake/messages.go (1)
NewMsgAcceptVersion(88-102)connection.go (2)
NewConnection(107-130)Connection(59-103)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/leiosnotify/client.go (1)
Client(24-33)
protocol/peersharing/server.go (1)
protocol/message.go (1)
Message(18-22)
protocol/chainsync/client_test.go (5)
protocol/chainsync/chainsync.go (3)
ChainSync(204-207)New(259-267)NewConfig(273-295)protocol/chainsync/client.go (1)
Client(29-46)protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-264)connection.go (3)
NewConnection(107-130)Connection(59-103)New(133-135)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithChainSyncConfig(131-135)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (27)
protocol/localtxsubmission/client_test.go (1)
166-186: Shutdown test matches existing pattern and looks correctThe new
TestClientShutdownreuses the sharedrunTesthelper, asserts the client is non-nil, and exercisesStart()/Stop()without extra goroutines or leak risks. It’s consistent with theLocalStateQueryshutdown test and should give good coverage of the new lifecycle behavior.protocol/localtxmonitor/client.go (1)
38-40: Lifecycle and shutdown handling are materially safer now
- Adding
stateMutex+startedand taking the mutex in bothStart()andStop()removes the prior race on lifecycle state and keepsStart/Stoptransitions serialized.- Deferring channel closure until
<-c.DoneChan()whenstartedis true, and closing immediately only in the “never started” path, aligns channel lifetime with the protocol’s own shutdown and avoids premature closes.- Updating all handlers (
handleAcquired,handleReplyHasTx,handleReplyNextTx,handleReplyGetSizes) to select onc.DoneChan()and returnprotocol.ErrProtocolShuttingDownmakes writes shutdown-aware and prevents handlers from blindly sending into channels on teardown.- Callers already treat a closed result channel as
ErrProtocolShuttingDown, so the behavior is consistent end‑to‑end.This is a solid fix to the earlier “write to closed channel” risk while keeping the API behavior predictable during shutdown.
If you haven’t already, it’s worth running the local tx monitor tests (and, ideally,
go test -racefor this package) to confirm there are no remaining Start/Stop races under concurrent use.Also applies to: 87-101, 103-140, 272-288, 291-306, 308-323, 325-340
protocol/localtxmonitor/client_test.go (1)
300-352: Shutdown test is consistent and well-scoped
TestClientShutdownmirrors the shutdown tests used in other protocols (mock connection, async error watcher, timeouts, goleak). It validatesStart()/Stop()forLocalTxMonitorwithout over-constraining behavior and should catch regressions in lifecycle handling.protocol/leiosfetch/client.go (2)
17-21: Atomicstartedand Stop() semantics resolve prior race while aligning with other clientsSwitching
startedtoatomic.Booland usingStore/LoadinStart()/Stop()removes the earlier Start/Stop data race on this flag. The Stop logic that:
- always sends
MsgDone, and- closes the result channels only after
<-c.DoneChan()whenstarted.Load()is true (or immediately if it was never started),matches the pattern used in other protocols and keeps channel lifetime tied to protocol shutdown. Resetting
startedto false at the end makes the lifecycle state internally consistent even thoughonceStart/onceStopprevent reuse.Please re‑run the leiosfetch tests (and
go test -racefor this package) to confirm there are no remaining races around Start/Stop and channel closure under concurrent usage.Also applies to: 26-37, 91-102, 104-134
136-206: Result-channel consumers and handlers now behave correctly during shutdown
- All request methods (
BlockRequest,BlockTxsRequest,VotesRequest,BlockRangeRequest) now treat a closed result channel asprotocol.ErrProtocolShuttingDown, which is a clear and consistent signal to callers.- The handler functions now use
select { case <-c.DoneChan(): ...; case chan <- msg: }, so once the protocol is shutting down they stop enqueueing messages and instead returnErrProtocolShuttingDown.Together, this prevents late handler sends into closing/closed channels and gives callers a well-defined error path when shutdown races with in-flight requests.
Consider adding or extending a
TestClientShutdownforLeiosFetchsimilar to the other protocols, if not already present in this PR, to validate this behavior end‑to‑end with the mock connection.Also applies to: 231-274
protocol/leiosnotify/client.go (1)
24-33: LeiosNotify client lifecycle and handler behavior are now shutdown-safe
- Adding
stateMutexandstartedand taking the mutex in bothStart()andStop()gives a clear, race‑free lifecycle state for the client.Stop()now ties closure ofrequestNextChanto protocol shutdown (<-c.DoneChan()) when started, while still handling the “never started” case by closing immediately.- Updating all handler functions to select on
c.DoneChan()and to returnprotocol.ErrProtocolShuttingDowninstead of blindly sending intorequestNextChancloses the hole where handlers could write into a closed channel.This brings LeiosNotify in line with the other protocols’ shutdown model and removes the previously flagged “write to closed channel” risk.
It would be good to confirm via the existing
TestClientShutdown(and optionallygo test -race ./protocol/leiosnotify) that no Start/Stop races or late handler sends remain under concurrent conditions.Also applies to: 72-86, 88-114, 150-184
protocol/leiosnotify/client_test.go (1)
56-106: LGTM! Test follows established shutdown patterns.The test properly validates the LeiosNotify client shutdown behavior using mock connections, asynchronous error handling, and goroutine leak detection. Good to see this test enabled after addressing the previous protocol initialization issues.
protocol/localtxsubmission/client.go (5)
34-35: LGTM! Lifecycle tracking fields added.The
stateMutexandstartedfields follow the established pattern for lifecycle management across protocol clients, enabling proper coordination between Start and Stop operations.
76-90: LGTM! Start() properly tracks lifecycle state.The method correctly acquires the mutex, sets the
startedflag, and starts the underlying protocol. The synchronization ensures safe concurrent access to the lifecycle state.
92-123: LGTM! Stop() safely coordinates channel lifecycle.The conditional channel closure based on the
startedflag prevents closing channels before the protocol has fully shut down, eliminating potential race conditions. The deferred closure afterDoneChan()fires ensures orderly cleanup.
164-178: LGTM! Shutdown-aware channel send.The select statement prevents sending on a closed channel by checking
DoneChan()first, returningErrProtocolShuttingDownif the protocol is shutting down.
180-203: LGTM! Consistent shutdown handling.The reject handler follows the same shutdown-aware pattern as
handleAcceptTx, ensuring consistent behavior across message handlers.protocol/localstatequery/client.go (7)
38-38: LGTM! Proper synchronization primitives added.The
acquiredMutexprotects theacquiredstate, whileonceStopandstartedenable safe lifecycle management. This follows the established pattern across protocol clients.Also applies to: 44-45
101-112: LGTM! Start() tracks lifecycle state.Setting
started = trueenables proper coordination with the Stop() method for conditional channel cleanup.
114-140: LGTM! Stop() properly manages channel lifecycle.The conditional channel closure based on
startedprevents closing channels when the protocol hasn't been started, and defers closure until afterDoneChan()fires to ensure orderly shutdown.
896-914: LGTM! Thread-safe state update with shutdown awareness.The method properly protects the
acquiredflag withacquiredMutexand uses a select statement to avoid sending on channels during shutdown.
916-942: LGTM! Failure handling with shutdown awareness.Both failure cases properly check for shutdown before sending errors to
acquireResultChan, preventing sends on closed channels.
944-959: LGTM! Result handling with shutdown awareness.The select statement ensures results are only sent when the protocol is not shutting down.
961-997: LGTM! Consistent mutex protection for acquired state.The
acquiredMutexis properly used to protect reads and writes to theacquiredflag acrossacquire(),release(), andrunQuery()methods, preventing race conditions.Also applies to: 999-1009, 1011-1032
protocol/localstatequery/client_test.go (1)
357-376: LGTM! Clean shutdown test.The test properly validates the LocalStateQuery client shutdown behavior, following the established pattern used across other protocol client tests.
protocol/chainsync/error.go (1)
24-26: LGTM! Clear sentinel error for sync cancellation.The new
ErrSyncCancellederror provides an explicit signal for cancelled sync operations, complementing the existing error types and improving error handling clarity.protocol/blockfetch/blockfetch.go (1)
122-122: LGTM! Increased buffer size aligns with PR objectives.The increase from 256 to 384 provides better buffering capacity while remaining well below the maximum of 512, consistent with the PR's goal to improve performance through larger default queue sizes.
protocol/peersharing/server.go (1)
109-109: LGTM! Idiomatic unused parameter naming.Using
_for the unused message parameter is the standard Go idiom, making it explicit that the message content is intentionally not used by this handler.protocol/chainsync/client_test.go (2)
83-86: LGTM! Proper test cleanup prevents goroutine leaks.Adding the client stop call in
runTest()ensures proper cleanup after each test, preventing goroutine leaks that could affect test reliability.
283-336: LGTM! Comprehensive shutdown test.The test properly validates ChainSync client shutdown behavior using the established pattern: mock connection setup, asynchronous error handling, goroutine leak detection, and appropriate timeouts.
protocol/peersharing/client.go (1)
19-35: PeerSharing client lifecycle and shutdown logic look solidThe Start/Stop implementation (with
onceStart/onceStop,stateMutex, and thestarted/stoppedflags) plus theDoneChan‑aware select inhandleSharePeersgives you clean, idempotent startup and safe shutdown without send‑on‑closed panics. The Stop‑before‑Start path is also handled sensibly by closingsharePeersChanimmediately.Also applies to: 75-123, 168-172
protocol/blockfetch/client.go (1)
40-41: BlockFetch client shutdown semantics correctly avoid send‑on‑closed panicsUsing
startedplus deferring channel closure untilDoneChan()fires (when started) and addingDoneChan‑aware selects inhandleStartBatch,handleNoBlocks, and the non‑callbackhandleBlockpath makes the shutdown path safe even with in‑flight responses. The GetBlock/GetBlockRange callers also get a cleanErrProtocolShuttingDownwhen channels are closed.Also applies to: 89-101, 103-129, 225-238, 243-258, 286-311
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
protocol/localtxmonitor/client_test.go (1)
300-352: Refactor to use existingrunTesthelper.This test duplicates the scaffolding logic already provided by the
runTesthelper (lines 50-106). For consistency and maintainability, use the helper pattern as demonstrated in other protocol tests (e.g.,protocol/localstatequery/client_test.go:357-376).Apply this refactor:
func TestClientShutdown(t *testing.T) { - defer goleak.VerifyNone(t) - mockConn := ouroboros_mock.NewConnection( - ouroboros_mock.ProtocolRoleClient, + runTest( + t, []ouroboros_mock.ConversationEntry{ ouroboros_mock.ConversationEntryHandshakeRequestGeneric, ouroboros_mock.ConversationEntryHandshakeNtCResponse, }, - ) - asyncErrChan := make(chan error, 1) - go func() { - err := <-mockConn.(*ouroboros_mock.Connection).ErrorChan() - if err != nil { - asyncErrChan <- fmt.Errorf("received unexpected error: %w", err) - } - close(asyncErrChan) - }() - oConn, err := ouroboros.New( - ouroboros.WithConnection(mockConn), - ouroboros.WithNetworkMagic(ouroboros_mock.MockNetworkMagic), - ) - if err != nil { - t.Fatalf("unexpected error when creating Ouroboros object: %s", err) - } - if oConn.LocalTxMonitor() == nil { - t.Fatalf("LocalTxMonitor client is nil") - } - // Start the client - oConn.LocalTxMonitor().Client.Start() - // Stop the client - if err := oConn.LocalTxMonitor().Client.Stop(); err != nil { - t.Fatalf("unexpected error when stopping client: %s", err) - } - // Wait for mock connection shutdown - select { - case err, ok := <-asyncErrChan: - if ok { - t.Fatal(err.Error()) - } - case <-time.After(2 * time.Second): - t.Fatalf("did not complete within timeout") - } - // Close Ouroboros connection - if err := oConn.Close(); err != nil { - t.Fatalf("unexpected error when closing Ouroboros object: %s", err) - } - // Wait for connection shutdown - select { - case <-oConn.ErrorChan(): - case <-time.After(10 * time.Second): - t.Errorf("did not shutdown within timeout") - } + func(t *testing.T, oConn *ouroboros.Connection) { + if oConn.LocalTxMonitor() == nil { + t.Fatalf("LocalTxMonitor client is nil") + } + // Start the client + oConn.LocalTxMonitor().Client.Start() + // Stop the client + if err := oConn.LocalTxMonitor().Client.Stop(); err != nil { + t.Fatalf("unexpected error when stopping client: %s", err) + } + }, + ) }protocol/txsubmission/server.go (1)
299-330: Consider holding restartMutex until after Start() to prevent TOCTOU race.There's a narrow window between unlocking
restartMutex(line 328) and callingStart()(line 330) whereStop()could execute for the first time, settingstopped=trueand closing the newly createddonechannel. This would cause the new protocol instance to start with an already-closed done channel, making it immediately "shut down."While safe (operations will return shutdown errors), it's wasteful to start a protocol instance that's immediately non-functional.
Options to fix:
- Keep
restartMutexlocked until afterStart()completes (Start() doesn't acquirerestartMutex, so no deadlock risk)- Re-check
stoppedflag after unlocking and beforeStart(), returning early if true- Accept this as a benign race (current approach per line 329 comment)
Given the comment at line 329 indicates the unlock is intentional for responsiveness, you may prefer option 2 or 3.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (26)
protocol/blockfetch/blockfetch.go(1 hunks)protocol/blockfetch/client.go(7 hunks)protocol/blockfetch/client_test.go(1 hunks)protocol/chainsync/chainsync.go(1 hunks)protocol/chainsync/client.go(6 hunks)protocol/chainsync/client_concurrency_test.go(1 hunks)protocol/chainsync/client_test.go(2 hunks)protocol/chainsync/error.go(1 hunks)protocol/keepalive/client.go(2 hunks)protocol/keepalive/client_test.go(1 hunks)protocol/leiosfetch/client.go(5 hunks)protocol/leiosnotify/client.go(4 hunks)protocol/leiosnotify/client_test.go(1 hunks)protocol/localstatequery/client.go(6 hunks)protocol/localstatequery/client_test.go(2 hunks)protocol/localtxmonitor/client.go(7 hunks)protocol/localtxmonitor/client_test.go(1 hunks)protocol/localtxsubmission/client.go(5 hunks)protocol/localtxsubmission/client_test.go(1 hunks)protocol/peersharing/client.go(4 hunks)protocol/peersharing/client_test.go(1 hunks)protocol/peersharing/server.go(3 hunks)protocol/peersharing/server_test.go(1 hunks)protocol/txsubmission/server.go(8 hunks)protocol/txsubmission/server_concurrency_test.go(1 hunks)protocol/txsubmission/server_test.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
- protocol/keepalive/client_test.go
- protocol/peersharing/server.go
- protocol/blockfetch/blockfetch.go
- protocol/blockfetch/client_test.go
- protocol/localtxsubmission/client_test.go
- protocol/chainsync/client_test.go
- protocol/chainsync/error.go
- protocol/leiosnotify/client.go
- protocol/txsubmission/server_concurrency_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-04T15:54:22.683Z
Learnt from: wolf31o2
Repo: blinklabs-io/gouroboros PR: 1093
File: ledger/mary/pparams.go:143-150
Timestamp: 2025-11-04T15:54:22.683Z
Learning: In the blinklabs-io/gouroboros repository, the design goal for CBOR round-trip tests is to achieve byte-identical encoding WITHOUT using stored CBOR (cbor.DecodeStoreCbor). Instead, the approach uses proper field types (pointers for optional fields) and relies on the cbor package's deterministic encoding (SortCoreDeterministic) to ensure reproducible output. The stored CBOR pattern should not be suggested as a solution for round-trip fidelity in this codebase.
Applied to files:
protocol/localstatequery/client_test.go
🧬 Code graph analysis (16)
protocol/blockfetch/client.go (4)
protocol/protocol.go (2)
Protocol(39-60)New(122-133)protocol/blockfetch/blockfetch.go (1)
New(156-162)muxer/muxer.go (1)
New(90-117)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/chainsync/client_concurrency_test.go (4)
connection.go (1)
NewConnection(107-130)protocol/chainsync/chainsync.go (3)
New(259-267)NewConfig(273-295)ChainSync(204-207)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithChainSyncConfig(131-135)protocol/chainsync/client.go (1)
Client(29-46)
protocol/leiosnotify/client_test.go (5)
protocol/versiondata.go (6)
VersionData(40-46)VersionDataNtN13andUp(143-145)VersionDataNtN11to12(116-122)DiffusionModeInitiatorOnly(21-21)PeerSharingModeNoPeerSharing(27-27)QueryModeDisabled(36-36)protocol/handshake/messages.go (1)
NewMsgAcceptVersion(88-102)connection.go (1)
Connection(59-103)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/leiosnotify/client.go (1)
Client(24-33)
protocol/leiosfetch/client.go (4)
protocol/protocol.go (1)
Protocol(39-60)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/blockfetch/client.go (1)
Client(30-41)protocol/chainsync/client.go (1)
Client(29-46)
protocol/localtxmonitor/client_test.go (5)
protocol/chainsync/client_test.go (1)
TestClientShutdown(283-336)connection.go (3)
NewConnection(107-130)Connection(59-103)New(133-135)protocol/protocol.go (2)
ProtocolRoleClient(94-94)New(122-133)connection_options.go (2)
WithConnection(36-40)WithNetworkMagic(50-54)protocol/localtxmonitor/client.go (1)
Client(25-40)
protocol/chainsync/client.go (3)
protocol/protocol.go (1)
Protocol(39-60)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/chainsync/error.go (1)
ErrSyncCancelled(26-26)
protocol/peersharing/server_test.go (5)
protocol/txsubmission/server_test.go (1)
TestServerShutdown(28-82)connection.go (2)
NewConnection(107-130)Connection(59-103)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/peersharing/peersharing.go (1)
PeerSharing(67-70)protocol/peersharing/server.go (1)
Server(25-30)
protocol/peersharing/client_test.go (5)
connection.go (2)
NewConnection(107-130)Connection(59-103)protocol/protocol.go (1)
ProtocolRoleClient(94-94)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/peersharing/peersharing.go (1)
PeerSharing(67-70)protocol/peersharing/client.go (1)
Client(25-35)
protocol/localstatequery/client_test.go (4)
protocol/localtxsubmission/client_test.go (1)
TestClientShutdown(167-186)connection.go (1)
Connection(59-103)protocol/localstatequery/localstatequery.go (1)
LocalStateQuery(116-119)protocol/localstatequery/client.go (1)
Client(30-47)
protocol/localtxsubmission/client.go (5)
protocol/protocol.go (1)
Protocol(39-60)protocol/blockfetch/client.go (1)
Client(30-41)protocol/leiosnotify/client.go (1)
Client(24-33)protocol/localstatequery/client.go (1)
Client(30-47)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/txsubmission/server_test.go (6)
protocol/peersharing/server_test.go (1)
TestServerShutdown(28-82)connection.go (2)
NewConnection(107-130)Connection(59-103)protocol/protocol.go (1)
ProtocolRoleServer(95-95)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/txsubmission/txsubmission.go (1)
TxSubmission(126-129)protocol/txsubmission/server.go (1)
Server(28-41)
protocol/localstatequery/client.go (7)
protocol/protocol.go (1)
Protocol(39-60)protocol/leiosfetch/client.go (1)
Client(26-38)protocol/leiosnotify/client.go (1)
Client(24-33)protocol/localtxmonitor/client.go (1)
Client(25-40)protocol/localtxsubmission/client.go (1)
Client(26-36)protocol/localstatequery/messages.go (1)
NewMsgDone(245-252)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/peersharing/client.go (4)
protocol/protocol.go (1)
Protocol(39-60)protocol/peersharing/peersharing.go (1)
ProtocolName(27-27)connection/id.go (1)
ConnectionId(22-25)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/localtxmonitor/client.go (6)
protocol/protocol.go (1)
Protocol(39-60)protocol/leiosnotify/client.go (1)
Client(24-33)protocol/localstatequery/client.go (1)
Client(30-47)protocol/localtxsubmission/client.go (1)
Client(26-36)protocol/peersharing/client.go (1)
Client(25-35)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/keepalive/client.go (5)
protocol/blockfetch/client.go (1)
Client(30-41)protocol/protocol.go (1)
Protocol(39-60)protocol/keepalive/keepalive.go (1)
ProtocolName(27-27)connection/id.go (1)
ConnectionId(22-25)protocol/keepalive/messages.go (1)
NewMsgDone(94-101)
protocol/txsubmission/server.go (5)
protocol/txsubmission/messages.go (2)
TxBody(197-201)NewMsgRequestTxIds(68-82)protocol/protocol.go (1)
Protocol(39-60)protocol/txsubmission/txsubmission.go (2)
ProtocolName(27-27)MaxAckCount(143-143)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/txsubmission/error.go (1)
ErrStopServerProcess(21-21)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (42)
protocol/localstatequery/client_test.go (1)
357-376: LGTM!The test validates the basic shutdown flow for the LocalStateQuery client, following the established pattern used across other protocol tests in this PR.
protocol/keepalive/client.go (1)
98-113: LGTM!The
Stop()method follows the established pattern across protocol clients: sends a Done message, ensures proper protocol shutdown by callingProtocol.Stop(), and provides idempotent behavior viasync.Once.protocol/leiosnotify/client_test.go (1)
56-109: LGTM!The test validates LeiosNotify client shutdown behavior with proper NtN protocol setup. The previously reported initialization issues (per past review comments) have been resolved.
protocol/leiosfetch/client.go (4)
32-33: LGTM!The atomic lifecycle flags prevent data races between
Start()andStop()calls, addressing previous review concerns. Thestoppedflag ensuresStart()cannot be called afterStop().
92-107: LGTM!The
Start()method correctly prevents re-starting after shutdown by checking thestoppedflag and uses atomic operations to set thestartedstate.
109-140: LGTM!The
Stop()method correctly handles channel lifecycle: deferring closure until protocol shutdown when started (avoiding panics from in-flight messages), or closing immediately when never started (preventing goroutine leaks).
237-279: LGTM!All message handlers correctly use non-blocking sends with shutdown detection via
selectonDoneChan(), preventing deadlocks during protocol shutdown.protocol/blockfetch/client.go (3)
40-40: LGTM!The atomic
startedflag correctly tracks protocol lifecycle state and prevents data races betweenStart()andStop().Also applies to: 98-98
104-132: LGTM!The
Stop()method correctly callsProtocol.Stop()(addressing previous deadlock concerns) and conditionally defers channel closure based on whether the protocol was started, preventing both panics from in-flight messages and goroutine leaks.
237-241: LGTM!Message handlers correctly use non-blocking sends with shutdown detection, preventing deadlocks when
Stop()is called while messages are in flight.Also applies to: 254-259, 309-313
protocol/localstatequery/client.go (5)
38-38: LGTM!The new synchronization fields correctly address data race concerns identified in past reviews:
stateMutexprotects thestartedflag andacquiredMutexguards theacquiredstate.Also applies to: 44-46
102-115: LGTM!The
Start()method correctly guards thestartedflag withstateMutex, preventing data races withStop().
117-146: LGTM!The
Stop()method correctly reads thestartedflag understateMutexand conditionally manages channel lifecycle: deferring closure until shutdown if started, or closing immediately if never started.
910-912: LGTM!All accesses to the
acquiredstate are correctly guarded byacquiredMutex, preventing data races across concurrent operations.Also applies to: 968-970, 1010-1012, 1019-1021
933-943: LGTM!Message handlers correctly use non-blocking sends with shutdown detection via
selectonDoneChan(), preventing deadlocks during protocol shutdown.Also applies to: 958-963
protocol/chainsync/chainsync.go (1)
226-227: LGTM!The 50% increase in default queue sizes (50 → 75) provides better buffering for improved throughput while maintaining a safe margin below the protocol-specified maximums (100). This aligns with the PR's performance improvement objectives.
protocol/peersharing/client_test.go (1)
28-90: LGTM! Well-structured shutdown test.The test properly validates the PeerSharing client shutdown behavior with appropriate error handling, timeout guards, and goroutine leak detection.
protocol/txsubmission/server_test.go (1)
28-82: LGTM! Test scaffolding ready for when mock is fixed.The test is properly skipped with a clear reason. The structure mirrors other shutdown tests and will provide coverage once the mock server issues are resolved.
protocol/peersharing/server_test.go (1)
60-62: Error handling properly implemented.The
Stop()error is now correctly checked, addressing the previous review feedback. The pattern matches other shutdown tests.protocol/localtxmonitor/client.go (4)
38-39: LGTM! Lifecycle state tracking added.The
stateMutexandstartedflag provide proper synchronization for the client lifecycle, consistent with the broader refactoring pattern.
87-100: LGTM! Proper startup synchronization.The
Start()method correctly uses the mutex to protect thestartedflag and ensures thread-safe initialization.
104-140: LGTM! Shutdown properly synchronized.The
Stop()method correctly handles two scenarios:
- If started: defers channel closing until protocol shutdown completes
- If never started: closes channels immediately
This eliminates the race condition where handlers could write to closed channels.
283-287: LGTM! Handlers properly synchronized with shutdown.All message handlers now use
selectstatements that checkDoneChan()before writing to result channels, preventing panics from writing to closed channels during shutdown.Also applies to: 300-304, 317-321, 334-338
protocol/peersharing/client.go (3)
75-95: LGTM! Start() properly prevents restart after stop.The guard against starting a stopped client (lines 81-84) ensures clean lifecycle management and prevents unexpected behavior.
97-123: LGTM! Stop() properly coordinates channel lifecycle.The conditional channel closing based on the
startedflag ensures channels are closed only after protocol shutdown completes, preventing handler panics.
168-173: LGTM! Handler respects shutdown signal.The
handleSharePeersmethod properly checksDoneChan()before sending tosharePeersChan, preventing writes to closed channels.protocol/chainsync/client_concurrency_test.go (2)
30-103: LGTM! Thorough concurrency testing.The test validates that concurrent
Start()/Stop()operations don't cause deadlocks or races, with appropriate timeout guards and leak detection.
106-148: LGTM! Important edge case coverage.The test validates that calling
Stop()beforeStart()doesn't cause panics or deadlocks, ensuring robust lifecycle handling.protocol/localtxsubmission/client.go (3)
76-90: LGTM! Consistent startup pattern.The
Start()method follows the established pattern with proper mutex protection for the lifecycle state.
93-124: LGTM! Proper shutdown sequencing.The
Stop()method correctly:
- Sends
MsgDoneto signal completion- Calls
Protocol.Stop()to initiate shutdown- Defers channel closing until protocol fully shuts down (if started)
This sequence prevents handlers from writing to closed channels.
173-178: LGTM! Handlers properly synchronized.Both
handleAcceptTxandhandleRejectTxnow useselectstatements to check for shutdown before writing tosubmitResultChan, eliminating the TOCTOU race condition.Also applies to: 198-202
protocol/chainsync/client.go (5)
37-37: LGTM! Appropriate use of atomic for lifecycle flag.Using
atomic.Boolfor thestartedflag provides lock-free synchronization for simple boolean state tracking.
119-130: LGTM! Proper atomic flag update.The
Start()method correctly usesstarted.Store(true)to atomically set the lifecycle state.
133-160: LGTM! Shutdown properly handles channel lifecycle.The
Stop()method usesstarted.Load()to atomically check state and conditionally defers channel closing until protocol shutdown completes, preventing handler panics.
342-350: LGTM! Proper handling of channel closure and cancellation.The updated receive operation correctly handles:
- Closed channel (!ok) → protocol shutdown
- False value (!ready) → sync cancellation
This prevents unexpected behavior when the channel closes during operation.
737-742: LGTM! Handlers properly synchronized with shutdown.All writes to
readyForNextBlockChaninhandleRollForwardandhandleRollBackwardnow useselectstatements that checkDoneChan()first, eliminating the race condition where handlers could write to a closed channel.Also applies to: 748-752, 776-780, 788-792
protocol/txsubmission/server.go (6)
33-41: LGTM: Well-designed concurrency primitives for lifecycle management.The introduction of
ackCountasint32,donechannel withdoneMutex,onceStop,restartMutex, and thestoppedflag provides a solid foundation for coordinating shutdown, restart, and preventing data races. This addresses the critical issues raised in previous reviews.
49-64: LGTM: Proper initialization of lifecycle channels.The buffered result channels (capacity 1) enable non-blocking handoff during restart, and the done channel is properly initialized.
94-116: LGTM: Robust shutdown with proper synchronization.The
Stop()method correctly usesonceStopfor idempotent execution, acquiresrestartMutexfor coordination with restart, and protects the done channel close withdoneMutexand a select guard to prevent double-close panics.
118-130: LGTM: Safe accessors with proper locking.The
doneChan()andIsStopped()helpers provide thread-safe access to the done channel and stopped flag respectively, with appropriate mutex protection.
133-193: LGTM: Shutdown-aware request handling with atomic ackCount.The method properly uses atomic operations for
ackCountaccess, implements shutdown awareness viadoneChan(), and gracefully handles closed channels. The validation and bounds checking are appropriate.
196-225: LGTM: Proper shutdown handling in RequestTxs.The method correctly uses
doneChan()for shutdown awareness and handles closed channels appropriately.
91e04e4 to
1c271c7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
protocol/leiosfetch/client.go (1)
109-140: Stop() should also call Protocol.Stop() to ensure shutdown and channel closure complete
Stop()sendsNewMsgDone()and defers closing result channels until<-c.DoneChan()when started. However, it never callsc.Protocol.Stop(), soDoneChan()may never close if the remote side misbehaves or doesn’t drive the protocol to completion. In that case:
- The goroutine spawned in
Stop()can block indefinitely on<-c.DoneChan().- Callers waiting on the result channels may hang instead of seeing
ErrProtocolShuttingDown.Align this with other clients by stopping the underlying protocol after a successful Done send:
func (c *Client) Stop() error { var err error c.onceStop.Do(func() { c.Protocol.Logger(). Debug("stopping client protocol", "component", "network", "protocol", ProtocolName, "connection_id", c.callbackContext.ConnectionId.String(), ) msg := NewMsgDone() - err = c.SendMessage(msg) + if err = c.SendMessage(msg); err != nil { + return + } + c.Protocol.Stop() // Defer closing channels until protocol fully shuts down (only if started) if c.started.Load() { go func() { <-c.DoneChan() close(c.blockResultChan) close(c.blockTxsResultChan) close(c.votesResultChan) close(c.blockRangeResultChan) }() } else { // If protocol was never started, close channels immediately close(c.blockResultChan) close(c.blockTxsResultChan) close(c.votesResultChan) close(c.blockRangeResultChan) } c.started.Store(false) c.stopped.Store(true) }) return err }protocol/txsubmission/server.go (1)
276-335: Fix remaining data race ons.stoppedin handleDone’s second checkThe restart path in
handleDonecorrectly usesrestartMutexto protect most accesses tos.stopped, but the second check:s.restartMutex.Unlock() // Check again if permanent stop has been requested (TOCTOU protection) if s.stopped { return nil }reads
s.stoppedwithout holdingrestartMutex, racing withStop()which writess.stoppedunder the lock. This is a real data race even if it’s only used for TOCTOU protection.Use the existing
IsStopped()helper (which takesrestartMutex) or keep the check under the lock. For example:- s.restartMutex.Unlock() - // Check again if permanent stop has been requested (TOCTOU protection) - if s.stopped { - return nil - } - // Start the new protocol outside the lock for better responsiveness - s.Start() + s.restartMutex.Unlock() + // Check again if permanent stop has been requested (TOCTOU protection) + if s.IsStopped() { + return nil + } + // Start the new protocol outside the lock for better responsiveness + s.Start()This keeps the extra check while eliminating the unsynchronized access.
🧹 Nitpick comments (5)
protocol/keepalive/client.go (1)
33-35: KeepAlive.Stop implementation is straightforward and idempotentThe new Stop method cleanly wraps shutdown: it’s guarded by
onceStop, logs, sendsMsgDone, and calls the underlyingProtocol.Stop(), with the error remembered instopErrfor subsequent calls. That matches the lifecycle used elsewhere in the repo.If you want to be extra defensive, you could also stop the keep‑alive timer inside
Stop()(undertimerMutex) before sendingMsgDone, to avoid any last scheduled send between Stop and protocol shutdown, but the existing DoneChan‑driven cleanup already prevents leaks.Also applies to: 98-113
protocol/localtxmonitor/client.go (1)
38-40: Shutdown handling is now much safer for LocalTxMonitorThe new
stateMutex/startedgating plus the revisedStart/Stoplogic and handlerselectblocks address the earlier race where handlers could write to closed channels. Channels are now only closed afterDoneChan()fires (or immediately if never started), and handlers bail out withErrProtocolShuttingDowninstead of sending once shutdown is in progress. The use ofbusyMutexaround Stop’s SendMessage also prevents overlap with in‑flight client calls likeHasTx/NextTx/GetSizes.If you ever want to harden the API against misuse, you could consider early‑rejecting client calls (e.g.,
HasTx) whenStop()has already run, rather than relying solely on the channel‑close path, but the current behavior is functionally correct.Also applies to: 87-101, 103-140, 283-287, 300-304, 317-321, 334-338
protocol/txsubmission/server_concurrency_test.go (1)
137-148: Weak verification: test doesn't exercise restart prevention logic.The test only verifies the
stoppedflag is set but doesn't trigger the actual restart path (viahandleDone) to confirm prevention works. Lines 146-148 repeat the same check without any intervening action that could trigger a restart, making the second assertion redundant.Consider deferring this test until the mock infrastructure supports triggering
handleDone, or restructure to simulate a Done message that would normally trigger restart.protocol/peersharing/client_test.go (2)
69-76: Consider increasing the timeout for CI stability.The 2-second timeout for mock connection shutdown might be too aggressive for slow CI environments or heavily loaded systems, potentially causing test flakiness. Consider increasing it to 5 seconds to align better with the later 10-second timeout.
Apply this diff to increase the timeout:
- case <-time.After(2 * time.Second): + case <-time.After(5 * time.Second): t.Fatalf("did not complete within timeout")
57-65: Consider using error channel instead of panic for better test failure reporting.While the panic approach is documented, it can make test failures harder to debug. Consider capturing the error in a channel and checking it in the main test goroutine for cleaner test output.
Here's an alternative pattern:
+ oConnErrChan := make(chan error, 1) // Async error handler go func() { err, ok := <-oConn.ErrorChan() if !ok { return } - // We can't call t.Fatalf() from a different Goroutine, so we panic instead - panic(fmt.Sprintf("unexpected Ouroboros error: %s", err)) + oConnErrChan <- fmt.Errorf("unexpected Ouroboros error: %w", err) }() // Run test inner function innerFunc(t, oConn) + // Check for errors from Ouroboros connection + select { + case err := <-oConnErrChan: + t.Fatal(err.Error()) + default: + } // Wait for mock connection shutdown
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (27)
protocol/blockfetch/blockfetch.go(1 hunks)protocol/blockfetch/client.go(7 hunks)protocol/blockfetch/client_test.go(1 hunks)protocol/chainsync/chainsync.go(1 hunks)protocol/chainsync/client.go(6 hunks)protocol/chainsync/client_concurrency_test.go(1 hunks)protocol/chainsync/client_test.go(2 hunks)protocol/chainsync/error.go(1 hunks)protocol/keepalive/client.go(2 hunks)protocol/keepalive/client_test.go(1 hunks)protocol/leiosfetch/client.go(5 hunks)protocol/leiosnotify/client.go(4 hunks)protocol/leiosnotify/client_test.go(1 hunks)protocol/localstatequery/client.go(6 hunks)protocol/localstatequery/client_test.go(2 hunks)protocol/localtxmonitor/client.go(7 hunks)protocol/localtxmonitor/client_test.go(1 hunks)protocol/localtxsubmission/client.go(5 hunks)protocol/localtxsubmission/client_test.go(1 hunks)protocol/localtxsubmission/localtxsubmission.go(2 hunks)protocol/peersharing/client.go(4 hunks)protocol/peersharing/client_test.go(1 hunks)protocol/peersharing/server.go(3 hunks)protocol/peersharing/server_test.go(1 hunks)protocol/txsubmission/server.go(8 hunks)protocol/txsubmission/server_concurrency_test.go(1 hunks)protocol/txsubmission/server_test.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- protocol/chainsync/client_concurrency_test.go
- protocol/blockfetch/client.go
- protocol/chainsync/chainsync.go
- protocol/peersharing/client.go
- protocol/chainsync/client_test.go
- protocol/chainsync/error.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-04T15:54:22.683Z
Learnt from: wolf31o2
Repo: blinklabs-io/gouroboros PR: 1093
File: ledger/mary/pparams.go:143-150
Timestamp: 2025-11-04T15:54:22.683Z
Learning: In the blinklabs-io/gouroboros repository, the design goal for CBOR round-trip tests is to achieve byte-identical encoding WITHOUT using stored CBOR (cbor.DecodeStoreCbor). Instead, the approach uses proper field types (pointers for optional fields) and relies on the cbor package's deterministic encoding (SortCoreDeterministic) to ensure reproducible output. The stored CBOR pattern should not be suggested as a solution for round-trip fidelity in this codebase.
Applied to files:
protocol/localstatequery/client_test.go
🧬 Code graph analysis (20)
protocol/localtxsubmission/localtxsubmission.go (1)
protocol/localtxsubmission/messages.go (1)
MessageTypeDone(29-29)
protocol/localtxmonitor/client_test.go (3)
protocol/chainsync/client_test.go (1)
TestClientShutdown(283-302)connection.go (1)
Connection(59-103)protocol/localtxmonitor/client.go (1)
Client(25-40)
protocol/localtxsubmission/client.go (8)
protocol/protocol.go (1)
Protocol(39-60)protocol/blockfetch/client.go (1)
Client(30-41)protocol/chainsync/client.go (1)
Client(29-46)protocol/leiosnotify/client.go (1)
Client(24-33)protocol/localstatequery/client.go (1)
Client(30-47)protocol/localtxmonitor/client.go (1)
Client(25-40)protocol/peersharing/client.go (1)
Client(25-35)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/localstatequery/client_test.go (4)
protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-230)protocol/chainsync/client_test.go (1)
TestClientShutdown(283-302)connection.go (1)
Connection(59-103)protocol/localstatequery/client.go (1)
Client(30-47)
protocol/peersharing/server.go (1)
protocol/protocol.go (1)
Protocol(39-60)
protocol/localtxsubmission/client_test.go (10)
protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-230)protocol/chainsync/client_test.go (1)
TestClientShutdown(283-302)protocol/keepalive/client_test.go (1)
TestClientShutdown(303-322)protocol/leiosnotify/client_test.go (1)
TestClientShutdown(117-136)protocol/localstatequery/client_test.go (1)
TestClientShutdown(357-376)protocol/localtxmonitor/client_test.go (1)
TestClientShutdown(300-319)protocol/peersharing/client_test.go (1)
TestClientShutdown(89-108)connection.go (1)
Connection(59-103)protocol/localtxsubmission/localtxsubmission.go (1)
LocalTxSubmission(75-78)protocol/localtxsubmission/client.go (1)
Client(26-36)
protocol/localtxmonitor/client.go (3)
protocol/protocol.go (1)
Protocol(39-60)protocol/localstatequery/client.go (1)
Client(30-47)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/chainsync/client.go (3)
protocol/protocol.go (1)
Protocol(39-60)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/chainsync/error.go (1)
ErrSyncCancelled(26-26)
protocol/txsubmission/server_concurrency_test.go (5)
connection.go (1)
NewConnection(107-130)protocol/protocol.go (1)
ProtocolRoleServer(95-95)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/txsubmission/txsubmission.go (1)
TxSubmission(126-129)protocol/txsubmission/server.go (1)
Server(28-41)
protocol/txsubmission/server_test.go (6)
protocol/peersharing/server_test.go (1)
TestServerShutdown(28-82)connection.go (2)
NewConnection(107-130)Connection(59-103)protocol/protocol.go (1)
ProtocolRoleServer(95-95)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/txsubmission/txsubmission.go (1)
TxSubmission(126-129)protocol/txsubmission/server.go (1)
Server(28-41)
protocol/peersharing/server_test.go (6)
protocol/txsubmission/server_test.go (1)
TestServerShutdown(28-82)connection.go (2)
NewConnection(107-130)Connection(59-103)protocol/protocol.go (1)
ProtocolRoleServer(95-95)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/peersharing/peersharing.go (1)
PeerSharing(67-70)protocol/peersharing/server.go (1)
Server(25-30)
protocol/keepalive/client_test.go (5)
connection.go (3)
Connection(59-103)NewConnection(107-130)New(133-135)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-230)protocol/keepalive/keepalive.go (1)
KeepAlive(85-88)protocol/keepalive/client.go (1)
Client(26-35)
protocol/leiosnotify/client_test.go (11)
protocol/versiondata.go (6)
VersionData(40-46)VersionDataNtN13andUp(143-145)VersionDataNtN11to12(116-122)DiffusionModeInitiatorOnly(21-21)PeerSharingModeNoPeerSharing(27-27)QueryModeDisabled(36-36)protocol/handshake/messages.go (1)
NewMsgAcceptVersion(88-102)connection.go (2)
Connection(59-103)NewConnection(107-130)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-230)protocol/keepalive/client_test.go (1)
TestClientShutdown(303-322)protocol/localstatequery/client_test.go (1)
TestClientShutdown(357-376)protocol/localtxsubmission/client_test.go (1)
TestClientShutdown(167-186)protocol/peersharing/client_test.go (1)
TestClientShutdown(89-108)protocol/leiosnotify/leiosnotify.go (1)
LeiosNotify(75-78)protocol/leiosnotify/client.go (1)
Client(24-33)
protocol/blockfetch/client_test.go (2)
connection.go (1)
Connection(59-103)protocol/blockfetch/client.go (1)
Client(30-41)
protocol/localstatequery/client.go (5)
protocol/protocol.go (1)
Protocol(39-60)protocol/localstatequery/localstatequery.go (1)
AcquireTarget(131-133)protocol/localstatequery/messages.go (4)
NewMsgDone(245-252)AcquireFailurePointNotOnChain(44-44)MsgResult(172-175)NewMsgQuery(160-170)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/localstatequery/error.go (2)
ErrAcquireFailurePointTooOld(20-20)ErrAcquireFailurePointNotOnChain(23-25)
protocol/txsubmission/server.go (5)
protocol/txsubmission/messages.go (2)
TxBody(197-201)NewMsgRequestTxIds(68-82)protocol/protocol.go (1)
Protocol(39-60)protocol/txsubmission/txsubmission.go (2)
ProtocolName(27-27)MaxAckCount(143-143)protocol/error.go (2)
ErrProtocolViolationRequestExceeded(29-31)ErrProtocolShuttingDown(19-19)protocol/txsubmission/error.go (1)
ErrStopServerProcess(21-21)
protocol/leiosfetch/client.go (3)
protocol/protocol.go (1)
Protocol(39-60)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/blockfetch/client.go (1)
Client(30-41)
protocol/keepalive/client.go (4)
protocol/protocol.go (1)
Protocol(39-60)protocol/keepalive/keepalive.go (1)
ProtocolName(27-27)connection/id.go (1)
ConnectionId(22-25)protocol/keepalive/messages.go (1)
NewMsgDone(94-101)
protocol/peersharing/client_test.go (5)
connection.go (2)
Connection(59-103)NewConnection(107-130)protocol/protocol.go (1)
ProtocolRoleClient(94-94)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/peersharing/peersharing.go (1)
PeerSharing(67-70)protocol/peersharing/client.go (1)
Client(25-35)
protocol/leiosnotify/client.go (4)
protocol/protocol.go (1)
Protocol(39-60)protocol/localstatequery/client.go (1)
Client(30-47)protocol/localtxsubmission/client.go (1)
Client(26-36)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (31)
protocol/localtxmonitor/client_test.go (1)
299-319: TestClientShutdown pattern looks consistent and usefulThe shutdown test mirrors the existing runTest harness and other protocol shutdown tests, and correctly asserts that Stop returns no error after Start.
protocol/localtxsubmission/localtxsubmission.go (1)
39-50: Confirm Done transition semantics from Busy stateAdding
MessageTypeDone -> stateDonefrom both Idle and Busy states aligns with enabling shutdown from more places, but withstateBusymarkedAgencyServerit’s worth double‑checking that the side expected to sendMessageTypeDonein Busy is consistent with the underlying protocol/state-machine implementation.Also applies to: 52-67
protocol/localstatequery/client_test.go (1)
25-25: Alias cleanup and shutdown test look goodUsing the
ouroborosalias keeps this file consistent with other tests, andTestClientShutdowncorrectly exercisesLocalStateQuery().Client.Start()followed byStop()with proper error checking.Also applies to: 357-376
protocol/keepalive/client_test.go (1)
242-301: Verify mock conversation vs KeepAlive.Start behaviorThe new
runTesthelper andTestClientShutdownfollow the shared shutdown-testing pattern and look structurally sound. One thing to double‑check:KeepAlive().Client.Start()immediately callssendKeepAlive, so ifouroboros_mockexpects every keep‑alive exchange to be scripted, a handshake‑only conversation may surface as an unexpected-message error. If the mock ignores unscripted keep‑alive traffic, this is fine; otherwise you may want to add a minimal keep‑alive request/response pair to the conversation or disable the initial send for this test.Also applies to: 303-322
protocol/leiosnotify/client.go (1)
31-33: LeiosNotify client shutdown logic is now consistent and race‑freeThe introduction of
stateMutex/started, the revisedStopthat only closesrequestNextChanafterDoneChan()(or immediately if never started), and the handlers’selectonDoneChan()vsrequestNextChantogether remove the previous risk of writes to a closed channel and align this client’s lifecycle with the rest of the protocols.RequestNextcorrectly surfacesErrProtocolShuttingDownonce the channel is closed.Also applies to: 72-86, 88-114, 150-155, 159-164, 168-173, 177-182
protocol/leiosnotify/client_test.go (1)
30-136: LeiosNotify shutdown test scaffolding looks solid and consistent with other protocolsThe handshake mock,
runTesthelper, andTestClientShutdownfollow the same pattern used across other protocol tests (mock connection, async error watcher, goleak check, Start/Stop, and bounded shutdown waits). I don’t see correctness or concurrency issues here; this gives good coverage of the LeiosNotify client lifecycle.protocol/chainsync/client.go (3)
28-37: Lifecycle tracking and deferred channel close resolve the previous readyForNextBlockChan raceUsing
started atomic.BoolplusonceStart/onceStop, and deferringclose(readyForNextBlockChan)until<-c.DoneChan()when started, ensures the channel is only closed after the protocol has fully shut down. This removes the prior risk of handlers sending on a closed channel while keeping Stop idempotent.Also applies to: 119-161
322-356: GetAvailableBlockRange: robust handling of readyForNextBlockChan and cancellationsSwitching to
ready, ok := <-c.readyForNextBlockChanwith:
!ok→ErrProtocolShuttingDownready == false→ErrSyncCancelled
gives clear semantics for shutdown vs. cancellation and avoids undefined behavior on closed channels. The subsequent RequestNext send remains guarded by DoneChan, so this looks correct.
573-754: RollForward/RollBackward: DoneChan‑aware signaling to readyForNextBlockChanWrapping all sends to
readyForNextBlockChan(bothtrueandfalsecases) in aselectonc.DoneChan()prevents sends after shutdown and cleanly propagatesErrProtocolShuttingDownwhen needed. Combined with the deferred close inStop(), this eliminates the send‑after‑close panic risk while preserving the sync and cancellation behavior.Also applies to: 756-793
protocol/leiosfetch/client.go (2)
26-38: Lifecycle flags and Start guard are reasonableUsing
started/stoppedasatomic.BoolwithonceStart/onceStopgives a clear “start once, stop once, never restart” contract. Thestopped.Load()check inStart()prevents accidental restarts afterStop(), which matches expectations for this client.Also applies to: 92-107
214-280: DoneChan‑aware handlers for result channels look correctEach handler (
handleBlock,handleBlockTxs,handleVotes,handleNextBlockAndTxsInRange,handleLastBlockAndTxsInRange) now sends on the corresponding result channel via aselectonc.DoneChan(). This prevents blocking during shutdown and returnsErrProtocolShuttingDownwhen the protocol is stopping, which is exactly what callers of the request methods expect.protocol/localstatequery/client.go (2)
29-47: Lifecycle and state mutexes remove the previous data race on startedAdding
stateMutexand updatingStart()to setc.startedunder this mutex makes accesses tostartedinStop()race‑free. TheonceStart/onceStoppairing around these operations is consistent with other protocol clients and avoids double‑start/stop issues.Also applies to: 102-115
902-965: Acquire/result handling now correctly synchronizesacquiredand is shutdown-safeThe introduction of
acquiredMutexand its use inhandleAcquired,acquire,release, andrunQueryremoves the prior data race onc.acquired. Combined with DoneChan‑aware sends toacquireResultChanandqueryResultChan, this makes acquire/release and query flows behave correctly on shutdown (returningErrProtocolShuttingDownwhen channels are closed) without risking send‑after‑close panics.Also applies to: 967-1038
protocol/txsubmission/server.go (4)
27-41: Server struct and constructor updates look correctSwitching
ackCounttoint32for atomic access, addingdonewithdoneMutex, and introducingonceStop/restartMutex/stoppedgives the server a clear lifecycle model. InitializingrequestTxIdsResultChanandrequestTxsResultChanas buffered channels inNewServeris appropriate for the single‑reply pattern.Also applies to: 48-57
83-130: Stop() and doneChan() provide a clean, coordinated shutdown API
Stop()now:
- is idempotent via
onceStop,- coordinates with restarts using
restartMutex,- safely closes
doneunderdoneMutex, and- calls
s.Protocol.Stop().Combined with
doneChan()andIsStopped(), this givesRequestTxIds/RequestTxsand tests a clear way to observe shutdown. This addresses the prior concurrent close and restart issues around the done channel.
132-193: Ack count atomics and RequestTxIds shutdown behavior look correct
RequestTxIdsnow:
- validates
reqCount,- uses
atomic.LoadInt32/StoreInt32onackCountwithin bounds, and- returns
ErrProtocolShuttingDownwhen eitherrequestTxIdsResultChanis closed ordoneChan()fires.The conversion to
uint16is guarded by the range checks, and updatingackCounttolen(result.txIds)after each successful call is consistent with the protocol. This path looks solid.
195-225: RequestTxs now shuts down cleanly via doneChan and ok checksUsing
select { case <-s.doneChan(): ...; case txs, ok := <-s.requestTxsResultChan: ... }ensures that callers getErrProtocolShuttingDownif the server is stopping, and the extraokcheck on the result channel avoids panics if it’s ever closed in future refactors. This is a good improvement in shutdown semantics.protocol/txsubmission/server_concurrency_test.go (1)
28-95: LGTM: Well-structured concurrent Stop test.The test properly exercises concurrent Stop calls with appropriate timeout protection and validates idempotent shutdown behavior. The timeout-based deadlock detection pattern is robust.
protocol/txsubmission/server_test.go (1)
28-82: LGTM: Clean shutdown test structure.The test follows established patterns with proper async error monitoring, timeout guards, and error checking on Stop. The two-stage timeout approach (2s for mock, 10s for full shutdown) is appropriate.
protocol/blockfetch/blockfetch.go (1)
122-122: LGTM: Queue size increase aligns with performance objectives.The 50% increase (256→384) in default receive queue size provides better buffering while remaining within the allowed maximum (512). This change supports the PR's performance improvement goals.
protocol/localtxsubmission/client_test.go (1)
167-186: LGTM: Consistent shutdown test pattern.The test follows the established pattern used across other protocol clients (blockfetch, chainsync, etc.) and properly validates error-free shutdown behavior.
protocol/peersharing/server.go (1)
114-129: LGTM: Clean restart flow with explicit lifecycle.The updated restart logic properly stops the protocol before reinitializing, making the lifecycle explicit and easier to reason about.
protocol/blockfetch/client_test.go (1)
211-230: LGTM: Consistent client shutdown test.The test follows the same pattern established across other protocol clients and properly validates the Start/Stop lifecycle.
protocol/peersharing/server_test.go (1)
28-82: LGTM: Clean server shutdown test.The test properly validates server shutdown with appropriate error handling, async monitoring, and timeout guards. The error check on Stop (lines 60-62) correctly validates shutdown behavior.
protocol/localtxsubmission/client.go (5)
34-35: LGTM: Standard lifecycle tracking fields.The
stateMutexandstartedfields follow the established pattern used across other protocol clients for tracking lifecycle state.
76-90: LGTM: Proper lifecycle initialization.The Start method correctly guards the
startedflag withstateMutexand usesonceStartfor idempotency. The mutex scope includes theProtocol.Start()call, which is consistent with other protocol clients.
92-124: LGTM: Proper shutdown sequencing with deferred channel close.The Stop method correctly:
- Guards state with
stateMutexand prevents concurrent access withbusyMutex- Sends the Done message before stopping the protocol
- Defers channel closure until after protocol shutdown (when
started==true) to prevent handler races- Handles the case where Stop is called before Start
The goroutine (lines 114-117) is safe because
Protocol.Stop()guaranteesDoneChanwill close.
165-179: LGTM: Eliminates TOCTOU race with shutdown-aware send.The select statement properly addresses the previous TOCTOU race condition by racing the channel send against
DoneChan. If shutdown occurs,ErrProtocolShuttingDownis returned; otherwise, the result is delivered safely.
181-204: LGTM: Consistent shutdown-aware error delivery.The select statement mirrors the pattern in
handleAcceptTx, ensuring the error is delivered safely or shutdown is acknowledged. This eliminates the TOCTOU race condition.protocol/peersharing/client_test.go (2)
30-56: Good test setup with goroutine leak detection.The test infrastructure is well-structured with proper use of goleak for detecting leaked goroutines and appropriate mock connection setup. The Ouroboros connection is configured correctly with the required options for testing.
89-108: Original review comment is based on false assumptions and should be disregarded.The Client struct fields
startedandstoppedare private (lowercase) with no corresponding public accessor methods likeIsStopped()orIsStarted(). The test correctly follows Go practices by testing only the public API (Start/Stop methods), verifying no errors occur, and relying on goleak for goroutine leak detection. Accessing private fields from tests would violate encapsulation and is unnecessary here.Likely an incorrect or invalid review comment.
1c271c7 to
5c93443
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
protocol/peersharing/server.go (1)
49-54: Error from Protocol.Stop() is silently discarded.The
Stop()method callss.Protocol.Stop()but does not capture or propagate its error. This means any shutdown failures are lost, which contradicts the method's error return signature and the pattern expected by callers.Apply this diff to properly propagate shutdown errors:
func (s *Server) Stop() error { + var err error s.onceStop.Do(func() { - s.Protocol.Stop() + err = s.Protocol.Stop() }) - return nil + return err }
🧹 Nitpick comments (2)
protocol/localtxsubmission/client.go (1)
34-36: Shutdown behavior improvements look correct; consider tightening Stop() error pathThe changes here look good overall:
stateMutex+startedcleanly protect the start/stop state.- Delaying
close(c.submitResultChan)until<-c.DoneChan()when started, and usingselect { case <-c.DoneChan(): ...; case c.submitResultChan <- ... }in both handlers, removes the TOCTOU window that could previously cause a send on a closed channel.- The
submitResultChanclose-on-Done pattern is consistent with other protocols and should be safe with the new select usage.One potential follow-up improvement: in
Stop()(Lines 93–123), ifSendMessage(NewMsgDone())fails, the function returns early and never callsc.Protocol.Stop()or closessubmitResultChan. That keeps the previous behavior but can leave the client in a partially-stopped state on an error path. You might want to still attemptc.Protocol.Stop()and schedule the channel close even when sendingMsgDonefails, while propagating the original error to the caller.Also applies to: 76-89, 93-123, 165-177, 181-203
protocol/leiosnotify/client_test.go (1)
30-136: LeiosNotify shutdown test harness is well-structuredThe NtN version-data helpers,
conversationEntryNtNResponseV15, andrunTestharness give you a clean, reusable way to exercise LeiosNotify client startup/shutdown with goleak verification and strict error handling.TestClientShutdownitself is straightforward and aligns with the other protocol shutdown tests. If you ever touchmockNtNVersionDataV11again, you could drop the interface return type and avoid the runtime type assertion, but as-is it's perfectly acceptable for test code.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (27)
protocol/blockfetch/blockfetch.go(1 hunks)protocol/blockfetch/client.go(7 hunks)protocol/blockfetch/client_test.go(1 hunks)protocol/chainsync/chainsync.go(1 hunks)protocol/chainsync/client.go(6 hunks)protocol/chainsync/client_concurrency_test.go(1 hunks)protocol/chainsync/client_test.go(2 hunks)protocol/chainsync/error.go(1 hunks)protocol/keepalive/client.go(2 hunks)protocol/keepalive/client_test.go(1 hunks)protocol/leiosfetch/client.go(5 hunks)protocol/leiosnotify/client.go(4 hunks)protocol/leiosnotify/client_test.go(1 hunks)protocol/localstatequery/client.go(6 hunks)protocol/localstatequery/client_test.go(2 hunks)protocol/localtxmonitor/client.go(10 hunks)protocol/localtxmonitor/client_test.go(1 hunks)protocol/localtxsubmission/client.go(5 hunks)protocol/localtxsubmission/client_test.go(1 hunks)protocol/localtxsubmission/localtxsubmission.go(2 hunks)protocol/peersharing/client.go(4 hunks)protocol/peersharing/client_test.go(1 hunks)protocol/peersharing/server.go(5 hunks)protocol/peersharing/server_test.go(1 hunks)protocol/txsubmission/server.go(8 hunks)protocol/txsubmission/server_concurrency_test.go(1 hunks)protocol/txsubmission/server_test.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- protocol/localtxmonitor/client_test.go
- protocol/localtxsubmission/localtxsubmission.go
- protocol/leiosnotify/client.go
- protocol/chainsync/client_test.go
- protocol/peersharing/client_test.go
- protocol/localstatequery/client_test.go
🧰 Additional context used
🧬 Code graph analysis (18)
protocol/keepalive/client_test.go (5)
connection.go (3)
Connection(59-103)NewConnection(107-130)New(133-135)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-230)protocol/keepalive/keepalive.go (1)
KeepAlive(85-88)protocol/keepalive/client.go (1)
Client(26-35)
protocol/blockfetch/client_test.go (6)
protocol/chainsync/client_test.go (1)
TestClientShutdown(283-302)protocol/keepalive/client_test.go (1)
TestClientShutdown(303-322)protocol/peersharing/client_test.go (1)
TestClientShutdown(95-114)connection.go (1)
Connection(59-103)protocol/blockfetch/blockfetch.go (1)
BlockFetch(102-105)protocol/blockfetch/client.go (1)
Client(30-41)
protocol/peersharing/client.go (3)
protocol/protocol.go (1)
Protocol(39-60)protocol/peersharing/peersharing.go (1)
ProtocolName(27-27)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/txsubmission/server_test.go (6)
protocol/peersharing/server_test.go (1)
TestServerShutdown(28-82)connection.go (2)
NewConnection(107-130)Connection(59-103)protocol/protocol.go (1)
ProtocolRoleServer(95-95)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/txsubmission/txsubmission.go (1)
TxSubmission(126-129)protocol/txsubmission/server.go (1)
Server(28-41)
protocol/peersharing/server.go (2)
protocol/txsubmission/server.go (1)
Server(28-41)protocol/protocol.go (1)
Protocol(39-60)
protocol/leiosnotify/client_test.go (6)
protocol/versiondata.go (6)
VersionData(40-46)VersionDataNtN13andUp(143-145)VersionDataNtN11to12(116-122)DiffusionModeInitiatorOnly(21-21)PeerSharingModeNoPeerSharing(27-27)QueryModeDisabled(36-36)protocol/handshake/messages.go (1)
NewMsgAcceptVersion(88-102)connection.go (2)
Connection(59-103)NewConnection(107-130)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/leiosnotify/leiosnotify.go (1)
LeiosNotify(75-78)protocol/leiosnotify/client.go (1)
Client(24-33)
protocol/localstatequery/client.go (5)
protocol/protocol.go (1)
Protocol(39-60)protocol/localtxmonitor/client.go (1)
Client(25-41)protocol/localstatequery/messages.go (4)
NewMsgDone(245-252)AcquireFailurePointNotOnChain(44-44)MsgResult(172-175)NewMsgQuery(160-170)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/localstatequery/error.go (2)
ErrAcquireFailurePointTooOld(20-20)ErrAcquireFailurePointNotOnChain(23-25)
protocol/localtxsubmission/client.go (7)
protocol/protocol.go (1)
Protocol(39-60)protocol/blockfetch/client.go (1)
Client(30-41)protocol/leiosnotify/client.go (1)
Client(24-33)protocol/localstatequery/client.go (1)
Client(30-47)protocol/localtxmonitor/client.go (1)
Client(25-41)protocol/peersharing/client.go (1)
Client(25-35)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/localtxsubmission/client_test.go (6)
protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-230)protocol/chainsync/client_test.go (1)
TestClientShutdown(283-302)protocol/localtxmonitor/client_test.go (1)
TestClientShutdown(300-319)connection.go (1)
Connection(59-103)protocol/localtxsubmission/localtxsubmission.go (1)
LocalTxSubmission(75-78)protocol/localtxsubmission/client.go (1)
Client(26-36)
protocol/peersharing/server_test.go (5)
protocol/txsubmission/server_test.go (1)
TestServerShutdown(28-82)connection.go (2)
NewConnection(107-130)Connection(59-103)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/peersharing/peersharing.go (1)
PeerSharing(67-70)protocol/peersharing/server.go (1)
Server(26-32)
protocol/chainsync/client.go (3)
protocol/protocol.go (1)
Protocol(39-60)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/chainsync/error.go (1)
ErrSyncCancelled(26-26)
protocol/leiosfetch/client.go (4)
protocol/protocol.go (1)
Protocol(39-60)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/blockfetch/client.go (1)
Client(30-41)protocol/chainsync/client.go (1)
Client(29-46)
protocol/blockfetch/client.go (2)
protocol/protocol.go (2)
Protocol(39-60)New(122-133)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/localtxmonitor/client.go (5)
protocol/protocol.go (1)
Protocol(39-60)protocol/localstatequery/client.go (1)
Client(30-47)protocol/peersharing/client.go (1)
Client(25-35)protocol/localtxmonitor/messages.go (1)
NewMsgDone(79-86)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/txsubmission/server_concurrency_test.go (5)
connection.go (1)
NewConnection(107-130)protocol/protocol.go (1)
ProtocolRoleServer(95-95)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/txsubmission/txsubmission.go (1)
TxSubmission(126-129)protocol/txsubmission/server.go (1)
Server(28-41)
protocol/txsubmission/server.go (5)
protocol/txsubmission/messages.go (2)
TxBody(197-201)NewMsgRequestTxIds(68-82)protocol/protocol.go (1)
Protocol(39-60)protocol/txsubmission/txsubmission.go (1)
MaxAckCount(143-143)protocol/error.go (2)
ErrProtocolViolationRequestExceeded(29-31)ErrProtocolShuttingDown(19-19)protocol/txsubmission/error.go (1)
ErrStopServerProcess(21-21)
protocol/chainsync/client_concurrency_test.go (3)
protocol/chainsync/chainsync.go (3)
New(259-267)NewConfig(273-295)ChainSync(204-207)connection_options.go (1)
WithChainSyncConfig(131-135)protocol/chainsync/client.go (1)
Client(29-46)
protocol/keepalive/client.go (4)
protocol/protocol.go (1)
Protocol(39-60)protocol/keepalive/keepalive.go (1)
ProtocolName(27-27)connection/id.go (1)
ConnectionId(22-25)protocol/keepalive/messages.go (1)
NewMsgDone(94-101)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (11)
protocol/localtxsubmission/client_test.go (1)
167-186: LGTM: Consistent shutdown test implementation.The test follows the established pattern seen across other protocol clients and properly validates the Start/Stop lifecycle with appropriate error handling.
protocol/blockfetch/blockfetch.go (1)
122-122: Performance tuning approved.The increase from 256 to 384 provides better buffering capacity while remaining well within the
MaxRecvQueueSizelimit of 512, consistent with the PR's performance improvement objectives.protocol/chainsync/error.go (1)
25-26: LGTM: Well-defined sentinel error.The new
ErrSyncCancellederror follows established patterns in this file and provides a clear signal for cancellation scenarios within the sync lifecycle.protocol/blockfetch/client_test.go (1)
211-230: LGTM: Proper shutdown test coverage.The test validates the BlockFetch client lifecycle with appropriate nil checks and error handling, consistent with shutdown tests across other protocols.
protocol/chainsync/chainsync.go (1)
226-227: Performance improvements approved.Increasing both
DefaultPipelineLimitandDefaultRecvQueueSizefrom 50 to 75 allows for better throughput via increased pipelining while remaining safely within protocol maximums (100).protocol/keepalive/client_test.go (1)
242-322: LGTM: Comprehensive shutdown test with proper error handling.The test infrastructure and
TestClientShutdownimplementation follow the established pattern across protocol tests. Error handling fromStop()is correctly validated (lines 317-319), addressing previous review feedback.protocol/peersharing/server.go (1)
118-132: Proper restart sequence with cleanup.The updated
handleDonecorrectly callsStop()with error checking before reinitializing the protocol, ensuring a clean restart cycle.protocol/keepalive/client.go (1)
98-120: LGTM: Well-structured Stop() implementation.The method provides proper lifecycle management with:
- Single-execution guarantee via
onceStop- Thread-safe timer cleanup under mutex
- Error capture from message sending
- Complete protocol shutdown
protocol/peersharing/server_test.go (1)
28-81: Server shutdown test wiring looks solidThe shutdown test mirrors the TxSubmission pattern: it validates that
PeerSharing().Serverexists, checks theStop()error, and waits for both the mock connection and Ouroboros connection to terminate with bounded timeouts. This will be a good guard for future lifecycle regressions once the skip is removed.protocol/txsubmission/server_concurrency_test.go (1)
28-95: Concurrency tests effectively exercise Stop() idempotence and state
TestServerConcurrentStopandTestServerStopSetsStoppedFlagare structured well: they start the server, coordinate concurrentStop()calls with aWaitGroupand timeout, and then assert theIsStopped()flag. This should catch deadlocks or mis-set state once the mock issues are resolved and the skips are removed.Also applies to: 97-143
protocol/txsubmission/server_test.go (1)
28-82: TxSubmission server shutdown test matches the shared lifecycle patternThis test cleanly validates TxSubmission server startup/shutdown, including checking the
Stop()error and waiting for both the mock and Ouroboros connections to terminate within timeouts. It aligns with the peersharing test and provides good coverage for server lifecycle behavior once un-skipped.
5c93443 to
eff3a32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
protocol/leiosfetch/client.go (1)
109-143: FixStop()early-return on SendMessage error (incomplete shutdown, no retry possible)If
SendMessage(NewMsgDone)fails, the closure inonceStop.Doreturns immediately:Protocol.Stop()is never called, result channels are never closed, andstarted/stoppedflags are not updated. BecauseonceStophas now run, subsequentStop()calls are no-ops. This can leave the underlying protocol running, leak goroutines waiting onDoneChan(line 126), and cause callers blocked on result channels to never be unblocked.You should always attempt to stop the protocol and clean up channels, while still propagating the SendMessage error. For example:
func (c *Client) Stop() error { var err error c.onceStop.Do(func() { c.Protocol.Logger(). Debug("stopping client protocol", "component", "network", "protocol", ProtocolName, "connection_id", c.callbackContext.ConnectionId.String(), ) - msg := NewMsgDone() - if err = c.SendMessage(msg); err != nil { - return - } - _ = c.Protocol.Stop() // Error ignored - method returns SendMessage error + msg := NewMsgDone() + if sendErr := c.SendMessage(msg); sendErr != nil { + // Preserve the SendMessage error but still shut down the protocol. + err = sendErr + } + // Always attempt to stop the protocol so DoneChan and muxer shutdown complete. + _ = c.Protocol.Stop() // Stop error ignored; err already reflects SendMessage failure if any // Defer closing channels until protocol fully shuts down (only if started) if c.started.Load() { go func() { <-c.DoneChan() close(c.blockResultChan) close(c.blockTxsResultChan) close(c.votesResultChan) close(c.blockRangeResultChan) }() } else { // If protocol was never started, close channels immediately close(c.blockResultChan) close(c.blockTxsResultChan) close(c.votesResultChan) close(c.blockRangeResultChan) } - c.started.Store(false) - c.stopped.Store(true) + c.started.Store(false) + c.stopped.Store(true) }) return err }This matches the pattern used in
localtxsubmission.Client.Stop()(protocol/localtxsubmission/client.go:93–120) and avoids leaving the client in a non-recoverable half-stopped state.protocol/blockfetch/client.go (1)
103-132: FixStop()to always callProtocol.Stop()to prevent goroutine leaks on send errorsThe current code only calls
c.Protocol.Stop()whenSendMessagesucceeds. IfSendMessagefails,Protocol.Stop()is skipped, which preventsMuxer.UnregisterProtocol()from being called. This leavesmuxerDoneChansignaled, causingrecvLoopto block indefinitely (line 495 of protocol/protocol.go), which never closesrecvDoneChan. WithoutrecvDoneChanclosing,sendLoopnever exits, leavingsendDoneChanunclosed. The goroutine at protocol.go lines 162-166 waits for both, sodoneChannever closes, permanently blocking the cleanup goroutine and leaking channels.The
localtxsubmissionclient (protocol/localtxsubmission/client.go:109-111) correctly handles this by always callingProtocol.Stop()regardless ofSendMessageresult. Apply the same pattern:func (c *Client) Stop() error { var err error c.onceStop.Do(func() { c.Protocol.Logger(). Debug("stopping client protocol", "component", "network", "protocol", ProtocolName, "connection_id", c.callbackContext.ConnectionId.String(), ) msg := NewMsgClientDone() - err = c.SendMessage(msg) - if err == nil { - _ = c.Protocol.Stop() // Error ignored - method returns SendMessage error - } + if sendErr := c.SendMessage(msg); sendErr != nil { + err = sendErr + } + _ = c.Protocol.Stop() // Always stop to signal muxerDoneChan // Defer closing channels until protocol fully shuts down (only if started) if c.started.Load() { go func() { <-c.DoneChan() close(c.blockChan) close(c.startBatchResultChan) }() } else { // If protocol was never started, close channels immediately close(c.blockChan) close(c.startBatchResultChan) } }) return err }protocol/localtxmonitor/client.go (1)
204-208: Guardc.acquiredwith a mutex to avoid a data race between handlers and callers.
c.acquiredis written in the handler (handleAcquired) andrelease()and read inHasTx,NextTx, andGetSizes(inside thebusyMutexcritical section), but the handler goroutine does not holdbusyMutex. That gives you an unsynchronized read/write onacquired, whichgo test -racewill flag.Given you already introduced
acquiredMutexfor the localstatequery client in this PR, I’d suggest mirroring that pattern here:
- Add
acquiredMutex sync.MutextoClient.- In
handleAcquiredandrelease, setc.acquiredunderacquiredMutex.- In
HasTx,NextTx, andGetSizes, readc.acquiredunderacquiredMutex(then decide whether to callc.acquire()).Conceptually:
type Client struct { @@ - busyMutex sync.Mutex - acquired bool + busyMutex sync.Mutex + acquiredMutex sync.Mutex + acquired bool @@ func (c *Client) handleAcquired(msg protocol.Message) error { @@ - c.acquired = true + c.acquiredMutex.Lock() + c.acquired = true + c.acquiredMutex.Unlock() @@ func (c *Client) HasTx(txId []byte) (bool, error) { @@ - if !c.acquired { + c.acquiredMutex.Lock() + acquired := c.acquired + c.acquiredMutex.Unlock() + if !acquired { @@ func (c *Client) release() error { @@ - c.acquired = false + c.acquiredMutex.Lock() + c.acquired = false + c.acquiredMutex.Unlock()And similarly for
NextTxandGetSizes. This keeps the handler and client sides race‑free while preserving existing behavior.Also applies to: 238-242, 272-276, 318-319, 397-398
🧹 Nitpick comments (2)
protocol/txsubmission/server_concurrency_test.go (2)
29-95: Skipped test provides no value until mock issues are resolved.The test structure is sound and properly checks for concurrent Stop() deadlocks, but it's currently skipped. Consider either:
- Fixing the mock server issues with NtN protocol to enable this test
- Removing this test until the infrastructure is ready
- Adding a tracking issue reference in the skip message
Do you want me to search for existing issues related to the "mock server issues with NtN protocol" or help create a tracking issue?
100-143: Incomplete test with TODO comment.The test is both skipped and has a TODO comment (line 98-99) indicating planned future enhancements. This suggests the test is a work-in-progress.
Consider either:
- Completing the test as indicated in the TODO before merging
- Creating a separate issue to track the enhancement
- Adding a more specific skip message referencing the TODO
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (32)
protocol/blockfetch/blockfetch.go(1 hunks)protocol/blockfetch/client.go(7 hunks)protocol/blockfetch/client_test.go(1 hunks)protocol/blockfetch/server.go(1 hunks)protocol/chainsync/chainsync.go(1 hunks)protocol/chainsync/client.go(6 hunks)protocol/chainsync/client_concurrency_test.go(1 hunks)protocol/chainsync/client_test.go(2 hunks)protocol/chainsync/error.go(1 hunks)protocol/chainsync/server.go(1 hunks)protocol/keepalive/client.go(2 hunks)protocol/keepalive/client_test.go(1 hunks)protocol/leiosfetch/client.go(5 hunks)protocol/leiosfetch/server.go(1 hunks)protocol/leiosnotify/client.go(4 hunks)protocol/leiosnotify/client_test.go(1 hunks)protocol/leiosnotify/server.go(1 hunks)protocol/localstatequery/client.go(6 hunks)protocol/localstatequery/client_test.go(2 hunks)protocol/localtxmonitor/client.go(10 hunks)protocol/localtxmonitor/client_test.go(1 hunks)protocol/localtxsubmission/client.go(5 hunks)protocol/localtxsubmission/client_test.go(1 hunks)protocol/localtxsubmission/localtxsubmission.go(2 hunks)protocol/peersharing/client.go(4 hunks)protocol/peersharing/client_test.go(1 hunks)protocol/peersharing/server.go(5 hunks)protocol/peersharing/server_test.go(1 hunks)protocol/protocol.go(4 hunks)protocol/txsubmission/server.go(8 hunks)protocol/txsubmission/server_concurrency_test.go(1 hunks)protocol/txsubmission/server_test.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- protocol/txsubmission/server_test.go
- protocol/blockfetch/blockfetch.go
- protocol/localtxmonitor/client_test.go
- protocol/peersharing/server_test.go
- protocol/keepalive/client_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-04T15:54:22.683Z
Learnt from: wolf31o2
Repo: blinklabs-io/gouroboros PR: 1093
File: ledger/mary/pparams.go:143-150
Timestamp: 2025-11-04T15:54:22.683Z
Learning: In the blinklabs-io/gouroboros repository, the design goal for CBOR round-trip tests is to achieve byte-identical encoding WITHOUT using stored CBOR (cbor.DecodeStoreCbor). Instead, the approach uses proper field types (pointers for optional fields) and relies on the cbor package's deterministic encoding (SortCoreDeterministic) to ensure reproducible output. The stored CBOR pattern should not be suggested as a solution for round-trip fidelity in this codebase.
Applied to files:
protocol/localstatequery/client_test.go
🧬 Code graph analysis (20)
protocol/chainsync/client_test.go (6)
protocol/chainsync/chainsync.go (1)
ChainSync(204-207)protocol/chainsync/client.go (1)
Client(29-46)protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-230)protocol/keepalive/client_test.go (1)
TestClientShutdown(303-322)protocol/localstatequery/client_test.go (1)
TestClientShutdown(357-376)connection.go (1)
Connection(59-103)
protocol/leiosnotify/client_test.go (6)
protocol/versiondata.go (6)
VersionData(40-46)VersionDataNtN13andUp(143-145)VersionDataNtN11to12(116-122)DiffusionModeInitiatorOnly(21-21)PeerSharingModeNoPeerSharing(27-27)QueryModeDisabled(36-36)protocol/handshake/messages.go (1)
NewMsgAcceptVersion(88-102)connection.go (2)
Connection(59-103)NewConnection(107-130)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/leiosnotify/leiosnotify.go (1)
LeiosNotify(75-78)protocol/leiosnotify/client.go (1)
Client(24-33)
protocol/blockfetch/client_test.go (5)
protocol/chainsync/client_test.go (1)
TestClientShutdown(283-302)protocol/keepalive/client_test.go (1)
TestClientShutdown(303-322)connection.go (1)
Connection(59-103)protocol/blockfetch/blockfetch.go (1)
BlockFetch(102-105)protocol/blockfetch/client.go (1)
Client(30-41)
protocol/peersharing/client.go (8)
protocol/leiosfetch/client.go (1)
Client(26-38)protocol/leiosnotify/client.go (1)
Client(24-33)protocol/localstatequery/client.go (1)
Client(30-47)protocol/localtxmonitor/client.go (1)
Client(25-41)protocol/localtxsubmission/client.go (1)
Client(26-36)protocol/protocol.go (1)
Protocol(39-60)protocol/peersharing/peersharing.go (1)
ProtocolName(27-27)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/keepalive/client.go (4)
protocol/blockfetch/client.go (1)
Client(30-41)protocol/protocol.go (1)
Protocol(39-60)protocol/keepalive/keepalive.go (1)
ProtocolName(27-27)protocol/keepalive/messages.go (1)
NewMsgDone(94-101)
protocol/blockfetch/client.go (2)
protocol/protocol.go (2)
Protocol(39-60)New(122-133)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/localtxsubmission/localtxsubmission.go (1)
protocol/localtxsubmission/messages.go (1)
MessageTypeDone(29-29)
protocol/leiosfetch/client.go (7)
protocol/protocol.go (1)
Protocol(39-60)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/blockfetch/client.go (1)
Client(30-41)protocol/chainsync/client.go (1)
Client(29-46)protocol/leiosnotify/client.go (1)
Client(24-33)protocol/localtxmonitor/client.go (1)
Client(25-41)protocol/message.go (1)
Message(18-22)
protocol/leiosnotify/client.go (8)
protocol/protocol.go (1)
Protocol(39-60)protocol/leiosfetch/client.go (1)
Client(26-38)protocol/localstatequery/client.go (1)
Client(30-47)protocol/localtxmonitor/client.go (1)
Client(25-41)protocol/localtxsubmission/client.go (1)
Client(26-36)protocol/peersharing/client.go (1)
Client(25-35)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/message.go (1)
Message(18-22)
protocol/localtxsubmission/client.go (6)
protocol/protocol.go (1)
Protocol(39-60)protocol/leiosnotify/client.go (1)
Client(24-33)protocol/localstatequery/client.go (1)
Client(30-47)protocol/localtxmonitor/client.go (1)
Client(25-41)protocol/peersharing/client.go (1)
Client(25-35)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/chainsync/client.go (4)
protocol/protocol.go (1)
Protocol(39-60)protocol/chainsync/chainsync.go (1)
ProtocolName(30-30)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/chainsync/error.go (1)
ErrSyncCancelled(26-26)
protocol/peersharing/client_test.go (4)
connection.go (2)
Connection(59-103)NewConnection(107-130)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/peersharing/peersharing.go (1)
PeerSharing(67-70)protocol/peersharing/client.go (1)
Client(25-35)
protocol/localstatequery/client_test.go (6)
protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-230)protocol/chainsync/client_test.go (1)
TestClientShutdown(283-302)protocol/keepalive/client_test.go (1)
TestClientShutdown(303-322)connection.go (1)
Connection(59-103)protocol/localstatequery/localstatequery.go (1)
LocalStateQuery(116-119)protocol/localstatequery/client.go (1)
Client(30-47)
protocol/localstatequery/client.go (6)
protocol/protocol.go (1)
Protocol(39-60)protocol/localstatequery/localstatequery.go (1)
AcquireTarget(131-133)protocol/localstatequery/messages.go (3)
NewMsgDone(245-252)AcquireFailurePointNotOnChain(44-44)MsgResult(172-175)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/localstatequery/error.go (2)
ErrAcquireFailurePointTooOld(20-20)ErrAcquireFailurePointNotOnChain(23-25)protocol/message.go (1)
Message(18-22)
protocol/localtxmonitor/client.go (4)
protocol/protocol.go (1)
Protocol(39-60)protocol/peersharing/client.go (1)
Client(25-35)protocol/localtxmonitor/messages.go (1)
NewMsgDone(79-86)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/peersharing/server.go (2)
protocol/txsubmission/server.go (1)
Server(28-41)protocol/protocol.go (1)
Protocol(39-60)
protocol/chainsync/client_concurrency_test.go (3)
protocol/chainsync/chainsync.go (3)
New(259-267)NewConfig(273-295)ChainSync(204-207)connection_options.go (1)
WithChainSyncConfig(131-135)protocol/chainsync/client.go (1)
Client(29-46)
protocol/txsubmission/server_concurrency_test.go (5)
connection.go (1)
NewConnection(107-130)protocol/protocol.go (1)
New(122-133)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/txsubmission/txsubmission.go (1)
TxSubmission(126-129)protocol/txsubmission/server.go (1)
Server(28-41)
protocol/txsubmission/server.go (5)
protocol/txsubmission/messages.go (1)
NewMsgRequestTxIds(68-82)protocol/protocol.go (1)
Protocol(39-60)protocol/txsubmission/txsubmission.go (2)
ProtocolName(27-27)MaxAckCount(143-143)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/txsubmission/error.go (1)
ErrStopServerProcess(21-21)
protocol/localtxsubmission/client_test.go (2)
connection.go (1)
Connection(59-103)protocol/localtxsubmission/client.go (1)
Client(26-36)
🔇 Additional comments (39)
protocol/localtxsubmission/localtxsubmission.go (1)
46-49: LGTM! Client-initiated graceful shutdown.Allowing the client to send
Donefrom theIdlestate enables graceful protocol termination, consistent with the shutdown lifecycle improvements mentioned in the PR.protocol/blockfetch/client_test.go (1)
211-230: LGTM! Shutdown test follows established pattern.The test properly validates that the BlockFetch client can be started and stopped without errors, matching the pattern used across other protocol clients (ChainSync, KeepAlive, etc.). The test structure includes proper nil checks and error handling.
protocol/localtxsubmission/client_test.go (1)
167-186: LGTM! Shutdown test follows established pattern.The test properly validates shutdown behavior for the LocalTxSubmission client. Correctly uses the NtC (node-to-client) protocol handshake, as LocalTxSubmission is a client-only protocol.
protocol/leiosfetch/server.go (1)
203-205: LGTM! Proper error propagation prevents invalid restart.The change ensures that if Stop() fails during the protocol restart sequence, the error is propagated instead of blindly proceeding with reinitialization. This aligns with similar error handling improvements in other server components (chainsync, blockfetch).
protocol/chainsync/server.go (1)
245-247: LGTM! Consistent error handling across server restarts.Proper error propagation added to the restart sequence, matching the pattern implemented in other server components. This prevents attempting to reinitialize and restart when Stop() encounters an error.
protocol/chainsync/client_test.go (1)
283-302: LGTM! Shutdown test follows established pattern.The test properly validates that the ChainSync client can be started and stopped without errors, consistent with similar tests in other protocol packages.
protocol/blockfetch/server.go (1)
179-181: LGTM! Consistent error handling in restart flow.Proper error propagation ensures that if Stop() fails, the error is returned instead of proceeding with reinitialization. This matches the pattern implemented across other server components in this PR.
protocol/chainsync/chainsync.go (1)
226-227: Verify the performance impact and rationale for the 50→75 increase—commit message lacks supporting metrics.The commit message states "Increase default queue sizes for better buffering" but provides no benchmarking data, memory impact analysis, or performance test results. While the 75 value remains within protocol limits (max 100), please ensure:
- The increase is validated with actual performance metrics or production data
- Memory impact of the 50% buffer increase is acceptable for all client deployments
- This change was tested against expected workload patterns
protocol/leiosnotify/server.go (1)
118-120: LGTM! Stop error propagation prevents restart on failure.The error handling ensures that if Stop() fails, the protocol won't attempt to reinitialize and restart, which is the correct behavior.
protocol/localstatequery/client_test.go (2)
25-25: LGTM! Import alias improves readability.The alias distinguishes the main package from test utilities.
357-376: LGTM! Shutdown test validates clean teardown.The test follows the established pattern across other protocol tests and uses goleak to ensure no goroutine leaks.
protocol/peersharing/server.go (3)
49-55: LGTM! Idempotent Stop() with error propagation.The sync.Once guard ensures Stop() is idempotent, and the method correctly propagates errors from Protocol.Stop().
119-119: LGTM! Unused parameter correctly ignored.The blank identifier indicates the message parameter is intentionally unused in handleDone.
128-130: LGTM! Stop error handling prevents unsafe restart.Checking the Stop() error ensures the protocol won't reinitialize if shutdown fails.
protocol/chainsync/error.go (1)
25-26: LGTM! Well-defined sentinel error.The new exported error follows Go conventions and provides a clear signal for cancelled sync operations.
protocol/protocol.go (3)
176-189: LGTM! Stop() signature change enables error-aware shutdown.The signature change from
func (p *Protocol) Stop()tofunc (p *Protocol) Stop() errorestablishes infrastructure for error propagation during shutdown. Currently returnsnil, but the pattern is correctly adopted by all callers in the PR.Note: This is a breaking API change for external consumers.
247-247: LGTM! Appropriate to ignore error in error path.The comment correctly indicates this is already an error path where the primary error (ErrProtocolViolationQueueExceeded) takes precedence.
453-453: LGTM! Consistent error handling in cleanup.Same pattern as line 247—error is appropriately ignored in the error path.
protocol/keepalive/client.go (2)
98-120: LGTM! Comprehensive shutdown handling.The Stop() method correctly:
- Uses sync.Once for idempotency
- Stops the timer under mutex protection
- Sends the Done message and captures its error
- Calls Protocol.Stop() (appropriately ignoring its error since SendMessage error is returned)
The design ensures clean shutdown of both the timer and the underlying protocol.
122-142: LGTM! Cleaner timer management.Extracting timer logic into
startTimer()improves code organization and makes the timer lifecycle explicit.protocol/leiosnotify/client_test.go (1)
1-136: LGTM! Comprehensive shutdown test with leak detection.The test properly validates LeiosNotify client shutdown behavior:
- Uses goleak to detect goroutine leaks
- Employs proper NtN v15 handshake sequence
- Validates Start/Stop lifecycle
- Includes timeout protection and error handling
The test structure aligns with the shutdown testing pattern used across other protocol components.
protocol/leiosnotify/client.go (5)
31-32: LGTM! Lifecycle state tracking added.The stateMutex and started flag enable safe coordination between Start/Stop and message handlers.
74-76: LGTM! Start() properly sets started flag.The mutex ensures thread-safe access to the started flag.
Also applies to: 83-83
103-111: LGTM! Channel closure deferred until protocol shutdown.The conditional logic correctly handles two scenarios:
- If started: closes channel after DoneChan signals (prevents handlers from writing to closed channel)
- If never started: closes immediately (safe since no handlers are running)
This pattern resolves the race condition noted in past review comments.
151-155: LGTM! Shutdown-aware channel send.The select statement prevents blocking on channel send when the protocol is shutting down, returning ErrProtocolShuttingDown instead.
160-164: LGTM! Consistent shutdown handling across handlers.All message handlers follow the same shutdown-aware pattern, preventing writes to closed channels.
Also applies to: 169-173, 178-182
protocol/chainsync/client_concurrency_test.go (2)
105-148: Stop-before-Start coverage looks goodThe Stop-before-Start test nicely validates that calling Stop on an unstarted ChainSync client is safe (no panic, no deadlock) and that a subsequent Start/Stop sequence still behaves correctly under goleak. This aligns well with the intended lifecycle contract.
29-103: Concurrent Start/Stop test correctly stresses once-semantics and race safetyThis test is valuable for catching races and deadlocks in the ChainSync client's sync.Once-based Start/Stop handling. Because Start() and Stop() each use sync.Once internally, only the first call to each actually executes the underlying logic; later calls from other goroutines are no-ops. If you need to exercise repeated full start/stop cycles, you'll have to create fresh clients per cycle rather than looping on the same instance.
protocol/peersharing/client.go (2)
159-173: Shutdown-aware send inhandleSharePeersis correctThe switch to a
selectonc.DoneChan()vsc.sharePeersChaninhandleSharePeerscleanly avoids blocking indefinitely during shutdown and returnsErrProtocolShuttingDownin a predictable way. This aligns with the patterns used in other mini-protocol clients.
24-35: Code is correct and already well-commented; no changes neededThe lifecycle pattern with
startedandstoppedflags is correct and necessary—not redundant. Both flags determine channel closure behavior: ifstartedis true, the channel close is deferred untilDoneChan; otherwise it closes immediately. This prevents blocking if Stop is called before Start.The code already contains clear explanatory comments (lines 110 and 117) that document this conditional logic. Setting
started = falseat the end of Stop is unnecessary and would be misleading—onceStopprevents Stop from executing again, sostartedis never read after the first invocation. The flag's semantic meaning is "was the protocol ever started," not "is it currently started."The implementation is consistent with other clients like
leiosfetchand correctly handles the Start-after-Stop prevention pattern.protocol/leiosfetch/client.go (1)
240-283: Shutdown-aware handlers for leiosfetch look solidThe updated handlers (
handleBlock,handleBlockTxs,handleVotes,handleNextBlockAndTxsInRange,handleLastBlockAndTxsInRange) now send on their result channels usingselectwithc.DoneChan(). This ensures graceful behavior during shutdown (returningErrProtocolShuttingDowninstead of blocking or panicking) and aligns with the patterns adopted in blockfetch/chainsync.protocol/localtxsubmission/client.go (2)
166-205: Handlers correctly gate result delivery on shutdownBoth
handleAcceptTxandhandleRejectTxnow use aselectonc.DoneChan()vsc.submitResultChan, returningErrProtocolShuttingDownwhen appropriate. Combined with closingsubmitResultChanafterDoneChanfires, this avoids sends to closed channels and ensuresSubmitTxcallers are reliably unblocked on shutdown.
34-36: Start/Stop lifecycle and channel cleanup are now robustThe introduction of
stateMutex+startedaround Start/Stop, along with always callingc.Protocol.Stop()and deferringsubmitResultChanclosure untilDoneChanwhen started, resolves prior shutdown/TOCTOU issues while keeping Stop idempotent. Lock ordering (stateMutex → busyMutex in Stop only; SubmitTx and handlers take no conflicting locks) eliminates deadlock risk. Handler implementations safely use select over DoneChan first, preventing sends on closed channels.protocol/blockfetch/client.go (1)
228-243: Handler changes correctly respect shutdown stateThe updated
handleStartBatch,handleNoBlocks, and non-callback path inhandleBlocknow send intostartBatchResultChan/blockChanvia aselectthat races againstc.DoneChan(), returningErrProtocolShuttingDownif the protocol is shutting down. Combined with deferring channel closure to afterDoneChanfires, this protects against panics on send and makes shutdown behavior well-defined for in-flightGetBlock/GetBlockRangeoperations.Also applies to: 245-261, 263-316
protocol/localtxmonitor/client.go (1)
24-41: Lifecycle and shutdown mechanics now look robust and race‑free.The added
stateMutex/started/stoppedstate, the Stop() logic that defers channel closing until<-c.DoneChan()(or closes immediately if never started), and the Done‑aware selects in the handlers collectively fix the prior send‑after‑close risk and give callers consistentErrProtocolShuttingDownsemantics on shutdown. This is aligned with the patterns used in the other protocol clients and looks solid.Also applies to: 88-102, 105-142, 309-377
protocol/peersharing/client_test.go (1)
30-93: PeerSharing shutdown test harness looks correct and leak‑safe.
runTest’s use of the mock connection, async error channels,goleak.VerifyNone, and the explicitClose+ timeout waits gives good coverage of the PeerSharing client Start/Stop lifecycle.TestClientShutdownexercises exactly the public surface you care about, and the structure is consistent with the other protocol tests.Also applies to: 95-114
protocol/chainsync/client.go (1)
37-38: readyForNextBlockChan shutdown is now correctly synchronized with protocol Done.The combination of:
- the
startedflag and Stop()’sgo func(){ <-c.DoneChan(); close(c.readyForNextBlockChan) },- the
ready, ok := <-c.readyForNextBlockChanhandling inGetAvailableBlockRangeandsyncLoop, and- the DoneChan‑aware selects before every send in
handleRollForward/handleRollBackwardeliminates the earlier send‑on‑closed‑channel race and gives clear, predictable shutdown/cancellation semantics (
ErrProtocolShuttingDownvsErrSyncCancelled). This is a solid fix.Also applies to: 119-169, 330-359, 458-487, 742-760, 782-800
protocol/localstatequery/client.go (1)
37-47: Lifecycle and acquire/query synchronization improvements look correct and consistent.The added
acquiredMutex,onceStop,stateMutex, andstartedflag, together with:
- Start() marking
startedunderstateMutex,- Stop() sending
MsgDone, invokingc.Protocol.Stop(), and deferring channel closes until<-c.DoneChan()when started, and- the DoneChan‑aware selects in
handleAcquired,handleFailure,handleResult, plus the closed‑channel checks inacquire()/runQuery()give the LocalStateQuery client a clean, race‑free shutdown story and predictable
ErrProtocolShuttingDownbehavior. This now matches the patterns used in the other protocol clients and addresses the earlier lifecycle races.Also applies to: 102-115, 117-149, 905-968, 970-1006, 1008-1041
protocol/txsubmission/server.go (1)
17-41: Stop/restart lifecycle and shutdown signaling are now well‑synchronized.The txsubmission server’s new design looks solid:
ackCountis nowint32and fully managed viaatomic.LoadInt32/StoreInt32, including a reset on restart.doneis guarded bydoneMutexin both Stop() andhandleDone(), with an atomic check‑and‑close, so there’s no longer a risk of double‑closing or racing with readers.- Stop() is idempotent via
onceStop, marks a permanentstoppedflag underrestartMutex, closesdone, and callsProtocol.Stop().RequestTxIds/RequestTxsuse aselecton their result channels vsdoneChan(), correctly handling closed channels and propagatingErrProtocolShuttingDown.handleDone()non‑blockingly wakes any in‑flightRequestTxIds, runs the user callback, then performs a serialized restart underrestartMutex(including reinitializing channels,done, andackCount), with a TOCTOUIsStopped()check to avoid restarting after a permanent Stop.This addresses the earlier concurrency issues around
ackCount,done, and restart, and the behavior now looks race‑free and predictable.Also applies to: 49-57, 93-130, 132-193, 195-225, 276-338
eff3a32 to
5ddaf9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
protocol/localtxsubmission/client.go (1)
92-125: Double‑close risk onsubmitResultChanbetween Stop() and handleDone()
Stop()(whenstartedis true) schedules:go func() { <-c.DoneChan() close(c.submitResultChan) }()while
handleDone()also callsclose(c.submitResultChan)directly. If a server‑initiatedDonemessage arrives during or after a client‑initiated Stop, both paths can execute, and the second close will panic.Writers are already DoneChan‑aware, so the fix is to make channel closure idempotent rather than done in multiple unguarded places.
A minimal, contained fix is to guard the close with
sync.Once:type Client struct { *protocol.Protocol config *Config callbackContext CallbackContext busyMutex sync.Mutex submitResultChan chan error onceStart sync.Once onceStop sync.Once stateMutex sync.Mutex started bool + closeSubmitResultOnce sync.Once } @@ func (c *Client) Stop() error { @@ - // Defer closing channel until protocol fully shuts down (only if started) - if c.started { - go func() { - <-c.DoneChan() - close(c.submitResultChan) - }() - } else { - // If protocol was never started, close channel immediately - close(c.submitResultChan) - } + // Defer closing channel until protocol fully shuts down (only if started) + if c.started { + go func() { + <-c.DoneChan() + c.closeSubmitResultChan() + }() + } else { + // If protocol was never started, close channel immediately + c.closeSubmitResultChan() + } @@ func (c *Client) handleDone() error { @@ - // Server is shutting down, close the result channel to unblock any waiting operations - close(c.submitResultChan) + // Server is shutting down, close the result channel to unblock any waiting operations + c.closeSubmitResultChan() return nil } + +func (c *Client) closeSubmitResultChan() { + c.closeSubmitResultOnce.Do(func() { + close(c.submitResultChan) + }) +}This keeps existing shutdown semantics but removes the panic risk.
🧹 Nitpick comments (5)
protocol/peersharing/client_test.go (1)
74-82: Consider using consistent timeout values across protocol tests.This test uses a 5-second timeout for mock connection shutdown, while
protocol/localstatequery/client_test.go(line 112) uses a 2-second timeout for the same operation. Consider aligning these values for consistency.protocol/chainsync/chainsync.go (1)
226-227: Document the rationale for the 50% increase in default buffer sizes.The defaults have been increased from 50 to 75 (a 50% increase), which could have memory implications for deployments with many concurrent connections. Consider:
- Adding a comment explaining the performance benefit and why 75 was chosen specifically
- Providing guidance on when users might want to adjust these values
Were these values validated through performance testing? If so, it would be valuable to document the findings (e.g., "testing showed X% throughput improvement with Y% memory increase").
protocol/protocol.go (1)
176-189: Add documentation comment explaining why Protocol.Stop() returns error despite always returning nil.Verification found no interface requiring
Stop() errorsignature, invalidating the interface compatibility hypothesis. However, call site comments (e.g., "Error ignored - method returns nil by design" intxsubmission/server.go:113) confirm this is an intentional design choice. To prevent API confusion, add a comment to theProtocol.Stop()method explaining whether this is a placeholder for future error conditions or part of a consistent protocol API pattern. The method's current comment lacks this context.protocol/keepalive/client.go (1)
33-35: Consider propagatingProtocol.Stop()errors alongsideSendMessageerrors
Stop()is now the primary shutdown API and other protocols propagateProtocol.Stop()errors, but here they’re discarded in favor of only theSendMessageerror. This can hide teardown failures in the underlying protocol while callers seenil.You can still keep idempotent behavior and prefer the
SendMessageerror by combining both results, e.g.:func (c *Client) Stop() error { c.onceStop.Do(func() { @@ - msg := NewMsgDone() - c.stopErr = c.SendMessage(msg) - // Ensure protocol shuts down completely - _ = c.Protocol.Stop() // Error ignored - method returns SendMessage error + msg := NewMsgDone() + sendErr := c.SendMessage(msg) + stopErr := c.Protocol.Stop() + + // Prefer the send error if present, otherwise return the stop error + if sendErr != nil { + c.stopErr = sendErr + } else { + c.stopErr = stopErr + } }) return c.stopErr }This preserves the current contract while exposing protocol-level shutdown problems to callers.
Also applies to: 98-120, 122-142
protocol/peersharing/client.go (1)
75-123: Start/Stop sequencing is safe; consider surfacing protocol Stop errorsThe Start/Stop implementation correctly:
- Serializes lifecycle with
sync.OnceandstateMutex- Prevents Start after Stop
- Defers
sharePeersChanclosure untilDoneChan()when started, avoiding send‑after‑close.
Stop()currently ignores any error fromc.Protocol.Stop(). If protocol shutdown can fail in meaningful ways, consider returning that error whenSendMessageisn’t involved, to keep behavior consistent with other clients and improve observability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (32)
protocol/blockfetch/blockfetch.go(1 hunks)protocol/blockfetch/client.go(7 hunks)protocol/blockfetch/client_test.go(1 hunks)protocol/blockfetch/server.go(1 hunks)protocol/chainsync/chainsync.go(1 hunks)protocol/chainsync/client.go(6 hunks)protocol/chainsync/client_concurrency_test.go(1 hunks)protocol/chainsync/client_test.go(2 hunks)protocol/chainsync/error.go(1 hunks)protocol/chainsync/server.go(1 hunks)protocol/keepalive/client.go(2 hunks)protocol/keepalive/client_test.go(1 hunks)protocol/leiosfetch/client.go(5 hunks)protocol/leiosfetch/server.go(1 hunks)protocol/leiosnotify/client.go(4 hunks)protocol/leiosnotify/client_test.go(1 hunks)protocol/leiosnotify/server.go(1 hunks)protocol/localstatequery/client.go(6 hunks)protocol/localstatequery/client_test.go(2 hunks)protocol/localtxmonitor/client.go(15 hunks)protocol/localtxmonitor/client_test.go(1 hunks)protocol/localtxsubmission/client.go(6 hunks)protocol/localtxsubmission/client_test.go(1 hunks)protocol/localtxsubmission/localtxsubmission.go(2 hunks)protocol/peersharing/client.go(4 hunks)protocol/peersharing/client_test.go(1 hunks)protocol/peersharing/server.go(5 hunks)protocol/peersharing/server_test.go(1 hunks)protocol/protocol.go(4 hunks)protocol/txsubmission/server.go(8 hunks)protocol/txsubmission/server_concurrency_test.go(1 hunks)protocol/txsubmission/server_test.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
- protocol/keepalive/client_test.go
- protocol/blockfetch/client_test.go
- protocol/chainsync/client_test.go
- protocol/blockfetch/blockfetch.go
- protocol/txsubmission/server_test.go
- protocol/localtxmonitor/client_test.go
- protocol/localtxsubmission/client_test.go
- protocol/leiosfetch/server.go
- protocol/txsubmission/server_concurrency_test.go
- protocol/peersharing/server_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-04T15:54:22.683Z
Learnt from: wolf31o2
Repo: blinklabs-io/gouroboros PR: 1093
File: ledger/mary/pparams.go:143-150
Timestamp: 2025-11-04T15:54:22.683Z
Learning: In the blinklabs-io/gouroboros repository, the design goal for CBOR round-trip tests is to achieve byte-identical encoding WITHOUT using stored CBOR (cbor.DecodeStoreCbor). Instead, the approach uses proper field types (pointers for optional fields) and relies on the cbor package's deterministic encoding (SortCoreDeterministic) to ensure reproducible output. The stored CBOR pattern should not be suggested as a solution for round-trip fidelity in this codebase.
Applied to files:
protocol/localstatequery/client_test.go
🧬 Code graph analysis (16)
protocol/keepalive/client.go (4)
protocol/protocol.go (1)
Protocol(39-60)protocol/keepalive/keepalive.go (1)
ProtocolName(27-27)connection/id.go (1)
ConnectionId(22-25)protocol/keepalive/messages.go (1)
NewMsgDone(94-101)
protocol/peersharing/client.go (5)
protocol/localtxsubmission/client.go (1)
Client(26-36)protocol/protocol.go (1)
Protocol(39-60)protocol/peersharing/peersharing.go (1)
ProtocolName(27-27)connection/id.go (1)
ConnectionId(22-25)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/leiosfetch/client.go (4)
protocol/protocol.go (1)
Protocol(39-60)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/blockfetch/client.go (1)
Client(30-41)protocol/chainsync/client.go (1)
Client(29-46)
protocol/leiosnotify/client_test.go (7)
protocol/versiondata.go (6)
VersionData(40-46)VersionDataNtN13andUp(143-145)VersionDataNtN11to12(116-122)DiffusionModeInitiatorOnly(21-21)PeerSharingModeNoPeerSharing(27-27)QueryModeDisabled(36-36)protocol/handshake/messages.go (1)
NewMsgAcceptVersion(88-102)connection.go (2)
Connection(59-103)NewConnection(107-130)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-230)protocol/blockfetch/client.go (1)
Client(30-41)protocol/leiosnotify/client.go (1)
Client(24-33)
protocol/localtxsubmission/client.go (7)
protocol/protocol.go (1)
Protocol(39-60)protocol/leiosnotify/client.go (1)
Client(24-33)protocol/localstatequery/client.go (1)
Client(30-47)protocol/localtxmonitor/client.go (1)
Client(25-42)protocol/peersharing/client.go (1)
Client(25-35)protocol/localtxsubmission/messages.go (1)
MessageTypeDone(29-29)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/blockfetch/client.go (2)
protocol/protocol.go (2)
Protocol(39-60)New(122-133)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/chainsync/client.go (4)
protocol/protocol.go (1)
Protocol(39-60)protocol/chainsync/chainsync.go (1)
ProtocolName(30-30)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/chainsync/error.go (1)
ErrSyncCancelled(26-26)
protocol/localstatequery/client.go (5)
protocol/protocol.go (1)
Protocol(39-60)protocol/localtxmonitor/client.go (1)
Client(25-42)protocol/localstatequery/localstatequery.go (1)
AcquireTarget(131-133)protocol/localstatequery/messages.go (4)
NewMsgDone(245-252)AcquireFailurePointNotOnChain(44-44)MsgResult(172-175)NewMsgQuery(160-170)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/localtxmonitor/client.go (4)
protocol/protocol.go (1)
Protocol(39-60)protocol/localstatequery/client.go (1)
Client(30-47)protocol/localtxmonitor/messages.go (1)
NewMsgDone(79-86)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/chainsync/client_concurrency_test.go (3)
protocol/chainsync/chainsync.go (3)
New(259-267)NewConfig(273-295)ChainSync(204-207)connection_options.go (1)
WithChainSyncConfig(131-135)protocol/chainsync/client.go (1)
Client(29-46)
protocol/leiosnotify/client.go (5)
protocol/protocol.go (1)
Protocol(39-60)protocol/localstatequery/client.go (1)
Client(30-47)protocol/localtxsubmission/client.go (1)
Client(26-36)protocol/peersharing/client.go (1)
Client(25-35)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/peersharing/client_test.go (5)
connection.go (2)
Connection(59-103)NewConnection(107-130)protocol/protocol.go (2)
ProtocolRoleClient(94-94)New(122-133)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/peersharing/peersharing.go (1)
PeerSharing(67-70)protocol/peersharing/client.go (1)
Client(25-35)
protocol/localtxsubmission/localtxsubmission.go (1)
protocol/localtxsubmission/messages.go (1)
MessageTypeDone(29-29)
protocol/txsubmission/server.go (5)
protocol/txsubmission/messages.go (2)
TxBody(197-201)NewMsgRequestTxIds(68-82)protocol/protocol.go (1)
Protocol(39-60)protocol/txsubmission/txsubmission.go (1)
MaxAckCount(143-143)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/txsubmission/error.go (1)
ErrStopServerProcess(21-21)
protocol/localstatequery/client_test.go (3)
protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-230)connection.go (1)
Connection(59-103)protocol/localstatequery/client.go (1)
Client(30-47)
protocol/peersharing/server.go (2)
protocol/txsubmission/server.go (1)
Server(28-41)protocol/protocol.go (1)
Protocol(39-60)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (47)
protocol/localtxsubmission/localtxsubmission.go (2)
46-49: LGTM: Client-initiated Done from Idle state.This transition allows the client to gracefully terminate the protocol session when idle, which is semantically correct and expected behavior.
63-66: Issue resolved—client properly handlesMessageTypeDone.Verification confirms the client's
messageHandlernow includes a case forMessageTypeDone(line 156) that callshandleDone()(line 157). The handler closes thesubmitResultChanto unblock waiting operations, properly addressing the protocol violation flagged in the previous review. The server can now sendDoneduring theBusystate without leaving the transaction status undefined.protocol/chainsync/error.go (1)
24-26: LGTM!The new error variable follows the existing pattern and provides a clear, explicit signal for sync cancellation.
protocol/peersharing/client_test.go (1)
95-114: LGTM!The shutdown test properly exercises the Start/Stop lifecycle and follows the established pattern used in other protocol tests.
protocol/chainsync/server.go (1)
245-247: LGTM!The error handling for Stop() prevents the protocol from attempting reinitialization if shutdown fails, which is the correct behavior.
protocol/blockfetch/server.go (1)
179-181: LGTM!Consistent error handling pattern that prevents protocol reinitialization if Stop() fails.
protocol/leiosnotify/server.go (1)
118-120: LGTM!Consistent with the error handling pattern applied across other protocol servers.
protocol/localstatequery/client_test.go (2)
25-25: LGTM!The import alias is needed to support the new shutdown test.
357-376: LGTM!The shutdown test follows the established pattern and properly exercises the Start/Stop lifecycle of the LocalStateQuery client.
protocol/protocol.go (2)
247-248: LGTM!Intentionally ignoring the Stop() error in an error path is appropriate, as the protocol is already handling a queue overflow violation.
453-454: LGTM!Intentionally ignoring the Stop() error in an error path is appropriate, as the protocol is already handling a queue overflow violation.
protocol/leiosnotify/client_test.go (1)
30-115: LeiosNotify client shutdown test helper and e2e Start/Stop look solidThe
runTesthelper, handshake scaffolding, andTestClientShutdownfollow the same pattern as other mini‑protocol tests (mock connection, ouroboros.New, goleak.VerifyNone, bounded timeouts). This gives good coverage that LeiosNotify’s Start/Stop path tears down cleanly without goroutine leaks.Once the Stop‑before‑Start semantics for the client are finalized, you might consider a small additional test mirroring
chainsync.TestStopBeforeStart, but the current test is a good baseline.Also applies to: 117-136
protocol/chainsync/client_concurrency_test.go (1)
29-103: Good concurrency coverage for ChainSync client Start/Stop semantics
TestConcurrentStartStopandTestStopBeforeStarteffectively stress the ChainSync client’s lifecycle: many interleaved Start/Stop calls plus the Stop‑before‑Start scenario, all under goleak and timeouts. This should catch deadlocks and shutdown regressions around the new lifecycle logic.No issues spotted in the test structure.
Also applies to: 105-148
protocol/peersharing/client.go (2)
24-35: Lifecycle state fields are well-scoped and guarded
onceStart/onceStopplusstateMutex,started, andstoppedgive clear, race‑free lifecycle control and correctly prevent starting after a Stop. This matches patterns in other clients and looks solid.
159-173: DoneChan‑aware send tosharePeersChanavoids send‑on‑closed panicsWrapping the send in:
select { case <-c.DoneChan(): return protocol.ErrProtocolShuttingDown case c.sharePeersChan <- msgSharePeers.PeerAddresses: }is exactly what’s needed with the new deferred close in
Stop(). This ensures handlers bail out cleanly once shutdown begins instead of risking a panic on a closed channel.protocol/blockfetch/client.go (4)
29-41: Atomicstartedflag correctly hardens lifecycle against racesSwitching
startedtoatomic.Booland using it only to decide shutdown behavior is appropriate here. WithonceStart/onceStopthis removes the Start/Stop race on the flag itself without complicating the API.
89-132: Stop() shutdown flow is correctly ordered and channel‑safe
Stop()now:
- Logs, sends
MsgClientDone, and always callsc.Protocol.Stop()to drive muxer shutdown- Defers closing
blockChan/startBatchResultChanuntil<-c.DoneChan()when started, otherwise closes immediately.This sequencing avoids deadlocks and send‑after‑close panics while still unblocking waiters with
ErrProtocolShuttingDown.
228-261: Start batch / no‑blocks handlers are now shutdown‑awareUsing:
select { case <-c.DoneChan(): return protocol.ErrProtocolShuttingDown case c.startBatchResultChan <- <nil or err>: }ensures outstanding
GetBlock/GetBlockRangecalls fail cleanly when shutting down instead of panicking on a closed result channel.
263-315: Block handler’s DoneChan checks correctly guard the data pathThe two‑stage DoneChan check around decoding and the final select on
c.blockChanprotect both work and sends from racing with shutdown. This aligns nicely with the deferred channel closure inStop()and should prevent the in‑flight response races you previously hit.protocol/localtxsubmission/client.go (2)
25-36: Lifecycle mutex +startedflag are fine, but channel closure needs consolidationThe addition of
stateMutexandstartedto guard Start/Stop state is reasonable and race‑free given you only read/writestartedunder that mutex. The main concern is howsubmitResultChanis now closed (see below).
149-207: DoneChan‑aware result signaling looks good once closure is deduplicatedThe updated
handleAcceptTxandhandleRejectTxuse:select { case <-c.DoneChan(): return protocol.ErrProtocolShuttingDown case c.submitResultChan <- <err or nil>: }which is exactly what you want with deferred channel closing: writers bail out cleanly on shutdown instead of racing a close. Once the closure is routed through a single
sync.Oncehelper, this path should be robust.protocol/leiosfetch/client.go (3)
26-38: Atomic lifecycle flags give clear, race‑free Start/Stop semanticsIntroducing
startedandstoppedasatomic.Booland using them to prevent Start after Stop is appropriate here. This makes lifecycle intent explicit and safe under concurrent Start/Stop calls.
92-145: Stop() shutdown path is well‑sequenced with deferred channel closure
Stop()now:
- Logs, sends
MsgDone, preserves anySendMessageerror, and always attemptsc.Protocol.Stop()- Closes all result channels only after
DoneChan()whenstartedis true, otherwise immediately- Marks
started=falseandstopped=trueatomically.This is consistent with other protocols in the PR and should avoid send‑after‑close panics while keeping the API simple.
242-285: DoneChan‑aware sends on result channels match the new Stop semanticsAll handlers (
handleBlock,handleBlockTxs,handleVotes,handleNextBlockAndTxsInRange,handleLastBlockAndTxsInRange) now gate sends with:select { case <-c.DoneChan(): return protocol.ErrProtocolShuttingDown case <chan> <- msg: }Combined with deferred channel closure in
Stop(), this gives a clean shutdown story with no send‑after‑close risk.protocol/chainsync/client.go (4)
28-38: Started flag is a simple, effective lifecycle indicatorAdding
started atomic.Booland setting it inStart()givesStop()a reliable way to distinguish “never started” from “active” when deciding how to closereadyForNextBlockChan. This is a lightweight, race‑free solution.
132-169: Stop() now shuts down the protocol and channel safely
Stop():
- Holds
busyMutexwhile sendingMsgDone, preserving existing mutual exclusion- Calls
c.Protocol.Stop()and logs (but doesn’t bubble) any error- Defers closing
readyForNextBlockChanuntil<-c.DoneChan()whenstartedis true, otherwise closes immediately.This removes the previous write‑after‑close race on
readyForNextBlockChanwhile still unblocking consumers cleanly.
330-364: GetAvailableBlockRange now handles shutdown and cancellation explicitlyHandling:
case <-c.DoneChan(): return start, end, protocol.ErrProtocolShuttingDown case ready, ok := <-c.readyForNextBlockChan: if !ok { return start, end, protocol.ErrProtocolShuttingDown } if !ready { return start, end, ErrSyncCancelled } // else send another RequestNextmeans callers won’t hang if the sync is cancelled or the protocol shuts down while they’re waiting on the range; they get a clear, typed error instead.
742-800: DoneChan‑aware signaling onreadyForNextBlockChanfixes the prior raceBoth
handleRollForwardandhandleRollBackwardnow signal readiness/cancellation via:select { case <-c.DoneChan(): return protocol.ErrProtocolShuttingDown case c.readyForNextBlockChan <- <true/false>: }With
readyForNextBlockChanonly closed afterDoneChan()inStop(), this removes the send‑after‑close panic previously called out while still givingsyncLoopand range callers precise readiness/cancellation signals.protocol/localtxmonitor/client.go (9)
30-42: LGTM: Clean lifecycle and acquired-state synchronization.The addition of
acquiredMutexandstateMutexprovides proper synchronization for theacquiredstate and lifecycle flags (started,stopped), respectively. This separation of concerns is clean and consistent with other protocol clients.
89-103: LGTM: Start() properly synchronizes lifecycle state.The
stateMutexcorrectly guards the write tostarted, ensuring no data race with readers inStop()and the channel cleanup logic.
106-148: LGTM: Stop() correctly implements shutdown with deferred cleanup.The implementation properly:
- Sets
stoppedunderstateMutexfor synchronized access- Calls
Protocol.Stop()after releasingbusyMutexto avoid potential deadlocks- Defers channel closure until
DoneChan()closes, preventing handler goroutines from writing to closed channelsThis addresses the critical issues raised in past reviews.
151-178: LGTM: Shutdown-aware public methods.
Acquire()andRelease()correctly check thestoppedflag understateMutexand returnprotocol.ErrProtocolShuttingDownwhen shutting down, preventing operations after shutdown has begun.
193-227: LGTM: HasTx() properly guards state access.The function checks
stoppedunderstateMutexand readsacquiredunderacquiredMutex, ensuring synchronized access to both lifecycle and acquired state.
230-264: LGTM: NextTx() properly guards state access.Synchronized checks for
stoppedandacquiredflags prevent races and ensure clean shutdown behavior.
267-301: LGTM: GetSizes() properly guards state access.Synchronized checks for
stoppedandacquiredflags are consistent with other public methods.
324-394: LGTM: Message handlers properly guard channel sends.All handlers use non-blocking
selectwithDoneChan()to avoid writing to result channels during shutdown. This pattern prevents the race condition where handlers write to channels afterStop()closes them.
409-418: LGTM: release() properly synchronizes acquired state.The
acquiredMutexcorrectly guards the write toacquired, ensuring no race with readers inHasTx(),NextTx(), andGetSizes().protocol/localstatequery/client.go (5)
38-47: LGTM: Lifecycle synchronization fields added.The addition of
acquiredMutex,stateMutex, andstartedflag provides proper synchronization for state access, consistent with other protocol clients.
102-115: LGTM: Start() properly synchronizes lifecycle state.The
stateMutexcorrectly guards the write tostarted, resolving the data race identified in past reviews whereStart()andStop()accessedstartedwithout synchronization.
117-149: LGTM: Stop() correctly implements shutdown with Protocol.Stop().The implementation properly:
- Calls
Protocol.Stop()to drive protocol shutdown (addressing past review feedback)- Reads
startedunderstateMutexfor race-free access- Defers channel closure until
DoneChan()closes if the client was startedThis resolves the issues raised in past reviews.
905-968: LGTM: Message handlers properly guard channel sends.All handlers (
handleAcquired,handleFailure,handleResult) use non-blockingselectwithDoneChan()to avoid writing to result channels during shutdown, preventing panics on closed channels.
970-1041: LGTM: Helper functions properly synchronize acquired state.All helpers (
acquire,release,runQuery) correctly useacquiredMutexto guard reads and writes to theacquiredflag, ensuring race-free access.protocol/txsubmission/server.go (5)
33-64: LGTM: Proper initialization for restart-capable server.The changes provide solid foundations for safe restart:
ackCountasint32enables atomic operations- Buffered result channels prevent blocking on non-blocking sends
done,doneMutex,restartMutex, andstoppedsupport coordinated shutdown and restart
93-130: LGTM: Stop() properly coordinates with restart logic.The implementation correctly:
- Uses
restartMutexto coordinate withhandleDone()restart path- Guards the select-and-close of
donewithdoneMutexto prevent double-close panics- Provides
doneChan()accessor for safe concurrent reads- Offers
IsStopped()to check permanent stop stateThis addresses all critical race conditions flagged in past reviews.
132-193: LGTM: RequestTxIds() safely uses atomic operations for ackCount.The implementation properly:
- Uses
atomic.LoadInt32andatomic.StoreInt32for race-free access toackCount- Validates before type conversion to
uint16- Uses
selectwithdoneChan()for shutdown-aware blockingThis resolves the data race on
ackCountflagged in past reviews.
195-225: LGTM: RequestTxs() properly handles shutdown.The
selectwithdoneChan()and channel-closed checks provide clean shutdown semantics, consistent withRequestTxIds().
276-338: LGTM: handleDone() restart logic is properly synchronized.The restart implementation correctly:
- Uses
restartMutexto coordinate withStop()- Guards the select-and-close of
donewithdoneMutex(matchingStop()'s pattern)- Does not reset
onceStop, preserving idempotent stop semantics- Atomically resets
ackCount- Includes TOCTOU protection via
IsStopped()check- Starts the new protocol outside the lock for better responsiveness
This resolves all critical race conditions identified in past reviews.
5ddaf9a to
eb18d3d
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
protocol/txsubmission/server.go (1)
27-41: Data race confirmed: concurrent channel field accesses without synchronizationAnalysis confirms the critical issue. The code exhibits a clear data race:
- Unsynchronized readers:
RequestTxIds()(line ~175) andRequestTxs()(line 220) readrequestTxIdsResultChanandrequestTxsResultChanwithout holdingrestartMutex- Unsynchronized senders:
handleReplyTxIds()(line 257),handleReplyTxs()(line 272), andhandleDone()(line 286) write to these channel fields without holdingrestartMutex- Synchronized reassigner:
handleDone()(lines 319-320) reassigns both channel fields while holdingrestartMutexThis violates Go's memory model: concurrent accesses to the same struct fields without synchronization constitute a data race. The pattern can cause panics (nil channel, closed channel) or undefined behavior.
The proposed fix is sound: guard all accesses with
restartMutexand snapshot the channel before any blocking operation, ensuring no concurrent field read/write during reassignment.
🧹 Nitpick comments (2)
protocol/localtxsubmission/localtxsubmission.go (1)
63-66: Test coverage for Done during active transaction is missing; add test case.The code implementation is sound—the
Donetransition fromBusystate is properly defined, and the client'shandleDone()handler (client.go:210-220) correctly closes the result channel when the server initiates shutdown.However, current tests do not verify this scenario:
TestClientShutdownonly tests client lifecycle without an active transactionTestSubmitTxAcceptandTestSubmitTxRjecttest normal response paths withAcceptTxandRejectTx, notDone- No test mock sends
MessageTypeDonewhile the client is inBusystateRecommended action: Add a test case (e.g.,
TestSubmitTxServerShutdown) that sendsSubmitTxand then receivesDonefrom the server to verify the client correctly handles forced shutdown during transaction processing.protocol/localstatequery/client.go (1)
905-923: acquiredMutex correctly protectsacquiredacross acquire/release/queryAll reads/writes of
c.acquirednow go throughacquiredMutex(handleAcquired,acquire,release,runQuery), eliminating the prior race on this flag while preserving the expected behavior (reacquire vs acquire, lazy acquire inrunQuery). You might optionally convert the lock/unlock pairs todeferstyle for readability, but functionally this looks correct.Also applies to: 970-1006, 1008-1018, 1020-1041
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (33)
protocol/blockfetch/blockfetch.go(1 hunks)protocol/blockfetch/client.go(7 hunks)protocol/blockfetch/client_test.go(1 hunks)protocol/blockfetch/server.go(1 hunks)protocol/chainsync/chainsync.go(1 hunks)protocol/chainsync/client.go(6 hunks)protocol/chainsync/client_concurrency_test.go(1 hunks)protocol/chainsync/client_test.go(2 hunks)protocol/chainsync/error.go(1 hunks)protocol/chainsync/server.go(1 hunks)protocol/keepalive/client.go(2 hunks)protocol/keepalive/client_test.go(1 hunks)protocol/leiosfetch/client.go(5 hunks)protocol/leiosfetch/server.go(1 hunks)protocol/leiosnotify/client.go(3 hunks)protocol/leiosnotify/client_concurrency_test.go(1 hunks)protocol/leiosnotify/client_test.go(1 hunks)protocol/leiosnotify/server.go(1 hunks)protocol/localstatequery/client.go(6 hunks)protocol/localstatequery/client_test.go(3 hunks)protocol/localtxmonitor/client.go(15 hunks)protocol/localtxmonitor/client_test.go(2 hunks)protocol/localtxsubmission/client.go(6 hunks)protocol/localtxsubmission/client_test.go(2 hunks)protocol/localtxsubmission/localtxsubmission.go(2 hunks)protocol/peersharing/client.go(4 hunks)protocol/peersharing/client_test.go(1 hunks)protocol/peersharing/server.go(4 hunks)protocol/peersharing/server_test.go(1 hunks)protocol/protocol.go(4 hunks)protocol/txsubmission/server.go(8 hunks)protocol/txsubmission/server_concurrency_test.go(1 hunks)protocol/txsubmission/server_test.go(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- protocol/leiosnotify/client_concurrency_test.go
🚧 Files skipped from review as they are similar to previous changes (12)
- protocol/chainsync/error.go
- protocol/leiosfetch/server.go
- protocol/txsubmission/server_test.go
- protocol/peersharing/server_test.go
- protocol/protocol.go
- protocol/txsubmission/server_concurrency_test.go
- protocol/peersharing/client_test.go
- protocol/leiosnotify/server.go
- protocol/blockfetch/client_test.go
- protocol/keepalive/client_test.go
- protocol/chainsync/chainsync.go
- protocol/leiosnotify/client_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-04T15:54:22.683Z
Learnt from: wolf31o2
Repo: blinklabs-io/gouroboros PR: 1093
File: ledger/mary/pparams.go:143-150
Timestamp: 2025-11-04T15:54:22.683Z
Learning: In the blinklabs-io/gouroboros repository, the design goal for CBOR round-trip tests is to achieve byte-identical encoding WITHOUT using stored CBOR (cbor.DecodeStoreCbor). Instead, the approach uses proper field types (pointers for optional fields) and relies on the cbor package's deterministic encoding (SortCoreDeterministic) to ensure reproducible output. The stored CBOR pattern should not be suggested as a solution for round-trip fidelity in this codebase.
Applied to files:
protocol/localstatequery/client_test.go
🔇 Additional comments (48)
protocol/localtxsubmission/localtxsubmission.go (1)
46-49: LGTM! Client-initiated Done transition enables graceful shutdown.Allowing the client to send
Donefrom theIdlestate is a clean way to support graceful, client-initiated shutdown when no transaction is pending. This aligns well with the PR's goal of improving lifecycle management across protocols.protocol/blockfetch/blockfetch.go (1)
122-122: LGTM! Consistent performance tuning across protocols.The 50% increase in DefaultRecvQueueSize (256→384) aligns with similar adjustments in other protocols (e.g., chainsync 50→75). This improves throughput for block-fetch operations.
protocol/blockfetch/server.go (1)
179-181: LGTM! Error-aware restart prevents continuing on Stop() failure.The handleClientDone method now properly checks Stop() for errors before reinitializing and restarting the protocol, preventing restart attempts when shutdown fails.
protocol/localtxmonitor/client_test.go (2)
93-93: LGTM! Timeout extension accommodates async shutdown.Extending the timeout from 2 to 5 seconds provides sufficient margin for the shutdown sequences introduced across protocol clients.
300-319: LGTM! Shutdown test validates lifecycle management.The new TestClientShutdown properly exercises the Start/Stop lifecycle, ensuring the LocalTxMonitor client can cleanly shut down without errors.
protocol/chainsync/server.go (1)
245-247: LGTM! Consistent error-aware restart pattern.The handleDone method now mirrors the error propagation pattern in blockfetch/server.go, ensuring Stop() errors are surfaced before attempting restart.
protocol/chainsync/client_test.go (3)
83-88: LGTM! Proper cleanup with error logging.The test cleanup now stops the ChainSync client and logs any Stop() errors via t.Logf, surfacing shutdown issues without failing tests. This addresses the previously flagged concern.
95-95: LGTM! Consistent timeout adjustment.The 5-second timeout aligns with similar adjustments across other protocol tests to accommodate extended shutdown sequences.
285-304: LGTM! Comprehensive shutdown test.TestClientShutdown properly validates the Start/Stop lifecycle for the ChainSync client, ensuring clean shutdown behavior.
protocol/localtxsubmission/client_test.go (2)
86-86: LGTM! Consistent timeout adjustment.The extended timeout accommodates async shutdown flows across protocol tests.
167-186: LGTM! Standard shutdown test pattern.TestClientShutdown validates the LocalTxSubmission client lifecycle, consistent with shutdown tests across other protocol implementations.
protocol/localstatequery/client_test.go (3)
25-25: LGTM! Import alias improves clarity.Adding the
ouroborosalias for the main package import improves readability and is consistent with Go conventions for avoiding package name conflicts.
112-112: LGTM! Consistent timeout extension.The 5-second timeout aligns with other protocol test adjustments for async shutdown handling.
357-376: LGTM! Standard shutdown validation.TestClientShutdown properly exercises the LocalStateQuery client Start/Stop lifecycle.
protocol/keepalive/client.go (3)
33-34: LGTM! Lifecycle fields support idempotent shutdown.The onceStop and stopErr fields enable thread-safe, idempotent Stop() behavior, consistent with lifecycle patterns across other protocol clients.
98-126: LGTM! Well-structured Stop() with proper coordination.The Stop() implementation correctly:
- Uses onceStop for idempotency
- Stops and clears the timer under mutex protection before sending MsgDone
- Prioritizes send errors over stop errors
- Coordinates with the Start() goroutine's defensive cleanup on DoneChan
The mutex synchronization and timer lifecycle management prevent races between Stop() and timer callbacks.
128-148: LGTM! Timer management properly refactored.Extracting startTimer() as a separate method improves clarity and ensures consistent timer lifecycle management with proper mutex protection.
protocol/peersharing/server.go (2)
49-55: LGTM! Idempotent shutdown with proper error propagation.The
Stop()method correctly usessync.Onceto ensure single execution and propagates errors fromProtocol.Stop().
127-132: LGTM! Correct restart sequence avoidingonceStoplimitation.The restart logic properly stops the current Protocol instance directly via
s.Protocol.Stop()before reinitializing, ensuring each protocol incarnation can be stopped independently.protocol/peersharing/client.go (3)
76-95: LGTM! Safe lifecycle management preventing Start after Stop.The
Start()method correctly checks thestoppedflag under mutex protection and returns early if the client has been stopped, preventing invalid state transitions.
98-124: LGTM! Proper channel cleanup synchronized with protocol shutdown.The conditional channel closure ensures:
- If started, channels close only after protocol fully shuts down (via
DoneChan())- If never started, channels close immediately
This prevents send-on-closed-channel panics.
169-173: LGTM! Shutdown-aware message handling.The non-blocking select with
DoneChan()ensures the handler returns gracefully during shutdown instead of attempting to send on a closed channel.protocol/chainsync/client_concurrency_test.go (2)
30-103: LGTM! Comprehensive concurrency test.This test validates that concurrent
Start()andStop()operations don't cause deadlocks or races, with appropriate timeout detection and leak verification.
106-148: LGTM! Critical edge case coverage.Testing
Stop()beforeStart()ensures the lifecycle guards handle out-of-order operations gracefully without panics or deadlocks.protocol/leiosnotify/client.go (3)
73-91: LGTM! Prevents Start after Stop.The mutex-protected
stoppedcheck ensuresStart()is a no-op ifStop()has already been called, preventing invalid state transitions.
93-120: LGTM! Proper shutdown coordination.The
Stop()method correctly:
- Marks the client as stopped
- Attempts to stop the protocol
- Defers channel closure until protocol shutdown completes (if started)
156-190: LGTM! All message handlers are shutdown-aware.Each handler uses a non-blocking select with
DoneChan()to gracefully handle shutdown, preventing panics from sending on closed channels.protocol/blockfetch/client.go (4)
90-101: LGTM! Clean startup with atomic flag.Using
atomic.Boolfor thestartedflag ensures race-free tracking of the protocol lifecycle.
104-132: LGTM! Proper shutdown sequence with channel cleanup.The
Stop()method correctly:
- Sends the Done message
- Calls
Protocol.Stop()to unregister from muxer- Defers channel closure until protocol fully shuts down (preventing send-on-closed panics)
237-241: LGTM! Shutdown-aware batch start handling.The select statement ensures graceful shutdown by checking
DoneChan()before sending to the result channel.
289-314: LGTM! Comprehensive shutdown handling in block delivery.The handler checks for shutdown before processing and uses a non-blocking select when sending blocks via channel, ensuring graceful termination.
protocol/leiosfetch/client.go (3)
92-107: LGTM! Race-free lifecycle coordination using atomic operations.The use of
atomic.Boolfor bothstoppedandstartedflags ensures thread-safe lifecycle management without data races.
109-145: LGTM! Comprehensive shutdown with atomic flag coordination.The
Stop()method properly:
- Marks the client as stopped atomically
- Always attempts protocol shutdown
- Defers result channel closure until protocol completes
- Updates lifecycle flags atomically
242-285: LGTM! Consistent shutdown-aware message handling across all handlers.All message handlers use the same pattern: non-blocking select with
DoneChan()to gracefully handle shutdown and prevent send-on-closed-channel panics.protocol/chainsync/client.go (5)
119-130: LGTM! Clean startup with atomic lifecycle tracking.The
startedatomic flag is set before starting the protocol, enabling proper coordination inStop().
133-169: LGTM! Proper shutdown sequence preventing channel races.The
Stop()method correctly:
- Sends Done while holding
busyMutex- Calls
Protocol.Stop()to ensure muxer unregistration- Defers channel closure until protocol fully shuts down (preventing handler panics)
350-358: LGTM! Proper channel closure handling.Checking the
okreturn value and distinguishing between channel closure (shutdown) and false readiness (cancellation) provides robust error handling.
743-760: LGTM! Shutdown-aware channel signaling in RollForward.Both cancellation and readiness signals now use non-blocking selects with
DoneChan(), preventing send-on-closed-channel panics during shutdown.
782-800: LGTM! Consistent shutdown-aware signaling in RollBackward.The handler mirrors the
handleRollForwardpattern, using non-blocking selects for both cancellation and readiness signals.protocol/localtxmonitor/client.go (5)
89-103: LGTM! Clean startup with mutex-protected flag.The
startedflag is set understateMutexprotection, ensuring coordination withStop().
106-148: LGTM! Careful lock ordering prevents deadlock.The
Stop()method correctly:
- Releases locks before calling
Protocol.Stop()(preventing deadlock)- Defers channel closure until protocol shutdown completes
- Marks the client as stopped under proper synchronization
152-273: LGTM! Consistent stopped-state checks across all operations.All public operations check the
stoppedflag understateMutexand returnErrProtocolShuttingDown, preventing operations on a stopped client.
210-212: LGTM! Fine-grained locking for acquired state.The separate
acquiredMutexprotects theacquiredflag independently from operation-level locking, preventing races without holdingbusyMutexunnecessarily.Also applies to: 247-249, 284-286, 333-336, 414-416
337-341: LGTM! All message handlers are shutdown-aware.Each handler uses a non-blocking select with
DoneChan()to prevent send-on-closed-channel panics during shutdown.Also applies to: 354-358, 371-375, 388-392
protocol/localstatequery/client.go (2)
29-47: Lifecycle synchronization and channel shutdown pattern look solid
stateMutex+startedmake Start/Stop coordination race‑free, and deferring result‑channel closure until<-c.DoneChan()when started avoids send‑on‑closed panics while still handling the “never started” case correctly. This aligns well with the patterns used in other protocol clients.Also applies to: 102-115, 117-149
886-903: Shutdown‑aware handlers correctly avoid send‑on‑closed racesThe updated
handleAcquired,handleFailure, andhandleResultnow all select onc.DoneChan()vs their respective result channels, returningErrProtocolShuttingDownwhen the protocol is stopping and only sending while the protocol is still alive. Combined with closing the result channels afterDoneChaninStop(), this removes the TOCTOU send‑on‑closed risk that existed in other clients.Also applies to: 925-951, 953-968
protocol/localtxsubmission/client.go (1)
25-37: Client lifecycle, shutdown, and result‑channel handling are consistent and safeThe combination of
stateMutex/started,onceStart/onceStop, andcloseSubmitResultOncegives you race‑free Start/Stop semantics and idempotent, panic‑free closure ofsubmitResultChan. UsingDoneChanin the accept/reject handlers and treating a closed channel asErrProtocolShuttingDowninSubmitTxmatches the shutdown pattern used elsewhere and avoids TOCTOU send‑on‑closed issues.Also applies to: 77-91, 93-126, 150-227
protocol/txsubmission/server.go (1)
33-41: ackCount, done, and Stop()/restart coordination now look race‑freeSwitching
ackCounttoint32withatomic.LoadInt32/StoreInt32, introducingdoneplusdoneMutex, and usingonceStop+restartMutex+stoppedgive the server well‑defined permanent‑stop vs restart semantics without the earlier double‑close and TOCTOU issues ondone. The RequestTxIds/RequestTxs select ondoneChan()correctly surface shutdown asErrProtocolShuttingDown, and the restart path inhandleDone()cleanly resets per‑restart state (channels, done, ackCount) underrestartMutex.Also applies to: 49-57, 93-116, 118-130, 156-171, 178-192, 217-222, 285-289, 299-329
eb18d3d to
2228478
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
protocol/chainsync/client.go (1)
28-46: Stop-before-Start and Start-after-Stop can lead to send-on-closed-channel panics and potential hangsVerified against actual implementation in
protocol/chainsync/client.go. The concerns are confirmed critical:
No
stoppedflag in Client struct (line 29-46): Unlikeleiosnotifyandleiosfetchclients, which have astoppedflag and guardStart()withif c.stopped { return },chainsync.Clientlacks this guard.Stop-before-Start closes
readyForNextBlockChanwithout prevention of later Start (lines 133-169):
Stop()checksc.started.Load()at line 158; if false, immediately closesreadyForNextBlockChanat line 165.Start()(lines 119-130) has no guard against being called afterStop(), so the channel is never recreated.- When handlers like
handleRollBackwardlater execute theirselect { case <-c.DoneChan(): ... case c.readyForNextBlockChan <- false: }(lines 784-788), they will panic on send-to-closed-channel.Unconditional
SendMessagein Stop() (line 145): IfStop()is called before the protocol was ever started (no priorProtocol.Start()), the underlyingsendQueueChanis nil, andSendMessageblocks indefinitely. The newTestStopBeforeStart(line 106) does not exercise subsequent operations that would trigger this hang.Suggested fix (align with other clients):
Introduce astoppedlifecycle flag in the struct and gate bothStart()andStop()to match the pattern inleiosnotify.Clientandleiosfetch.Client:
- In the struct:
type Client struct { *protocol.Protocol @@ onceStart sync.Once onceStop sync.Once started atomic.Bool + stopped atomic.Bool // prevents Start() after Stop()
- In
Start()(gate execution after stop):func (c *Client) Start() { c.onceStart.Do(func() { + if c.stopped.Load() { + return + } c.Protocol.Logger(). Debug("starting client protocol", "component", "network", "protocol", ProtocolName, "connection_id", c.callbackContext.ConnectionId.String(), ) c.started.Store(true) c.Protocol.Start() }) }
- In
Stop()(guardSendMessagewhen protocol not started):func (c *Client) Stop() error { var err error c.onceStop.Do(func() { + c.stopped.Store(true) c.Protocol.Logger(). Debug("stopping client protocol", "component", "network", "protocol", ProtocolName, "connection_id", c.callbackContext.ConnectionId.String(), ) c.busyMutex.Lock() defer c.busyMutex.Unlock() msg := NewMsgDone() - if err = c.SendMessage(msg); err != nil { - return + if c.started.Load() { + if sendErr := c.SendMessage(msg); sendErr != nil { + err = sendErr + // Still proceed to stopping the protocol + } } if stopErr := c.Protocol.Stop(); stopErr != nil { c.Protocol.Logger(). Error("error stopping protocol", "component", "network", "protocol", ProtocolName, "connection_id", c.callbackContext.ConnectionId.String(), "error", stopErr, ) } // Defer closing channel until protocol fully shuts down (only if started) if c.started.Load() { go func() { <-c.DoneChan() close(c.readyForNextBlockChan) }() } else { // If protocol was never started, close channel immediately close(c.readyForNextBlockChan) } }) return err }Also add a test that calls
Stop()beforeStart(), thenStart(), and immediately attemptsSync()orGetAvailableBlockRange()to verify no send-on-closed-channel panic occurs.Also applies to: 119-130, 132-169, 350-359, 458-487, 615-762, 764-801
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (33)
protocol/blockfetch/blockfetch.go(1 hunks)protocol/blockfetch/client.go(7 hunks)protocol/blockfetch/client_test.go(1 hunks)protocol/blockfetch/server.go(1 hunks)protocol/chainsync/chainsync.go(1 hunks)protocol/chainsync/client.go(6 hunks)protocol/chainsync/client_concurrency_test.go(1 hunks)protocol/chainsync/client_test.go(2 hunks)protocol/chainsync/error.go(1 hunks)protocol/chainsync/server.go(1 hunks)protocol/keepalive/client.go(2 hunks)protocol/keepalive/client_test.go(1 hunks)protocol/leiosfetch/client.go(5 hunks)protocol/leiosfetch/server.go(1 hunks)protocol/leiosnotify/client.go(3 hunks)protocol/leiosnotify/client_concurrency_test.go(1 hunks)protocol/leiosnotify/client_test.go(1 hunks)protocol/leiosnotify/server.go(1 hunks)protocol/localstatequery/client.go(6 hunks)protocol/localstatequery/client_test.go(3 hunks)protocol/localtxmonitor/client.go(15 hunks)protocol/localtxmonitor/client_test.go(2 hunks)protocol/localtxsubmission/client.go(6 hunks)protocol/localtxsubmission/client_test.go(2 hunks)protocol/localtxsubmission/localtxsubmission.go(2 hunks)protocol/peersharing/client.go(4 hunks)protocol/peersharing/client_test.go(1 hunks)protocol/peersharing/server.go(4 hunks)protocol/peersharing/server_test.go(1 hunks)protocol/protocol.go(4 hunks)protocol/txsubmission/server.go(10 hunks)protocol/txsubmission/server_concurrency_test.go(1 hunks)protocol/txsubmission/server_test.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (12)
- protocol/chainsync/error.go
- protocol/leiosnotify/client_concurrency_test.go
- protocol/chainsync/chainsync.go
- protocol/keepalive/client_test.go
- protocol/blockfetch/blockfetch.go
- protocol/leiosfetch/server.go
- protocol/leiosnotify/client_test.go
- protocol/peersharing/client.go
- protocol/txsubmission/server_test.go
- protocol/localtxsubmission/localtxsubmission.go
- protocol/peersharing/server_test.go
- protocol/chainsync/server.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-04T15:54:22.683Z
Learnt from: wolf31o2
Repo: blinklabs-io/gouroboros PR: 1093
File: ledger/mary/pparams.go:143-150
Timestamp: 2025-11-04T15:54:22.683Z
Learning: In the blinklabs-io/gouroboros repository, the design goal for CBOR round-trip tests is to achieve byte-identical encoding WITHOUT using stored CBOR (cbor.DecodeStoreCbor). Instead, the approach uses proper field types (pointers for optional fields) and relies on the cbor package's deterministic encoding (SortCoreDeterministic) to ensure reproducible output. The stored CBOR pattern should not be suggested as a solution for round-trip fidelity in this codebase.
Applied to files:
protocol/localstatequery/client_test.go
🧬 Code graph analysis (18)
protocol/chainsync/client_test.go (4)
protocol/chainsync/chainsync.go (1)
ChainSync(204-207)protocol/chainsync/client.go (1)
Client(29-46)protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-230)connection.go (1)
Connection(59-103)
protocol/chainsync/client.go (5)
protocol/protocol.go (1)
Protocol(39-60)protocol/chainsync/chainsync.go (1)
ProtocolName(30-30)connection/id.go (1)
ConnectionId(22-25)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/chainsync/error.go (1)
ErrSyncCancelled(26-26)
protocol/leiosfetch/client.go (5)
protocol/protocol.go (1)
Protocol(39-60)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/blockfetch/client.go (1)
Client(30-41)protocol/chainsync/client.go (1)
Client(29-46)protocol/leiosnotify/client.go (1)
Client(24-34)
protocol/localstatequery/client_test.go (10)
protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-230)protocol/chainsync/client_test.go (1)
TestClientShutdown(285-304)protocol/keepalive/client_test.go (1)
TestClientShutdown(303-322)protocol/leiosnotify/client_test.go (1)
TestClientShutdown(117-136)protocol/localtxmonitor/client_test.go (1)
TestClientShutdown(300-319)protocol/localtxsubmission/client_test.go (1)
TestClientShutdown(167-186)protocol/peersharing/client_test.go (1)
TestClientShutdown(95-114)connection.go (1)
Connection(59-103)protocol/localstatequery/localstatequery.go (1)
LocalStateQuery(116-119)protocol/localstatequery/client.go (1)
Client(30-47)
protocol/keepalive/client.go (3)
protocol/protocol.go (1)
Protocol(39-60)connection/id.go (1)
ConnectionId(22-25)protocol/keepalive/messages.go (1)
NewMsgDone(94-101)
protocol/localstatequery/client.go (5)
protocol/protocol.go (1)
Protocol(39-60)protocol/localtxmonitor/client.go (1)
Client(25-42)protocol/keepalive/messages.go (1)
NewMsgDone(94-101)protocol/localstatequery/messages.go (4)
NewMsgDone(245-252)AcquireFailurePointNotOnChain(44-44)MsgResult(172-175)NewMsgQuery(160-170)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/peersharing/client_test.go (5)
connection.go (2)
Connection(59-103)NewConnection(107-130)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-230)protocol/peersharing/peersharing.go (1)
PeerSharing(67-70)protocol/peersharing/client.go (1)
Client(25-35)
protocol/peersharing/server.go (2)
protocol/txsubmission/server.go (1)
Server(28-41)protocol/protocol.go (1)
Protocol(39-60)
protocol/localtxmonitor/client.go (8)
protocol/protocol.go (1)
Protocol(39-60)protocol/leiosnotify/client.go (1)
Client(24-34)protocol/localstatequery/client.go (1)
Client(30-47)protocol/localtxsubmission/client.go (1)
Client(26-37)protocol/peersharing/client.go (1)
Client(25-35)protocol/keepalive/messages.go (1)
NewMsgDone(94-101)protocol/localtxmonitor/messages.go (1)
NewMsgDone(79-86)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/chainsync/client_concurrency_test.go (4)
protocol/chainsync/chainsync.go (3)
New(259-267)NewConfig(273-295)ChainSync(204-207)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithChainSyncConfig(131-135)protocol/chainsync/client.go (1)
Client(29-46)protocol/leiosnotify/client_concurrency_test.go (1)
TestStopBeforeStart(27-69)
protocol/localtxsubmission/client.go (4)
protocol/localtxsubmission/localtxsubmission.go (3)
Config(81-84)CallbackContext(87-91)ProtocolName(27-27)protocol/protocol.go (1)
Protocol(39-60)protocol/localtxsubmission/messages.go (1)
MessageTypeDone(29-29)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/txsubmission/server.go (5)
protocol/txsubmission/messages.go (2)
TxBody(197-201)NewMsgRequestTxIds(68-82)protocol/protocol.go (1)
Protocol(39-60)protocol/txsubmission/txsubmission.go (2)
ProtocolName(27-27)MaxAckCount(143-143)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/txsubmission/error.go (1)
ErrStopServerProcess(21-21)
protocol/txsubmission/server_concurrency_test.go (2)
connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/txsubmission/server.go (1)
Server(28-41)
protocol/blockfetch/client.go (2)
protocol/protocol.go (2)
Protocol(39-60)New(122-133)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/leiosnotify/client.go (5)
protocol/protocol.go (1)
Protocol(39-60)protocol/leiosfetch/client.go (1)
Client(26-38)protocol/peersharing/client.go (1)
Client(25-35)protocol/leiosnotify/messages.go (1)
NewMsgDone(149-156)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/blockfetch/client_test.go (4)
protocol/keepalive/client_test.go (1)
TestClientShutdown(303-322)connection.go (1)
Connection(59-103)protocol/blockfetch/blockfetch.go (1)
BlockFetch(102-105)protocol/blockfetch/client.go (1)
Client(30-41)
protocol/localtxmonitor/client_test.go (2)
connection.go (1)
Connection(59-103)protocol/localtxmonitor/client.go (1)
Client(25-42)
protocol/localtxsubmission/client_test.go (6)
connection.go (1)
Connection(59-103)protocol/localtxsubmission/localtxsubmission.go (2)
LocalTxSubmission(75-78)ProtocolId(28-28)protocol/localtxsubmission/client.go (1)
Client(26-37)internal/test/helpers.go (1)
DecodeHexString(14-22)ledger/babbage.go (1)
TxTypeBabbage(39-39)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (45)
protocol/keepalive/client.go (1)
33-34: LGTM: Lifecycle control fields added.The
onceStopandstopErrfields properly support idempotent shutdown semantics, consistent with the pattern used in the underlying Protocol struct.protocol/protocol.go (3)
175-193: Stop() API change and implementation look consistent and safeUsing
onceStopwith muxer unregistration and a nil error return is fine for now and matches the new callers that ignore the error in error paths. The method is idempotent and safe to call from the violation paths inSendMessageandreadLoop.
247-252: Using Stop() on queue overflow is reasonableOn send-queue byte-limit violation, you emit
ErrProtocolViolationQueueExceeded, callStop(), and return the same error. This is a sensible fail-fast path; ignoring theStop()error is appropriate given the current implementation always returns nil.
453-458: Recv-side queue overflow shutdown path is consistent with send-sideThe recv-byte-limit violation mirrors the send path: emit
ErrProtocolViolationQueueExceeded, callStop(), and abort. This keeps shutdown behavior symmetric and centralizes muxer cleanup inStop().protocol/chainsync/client_concurrency_test.go (2)
29-103: Concurrent Start/Stop test is well-structuredThe test exercises repeated concurrent
Start/Stopwith a clear timeout and leak detection, which is appropriate for the new lifecycle semantics.
105-148: Stop-before-Start scenario is clearly capturedThis test codifies the expectation that
Stop()on an unstarted client and a subsequentStart()/Stop()should neither panic nor deadlock, which is useful to guard the lifecycle behavior going forward.protocol/leiosnotify/client.go (1)
31-34: Lifecycle flags and shutdown-aware handlers resolve prior channel raceThe addition of
stateMutex,started, andstoppedplus the updatedStart/Stoplogic makes the client’s lifecycle well-defined and preventsStart()afterStop()from reusing a closedrequestNextChan. Deferring the close ofrequestNextChanuntil<-c.DoneChan()when started, combined with the handlers’selectonc.DoneChan()vsrequestNextChan <- msg, avoids send-on-closed-channel panics while still surfacingErrProtocolShuttingDownto callers.You may want to double‑check, via tests similar to
TestStopBeforeStart, that callingStop()beforeStart()does not attempt to sendMsgDonevia a protocol that was never started, i.e., that the underlyingProtocolis either started elsewhere or explicitly documented as supporting pre‑StartSendMessagecalls.Also applies to: 73-90, 93-120, 156-189
protocol/leiosfetch/client.go (1)
17-24: Atomic lifecycle management and shutdown-aware handlers look solidUsing
atomic.Boolforstarted/stoppedplus the updatedStart/Stopsemantics gives a race-free lifecycle:Start()refuses to run afterStop(), andStop()always attempts to stop the underlying protocol while deferring channel closures until<-c.DoneChan()where appropriate. The handlerselectblocks onc.DoneChan()and the result channels, correctly surfacingErrProtocolShuttingDownwhen the client is shutting down.It would be worth confirming via tests (similar to the chainsync and leiosnotify concurrency/StopBeforeStart tests) that calling
Stop()beforeStart()either behaves as intended (no deadlock, returns promptly) or is explicitly disallowed by API contract, sinceStop()still sends aMsgDoneregardless of whether the protocol was started.Also applies to: 26-38, 92-107, 109-145, 242-285
protocol/txsubmission/server_concurrency_test.go (1)
1-143: Well-structured concurrency tests, appropriately skipped.Both tests follow best practices:
- Goroutine leak detection via
goleak.VerifyNone(t)(lines 31, 102)- Proper resource cleanup with deferred Close (lines 49-53, 120-124)
- Timeout protection for concurrent operations (lines 89-94)
- Clear test intent with descriptive names and comments
The skip reasons are documented (lines 30, 101) and reasonable. Once mock infrastructure supports the NtN protocol interactions needed to exercise server lifecycle, these tests will validate that:
- Concurrent
Stop()calls don't deadlock (test uses 5 goroutines with 5-second timeout)IsStopped()correctly reflects server state afterStop()protocol/blockfetch/server.go (1)
169-185: Good improvement: Stop() error now propagated.The restart path now checks and returns
Stop()errors (lines 179-181) instead of ignoring them. This prevents the server from attempting to reinitialize and restart if the underlying protocol fails to stop cleanly, avoiding potential resource leaks or inconsistent state.This aligns with the broader pattern across protocol servers in this PR (e.g.,
protocol/leiosnotify/server.go).protocol/blockfetch/client_test.go (1)
211-230: Consistent shutdown test addition.
TestClientShutdownfollows the established pattern seen across protocol tests in this PR (e.g.,protocol/keepalive/client_test.go,protocol/chainsync/client_test.go). The test correctly:
- Uses the
runTesthelper for mock setup and cleanup- Verifies the client is non-nil
- Starts the client before stopping
- Asserts
Stop()completes without errorprotocol/localtxmonitor/client_test.go (2)
93-93: Reasonable timeout increase.Extending the mock connection shutdown timeout from 2 to 5 seconds aligns with similar increases across protocol tests in this PR (e.g.,
protocol/chainsync/client_test.goline 95). This accommodates the slightly longer shutdown sequences introduced by the lifecycle improvements.
300-319: Consistent shutdown test addition.
TestClientShutdownfollows the same pattern as other protocol client shutdown tests added in this PR. The test structure is identical to those inprotocol/blockfetch/client_test.go,protocol/keepalive/client_test.go, and others, ensuring consistent validation of the Start/Stop lifecycle.protocol/leiosnotify/server.go (1)
109-124: Good improvement: Stop() error now propagated.The restart path in
handleDonenow checks and returnsStop()errors (lines 118-120), consistent with the pattern inprotocol/blockfetch/server.goand other servers in this PR. This prevents restart attempts after a failed stop, avoiding potential resource leaks.protocol/chainsync/client_test.go (3)
83-88: Good improvement: Stop() error now logged in cleanup.The test cleanup now checks the
Stop()error and logs it (lines 85-87), ensuring shutdown failures are surfaced during test runs rather than silently ignored. This addresses the past review comment and improves test observability.
95-95: Reasonable timeout increase.Extending the timeout from 2 to 5 seconds is consistent with similar adjustments across protocol tests in this PR (e.g.,
protocol/localtxmonitor/client_test.go), accommodating the lifecycle improvements.
285-304: Consistent shutdown test addition.
TestClientShutdownfollows the established pattern for shutdown tests across protocols in this PR. The test correctly verifies that the client can be started and stopped without error, validating the lifecycle improvements.protocol/localstatequery/client_test.go (3)
25-25: Import alias added for clarity.Adding the
ouroborosalias is consistent with the import style in other test files in this PR (e.g.,protocol/blockfetch/client_test.goline 22) and improves readability by distinguishing between the main package and the mock package.
112-112: Reasonable timeout increase.The timeout extension to 5 seconds matches similar changes across protocol tests (e.g.,
protocol/chainsync/client_test.goline 95,protocol/localtxmonitor/client_test.goline 93), accommodating the lifecycle improvements.
357-376: Consistent shutdown test addition.
TestClientShutdownfollows the same pattern as shutdown tests in other protocol clients (e.g.,protocol/blockfetch/client_test.go,protocol/keepalive/client_test.go), ensuring uniform validation of the Start/Stop lifecycle across the codebase.protocol/txsubmission/server.go (1)
28-351: LGTM! Lifecycle and concurrency improvements are well-implemented.The server implementation correctly addresses all previously identified critical concurrency issues:
- Done channel management: Safe close operations using
doneMutexwith select-before-close pattern prevent double-close panics- ackCount synchronization: Atomic operations (
LoadInt32/StoreInt32) eliminate data races- Restart coordination:
restartMutexproperly serializes restart and stop operations, with TOCTOU checks preventing restart after permanent stop- Idempotent shutdown:
onceStopensuresStop()executes once, and thestoppedflag correctly prevents restarts- Channel access: Result channels are captured under
restartMutexbefore blocking operations, avoiding races during restart- Send protection: Reply handlers wrap channel sends with
restartMutexThe shutdown semantics are clean:
RequestTxIds/RequestTxsreturnErrProtocolShuttingDownwhen shutdown is signaled, andhandleDonerestarts the protocol only if permanent stop hasn't been requested.protocol/peersharing/server.go (2)
49-55: LGTM: Stop() now properly propagates Protocol.Stop() errors.The idempotent Stop implementation using
sync.Oncecorrectly captures and returns errors from the underlying Protocol, addressing the previous review concern.
127-132: LGTM: Restart logic correctly calls Protocol.Stop() directly.The restart flow now calls
s.Protocol.Stop()directly instead ofs.Stop(), which correctly addresses the previous issue whereonceStopwould prevent stopping restarted protocol instances. The nil check is a good defensive measure.protocol/blockfetch/client.go (4)
90-101: LGTM: Start() correctly tracks lifecycle with atomic flag.The implementation properly sets the
startedflag usingatomic.Bool.Storebefore starting the protocol, ensuring thread-safe lifecycle tracking.
104-132: LGTM: Stop() correctly handles shutdown and deferred channel closure.The implementation properly:
- Sends the Done message with error handling
- Calls
Protocol.Stop()to ensure muxer unregistration (addressing past review)- Defers channel closure until after
DoneChan()fires if started, preventing panics from in-flight responses- Closes channels immediately if never started
237-241: LGTM: handleStartBatch uses shutdown-aware channel send.The select statement correctly checks
DoneChan()before sending tostartBatchResultChan, preventing send-on-closed panics during shutdown.
254-259: LGTM: Message handlers use shutdown-aware channel operations.Both
handleNoBlocksandhandleBlockcorrectly use select statements withDoneChan()to avoid sending to closed channels during shutdown.Also applies to: 309-313
protocol/localtxsubmission/client_test.go (2)
86-86: LGTM: Timeout increase accommodates shutdown verification.Extending the timeout from 2 to 5 seconds provides sufficient time for shutdown sequences to complete, consistent with similar changes across other protocol tests.
167-216: LGTM: Shutdown tests validate lifecycle and error handling.The new tests properly verify:
TestClientShutdown: Basic start/stop lifecycle completes without errorTestSubmitTxServerShutdown: Client correctly returnsErrProtocolShuttingDownwhen server sends DoneThese tests align with the broader PR pattern of adding shutdown verification across all protocols.
protocol/peersharing/client_test.go (2)
30-93: LGTM: Test helper follows established patterns.The
runTesthelper correctly implements:
- Goroutine leak detection with
goleak.VerifyNone- Async error handling for both mock and Ouroboros connections
- Proper timeouts for completion (5s) and shutdown (10s)
- Clean connection lifecycle management
95-114: LGTM: TestClientShutdown validates PeerSharing lifecycle.The test properly verifies that the PeerSharing client can be started and stopped without error, ensuring proper shutdown semantics consistent with other protocol clients in this PR.
protocol/localtxmonitor/client.go (5)
89-103: LGTM: Start() properly synchronizes lifecycle state.The implementation correctly guards the
startedflag withstateMutex, preventing data races between Start and Stop as noted in past reviews.
105-148: LGTM: Stop() correctly releases locks before calling Protocol.Stop().The implementation properly:
- Sets
stoppedflag under lock to prevent new operations- Releases all locks before calling
Protocol.Stop(), avoiding potential deadlocks (as addressed in past review)- Defers channel closure until
DoneChan()fires if started, preventing handler panics
152-157: LGTM: Operations correctly pre-check stopped state.All public operations properly check the
stoppedflag understateMutexand returnErrProtocolShuttingDownearly, preventing new operations after shutdown is initiated.Also applies to: 173-178, 194-199, 231-236, 268-273
337-341: LGTM: Message handlers use shutdown-aware channel operations.All message handlers correctly use select statements with
DoneChan()to avoid sending to closed channels during shutdown, addressing the critical race condition noted in past reviews.Also applies to: 354-358, 371-375, 388-392
333-336: LGTM: acquired flag properly synchronized with dedicated mutex.The
acquiredMutexconsistently guards all reads and writes to theacquiredflag, preventing data races across concurrent operations.Also applies to: 414-416
protocol/localstatequery/client.go (4)
102-115: LGTM: Start() eliminates data race on started flag.The implementation correctly guards the
startedflag withstateMutex, addressing the data race noted in past reviews where Start and Stop could concurrently accessstarted.
117-149: LGTM: Stop() ensures protocol termination and safe channel closure.The implementation properly:
- Sends the Done message with error handling
- Calls
Protocol.Stop()to drive protocol shutdown (addressing past review)- Defers channel closure until
DoneChan()fires if started, preventing handler panics
913-920: LGTM: Message handlers properly synchronized and shutdown-aware.All handlers correctly:
- Guard
acquiredflag updates withacquiredMutex- Use select statements with
DoneChan()to avoid send-on-closed panics during shutdownAlso applies to: 936-946, 961-966
971-973: LGTM: acquired flag consistently synchronized with dedicated mutex.All reads and writes to the
acquiredflag are properly guarded byacquiredMutex, ensuring race-free state tracking.Also applies to: 1013-1014, 1022-1024
protocol/localtxsubmission/client.go (5)
77-91: LGTM: Start() properly synchronizes started flag.The implementation correctly guards the
startedflag withstateMutex, ensuring thread-safe lifecycle tracking.
93-126: LGTM: Stop() ensures protocol termination and safe channel closure.The implementation properly:
- Sends the Done message with error handling
- Calls
Protocol.Stop()to drive protocol shutdown- Defers channel closure via
closeSubmitResultChan()helper untilDoneChan()fires if started- Uses
sync.Oncein the helper to prevent multiple closes
177-182: LGTM: Message handlers eliminate TOCTOU race with shutdown-aware selects.Both handlers correctly use select statements with
DoneChan()to race the channel send against shutdown, addressing the TOCTOU race condition noted in past reviews.Also applies to: 202-207
210-221: LGTM: handleDone gracefully handles server-initiated shutdown.The handler correctly closes the result channel when the server sends Done, unblocking any operations waiting on
submitResultChanand allowing them to returnErrProtocolShuttingDown.
223-227: LGTM: closeSubmitResultChan helper prevents double-close panics.The helper correctly uses
sync.Onceto ensure thesubmitResultChanis closed exactly once, even if called from multiple paths (Stop, handleDone).
2228478 to
0371e62
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (33)
protocol/blockfetch/blockfetch.go(1 hunks)protocol/blockfetch/client.go(7 hunks)protocol/blockfetch/client_test.go(1 hunks)protocol/blockfetch/server.go(1 hunks)protocol/chainsync/chainsync.go(1 hunks)protocol/chainsync/client.go(6 hunks)protocol/chainsync/client_concurrency_test.go(1 hunks)protocol/chainsync/client_test.go(2 hunks)protocol/chainsync/error.go(1 hunks)protocol/chainsync/server.go(1 hunks)protocol/keepalive/client.go(2 hunks)protocol/keepalive/client_test.go(1 hunks)protocol/leiosfetch/client.go(5 hunks)protocol/leiosfetch/server.go(1 hunks)protocol/leiosnotify/client.go(3 hunks)protocol/leiosnotify/client_concurrency_test.go(1 hunks)protocol/leiosnotify/client_test.go(1 hunks)protocol/leiosnotify/server.go(1 hunks)protocol/localstatequery/client.go(6 hunks)protocol/localstatequery/client_test.go(3 hunks)protocol/localtxmonitor/client.go(15 hunks)protocol/localtxmonitor/client_test.go(2 hunks)protocol/localtxsubmission/client.go(6 hunks)protocol/localtxsubmission/client_test.go(2 hunks)protocol/localtxsubmission/localtxsubmission.go(2 hunks)protocol/peersharing/client.go(4 hunks)protocol/peersharing/client_test.go(1 hunks)protocol/peersharing/server.go(4 hunks)protocol/peersharing/server_test.go(1 hunks)protocol/protocol.go(6 hunks)protocol/txsubmission/server.go(10 hunks)protocol/txsubmission/server_concurrency_test.go(1 hunks)protocol/txsubmission/server_test.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (12)
- protocol/blockfetch/client_test.go
- protocol/localstatequery/client_test.go
- protocol/localtxsubmission/client_test.go
- protocol/chainsync/error.go
- protocol/peersharing/server.go
- protocol/chainsync/chainsync.go
- protocol/leiosnotify/client_concurrency_test.go
- protocol/leiosnotify/server.go
- protocol/chainsync/server.go
- protocol/keepalive/client_test.go
- protocol/txsubmission/server_test.go
- protocol/localtxsubmission/client.go
🧰 Additional context used
🧬 Code graph analysis (17)
protocol/localtxmonitor/client_test.go (9)
protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-230)protocol/chainsync/client_test.go (1)
TestClientShutdown(285-304)protocol/keepalive/client_test.go (1)
TestClientShutdown(303-322)protocol/localstatequery/client_test.go (1)
TestClientShutdown(357-376)protocol/localtxsubmission/client_test.go (1)
TestClientShutdown(167-186)protocol/peersharing/client_test.go (1)
TestClientShutdown(95-114)connection.go (1)
Connection(59-103)protocol/localtxmonitor/localtxmonitor.go (1)
LocalTxMonitor(112-115)protocol/localtxmonitor/client.go (1)
Client(25-42)
protocol/chainsync/client_test.go (10)
protocol/chainsync/chainsync.go (1)
ChainSync(204-207)protocol/chainsync/client.go (1)
Client(29-47)protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-230)protocol/keepalive/client_test.go (1)
TestClientShutdown(303-322)protocol/leiosnotify/client_test.go (1)
TestClientShutdown(117-136)protocol/localstatequery/client_test.go (1)
TestClientShutdown(357-376)protocol/localtxmonitor/client_test.go (1)
TestClientShutdown(300-319)protocol/localtxsubmission/client_test.go (1)
TestClientShutdown(167-186)protocol/peersharing/client_test.go (1)
TestClientShutdown(95-114)connection.go (1)
Connection(59-103)
protocol/keepalive/client.go (4)
protocol/protocol.go (1)
Protocol(39-60)protocol/keepalive/keepalive.go (1)
ProtocolName(27-27)connection/id.go (1)
ConnectionId(22-25)protocol/keepalive/messages.go (1)
NewMsgDone(94-101)
protocol/peersharing/client.go (4)
protocol/leiosnotify/client.go (1)
Client(24-34)protocol/localtxsubmission/client.go (1)
Client(26-37)protocol/protocol.go (1)
Protocol(39-60)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/leiosfetch/client.go (3)
protocol/protocol.go (1)
Protocol(39-60)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/chainsync/client.go (1)
Client(29-47)
protocol/blockfetch/client.go (2)
protocol/protocol.go (2)
Protocol(39-60)New(122-133)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/localtxsubmission/localtxsubmission.go (1)
protocol/localtxsubmission/messages.go (1)
MessageTypeDone(29-29)
protocol/localtxmonitor/client.go (7)
protocol/protocol.go (1)
Protocol(39-60)protocol/leiosfetch/client.go (1)
Client(26-38)protocol/leiosnotify/client.go (1)
Client(24-34)protocol/localstatequery/client.go (1)
Client(30-47)protocol/keepalive/messages.go (1)
NewMsgDone(94-101)protocol/localtxmonitor/messages.go (1)
NewMsgDone(79-86)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/txsubmission/server_concurrency_test.go (5)
connection.go (1)
NewConnection(107-130)protocol/protocol.go (2)
ProtocolRoleServer(95-95)New(122-133)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/txsubmission/txsubmission.go (1)
TxSubmission(126-129)protocol/txsubmission/server.go (1)
Server(28-41)
protocol/peersharing/client_test.go (5)
connection.go (2)
Connection(59-103)NewConnection(107-130)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-230)protocol/peersharing/peersharing.go (1)
PeerSharing(67-70)protocol/peersharing/client.go (1)
Client(25-35)
protocol/leiosnotify/client_test.go (6)
protocol/versiondata.go (6)
VersionData(40-46)VersionDataNtN13andUp(143-145)VersionDataNtN11to12(116-122)DiffusionModeInitiatorOnly(21-21)PeerSharingModeNoPeerSharing(27-27)QueryModeDisabled(36-36)protocol/handshake/messages.go (1)
NewMsgAcceptVersion(88-102)connection.go (2)
Connection(59-103)NewConnection(107-130)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-230)protocol/leiosnotify/client.go (1)
Client(24-34)
protocol/leiosnotify/client.go (5)
protocol/protocol.go (1)
Protocol(39-60)protocol/localtxsubmission/client.go (1)
Client(26-37)protocol/peersharing/client.go (1)
Client(25-35)protocol/leiosnotify/messages.go (1)
NewMsgDone(149-156)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/localstatequery/client.go (6)
protocol/protocol.go (1)
Protocol(39-60)protocol/localtxmonitor/client.go (1)
Client(25-42)protocol/localstatequery/localstatequery.go (1)
AcquireTarget(131-133)protocol/localstatequery/messages.go (3)
NewMsgDone(245-252)AcquireFailurePointNotOnChain(44-44)MsgResult(172-175)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/localstatequery/error.go (1)
ErrAcquireFailurePointNotOnChain(23-25)
protocol/chainsync/client.go (4)
protocol/protocol.go (1)
Protocol(39-60)protocol/leiosfetch/client.go (1)
Client(26-38)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/chainsync/error.go (1)
ErrSyncCancelled(26-26)
protocol/chainsync/client_concurrency_test.go (6)
protocol/chainsync/chainsync.go (3)
New(259-267)NewConfig(273-295)ChainSync(204-207)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithChainSyncConfig(131-135)protocol/blockfetch/client.go (1)
Client(30-41)protocol/chainsync/client.go (1)
Client(29-47)protocol/keepalive/client.go (1)
Client(26-35)protocol/leiosnotify/client_concurrency_test.go (1)
TestStopBeforeStart(27-69)
protocol/txsubmission/server.go (4)
protocol/txsubmission/messages.go (2)
TxBody(197-201)NewMsgRequestTxIds(68-82)protocol/txsubmission/txsubmission.go (1)
MaxAckCount(143-143)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/txsubmission/error.go (1)
ErrStopServerProcess(21-21)
protocol/peersharing/server_test.go (5)
protocol/txsubmission/server_test.go (1)
TestServerShutdown(28-82)connection.go (2)
NewConnection(107-130)Connection(59-103)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/peersharing/peersharing.go (1)
PeerSharing(67-70)protocol/peersharing/server.go (1)
Server(26-32)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (35)
protocol/localtxsubmission/localtxsubmission.go (2)
46-49: Client-initiated termination looks good.Adding the
MessageTypeDonetransition fromstateIdleenables the client to gracefully terminate the protocol when it has agency, which is appropriate for coordinated shutdown handling.
63-66: Verify this state transition change aligns with Ouroboros specification.The verification confirms the previous review's requested changes are present:
- ✓ Client properly handles
MessageTypeDoneinBusystate (line 157-158 of client.go)- ✓ Transaction status is communicated when server closes during submission (via channel closure returning
protocol.ErrProtocolShuttingDown)- ✓ No data loss or undefined behavior
However, the semantic design question remains unverified: the state machine now allows the server to send
Doneand transition fromBusytoDone(lines 63-66), which terminates a transaction submission in progress. While this is now properly handled on the client side, please confirm:
- This state transition aligns with the Ouroboros local-tx-submission protocol specification
- The intended behavior when the server closes the connection mid-submission is to report
ErrProtocolShuttingDownrather than a transaction accept/reject resultprotocol/leiosfetch/server.go (1)
203-205: LGTM!The error handling for
Stop()is correctly implemented. IfStop()fails, the error is returned immediately, preventing the restart sequence from proceeding with a partially stopped protocol.protocol/blockfetch/blockfetch.go (1)
122-122: Verify the performance impact of the increased queue size.The default receive queue size has been increased by 50% (256→384). While this may improve throughput by reducing backpressure, it also increases memory usage per connection.
Please ensure this change has been tested under realistic load conditions to confirm:
- The throughput improvement justifies the increased memory footprint
- No adverse effects on GC pressure or connection scaling
protocol/localtxmonitor/client_test.go (2)
93-93: LGTM!The timeout increase from 2 to 5 seconds accommodates the more coordinated shutdown sequences introduced across the codebase.
300-319: LGTM!The shutdown test follows the established pattern used across other protocol tests (blockfetch, chainsync, keepalive, etc.) and validates that
Start()/Stop()can be called without errors.protocol/leiosfetch/client.go (3)
92-107: LGTM!The lifecycle guards prevent starting a stopped client and properly track the started state using atomic operations for race-free access.
109-145: LGTM!The shutdown sequence is well-coordinated:
- SendMessage error is preserved
- Protocol.Stop() is always called to complete muxer shutdown
- Channels are closed after DoneChan signals if started, or immediately if never started
- Lifecycle flags are updated atomically
This prevents send-on-closed-channel panics and ensures clean shutdown.
242-285: LGTM!All message handlers now use shutdown-aware selects that check
DoneChan()before sending to result channels, preventing panics during concurrent shutdown and ensuringErrProtocolShuttingDownis returned appropriately.protocol/blockfetch/client.go (5)
90-101: LGTM!The
startedflag is properly initialized using atomic operations, ensuring race-free lifecycle tracking.
104-132: LGTM!The shutdown sequence correctly:
- Preserves SendMessage errors
- Always calls Protocol.Stop() to signal muxer shutdown
- Defers channel closure until protocol completes if started
- Closes channels immediately if never started
This addresses the previous critical issues regarding premature channel closure and missing Protocol.Stop() call.
228-243: LGTM!The shutdown-aware select prevents sending to
startBatchResultChanduring shutdown, returningErrProtocolShuttingDowninstead.
245-261: LGTM!The error handling and shutdown-aware delivery of the "block(s) not found" error is correctly implemented.
309-314: LGTM!For the non-callback path (single block requests), the shutdown-aware select ensures blocks are only delivered when the protocol is still active.
protocol/blockfetch/server.go (1)
179-181: LGTM!The error handling for
Stop()prevents restarting the protocol if shutdown fails, ensuring consistent state management.protocol/localtxmonitor/client.go (5)
89-103: LGTM!The
Start()method correctly sets thestartedflag under thestateMutex, ensuring thread-safe lifecycle tracking.
105-148: LGTM!The shutdown sequence is comprehensive and safe:
stoppedflag prevents new operations- Locks are released before calling
Protocol.Stop()to avoid deadlocks- Channels are closed after protocol shutdown if started, or immediately if never started
- Previous critical issues about race conditions and missing Protocol.Stop() call have been addressed
150-301: LGTM!All public methods (
Acquire,Release,HasTx,NextTx,GetSizes) correctly:
- Pre-check the
stoppedflag understateMutex- Check the
acquiredstate underacquiredMutexwhere needed- Return
ErrProtocolShuttingDownwhen stoppedThis prevents operations from proceeding after shutdown has been initiated.
324-394: LGTM!All message handlers now:
- Update state under the appropriate mutex (
acquiredMutexforhandleAcquired)- Use shutdown-aware selects that check
DoneChan()before sending to result channels- Return
ErrProtocolShuttingDownwhen the protocol is shutting downThis prevents send-on-closed-channel panics and ensures consistent error handling during shutdown.
409-418: LGTM!The
release()method correctly clears theacquiredflag under theacquiredMutex, maintaining thread-safe state management.protocol/protocol.go (2)
272-394: LGTM! Excellent shutdown-aware sendLoop implementation.The graceful shutdown logic in
sendLoopis well-designed:
- The
shuttingDownflag ensures clean message draining before exit- Messages already queued are processed even during shutdown, preventing data loss
- Multiple exit points properly handle the
recvDoneChansignal- The nested select at lines 303-309 correctly transitions to shutdown mode while processing the queue
This pattern prevents abrupt termination and ensures protocol compliance during shutdown.
175-193: LGTM! Well-documented API design for Stop() method.The
Stop()signature change to return error is good API design:
- Provides consistency with other Stop() methods across the codebase
- Comprehensive documentation explains the current nil return and future extensibility
- Allows callers to prepare for potential future failure modes
- Clear guidance for error checking
protocol/txsubmission/server_concurrency_test.go (1)
28-95: LGTM! Thorough concurrent shutdown test.The test correctly validates:
- Idempotent Stop() calls from 5 concurrent goroutines
- Proper use of sync.WaitGroup for coordination
- Timeout detection (5s) to catch deadlocks
- Error handling for each goroutine
The skip reason is documented and acceptable. The test will provide valuable coverage once the mock infrastructure issues are resolved.
protocol/chainsync/client.go (2)
136-177: LGTM! Race condition properly resolved.The Stop() implementation correctly addresses the race condition identified in the past review:
MsgDoneis sent only if the client was started- Protocol.Stop() is called to trigger shutdown
- Critical fix: Channel close is deferred via goroutine that waits for
DoneChan(), ensuring all message handlers complete before closingreadyForNextBlockChan- If never started, channels are closed immediately (no protocol to wait for)
This prevents the panic that could occur if message handlers were still writing to the channel during close. The fix properly sequences shutdown operations.
753-768: LGTM! Shutdown-aware message handler pattern.Both
handleRollForwardandhandleRollBackwardcorrectly implement shutdown coordination:
- All writes to
readyForNextBlockChanare wrapped inselectstatements- Detect protocol shutdown via
DoneChan()before sending- Return
protocol.ErrProtocolShuttingDownon shutdown detection- Handle both cancellation (false) and readiness (true) signals
This pattern ensures no writes occur to closed channels and provides clean error propagation during shutdown.
Also applies to: 792-808
protocol/localstatequery/client.go (2)
102-115: LGTM! Data race properly resolved.The Start() method correctly addresses the data race identified in the past review:
stateMutexprotects thestartedflag- Lock is acquired before setting
started = true- Lock is released before calling
Protocol.Start()to avoid holding mutex during potentially blocking operationThis matches the pattern used in other protocol clients and ensures thread-safe access to the started flag.
117-149: LGTM! Complete Stop() implementation with proper lifecycle handling.The Stop() method correctly implements shutdown:
- Sends
MsgDonemessage with error handling- Calls
Protocol.Stop()to trigger underlying protocol shutdown- Defers channel closure based on
startedstate:
- If started: goroutine waits for
DoneChan()before closing channels (prevents race with in-flight handlers)- If never started: closes channels immediately
This addresses the past review feedback and ensures proper synchronization during shutdown.
protocol/txsubmission/server.go (3)
93-116: LGTM! All critical race conditions properly resolved.The Stop() implementation addresses all issues from past reviews:
- restartMutex coordination: acquired before
onceStop.Do(), preventing race withhandleDone()- stopped flag: marks permanent stop under mutex protection
- doneMutex guards channel close: select statement prevents double-close panic
- Protocol.Stop() called to trigger underlying shutdown
This layered synchronization correctly handles concurrent Stop() calls, ongoing restarts, and channel lifecycle.
132-196: LGTM! Atomic operations and shutdown coordination implemented correctly.RequestTxIds properly addresses the concurrency issues:
- Atomic operations:
atomic.LoadInt32andatomic.StoreInt32for ackCount eliminate the data race identified in past reviews- restartMutex protection: safely captures result channel reference before blocking select
- Shutdown awareness: doneChan() check allows clean exit during protocol shutdown
- Closed channel handling: ok check returns
ErrProtocolShuttingDownon closed channelThe validation logic (negative/overflow checks) remains before unsafe conversions, maintaining safety.
312-351: LGTM! Complex restart logic with proper TOCTOU protection.The handleDone() restart sequence is well-structured:
- Non-blocking signal: notifies RequestTxIds waiter without blocking
- First stopped check (line 314): early exit if Stop() was called
- doneMutex + select: prevents double-close race with Stop()
- Protocol reinitialization: fresh channels, atomic ackCount reset
- Second stopped check (line 345): TOCTOU protection before restart
- Restart outside lock: Start() called after releasing restartMutex for better responsiveness
This correctly handles the complex interplay between restart, stop, and ongoing requests.
protocol/chainsync/client_test.go (2)
83-88: LGTM! Cleanup improved per past review feedback.The test cleanup now properly:
- Checks for client existence before calling Stop()
- Captures and logs Stop() errors (not fatal in cleanup)
- Addresses the missing error check identified in the past review
Using
t.Logf()is appropriate for cleanup as it informs without failing the test during teardown.
285-304: LGTM! Consistent shutdown test pattern.TestClientShutdown follows the established pattern:
- Uses runTest helper for consistent setup/teardown
- Minimal handshake conversation (sufficient for lifecycle test)
- Validates client existence, starts, then stops
- Checks Stop() error explicitly
- Leverages runTest's goleak verification and timeout handling
This provides solid coverage of the client shutdown behavior introduced in this PR.
protocol/peersharing/server_test.go (1)
28-82: LGTM! Server shutdown test with proper error handling.The test correctly implements:
- Documented skip reason for mock infrastructure limitation
- Goleak verification for goroutine leak detection
- NtN handshake conversation setup
- Async error monitoring for both mock and Ouroboros connections
- Stop() error checking (lines 60-62) per past review feedback
- Reasonable timeouts (2s for mock, 10s for connection)
Pattern is consistent with other server shutdown tests (e.g., txsubmission/server_test.go).
protocol/leiosnotify/client_test.go (2)
30-54: LGTM! Proper NtN version 15+ handshake setup.The custom handshake configuration is correct:
mockNtNVersionData()returnsVersionDataNtN13andUp(required for v13+)mockNtNVersionDataV11()sets appropriate fields (network magic, diffusion mode, peer sharing, query)conversationEntryNtNResponseV15constructs AcceptVersion with version 15 and proper version data- This addresses the "protocol initialization issues" mentioned in past review
The LeiosNotify protocol requires NtN v15+ per protocol/leiosnotify documentation, and this setup satisfies that requirement.
117-136: LGTM! Shutdown test with protocol-specific handshake.TestClientShutdown correctly:
- Uses the custom NtN v15 handshake (required for LeiosNotify)
- Follows the established test pattern (runTest helper)
- Validates client existence before starting
- Checks Stop() error explicitly
- Leverages runTest's goleak and timeout protections
This provides proper coverage of the LeiosNotify client shutdown behavior, resolving the past review concern about skipped tests.
| // Stop stops the KeepAlive client protocol | ||
| func (c *Client) Stop() error { | ||
| c.onceStop.Do(func() { | ||
| c.Protocol.Logger(). | ||
| Debug("stopping client protocol", | ||
| "component", "network", | ||
| "protocol", ProtocolName, | ||
| "connection_id", c.callbackContext.ConnectionId.String(), | ||
| ) | ||
| // Stop the keep-alive timer to prevent any further sends | ||
| c.timerMutex.Lock() | ||
| if c.timer != nil { | ||
| c.timer.Stop() | ||
| c.timer = nil | ||
| } | ||
| c.timerMutex.Unlock() | ||
| msg := NewMsgDone() | ||
| sendErr := c.SendMessage(msg) | ||
| stopErr := c.Protocol.Stop() | ||
|
|
||
| // Prefer the send error if present, otherwise return the stop error | ||
| if sendErr != nil { | ||
| c.stopErr = sendErr | ||
| } else { | ||
| c.stopErr = stopErr | ||
| } | ||
| }) | ||
| return c.stopErr | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify that the MsgDone race condition is resolved.
The previous critical review identified that SendMessage(msg) followed immediately by Protocol.Stop() can race: Protocol.Stop() triggers sendLoop to exit via recvDoneChan before processing queued messages in sendQueueChan, potentially losing the MsgDone message.
The past review was marked as "✅ Addressed" but the code pattern remains unchanged. Please confirm that either:
Protocol.Stop()was modified to drainsendQueueChanbefore exiting, or- This race is acceptable because the peer will detect connection closure
Run this script to verify the current sendLoop behavior:
🏁 Script executed:
#!/bin/bash
# Check if sendLoop drains sendQueueChan before exiting when recvDoneChan closes
rg -n "func.*sendLoop" protocol/protocol.go -A 80 | head -100Length of output: 2268
I can see the code was truncated. Let me retrieve the complete sendLoop implementation and then verify the shutdown flow:
I need to examine the complete sendLoop function and the Protocol.Stop() implementation to verify the drain logic:
I need to see the complete sendLoop function and the Protocol.Stop() implementation to verify the drain logic thoroughly:
🏁 Script executed:
#!/bin/bash
# Get complete sendLoop function
rg -n "func.*sendLoop" protocol/protocol.go -A 150 | head -200Length of output: 4472
Now I need to examine the Protocol.Stop() implementation to understand the complete shutdown sequence and how recvDoneChan is closed:
🏁 Script executed:
#!/bin/bash
# Get Protocol.Stop() implementation
rg -n "func.*\bStop\(\)" protocol/protocol.go -A 50 | head -80Length of output: 1523
I need to find where recvDoneChan is actually closed and get the complete SendMessage implementation:
🏁 Script executed:
#!/bin/bash
# Find where recvDoneChan is closed
rg -n "recvDoneChan" protocol/protocol.go -B2 -A2Length of output: 927
I need to see the recvLoop implementation to understand what causes it to exit and close recvDoneChan:
🏁 Script executed:
#!/bin/bash
# Get recvLoop implementation
rg -n "func.*recvLoop" protocol/protocol.go -A 100 | head -150Length of output: 2638
Let me verify that sendQueueChan is never closed (which would be the proper way to signal shutdown):
🏁 Script executed:
#!/bin/bash
# Check if sendQueueChan is ever closed
rg -n "close.*sendQueueChan|sendQueueChan.*close" protocol/Length of output: 0
🏁 Script executed:
#!/bin/bash
# Get complete SendMessage implementation
rg -n "func.*SendMessage" protocol/protocol.go -A 40Length of output: 1351
Race condition persists; sendLoop uses unsafe len() to decide shutdown, not draining the queue reliably.
The review's concern is valid. While the code appears to implement a drain-before-exit pattern, it relies entirely on unsafe channel length checks for synchronization.
Critical issue:
Client.Stop()callsSendMessage(MsgDone)which asynchronously queues the message (line 256 in protocol.go), then immediately callsProtocol.Stop()Protocol.Stop()triggersUnregisterProtocol(), which closesmuxerDoneChan, causingrecvLoopto exit and closerecvDoneChansendLoopdetectsrecvDoneChanclose, setsshuttingDown=true, then checksif shuttingDown && len(p.sendQueueChan) == 0 { return }(line 292)- Race window:
len()is not atomic with respect to concurrent sends;MsgDonecan be added to the queue after the length check but before the return statement, causing it to be lost
The len(channel) pattern for shutdown synchronization is unsafe in Go. There is no acquire/release semantics, only a relaxed read. sendQueueChan is never closed, so the !ok exit path never triggers. All shutdown paths depend on these unsafe len() checks at lines 292, 307, and 362.
To fix this properly, either:
- Close
sendQueueChanwhen shutdown begins, allowingsendLoopto exit reliably via the!okpath, or - Implement proper synchronization (mutex, atomic flag with acquire semantics, or sync.Cond) to safely coordinate the drain
🤖 Prompt for AI Agents
In protocol/keepalive/client.go around lines 98-126, Stop() currently sends
MsgDone and calls Protocol.Stop() but relies on unsafe len(sendQueueChan) checks
in sendLoop; to fix, implement a proper shutdown by marking shutdown under a
mutex/once and closing the sendQueueChan so sendLoop can reliably range
over/drain the channel and exit via the closed-channel path. Specifically:
ensure Stop sets a shutting-down flag and closes sendQueueChan exactly once (use
c.onceStop or a dedicated once for channel close), have SendMessage return an
error if the client is shutting down (to avoid sending on closed channel), and
keep Stop waiting for sendLoop to finish (e.g., wait on the existing stop/wait
channel) before calling Protocol.Stop and returning the combined errors.
13c60d2 to
20d440a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
protocol/leiosfetch/client.go (1)
92-145: GuardStop()’sSendMessagefor never-started clients to avoid potential deadlock
Stop()currently sendsMsgDoneregardless of whether the client was ever started:msg := NewMsgDone() if sendErr := c.SendMessage(msg); sendErr != nil { // Preserve the SendMessage error but still shut down the protocol. err = sendErr }If
Start()was never called, the underlying protocol send loop may not be running, soSendMessagecan block indefinitely. This is the same pattern that previously caused hangs inleiosnotify.Client.Stop()before it was changed to sendMsgDoneonly when the client was started.Given you already track lifecycle via
startedandstopped, it would be safer to gate the send:func (c *Client) Stop() error { var err error c.onceStop.Do(func() { c.Protocol.Logger(). Debug("stopping client protocol", "component", "network", "protocol", ProtocolName, "connection_id", c.callbackContext.ConnectionId.String(), ) - msg := NewMsgDone() - if sendErr := c.SendMessage(msg); sendErr != nil { - // Preserve the SendMessage error but still shut down the protocol. - err = sendErr - } + msg := NewMsgDone() + // Only send MsgDone if the protocol was actually started; otherwise + // avoid blocking on a send queue that is not being drained. + if c.started.Load() { + if sendErr := c.SendMessage(msg); sendErr != nil { + // Preserve the SendMessage error but still shut down the protocol. + err = sendErr + } + } // Always attempt to stop the protocol so DoneChan and muxer shutdown complete. _ = c.Protocol.Stop() // Defer closing channels until protocol fully shuts down (only if started) if c.started.Load() { go func() { <-c.DoneChan() close(c.blockResultChan) close(c.blockTxsResultChan) close(c.votesResultChan) close(c.blockRangeResultChan) }() } else { // If protocol was never started, close channels immediately close(c.blockResultChan) close(c.blockTxsResultChan) close(c.votesResultChan) close(c.blockRangeResultChan) } c.started.Store(false) c.stopped.Store(true) }) return err }This keeps the new lifecycle semantics but avoids hangs when
Stop()is invoked on a never-started client.
🧹 Nitpick comments (2)
protocol/protocol.go (1)
420-440: sendMessage helper is shutdown‑only; pendingBytes mismatch is acceptableUsing
sendMessageonly during shutdown to push any remaining messages directly to the muxer simplifies the drain path. It doesn’t adjustpendingSendBytesor transition state, but since it runs only after shutdown has been initiated, this inconsistency can’t affect normal operation. If you ever expose Stop as a recoverable operation, consider aligning this path with the main send accounting.protocol/txsubmission/server.go (1)
286-351: handleDone restart sequence is well‑structured; minor future‑proofing nit
handleDonenow:
- Non‑blockingly notifies any in‑flight
RequestTxIdsviaErrStopServerProcess,- Invokes the user
DoneFunc,- Under
restartMutex, closes the currentdonechannel (withdoneMutex), stops the current protocol, re‑creates the protocol and per‑request channels, resetsackCount, and- Skips restart entirely if
stoppedis (or becomes) true.This addresses earlier races on
onceStop,done, andackCount. The only minor nit is that ifs.Protocol.Stop()ever returns a non‑nil error in the future, this path will return while still holdingrestartMutex; if you later makeStop()fallible, consider capturing the error, unlocking, and then returning.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (34)
protocol/blockfetch/blockfetch.go(1 hunks)protocol/blockfetch/client.go(7 hunks)protocol/blockfetch/client_test.go(1 hunks)protocol/blockfetch/server.go(1 hunks)protocol/chainsync/chainsync.go(1 hunks)protocol/chainsync/client.go(6 hunks)protocol/chainsync/client_concurrency_test.go(1 hunks)protocol/chainsync/client_test.go(2 hunks)protocol/chainsync/error.go(1 hunks)protocol/chainsync/server.go(1 hunks)protocol/keepalive/client.go(2 hunks)protocol/keepalive/client_test.go(1 hunks)protocol/keepalive/keepalive.go(1 hunks)protocol/leiosfetch/client.go(5 hunks)protocol/leiosfetch/server.go(1 hunks)protocol/leiosnotify/client.go(3 hunks)protocol/leiosnotify/client_concurrency_test.go(1 hunks)protocol/leiosnotify/client_test.go(1 hunks)protocol/leiosnotify/server.go(1 hunks)protocol/localstatequery/client.go(6 hunks)protocol/localstatequery/client_test.go(3 hunks)protocol/localtxmonitor/client.go(15 hunks)protocol/localtxmonitor/client_test.go(2 hunks)protocol/localtxsubmission/client.go(6 hunks)protocol/localtxsubmission/client_test.go(2 hunks)protocol/localtxsubmission/localtxsubmission.go(2 hunks)protocol/peersharing/client.go(4 hunks)protocol/peersharing/client_test.go(1 hunks)protocol/peersharing/server.go(4 hunks)protocol/peersharing/server_test.go(1 hunks)protocol/protocol.go(8 hunks)protocol/txsubmission/server.go(10 hunks)protocol/txsubmission/server_concurrency_test.go(1 hunks)protocol/txsubmission/server_test.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (14)
- protocol/blockfetch/blockfetch.go
- protocol/blockfetch/server.go
- protocol/txsubmission/server_concurrency_test.go
- protocol/chainsync/chainsync.go
- protocol/leiosnotify/client_test.go
- protocol/leiosnotify/client_concurrency_test.go
- protocol/localtxsubmission/client_test.go
- protocol/peersharing/client.go
- protocol/keepalive/client_test.go
- protocol/leiosfetch/server.go
- protocol/peersharing/client_test.go
- protocol/localtxsubmission/localtxsubmission.go
- protocol/blockfetch/client_test.go
- protocol/chainsync/client_concurrency_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-04T15:54:22.683Z
Learnt from: wolf31o2
Repo: blinklabs-io/gouroboros PR: 1093
File: ledger/mary/pparams.go:143-150
Timestamp: 2025-11-04T15:54:22.683Z
Learning: In the blinklabs-io/gouroboros repository, the design goal for CBOR round-trip tests is to achieve byte-identical encoding WITHOUT using stored CBOR (cbor.DecodeStoreCbor). Instead, the approach uses proper field types (pointers for optional fields) and relies on the cbor package's deterministic encoding (SortCoreDeterministic) to ensure reproducible output. The stored CBOR pattern should not be suggested as a solution for round-trip fidelity in this codebase.
Applied to files:
protocol/localstatequery/client_test.go
🧬 Code graph analysis (16)
protocol/protocol.go (3)
protocol/message.go (1)
Message(18-22)cbor/encode.go (1)
Encode(27-40)muxer/segment.go (1)
NewSegment(48-69)
protocol/leiosfetch/client.go (5)
protocol/protocol.go (1)
Protocol(39-60)protocol/leiosfetch/leiosfetch.go (1)
ProtocolName(26-26)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/blockfetch/client.go (1)
Client(30-41)protocol/chainsync/client.go (1)
Client(29-47)
protocol/localstatequery/client_test.go (10)
protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-230)protocol/chainsync/client_test.go (1)
TestClientShutdown(285-304)protocol/keepalive/client_test.go (1)
TestClientShutdown(303-322)protocol/leiosnotify/client_test.go (1)
TestClientShutdown(117-136)protocol/localtxmonitor/client_test.go (1)
TestClientShutdown(300-319)protocol/localtxsubmission/client_test.go (1)
TestClientShutdown(167-186)protocol/peersharing/client_test.go (1)
TestClientShutdown(95-114)connection.go (1)
Connection(59-103)protocol/localstatequery/localstatequery.go (1)
LocalStateQuery(116-119)protocol/localstatequery/client.go (1)
Client(30-47)
protocol/leiosnotify/client.go (9)
protocol/protocol.go (1)
Protocol(39-60)protocol/chainsync/client.go (1)
Client(29-47)protocol/leiosfetch/client.go (1)
Client(26-38)protocol/localstatequery/client.go (1)
Client(30-47)protocol/localtxmonitor/client.go (1)
Client(25-42)protocol/localtxsubmission/client.go (1)
Client(26-37)protocol/peersharing/client.go (1)
Client(25-35)protocol/leiosnotify/messages.go (1)
NewMsgDone(149-156)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/localtxsubmission/client.go (5)
protocol/localtxsubmission/localtxsubmission.go (2)
Config(81-84)CallbackContext(87-91)protocol/protocol.go (1)
Protocol(39-60)protocol/localstatequery/client.go (1)
Client(30-47)protocol/localtxsubmission/messages.go (1)
MessageTypeDone(29-29)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/chainsync/client_test.go (4)
protocol/chainsync/chainsync.go (1)
ChainSync(204-207)protocol/chainsync/client.go (1)
Client(29-47)protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-230)connection.go (1)
Connection(59-103)
protocol/peersharing/server_test.go (5)
protocol/txsubmission/server_test.go (1)
TestServerShutdown(28-82)connection.go (2)
NewConnection(107-130)Connection(59-103)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/peersharing/peersharing.go (1)
PeerSharing(67-70)protocol/peersharing/server.go (1)
Server(26-32)
protocol/keepalive/client.go (3)
protocol/blockfetch/client.go (1)
Client(30-41)protocol/protocol.go (1)
Protocol(39-60)protocol/keepalive/messages.go (1)
NewMsgDone(94-101)
protocol/blockfetch/client.go (3)
protocol/protocol.go (2)
Protocol(39-60)New(122-133)protocol/blockfetch/blockfetch.go (1)
New(156-162)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/peersharing/server.go (3)
protocol/txsubmission/server.go (1)
Server(28-41)protocol/protocol.go (1)
Protocol(39-60)protocol/peersharing/peersharing.go (1)
ProtocolName(27-27)
protocol/txsubmission/server_test.go (5)
protocol/peersharing/server_test.go (1)
TestServerShutdown(28-82)connection.go (2)
NewConnection(107-130)Connection(59-103)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/txsubmission/txsubmission.go (1)
TxSubmission(126-129)protocol/txsubmission/server.go (1)
Server(28-41)
protocol/chainsync/client.go (10)
protocol/protocol.go (1)
Protocol(39-60)protocol/blockfetch/client.go (1)
Client(30-41)protocol/leiosfetch/client.go (1)
Client(26-38)protocol/leiosnotify/client.go (1)
Client(24-34)protocol/localstatequery/client.go (1)
Client(30-47)protocol/localtxmonitor/client.go (1)
Client(25-42)protocol/localtxsubmission/client.go (1)
Client(26-37)protocol/peersharing/client.go (1)
Client(25-35)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/chainsync/error.go (1)
ErrSyncCancelled(26-26)
protocol/localstatequery/client.go (5)
protocol/protocol.go (1)
Protocol(39-60)protocol/localtxmonitor/client.go (1)
Client(25-42)protocol/localstatequery/messages.go (4)
NewMsgDone(245-252)AcquireFailurePointNotOnChain(44-44)MsgResult(172-175)NewMsgQuery(160-170)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/localstatequery/error.go (2)
ErrAcquireFailurePointTooOld(20-20)ErrAcquireFailurePointNotOnChain(23-25)
protocol/txsubmission/server.go (5)
protocol/txsubmission/messages.go (2)
TxBody(197-201)NewMsgRequestTxIds(68-82)protocol/protocol.go (1)
Protocol(39-60)protocol/txsubmission/txsubmission.go (2)
ProtocolName(27-27)MaxAckCount(143-143)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/txsubmission/error.go (1)
ErrStopServerProcess(21-21)
protocol/keepalive/keepalive.go (3)
protocol/keepalive/messages.go (1)
MessageTypeDone(30-30)protocol/state.go (1)
NewState(38-43)protocol/blockfetch/blockfetch.go (1)
StateDone(46-46)
protocol/localtxmonitor/client_test.go (3)
connection.go (1)
Connection(59-103)protocol/localtxmonitor/localtxmonitor.go (1)
LocalTxMonitor(112-115)protocol/localtxmonitor/client.go (1)
Client(25-42)
🔇 Additional comments (43)
protocol/chainsync/error.go (1)
25-26: NewErrSyncCancelledconstant looks goodThe added error and comment are consistent with the existing error style in this package and give a clear sentinel for cancelled sync operations. No changes needed.
protocol/localstatequery/client_test.go (1)
25-25: LocalStateQuery shutdown test and timeout adjustments are consistent
- Using the
ouroborosalias aligns this file with other protocol tests.- Bumping the
runTesttimeout to 5s should help avoid flakiness while still catching hangs.TestClientShutdownfollows the same Start/Stop and error-check pattern used by other protocol clients and gives good coverage of the new lifecycle behavior.No issues from this side.
Also applies to: 112-112, 357-376
protocol/peersharing/server_test.go (1)
28-81: PeerSharing server shutdown test matches established patternThe new
TestServerShutdownmirrors the txsubmission server test pattern (mock NtN connection, Start/Stop, async error monitoring, shutdown timeouts, goleak). Error fromServer.Stop()is now checked, which aligns with prior feedback and avoids masking shutdown failures. Skipping the test while NtN mock issues are unresolved is reasonable.Looks solid.
protocol/leiosfetch/client.go (1)
242-285: Shutdown-aware handlers look correctThe updated handlers that
selectonc.DoneChan()vs sending to the result channels are well-structured:
- They avoid blocking indefinitely during shutdown.
- They prevent send-on-closed-channel panics because channels are only closed after
DoneChanfires.- Returning
protocol.ErrProtocolShuttingDownaligns with the caller-side logic that interprets closed channels/shutdown consistently.No further changes needed here.
protocol/leiosnotify/client.go (1)
31-33: LeiosNotify client lifecycle and shutdown logic now looks robustThe revised client lifecycle handling addresses the earlier race conditions:
stateMutex+started/stoppedensure:
- Stop-before-Start is safe and prevents later Start.
Start()andStop()don’t race on the state fields.Stop()sendsMsgDoneonly whenstartedis true and defers closingrequestNextChanuntilDoneChanwhen the protocol was running, which avoids send-on-closed panics while still unblockingRequestNextwithErrProtocolShuttingDown.- The handlers’
selectonDoneChan()vsrequestNextChanmirrors the shutdown-aware pattern used in other protocols and correctly short-circuits when the protocol is shutting down.This implementation looks correct and aligned with the broader lifecycle pattern in the PR.
Also applies to: 73-90, 93-123, 160-194
protocol/blockfetch/client.go (3)
21-21: LGTM: Lifecycle tracking with atomic.Bool is thread-safe.The addition of
started atomic.Booland its use inStart()correctly tracks whether the protocol was started, enabling proper conditional cleanup inStop(). Usingatomic.Boolavoids data races without needing additional mutex synchronization.Also applies to: 40-40, 98-98
114-129: LGTM: Shutdown sequencing is correct and race-free.The Stop() implementation properly:
- Handles SendMessage errors without losing them
- Always calls Protocol.Stop() to signal muxer shutdown
- Defers channel closure until protocol fully stops (after DoneChan signals), avoiding send-on-closed panics
- Closes channels immediately if never started, preventing goroutine leaks
This resolves the previously identified deadlock and race conditions.
237-241: LGTM: Shutdown-aware channel operations prevent panics.All message handlers correctly use
selectwithDoneChan()to avoid sending to closed channels during shutdown. This pattern ensures handlers returnErrProtocolShuttingDowninstead of panicking when the protocol is stopping.Also applies to: 254-259, 309-313
protocol/localtxsubmission/client.go (4)
34-36: LGTM: Proper lifecycle state tracking with mutex protection.The addition of
stateMutex,started, andcloseSubmitResultOncefields provides thread-safe lifecycle management. Start() correctly guards thestartedflag withstateMutex, preventing data races between Start/Stop called from different goroutines.Also applies to: 79-88
97-123: LGTM: Stop() properly sequences shutdown and prevents channel closure races.Stop() correctly:
- Guards state changes with
stateMutex- Handles SendMessage errors gracefully
- Always calls Protocol.Stop() to ensure muxer cleanup
- Defers channel closure until protocol shutdown (only if started)
- Uses
closeSubmitResultChan()helper to ensure exactly-once closureThis prevents the previously identified TOCTOU race between channel checks and closure.
157-158: LGTM: Server-initiated shutdown properly handled.The addition of
MessageTypeDonehandling andhandleDone()allows the server to cleanly shut down the client by closing the result channel. This complements the client-initiated Stop() path.Also applies to: 210-221
177-181: LGTM: Message handlers use shutdown-aware channel operations.Both
handleAcceptTx()andhandleRejectTx()correctly useselectwithDoneChan()to avoid sending to closed channels during shutdown, preventing panics and ensuring clean shutdown behavior.Also applies to: 202-206
protocol/localtxmonitor/client.go (4)
30-30: LGTM: Thread-safe lifecycle management with multiple mutexes.The addition of
acquiredMutex,stateMutex,started, andstoppedfields provides proper synchronization for lifecycle state. Start() correctly guards thestartedflag, preventing races between concurrent Start/Stop calls.Also applies to: 39-41, 91-101
109-145: LGTM: Stop() avoids deadlocks with careful lock ordering.Stop() correctly:
- Marks
stoppedunderstateMutexfirst- Acquires
busyMutexonly to serialize with SendMessage- Releases
busyMutexbefore calling Protocol.Stop() - crucial to avoid potential deadlocks- Defers channel closure until protocol shutdown completes
The lock release before Protocol.Stop() is especially important and properly implemented.
152-157: LGTM: Operations guard against post-shutdown invocation.All public operations (Acquire, Release, HasTx, NextTx, GetSizes) correctly check the
stoppedflag understateMutexand returnErrProtocolShuttingDownearly, preventing wasted work and ensuring consistent error reporting after Stop().Also applies to: 173-178, 194-199, 231-236, 268-273
210-213: LGTM: Acquired state properly synchronized.All reads and writes to the
acquiredboolean are correctly guarded byacquiredMutex, and message handlers useselectwithDoneChan()for shutdown-aware channel operations. This prevents races on the acquired state and ensures clean shutdown.Also applies to: 247-250, 284-287, 333-341, 414-416
protocol/localtxmonitor/client_test.go (2)
93-93: LGTM: Timeout increase accommodates shutdown complexity.Increasing the mock connection timeout from 2 to 5 seconds is reasonable given the added shutdown synchronization across the PR, which may require more time in test environments.
300-319: LGTM: New test validates client lifecycle.TestClientShutdown properly exercises the Start/Stop lifecycle with a minimal handshake conversation, ensuring the client can be started and stopped cleanly without leaking goroutines (verified by goleak).
protocol/localstatequery/client.go (5)
38-38: LGTM: Proper synchronization for lifecycle state.The addition of
acquiredMutex,onceStop,stateMutex, andstartedprovides thread-safe lifecycle management. Start() correctly guards thestartedflag withstateMutex, matching the pattern used in other protocol clients.Also applies to: 44-46, 110-114
117-149: LGTM: Stop() properly introduced with correct sequencing.The new Stop() method correctly:
- Sends Done message with error handling
- Calls Protocol.Stop() to ensure muxer cleanup
- Defers channel closure until protocol shutdown (only if started)
- Returns any SendMessage error to the caller
This resolves the previously identified issue where Stop() was missing and DoneChan would never close.
152-175: LGTM: Nil routing and explicit acquire methods improve API usability.Acquire() now accepts
niland routes toAcquireVolatileTip(), providing a sensible default. The addition of explicitAcquireVolatileTip()andAcquireImmutableTip()methods gives users more control over acquisition targets while maintaining backward compatibility.Also applies to: 177-205
913-922: LGTM: Message handlers use shutdown-aware operations.All handlers (
handleAcquired,handleFailure,handleResult) correctly:
- Guard shared state (
acquired) withacquiredMutex- Use
selectwithDoneChan()to avoid sending to closed channels during shutdownThis ensures clean shutdown behavior and prevents panics.
Also applies to: 936-946, 961-967
971-1006: LGTM: Internal helpers properly synchronized.The
acquire(),release(), andrunQuery()helpers correctly read and write theacquiredstate underacquiredMutex, ensuring thread-safe coordination of the acquisition lifecycle.Also applies to: 1013-1017, 1022-1029
protocol/chainsync/server.go (1)
245-247: LGTM: Propagating Stop() error makes restart failures observable.The change to check and return the error from
Stop()before reinitializing and restarting the protocol ensures that shutdown failures are properly surfaced to callers rather than silently ignored. This aligns with the broader error-propagation improvements across the PR.protocol/keepalive/keepalive.go (1)
77-80: LGTM: State machine updated to handle server-side Done transition.Adding the
MessageTypeDonetransition fromStateServertoStateDoneprovides an explicit completion path for server-initiated shutdown. This complements the existing client-initiated Done path and aligns with the broader shutdown handling improvements across the PR.protocol/chainsync/client_test.go (3)
83-88: LGTM: Test cleanup prevents goroutine leaks.Adding client.Stop() to the runTest cleanup ensures the ChainSync client is properly stopped after each test, preventing goroutine leaks. Logging Stop() errors rather than failing the test is appropriate for cleanup code, as tests may have already completed their assertions.
95-95: LGTM: Timeout increase accommodates shutdown complexity.Increasing the timeout from 2 to 5 seconds is consistent with similar changes across other protocol tests and accommodates the additional shutdown synchronization introduced in this PR.
285-304: LGTM: New test validates client shutdown behavior.TestClientShutdown properly exercises the Start/Stop lifecycle with a minimal handshake conversation, ensuring the ChainSync client can be started and stopped cleanly. The test correctly asserts that Stop() succeeds without error.
protocol/leiosnotify/server.go (1)
109-123: handleDone now correctly propagates Stop() errorsCalling
s.Stop()and returning its error before reinitializing ensures any future Stop failures abort the restart sequence instead of being silently ignored. GivenProtocol.Stop()is currently infallible, this is a safe, forward‑compatible change.protocol/txsubmission/server_test.go (1)
28-82: Server shutdown test scaffold looks goodThe shutdown test wiring (mock connection, async error funnel, time‑bounded waits, goleak gate) matches the existing pattern used for other protocols and should be effective once the NtN mock issues are resolved and the skip is removed.
protocol/peersharing/server.go (2)
17-32: Server.Stop is now idempotent and propagates protocol errorsAdding
onceStopand returning the result ofs.Protocol.Stop()gives you an idempotent server‑level Stop with proper error propagation, which aligns with the other mini‑protocols.Also applies to: 49-55
119-135: handleDone restart path avoids onceStop interactionSwitching
handleDoneto calls.Protocol.Stop()directly (with a nil‑check) before re‑initializing and starting the protocol fixes the earlier mismatch betweenServer.onceStopand per‑restart shutdown, while still surfacing Stop errors if they ever occur.protocol/chainsync/client.go (3)
28-47: Start/Stop lifecycle flags correctly prevent unsafe reuseUsing atomic
started/stoppedto:
- Block
Start()afterStop(), and- Only send
MsgDone/closereadyForNextBlockChanwhen the protocol has actually startedgives you a clean, idempotent lifecycle and removes the prior risk of closing
readyForNextBlockChanwhile handlers were still active.Also applies to: 120-177
356-366: GetAvailableBlockRange now differentiates shutdown vs. cancellationHandling:
ok == falseasprotocol.ErrProtocolShuttingDown, andready == falseasErrSyncCancelledmakes the caller’s error semantics much clearer. Just ensure upstream users treat
ErrSyncCancelledas a “local abort” andErrProtocolShuttingDownas a global shutdown signal.
750-769: Done-aware sends to readyForNextBlockChan fix send-on-closed racesWrapping writes to
readyForNextBlockChanin:select { case <-c.DoneChan(): return protocol.ErrProtocolShuttingDown case c.readyForNextBlockChan <- {true|false}: }ensures handlers either respect shutdown or safely signal readiness/cancellation, eliminating the earlier race where
Stop()could close the channel while message handlers were still sending.Also applies to: 792-808
protocol/protocol.go (4)
175-193: Stop() API now future‑proof while remaining effectively no‑opChanging
Stop()to returnerrorbut keeping it infallible for now gives you room to introduce failure modes later without another API break. Callers that either ignore or log the error remain correct under the current implementation.
225-257: Error path Stop() invocation in SendMessage is consistentOn pending‑byte limit violations, calling
p.SendError(ErrProtocolViolationQueueExceeded)and then_ = p.Stop()is consistent with treating this as a protocol‑fatal condition while keeping the Stop error non‑disruptive for callers.
272-307: sendLoop shutdown/drain behavior resolves prior MsgDone raceThe new
shuttingDownpath that drainssendQueueChanwheneverrecvDoneChanis closed ensures all queued messages (includingMsgDone) are flushed before exiting, instead of relying on racylen()checks to decide when to stop. The remaininglen(p.sendQueueChan)use is now only for batching and no longer affects correctness.Also applies to: 309-388
510-515: readLoop’s Stop() on recv‑queue violations matches send sideOn receive‑queue limit violations, emitting
ErrProtocolViolationQueueExceededand then calling_ = p.Stop()mirrors the send side behavior and keeps the protocol shutdown semantics consistent.protocol/txsubmission/server.go (4)
27-41: Server lifecycle fields and Stop()/IsStopped() coordination look solidThe combination of:
ackCountasint32with atomics,- a dedicated
donechannel protected bydoneMutex,onceStopfor idempotent Stop, andrestartMutex+stoppedflaggives the server a clear lifecycle model.
Stop()now safely coordinates with restarts (viarestartMutex) and avoids the previous double‑close race ondone.Also applies to: 49-57, 93-130
156-171: RequestTxIds now has race‑free ackCount and restart‑aware waitsUsing
atomic.LoadInt32/StoreInt32forackCountremoves the earlier data race, and taking a snapshot ofrequestTxIdsResultChanunderrestartMutexbefore waiting ensures calls either see a result for the current protocol instance or cleanly bail out via thedonechannel on shutdown.Also applies to: 178-195
219-230: RequestTxs adopts the same guarded wait semanticsMirroring the
RequestTxIdspattern—snapshottingrequestTxsResultChanunderrestartMutexand racing it againstdoneChan()—makesRequestTxsrestart/shutdown‑aware in the same way, which is good for consistency.
263-268: Reply handlers synchronize with restarts via restartMutexLocking
restartMutexwhile sending torequestTxIdsResultChan/requestTxsResultChanensures these sends are coordinated with channel reallocation inhandleDone, removing the prior risk of writing into a channel that has just been swapped during a restart.Also applies to: 280-283
20d440a to
45dcdee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
protocol/localstatequery/client.go (1)
29-47: Stop() can block forever if called before Start(); guard it onstarted.The new
stateMutex/startedstate fixes the prior data race, butStop()still assumes the underlying protocol has been started:
Start()initializes internal protocol state and setsstarted = trueunderstateMutex.Stop()always doesmsg := NewMsgDone(); err = c.SendMessage(msg)before checkingstarted(and always callsc.Protocol.Stop()).If
Stop()is called beforeStart(),SendMessagewill send on a nilsendQueueChanand block indefinitely, sinceProtocol.Start()was never called to create the send queue or runsendLoop. That’s the same class of issue that was addressed in other clients viastarted/atomic.Boolguards.To make Stop safe and idempotent regardless of call order, gate the Done send and Protocol.Stop on
started, for example:func (c *Client) Stop() error { var err error c.onceStop.Do(func() { c.Protocol.Logger(). Debug("stopping client protocol", "component", "network", "protocol", ProtocolName, "connection_id", c.callbackContext.ConnectionId.String(), ) - msg := NewMsgDone() - if err = c.SendMessage(msg); err != nil { - return - } - _ = c.Protocol.Stop() // Error ignored - method returns SendMessage error - // Defer closing channels until protocol fully shuts down (only if started) - c.stateMutex.Lock() - started := c.started - c.stateMutex.Unlock() - if started { + c.stateMutex.Lock() + started := c.started + c.stateMutex.Unlock() + if started { + msg := NewMsgDone() + if err = c.SendMessage(msg); err != nil { + return + } + _ = c.Protocol.Stop() go func() { <-c.DoneChan() close(c.queryResultChan) close(c.acquireResultChan) }() } else { // If protocol was never started, close channels immediately close(c.queryResultChan) close(c.acquireResultChan) } }) return err }This keeps current behaviour for normal Start→Stop flows, and makes Stop safe to call even if Start was never invoked or failed early.
Also applies to: 102-115, 117-149
protocol/localtxmonitor/client.go (1)
24-42: Fix data race onstartedand Stop‑before‑Start behavior in LocalTxMonitor clientThere are two tightly related issues in the LocalTxMonitor client:
Data race on
started
Start()writesc.startedunderstateMutex.Stop()readsc.startedinif c.started { ... }and in the channel‑closure branch without any lock.
This is a classic Go data race (unsynchronized read/write of the same variable across goroutines), and it can also causeStop()to see a stale value and make the wrong decision about sendingMsgDoneor deferring channel closure.Stop‑before‑Start still sends
MsgDone, risking the same hang fixed elsewhere
IfStop()is called beforeStart(),c.startedis stillfalsein actual state, but due to the race it may be read as either value. WhenStop()sendsMsgDonewhile the protocol has never been started, it can block on an undrained send queue, the situation other clients in this PR now avoid by checking astartedflag before sendingDone.You can address both issues and align this client with the patterns used in e.g.
leiosfetchby:
- Reading
startedunderstateMutexinto a local variable inStop(), and using that local for both theMsgDonesend and the channel‑closure decision.- Only sending
MsgDonewhenstartedis true.- Adding a
stoppedcheck inStart()so Start-after-Stop is a no-op instead of starting a protocol whose result channels may already be closed.Concrete diff:
func (c *Client) Start() { c.onceStart.Do(func() { - c.stateMutex.Lock() - defer c.stateMutex.Unlock() - - c.Protocol.Logger(). - Debug("starting client protocol", - "component", "network", - "protocol", ProtocolName, - "connection_id", c.callbackContext.ConnectionId.String(), - ) - c.started = true - c.Protocol.Start() + c.stateMutex.Lock() + if c.stopped { + // Do not start a client that has already been stopped + c.stateMutex.Unlock() + return + } + c.started = true + c.stateMutex.Unlock() + + c.Protocol.Logger(). + Debug("starting client protocol", + "component", "network", + "protocol", ProtocolName, + "connection_id", c.callbackContext.ConnectionId.String(), + ) + c.Protocol.Start() }) } // Stop transitions the protocol to the Done state. No more operations will be possible func (c *Client) Stop() error { var err error c.onceStop.Do(func() { - c.stateMutex.Lock() - c.stopped = true - c.stateMutex.Unlock() + c.stateMutex.Lock() + started := c.started + c.stopped = true + c.stateMutex.Unlock() c.Protocol.Logger(). Debug("stopping client protocol", "component", "network", "protocol", ProtocolName, "connection_id", c.callbackContext.ConnectionId.String(), ) - c.busyMutex.Lock() - msg := NewMsgDone() - if err = c.SendMessage(msg); err != nil { - c.busyMutex.Unlock() - return - } - c.busyMutex.Unlock() + if started { + c.busyMutex.Lock() + msg := NewMsgDone() + if err = c.SendMessage(msg); err != nil { + c.busyMutex.Unlock() + return + } + c.busyMutex.Unlock() + } - // Call Protocol.Stop() after releasing locks to avoid potential deadlocks - _ = c.Protocol.Stop() + // Call Protocol.Stop() after releasing locks to avoid potential deadlocks + _ = c.Protocol.Stop() - // Defer closing channels until protocol fully shuts down (only if started) - if c.started { + // Defer closing channels until protocol fully shuts down (only if started) + if started { go func() { <-c.DoneChan() close(c.acquireResultChan) close(c.hasTxResultChan) close(c.nextTxResultChan) close(c.getSizesResultChan) }() } else { // If protocol was never started, close channels immediately close(c.acquireResultChan) close(c.hasTxResultChan) close(c.nextTxResultChan) close(c.getSizesResultChan) } }) return err }With this change:
- All accesses to
startedare synchronized viastateMutex, eliminating the race.Stop()skipsMsgDonewhen the client was never started, preventing the Stop-before-Start send-blocking issue.Start()becomes a no-op afterStop(), matching the semantics in other updated clients and avoiding surprising restarts.Also applies to: 89-103, 105-148, 150-301, 324-343, 345-394, 396-418
♻️ Duplicate comments (1)
protocol/localtxsubmission/client.go (1)
150-167: Done handling and result channel shutdown fix the prior TOCTOU riskAdding
MessageTypeDonetomessageHandler, implementinghandleDoneto closesubmitResultChanonce, and usingselecton<-c.DoneChan()vsc.submitResultChan <- …in bothhandleAcceptTxandhandleRejectTxresolves the earlier send‑on‑closed‑channel race: sends now either complete or returnErrProtocolShuttingDown, and the channel is only closed viacloseSubmitResultChan()’ssync.Onceguard after shutdown. This is consistent with the updated state machine and the new SubmitTx server‑shutdown test.Also applies to: 169-208, 210-227
🧹 Nitpick comments (7)
protocol/localtxsubmission/client.go (1)
25-37: Start/Stop lifecycle and locking look sound;Stopwaits for in‑flight submitsThe added
onceStart/onceStopplusstateMutex/startedtracking andcloseSubmitResultOncegive a clear, single‑shot lifecycle with a safely closedsubmitResultChan. HoldingbusyMutexinStopmeans it will wait for any in‑flightSubmitTxto finish rather than cancel it mid‑flight, which is reasonable but meansStopcan block if the server is stuck inBusy. If you ever needStopto actively cancel a hung submit, you may want a separate cancellation path that doesn’t serialize onbusyMutex.Also applies to: 77-126
protocol/keepalive/client.go (1)
27-37: Lifecycle flags and Stop() implementation mostly look solid; consider guarding Protocol.Stop() onstarted.Using
onceStart/onceStopplusstartedas anatomic.Boolgives you safe, idempotent Start/Stop and preventsSendMessageon a nilsendQueueChan. Timer cleanup on bothDoneChan()and explicitStop()is also correct.The one edge case left is calling
Stop()beforeStart(): you correctly skipMsgDonewhenc.started.Load()is false, but you still callc.Protocol.Stop(), which unconditionally unregisters from the muxer. If a caller accidentally callsStop()without ever starting the protocol, this relies onMuxer.UnregisterProtocolbeing a no-op for unregistered protocols.If you want Stop to be completely safe regardless of call order (and symmetric with other clients), consider also gating
Protocol.Stop()onstarted, e.g.:- if c.started.Load() { - msg := NewMsgDone() - sendErr := c.SendMessage(msg) - if sendErr != nil { - c.stopErr = sendErr - } - } - stopErr := c.Protocol.Stop() + if c.started.Load() { + msg := NewMsgDone() + sendErr := c.SendMessage(msg) + if sendErr != nil { + c.stopErr = sendErr + } + stopErr := c.Protocol.Stop() + if c.stopErr == nil { + c.stopErr = stopErr + } + }[significant behavior change only if Stop-before-Start is used; otherwise semantics stay the same]
Also applies to: 76-99, 101-133
protocol/protocol.go (1)
175-193: sendLoop shutdown/drain behaviour is much safer now; consider optionally rejecting sends after shutdown.The changes here are a clear improvement:
Stop()is idempotent (viaonceStop) and centralizes muxer unregistration.- The new
shuttingDownflag plus drain loops triggered byrecvDoneChanensure messages queued before shutdown are flushed usingsendMessage, without relying onlen()during shutdown.- Queue-limit violations on send/recv now trigger
Stop()on the protocol, which is consistent with the error path.One remaining semantic edge case is
SendMessagecalls after shutdown has begun: sincesendQueueChanis never closed andSendMessagedoesn’t consultdoneChan, such calls can still succeed (enqueue) even thoughsendLoophas already drained and exited, so those messages will never be sent and no error is propagated.If you want stronger guarantees, consider (in a follow-up, not necessarily this PR):
- Guarding
SendMessagewith a fast check ondoneChan/ a shutdown flag (returningErrProtocolShuttingDown), or- Closing
sendQueueChanat shutdown start to make further sends panic in tests and be caught early.As-is, the refactor is sound and materially improves shutdown behaviour; the above would just tighten post-shutdown semantics.
Also applies to: 225-258, 272-418, 420-443, 516-517
protocol/txsubmission/server_concurrency_test.go (2)
28-95: Concurrent Stop() test structure looks solidThe Start/Stop concurrency test is well‑structured: bounded goroutine count, WaitGroup, and a timeout guard to catch deadlocks. Once the NtN mock issues are resolved, this should give good coverage for idempotent, concurrent
Server.Stop()behavior.
97-143: Stopped flag assertion is straightforward and matches the APIThe Stop/IsStopped() test cleanly exercises the lifecycle flag and matches the intended semantics of a permanent server stop. With the test currently skipped due to mock limitations, it’s a good candidate to enable once the NtN mock behavior is fixed.
protocol/localtxmonitor/client_test.go (1)
300-319: Client shutdown test aligns with new lifecycle semanticsThe new
TestClientShutdowncleanly verifies that the LocalTxMonitor client can be started and stopped without error using the standard handshake conversation. This matches the shutdown coverage added for other protocols and should catch basic lifecycle regressions.protocol/peersharing/client.go (1)
30-35: Lifecycle and shutdown wiring for PeerSharing client looks sound; consider a small consistency tweakThe Start/Stop pattern plus
handleSharePeers’selectonDoneChan()correctly prevents send-on-closed panics and avoids Start-after-Stop; thestarted/stoppedflags are only touched understateMutex, so there’s no obvious race here.If you want to match the patterns used in other clients (e.g.,
localtxmonitor,leiosfetch) and further reduce coupling between client locks and the underlyingProtocol, you could releasestateMutexbefore callingc.Protocol.Stop()(there’s no need to hold it across that call). Behavior wouldn’t change, but it slightly simplifies the locking story and keeps allProtocol.Stop()calls outside client-specific mutexes.Also applies to: 75-95, 97-124, 160-175
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (34)
protocol/blockfetch/blockfetch.go(1 hunks)protocol/blockfetch/client.go(7 hunks)protocol/blockfetch/client_test.go(1 hunks)protocol/blockfetch/server.go(1 hunks)protocol/chainsync/chainsync.go(1 hunks)protocol/chainsync/client.go(6 hunks)protocol/chainsync/client_concurrency_test.go(1 hunks)protocol/chainsync/client_test.go(2 hunks)protocol/chainsync/error.go(1 hunks)protocol/chainsync/server.go(1 hunks)protocol/keepalive/client.go(4 hunks)protocol/keepalive/client_test.go(1 hunks)protocol/keepalive/keepalive.go(1 hunks)protocol/leiosfetch/client.go(5 hunks)protocol/leiosfetch/server.go(1 hunks)protocol/leiosnotify/client.go(3 hunks)protocol/leiosnotify/client_concurrency_test.go(1 hunks)protocol/leiosnotify/client_test.go(1 hunks)protocol/leiosnotify/server.go(1 hunks)protocol/localstatequery/client.go(6 hunks)protocol/localstatequery/client_test.go(3 hunks)protocol/localtxmonitor/client.go(15 hunks)protocol/localtxmonitor/client_test.go(2 hunks)protocol/localtxsubmission/client.go(6 hunks)protocol/localtxsubmission/client_test.go(2 hunks)protocol/localtxsubmission/localtxsubmission.go(2 hunks)protocol/peersharing/client.go(4 hunks)protocol/peersharing/client_test.go(1 hunks)protocol/peersharing/server.go(4 hunks)protocol/peersharing/server_test.go(1 hunks)protocol/protocol.go(8 hunks)protocol/txsubmission/server.go(10 hunks)protocol/txsubmission/server_concurrency_test.go(1 hunks)protocol/txsubmission/server_test.go(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- protocol/peersharing/server_test.go
🚧 Files skipped from review as they are similar to previous changes (13)
- protocol/blockfetch/server.go
- protocol/chainsync/client_test.go
- protocol/keepalive/keepalive.go
- protocol/chainsync/client_concurrency_test.go
- protocol/txsubmission/server_test.go
- protocol/keepalive/client_test.go
- protocol/chainsync/error.go
- protocol/blockfetch/blockfetch.go
- protocol/blockfetch/client.go
- protocol/leiosnotify/client_concurrency_test.go
- protocol/localstatequery/client_test.go
- protocol/chainsync/chainsync.go
- protocol/leiosnotify/client_test.go
🧰 Additional context used
🧬 Code graph analysis (17)
protocol/blockfetch/client_test.go (5)
protocol/chainsync/client_test.go (1)
TestClientShutdown(285-304)protocol/keepalive/client_test.go (1)
TestClientShutdown(303-322)connection.go (1)
Connection(59-103)protocol/blockfetch/blockfetch.go (1)
BlockFetch(102-105)protocol/blockfetch/client.go (1)
Client(30-41)
protocol/localtxmonitor/client_test.go (2)
connection.go (1)
Connection(59-103)protocol/localtxmonitor/client.go (1)
Client(25-42)
protocol/peersharing/client.go (8)
protocol/keepalive/client.go (1)
Client(27-37)protocol/leiosnotify/client.go (1)
Client(24-34)protocol/localstatequery/client.go (1)
Client(30-47)protocol/localtxmonitor/client.go (1)
Client(25-42)protocol/localtxsubmission/client.go (1)
Client(26-37)protocol/protocol.go (1)
Protocol(39-60)protocol/peersharing/peersharing.go (1)
ProtocolName(27-27)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/leiosnotify/client.go (9)
protocol/protocol.go (1)
Protocol(39-60)protocol/chainsync/client.go (1)
Client(29-47)protocol/leiosfetch/client.go (1)
Client(26-38)protocol/localstatequery/client.go (1)
Client(30-47)protocol/localtxmonitor/client.go (1)
Client(25-42)protocol/localtxsubmission/client.go (1)
Client(26-37)protocol/peersharing/client.go (1)
Client(25-35)protocol/leiosnotify/messages.go (1)
NewMsgDone(149-156)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/localtxsubmission/client_test.go (13)
protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-230)protocol/chainsync/client_test.go (1)
TestClientShutdown(285-304)protocol/keepalive/client_test.go (1)
TestClientShutdown(303-322)protocol/leiosnotify/client_test.go (1)
TestClientShutdown(117-136)protocol/localstatequery/client_test.go (1)
TestClientShutdown(357-376)protocol/localtxmonitor/client_test.go (1)
TestClientShutdown(300-319)protocol/peersharing/client_test.go (1)
TestClientShutdown(95-114)connection.go (1)
Connection(59-103)protocol/localtxsubmission/localtxsubmission.go (1)
LocalTxSubmission(75-78)protocol/localtxsubmission/client.go (1)
Client(26-37)internal/test/helpers.go (1)
DecodeHexString(14-22)ledger/babbage.go (1)
TxTypeBabbage(39-39)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/localtxsubmission/localtxsubmission.go (1)
protocol/localtxsubmission/messages.go (1)
MessageTypeDone(29-29)
protocol/localstatequery/client.go (5)
protocol/protocol.go (1)
Protocol(39-60)protocol/localstatequery/localstatequery.go (1)
AcquireTarget(131-133)protocol/localstatequery/messages.go (3)
NewMsgDone(245-252)AcquireFailurePointNotOnChain(44-44)MsgResult(172-175)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/localstatequery/error.go (2)
ErrAcquireFailurePointTooOld(20-20)ErrAcquireFailurePointNotOnChain(23-25)
protocol/peersharing/client_test.go (5)
connection.go (2)
Connection(59-103)NewConnection(107-130)protocol/protocol.go (2)
ProtocolRoleClient(94-94)New(122-133)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/peersharing/peersharing.go (1)
PeerSharing(67-70)protocol/peersharing/client.go (1)
Client(25-35)
protocol/chainsync/client.go (4)
protocol/protocol.go (1)
Protocol(39-60)protocol/blockfetch/client.go (1)
Client(30-41)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/chainsync/error.go (1)
ErrSyncCancelled(26-26)
protocol/protocol.go (3)
protocol/message.go (1)
Message(18-22)cbor/encode.go (1)
Encode(27-40)muxer/segment.go (1)
NewSegment(48-69)
protocol/peersharing/server.go (2)
protocol/txsubmission/server.go (1)
Server(28-41)protocol/protocol.go (1)
Protocol(39-60)
protocol/localtxsubmission/client.go (4)
protocol/localtxsubmission/localtxsubmission.go (3)
Config(81-84)CallbackContext(87-91)ProtocolName(27-27)protocol/protocol.go (1)
Protocol(39-60)protocol/localtxsubmission/messages.go (1)
MessageTypeDone(29-29)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/leiosfetch/client.go (4)
protocol/protocol.go (1)
Protocol(39-60)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/blockfetch/client.go (1)
Client(30-41)protocol/chainsync/client.go (1)
Client(29-47)
protocol/localtxmonitor/client.go (4)
protocol/protocol.go (1)
Protocol(39-60)protocol/localstatequery/client.go (1)
Client(30-47)protocol/localtxmonitor/messages.go (1)
NewMsgDone(79-86)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/txsubmission/server.go (5)
protocol/txsubmission/messages.go (1)
NewMsgRequestTxIds(68-82)protocol/protocol.go (1)
Protocol(39-60)protocol/txsubmission/txsubmission.go (1)
MaxAckCount(143-143)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/txsubmission/error.go (1)
ErrStopServerProcess(21-21)
protocol/txsubmission/server_concurrency_test.go (3)
connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/txsubmission/txsubmission.go (1)
TxSubmission(126-129)protocol/txsubmission/server.go (1)
Server(28-41)
protocol/keepalive/client.go (3)
protocol/blockfetch/client.go (1)
Client(30-41)protocol/protocol.go (1)
Protocol(39-60)protocol/keepalive/messages.go (1)
NewMsgDone(94-101)
🔇 Additional comments (14)
protocol/chainsync/server.go (1)
234-250: Propagating Stop error fromhandleDonelooks correctReturning the
s.Stop()error instead of blindly restarting improves failure visibility and avoids reinitializing a protocol that failed to stop cleanly. The restart path is only taken on a clean Stop, which is a sensible guard.protocol/localtxsubmission/localtxsubmission.go (1)
39-68: Done transitions in StateMap align with client/server shutdown semanticsAdding
MessageTypeDonetransitions fromstateIdle(client) andstateBusy(server) intostateDonematches the new clientMessageTypeDonehandler and shutdown tests, giving the protocol an explicit completion path on both sides.protocol/blockfetch/client_test.go (1)
210-230: NewTestClientShutdownprovides useful lifecycle coverageThe shutdown test mirrors the pattern used by other mini‑protocol tests and ensures
Client.Start()/Client.Stop()work without additional traffic, which is a good smoke test for the new lifecycle logic.protocol/localtxsubmission/client_test.go (2)
81-88: Extending the mock shutdown timeout to 5s is reasonableBumping the wait from 2s to 5s in the harness should make these tests less flaky under load while still keeping failures bounded in time.
167-216: New client lifecycle and server‑shutdown tests nicely cover the Done path
TestClientShutdownconfirms the basic Start/Stop lifecycle for LocalTxSubmission, andTestSubmitTxServerShutdownvalidates that a server‑initiatedDonecausesSubmitTxto returnErrProtocolShuttingDownas intended. Together they exercise the new Done handling and result‑channel closure behavior end‑to‑end.protocol/chainsync/client.go (1)
35-39: Shutdown-aware lifecycle and readyForNextBlockChan handling look correct and fix the prior race.The combination of:
started/stoppedasatomic.BoolwithonceStart/onceStop,- Stop() sending
MsgDoneonly when started, then closingreadyForNextBlockChanafterDoneChan()when started (or immediately when never started),- and the new Done-guarded
selectsends/receives onreadyForNextBlockChan(inGetAvailableBlockRange,handleRollForward, andhandleRollBackward),gives you a clean, race-free shutdown path and clear error signalling (
ErrProtocolShuttingDownvsErrSyncCancelled). This resolves the previous write-after-close panic risk onreadyForNextBlockChanwithout introducing obvious new concurrency hazards.Also applies to: 120-177, 258-382, 623-769, 772-809
protocol/localstatequery/client.go (1)
37-42: acquired state and shutdown-aware handlers are correctly synchronized.The introduction of
acquiredMutexand its use in:
handleAcquired(settingc.acquired = truebefore signallingacquireResultChan),acquire()/release()(reading and updatingacquired),- and
runQuery()(auto-acquiring a volatile tip when not acquired),gives you race-free management of the acquired state.
The Done-aware selects in
handleAcquired,handleFailure, andhandleResult, combined with closingqueryResultChan/acquireResultChanafterDoneChan()in Stop, ensure:
- No sends occur after channels are closed.
- Callers of
acquire()/runQuery()seeErrProtocolShuttingDownconsistently when the client is stopping.Overall this is a solid concurrency cleanup for LocalStateQuery.
Also applies to: 905-923, 925-951, 970-1006, 1008-1017, 1020-1041
protocol/txsubmission/server.go (1)
27-41: Server lifecycle, restart, and ackCount handling look correct and fix earlier races.Key points that look good here:
ackCountis now anint32with all access viasync/atomic, eliminating the previous data race betweenRequestTxIds()and the restart path.doneis protected bydoneMutexfor both closing and re-assignment, and all waiters go throughdoneChan(), which removes prior unsynchronized read/write and double-close risks.Stop()is idempotent (onceStop), coordinates withhandleDone()viarestartMutex, sets astoppedflag, and safely closesdonebefore callingProtocol.Stop().RequestTxIds/RequestTxssnapshot their result channels underrestartMutexand then select on that vsdoneChan(), giving clean shutdown semantics (ErrProtocolShuttingDownwhen stopping).handleDone()now:
- Non-blockingly signals any in-flight
RequestTxIds,- Skips restart when
stoppedis true,- Otherwise stops the current protocol, reinitializes
Protocoland the result/done channels, resetsackCount, and restarts the protocol outside the lock, with an extraIsStopped()check for TOCTOU.Overall, this is a solid concurrency and lifecycle cleanup for TxSubmission.Server, with no obvious remaining races in the Stop/restart paths.
Also applies to: 48-57, 83-116, 118-130, 132-195, 198-231, 254-283, 286-349
protocol/leiosfetch/server.go (1)
194-208: Properly propagating Stop() errors before restartRouting the Done path through
s.Stop()and returning its error before reinitializing the protocol is the right move; it prevents silent shutdown failures and aligns this server with the lifecycle pattern used elsewhere in the codebase.protocol/localtxmonitor/client_test.go (1)
88-95: Increased mock shutdown timeout reduces flakinessBumping the mock connection shutdown timeout to 5 seconds is reasonable and should help reduce sporadic test failures on slower CI runs, while still keeping hung tests bounded.
protocol/leiosnotify/server.go (1)
109-123: Done handler now correctly respects Stop() failuresHaving
handleDonecalls.Stop()and propagate any error before reinitializing and starting the protocol ensures that restart only proceeds from a clean shutdown, improving observability and consistency with the broader protocol lifecycle pattern.protocol/peersharing/server.go (1)
17-32: Server Stop() and Done restart flow now correctly separate per‑server and per‑protocol shutdownIntroducing
onceStoponServerwith a Stop() method that delegates tos.Protocol.Stop()(and returns its error) gives you an idempotent, externally callable shutdown hook. UpdatinghandleDoneto calls.Protocol.Stop()directly before reinitializing and restarting cleanly separates the per‑Server “final Stop” from the per‑Protocol restart-on-Done behavior, avoiding the earlier once/Restart interaction issues.Also applies to: 49-55, 119-135
protocol/leiosfetch/client.go (1)
20-37: LeiosFetch client lifecycle and shutdown handling looks correctThe atomic
started/stoppedflags, conditionalMsgDonesend, unconditionalProtocol.Stop(), and DoneChan‑aware handlers together give a clean Start/Stop story without obvious races or send‑on‑closed risks. The Stop‑before‑Start and Start‑after‑Stop cases are both handled in a way consistent with the other clients.Also applies to: 92-107, 109-149, 151-221, 246-289
protocol/peersharing/client_test.go (1)
28-93: PeerSharing client shutdown test harness looks solidThe
runTesthelper plusTestClientShutdowngive good coverage of client Start/Stop behavior under a mock NtN handshake, with goleak and explicit timeouts guarding against hangs or leaks. The async error channels also ensure unexpected protocol errors fail the test promptly.Also applies to: 95-114
45dcdee to
06fa23d
Compare
- Increase default queue sizes for better buffering - Eliminate unnecessary goroutines for channel cleanup - Apply optimizations consistently across all protocols Signed-off-by: Chris Gianelloni <wolf31o2@blinklabs.io>
06fa23d to
1d4e741
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
protocol/blockfetch/client.go (1)
30-41: GuardSendMessage(MsgClientDone)withstartedflag to prevent Send on uninitialized queueThe blockfetch
Stop()method unconditionally sendsMsgClientDoneand callsSendMessage, even ifStart()was never invoked. This diverges from the established pattern inkeepalive/client.goandleiosnotify/client.go, which both guardSendMessagewith astartedcheck. IfStop()is called beforeStart(), the send will occur on an uninitializedsendQueueChan, causing a hang or panic.Fix at lines 103–132: Guard the
MsgClientDonesend andProtocol.Stop()call withc.started.Load():func (c *Client) Stop() error { var err error c.onceStop.Do(func() { c.Protocol.Logger(). Debug("stopping client protocol", "component", "network", "protocol", ProtocolName, "connection_id", c.callbackContext.ConnectionId.String(), ) - msg := NewMsgClientDone() - if sendErr := c.SendMessage(msg); sendErr != nil { - err = sendErr - } - _ = c.Protocol.Stop() + if c.started.Load() { + msg := NewMsgClientDone() + if sendErr := c.SendMessage(msg); sendErr != nil { + err = sendErr + } + _ = c.Protocol.Stop() + } // Defer closing channels until protocol fully shuts down (only if started) if c.started.Load() { go func() { <-c.DoneChan() close(c.blockChan) close(c.startBatchResultChan) }() } else { // If protocol was never started, close channels immediately close(c.blockChan) close(c.startBatchResultChan) } }) return err }protocol/leiosnotify/client.go (1)
24-35:Stop()should callProtocol.Stop()to ensure shutdown progress regardless of peer behaviorThe lifecycle flags and
stateMutexordering inStart()/Stop()are safe, and therequestNextChanclosure logic is correct. However,Stop()never callsc.Protocol.Stop(), which means:
- The
recvLoop()waits indefinitely onmuxerDoneChan, which only signals whenUnregisterProtocol()is called- Since
Stop()never callsProtocol.Stop(),UnregisterProtocol()is never invoked- If the peer disconnects or misbehaves,
muxerDoneChanmay never signal, leavingrecvDoneChanopen- This prevents
p.doneChanfrom ever closing (it awaits bothrecvDoneChanandsendDoneChan)- The goroutine at line 115 blocks indefinitely on
<-c.DoneChan()Other clients in this PR already call
Protocol.Stop()(blockfetch, chainsync, peersharing, keepalive, localtxsubmission, localtxmonitor, leiosfetch) to guaranteeDoneChan()closes even on peer misbehavior. SinceProtocol.Stop()is idempotent and safe to call regardless of protocol state, updateStop()to call it.The suggested diff pattern is sound: capture state flags under lock, release early, and call
Protocol.Stop()after releasing the lock to avoid deadlock.
🧹 Nitpick comments (7)
protocol/leiosfetch/server.go (1)
194-208: Decouple restart path from any future Server.Stop guardsPropagating the
Stop()error here is good. However, callings.Stop()for the restart path coupleshandleDone()to whatever semanticsServer.Stop()may grow in the future (e.g., async.Onceguard, as seen inpeersharing.Server). That can reintroduce the “can’t stop restarted protocols” issue ifStop()becomes a one-shot shutdown.Consider calling the embedded protocol directly instead:
- // Restart protocol - if err := s.Stop(); err != nil { + // Restart protocol: stop only the current Protocol instance + if err := s.Protocol.Stop(); err != nil { return err }This keeps
handleDone()focused on per-instance teardown while allowingServer.Stop()(if added or changed later) to represent permanent server shutdown.protocol/leiosfetch/client.go (1)
17-21: Tighten Stop() error handling and clarify Start/Stop concurrency expectationsThe lifecycle changes (atomic
started/stopped, onceStart/onceStop) significantly improve shutdown behavior and allowStop()-before-Start()cases to be handled cleanly. A couple of refinements would make this even more robust:
- Preserve
Protocol.Stop()errors as well as SendMessage errors
Stop()currently only propagatesSendMessagefailures; any error fromc.Protocol.Stop()is silently dropped:if c.started.Load() { if sendErr := c.SendMessage(msg); sendErr != nil { err = sendErr } } // Always attempt to stop the protocol... _ = c.Protocol.Stop() // Stop error ignoredIf
Protocol.Stop()can fail (e.g., muxer shutdown problems), callers have no visibility. You can preserve both while still prioritizing the SendMessage error:func (c *Client) Stop() error { var err error c.onceStop.Do(func() { @@ - if c.started.Load() { - if sendErr := c.SendMessage(msg); sendErr != nil { - // Preserve the SendMessage error but still shut down the protocol. - err = sendErr - } - } - // Always attempt to stop the protocol so DoneChan and muxer shutdown complete. - _ = c.Protocol.Stop() // Stop error ignored; err already reflects SendMessage failure if any + if c.started.Load() { + if sendErr := c.SendMessage(msg); sendErr != nil { + // Preserve the SendMessage error but still shut down the protocol. + err = sendErr + } + } + // Always attempt to stop the protocol so DoneChan and muxer shutdown complete. + if stopErr := c.Protocol.Stop(); stopErr != nil && err == nil { + // Only surface Stop() error if we don't already have a SendMessage error. + err = stopErr + } @@ - c.started.Store(false) - c.stopped.Store(true) + c.started.Store(false) + c.stopped.Store(true) }) return err }
- Document or guard against concurrent Start/Stop from different goroutines
Atomics eliminate data races on
started/stopped, but there’s still no higher-level coordination betweenStart()andStop()calls from different goroutines. For example, ifStop()runs early whilestartedis stillfalse, it may close the result channels immediately, and a concurrentStart()that passes thestopped.Load()check (due to timing) can still callc.Protocol.Start()with already-closed result channels.If your usage guarantees
Start()/Stop()are serialized by the caller, consider adding a brief comment to that effect. If concurrent Start/Stop is intended to be supported, adding a small state mutex (similar toleiosnotify.Clientandpeersharing.Client) around the high-level state changes would make those guarantees explicit.Also applies to: 26-38, 92-149
protocol/localtxsubmission/client.go (1)
77-91: Consider aligning Start/Stop semantics with other clientsHere
Start()unconditionally marksstarted = trueand callsProtocol.Start(), andStop()always sendsMsgDone(even ifStart()was never called), while only usingstartedto decide when to wait onDoneChanbefore closingsubmitResultChan. Other clients (e.g. LocalTxMonitor) track astoppedflag and skip sendingDonewhen never started, and preventStart()from running after a priorStop().Not a bug, but you may want to:
- Add a
stoppedflag and short‑circuitStart()whenstoppedis true, and- Only send
MsgDoneinStop()whenstartedis true,to keep lifecycle semantics consistent across protocols and avoid odd edge‑cases like
Stop()being called beforeStart().Also applies to: 93-126
protocol/keepalive/client.go (1)
26-37: KeepAlive Start/Stop mostly look good; consider tightening thestartedsemantics for concurrencyThe new lifecycle wiring is an improvement:
Stop()is idempotent viaonceStop,- It skips
MsgDone/Protocol.Stop()when the client was never started, avoiding the earlier “Stop-before-Start” bug, and- Timer cleanup is handled both on
Stop()and on protocol shutdown via theDoneChan()watcher.One subtle corner case:
Start()setsc.startedtotruebefore callingc.Protocol.Start(), whileStop()usesc.started.Load()to decide whether to sendMsgDoneand callProtocol.Stop(). IfStart()andStop()can be invoked from different goroutines, there is a small window whereStop()may observestarted=trueeven though the underlying protocol hasn’t actually finished starting yet (e.g.,sendQueueChannot initialized), which is exactly whatstartedwas intended to guard against.If you expect concurrent Start/Stop, it would be safer to:
- Move
c.started.Store(true)afterc.Protocol.Start(), and- Optionally document that Stop-before-Start and Start-after-Stop are supported only in the non‑concurrent sense.
Example tweak:
func (c *Client) Start() { c.onceStart.Do(func() { - c.started.Store(true) c.Protocol.Logger(). Debug("starting client protocol", ...) c.Protocol.Start() + c.started.Store(true) // DoneChan cleanup goroutine... go func() { ... }() c.sendKeepAlive() }) }If your usage never calls
Start()/Stop()concurrently, this remains mostly a robustness improvement rather than a correctness fix, but it’s cheap insurance.Also applies to: 76-99, 101-132
protocol/chainsync/client.go (1)
28-47: ChainSync lifecycle tracking is solid; watchstartedordering for concurrent Start/StopThe new lifecycle additions improve robustness:
stoppedpreventsStart()from running afterStop().startedis used to decide when it’s valid to sendMsgDoneand to defer closingreadyForNextBlockChanuntilDoneChan()has fired.Stop()always callsc.Protocol.Stop(), so muxer/protocol shutdown is not left to the peer.Sequential Start/Stop and Stop-before-Start flows look correct. The only subtlety is the same as in keepalive:
Start()doesc.started.Store(true)beforec.Protocol.Start(), whileStop()usesc.started.Load()to decide whether to sendMsgDone.If
Start()andStop()can be invoked from different goroutines, there’s a small window whereStop()may seestarted=truebut the underlying protocol isn’t fully started yet (e.g., send loop not initialised), andSendMessagemay again run “too early”.If concurrent Start/Stop is part of your contract (and
client_concurrency_test.gosuggests it might be), consider:func (c *Client) Start() { c.onceStart.Do(func() { if c.stopped.Load() { return } c.Protocol.Logger().Debug("starting client protocol", ...) - c.started.Store(true) - c.Protocol.Start() + c.Protocol.Start() + c.started.Store(true) }) }so
startedmore accurately reflects “protocol ready for messages”.Also applies to: 120-134, 136-177
protocol/protocol.go (1)
281-330: Consider extracting duplicate draining logic.The draining loops at lines 292-307 and 316-330 are nearly identical. Consider extracting this pattern into a helper method to reduce duplication:
func (p *Protocol) drainSendQueue() { for { select { case msg, ok := <-p.sendQueueChan: if !ok { return } p.sendMessage(msg) default: return } } }Then replace both loops with
p.drainSendQueue().protocol/localstatequery/client.go (1)
925-951: Consider simplifying duplicate error mapping.The two failure cases have identical channel-send logic. Consider reducing duplication:
func (c *Client) handleFailure(msg protocol.Message) error { ... msgFailure := msg.(*MsgFailure) + var err error switch msgFailure.Failure { case AcquireFailurePointTooOld: - select { - case <-c.DoneChan(): - return protocol.ErrProtocolShuttingDown - case c.acquireResultChan <- ErrAcquireFailurePointTooOld: - } + err = ErrAcquireFailurePointTooOld case AcquireFailurePointNotOnChain: - select { - case <-c.DoneChan(): - return protocol.ErrProtocolShuttingDown - case c.acquireResultChan <- ErrAcquireFailurePointNotOnChain: - } + err = ErrAcquireFailurePointNotOnChain default: return fmt.Errorf("unknown failure type: %d", msgFailure.Failure) } + select { + case <-c.DoneChan(): + return protocol.ErrProtocolShuttingDown + case c.acquireResultChan <- err: + } return nil }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (32)
protocol/blockfetch/client.go(7 hunks)protocol/blockfetch/client_test.go(1 hunks)protocol/blockfetch/server.go(1 hunks)protocol/chainsync/client.go(6 hunks)protocol/chainsync/client_concurrency_test.go(1 hunks)protocol/chainsync/client_test.go(2 hunks)protocol/chainsync/error.go(1 hunks)protocol/chainsync/server.go(1 hunks)protocol/keepalive/client.go(4 hunks)protocol/keepalive/client_test.go(1 hunks)protocol/keepalive/keepalive.go(1 hunks)protocol/leiosfetch/client.go(5 hunks)protocol/leiosfetch/server.go(1 hunks)protocol/leiosnotify/client.go(3 hunks)protocol/leiosnotify/client_concurrency_test.go(1 hunks)protocol/leiosnotify/client_test.go(1 hunks)protocol/leiosnotify/server.go(1 hunks)protocol/localstatequery/client.go(6 hunks)protocol/localstatequery/client_test.go(3 hunks)protocol/localtxmonitor/client.go(15 hunks)protocol/localtxmonitor/client_test.go(2 hunks)protocol/localtxsubmission/client.go(6 hunks)protocol/localtxsubmission/client_test.go(2 hunks)protocol/localtxsubmission/localtxsubmission.go(2 hunks)protocol/peersharing/client.go(4 hunks)protocol/peersharing/client_test.go(1 hunks)protocol/peersharing/server.go(4 hunks)protocol/peersharing/server_test.go(1 hunks)protocol/protocol.go(8 hunks)protocol/txsubmission/server.go(10 hunks)protocol/txsubmission/server_concurrency_test.go(1 hunks)protocol/txsubmission/server_test.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (12)
- protocol/localtxsubmission/localtxsubmission.go
- protocol/keepalive/client_test.go
- protocol/peersharing/client_test.go
- protocol/chainsync/client_concurrency_test.go
- protocol/localtxsubmission/client_test.go
- protocol/chainsync/server.go
- protocol/blockfetch/server.go
- protocol/peersharing/server_test.go
- protocol/txsubmission/server_concurrency_test.go
- protocol/chainsync/client_test.go
- protocol/localstatequery/client_test.go
- protocol/txsubmission/server_test.go
🧰 Additional context used
🧬 Code graph analysis (17)
protocol/leiosfetch/client.go (8)
protocol/protocol.go (1)
Protocol(39-60)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/blockfetch/client.go (1)
Client(30-41)protocol/chainsync/client.go (1)
Client(29-47)protocol/keepalive/client.go (1)
Client(27-37)protocol/leiosnotify/client.go (1)
Client(24-34)protocol/peersharing/client.go (1)
Client(25-35)protocol/message.go (1)
Message(18-22)
protocol/leiosnotify/client_concurrency_test.go (2)
protocol/chainsync/client_concurrency_test.go (1)
TestStopBeforeStart(106-148)protocol/leiosnotify/client.go (1)
Client(24-34)
protocol/blockfetch/client.go (2)
protocol/protocol.go (2)
Protocol(39-60)New(122-133)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/peersharing/client.go (5)
protocol/leiosnotify/client.go (1)
Client(24-34)protocol/localstatequery/client.go (1)
Client(30-47)protocol/localtxsubmission/client.go (1)
Client(26-37)protocol/protocol.go (1)
Protocol(39-60)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/peersharing/server.go (2)
protocol/txsubmission/server.go (1)
Server(28-41)protocol/protocol.go (1)
Protocol(39-60)
protocol/keepalive/keepalive.go (3)
protocol/keepalive/messages.go (1)
MessageTypeDone(30-30)protocol/state.go (1)
NewState(38-43)protocol/blockfetch/blockfetch.go (1)
StateDone(46-46)
protocol/leiosnotify/client_test.go (6)
protocol/versiondata.go (6)
VersionData(40-46)VersionDataNtN13andUp(143-145)VersionDataNtN11to12(116-122)DiffusionModeInitiatorOnly(21-21)PeerSharingModeNoPeerSharing(27-27)QueryModeDisabled(36-36)protocol/handshake/messages.go (1)
NewMsgAcceptVersion(88-102)connection.go (2)
Connection(59-103)NewConnection(107-130)connection_options.go (3)
WithConnection(36-40)WithNetworkMagic(50-54)WithNodeToNode(78-82)protocol/leiosnotify/leiosnotify.go (1)
LeiosNotify(75-78)protocol/leiosnotify/client.go (1)
Client(24-34)
protocol/leiosnotify/client.go (3)
protocol/protocol.go (1)
Protocol(39-60)protocol/leiosnotify/messages.go (1)
NewMsgDone(149-156)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/blockfetch/client_test.go (3)
connection.go (1)
Connection(59-103)protocol/blockfetch/blockfetch.go (1)
BlockFetch(102-105)protocol/blockfetch/client.go (1)
Client(30-41)
protocol/protocol.go (3)
protocol/message.go (1)
Message(18-22)cbor/encode.go (1)
Encode(27-40)muxer/segment.go (1)
NewSegment(48-69)
protocol/localtxmonitor/client.go (4)
protocol/protocol.go (1)
Protocol(39-60)protocol/localstatequery/client.go (1)
Client(30-47)protocol/localtxmonitor/messages.go (1)
NewMsgDone(79-86)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/localtxsubmission/client.go (4)
protocol/localtxsubmission/localtxsubmission.go (2)
Config(81-84)CallbackContext(87-91)protocol/protocol.go (1)
Protocol(39-60)protocol/localtxsubmission/messages.go (1)
MessageTypeDone(29-29)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)
protocol/chainsync/client.go (3)
protocol/protocol.go (1)
Protocol(39-60)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/chainsync/error.go (1)
ErrSyncCancelled(26-26)
protocol/txsubmission/server.go (5)
protocol/txsubmission/messages.go (2)
TxBody(197-201)NewMsgRequestTxIds(68-82)protocol/protocol.go (1)
Protocol(39-60)protocol/txsubmission/txsubmission.go (1)
MaxAckCount(143-143)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/txsubmission/error.go (1)
ErrStopServerProcess(21-21)
protocol/localtxmonitor/client_test.go (10)
protocol/blockfetch/client_test.go (1)
TestClientShutdown(211-230)protocol/chainsync/client_test.go (1)
TestClientShutdown(285-304)protocol/keepalive/client_test.go (1)
TestClientShutdown(303-322)protocol/leiosnotify/client_test.go (1)
TestClientShutdown(117-136)protocol/localstatequery/client_test.go (1)
TestClientShutdown(357-376)protocol/localtxsubmission/client_test.go (1)
TestClientShutdown(167-186)protocol/peersharing/client_test.go (1)
TestClientShutdown(95-114)connection.go (1)
Connection(59-103)protocol/localtxmonitor/localtxmonitor.go (1)
LocalTxMonitor(112-115)protocol/localtxmonitor/client.go (1)
Client(25-42)
protocol/keepalive/client.go (2)
protocol/protocol.go (1)
Protocol(39-60)protocol/keepalive/messages.go (1)
NewMsgDone(94-101)
protocol/localstatequery/client.go (5)
protocol/protocol.go (1)
Protocol(39-60)protocol/localtxmonitor/client.go (1)
Client(25-42)protocol/localstatequery/messages.go (4)
NewMsgDone(245-252)AcquireFailurePointNotOnChain(44-44)MsgResult(172-175)NewMsgQuery(160-170)protocol/error.go (1)
ErrProtocolShuttingDown(19-19)protocol/localstatequery/error.go (2)
ErrAcquireFailurePointTooOld(20-20)ErrAcquireFailurePointNotOnChain(23-25)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (38)
protocol/keepalive/keepalive.go (1)
69-81: Done transition from server state looks correctAdding
MessageTypeDone -> StateDoneforStateServercleanly mirrors the client-side transition and gives the keep-alive state machine a proper completion path from both agencies. No issues from a protocol/state-machine perspective.protocol/leiosnotify/client_concurrency_test.go (1)
30-55: Verify LeiosNotify is actually initialized and the handshake fixture is availableThis test assumes:
conversationEntryNtNResponseV15is defined in theleiosnotify_testpackage (or imported via another file in the same package), andouroboros.New(..., ouroboros.WithNodeToNode(true))is sufficient to initialize the LeiosNotify protocol such thatoConn.LeiosNotify().Clientis non-nil.If either assumption is false, you’ll hit a compile-time undefined symbol error (for the fixture) or a runtime nil-pointer when accessing
LeiosNotify().Client.Please double-check:
- That
conversationEntryNtNResponseV15is declared in another_testfile in this package, and- That LeiosNotify is enabled by default for NtN connections in
ouroboros.New(or add an explicitWithLeiosNotifyConfig(...)if required).protocol/peersharing/server.go (1)
17-32: Server Stop semantics and Done-handling look correct nowThe new
onceStop-guardedStop()correctly propagatesProtocol.Stop()errors, andhandleDone()now callss.Protocol.Stop()directly (with a nil check) before reinitializing and restarting the protocol. This cleanly separates:
- Per-Server permanent shutdown via
Server.Stop(), and- Per-conversation restart behavior via
handleDone().This addresses the earlier concern where using
s.Stop()insidehandleDone()made restarted protocol instances un-stoppable onceonceStophad fired.Also applies to: 49-55, 119-135
protocol/leiosfetch/client.go (1)
246-251: Non-blocking result handlers correctly respect shutdownThe updated handlers that
selectonc.DoneChan()versus sending to the result channels are a solid improvement:select { case <-c.DoneChan(): return protocol.ErrProtocolShuttingDown case c.blockResultChan <- msg: }This pattern ensures:
- No sends to closed channels (since channel closure is gated on
DoneChan()), and- Handlers bail out promptly with
ErrProtocolShuttingDownonce shutdown is in progress.The same logic applied across all result channels looks consistent and safe.
Also applies to: 255-260, 264-269, 273-278, 282-287
protocol/localtxsubmission/client.go (2)
28-37: Shutdown sequencing and channel close logic look soundThe added lifecycle state (
stateMutex,started,onceStart/onceStop) pluscloseSubmitResultOnceand theDoneChan-gatedStop()logic form a coherent shutdown story:submitResultChanis only closed once, after the protocol is fully done (or immediately when never started), andStop()always attempts to drive the underlying protocol to completion. This resolves the previous “send to closed channel” risk aroundsubmitResultChanwhile keeping shutdown behavior predictable for callers (SubmitTxseesErrProtocolShuttingDownon a closed channel).Also applies to: 77-91, 93-126
150-167: Handler changes correctly respect shutdown and avoid racesThe updated handlers for
AcceptTx,RejectTx, andDonenow:
- Use
select { case <-c.DoneChan(): ... case c.submitResultChan <- ... }to avoid sending on result channels once shutdown is active, and- Centralize channel closing in
handleDone()/closeSubmitResultChan()usingsync.Once.This is the right pattern to avoid TOCTOU panics on shutdown and ensures any in‑flight
SubmitTxcalls either receive a result or observeErrProtocolShuttingDownvia a closed channel.Also applies to: 169-208, 210-227
protocol/localtxmonitor/client.go (2)
24-42: Lifecycle guards and Stop() behavior are robustThe added
stateMutexwithstarted/stoppedflags, plus the reworkedStart()andStop():
- Prevent starting a client after it’s been stopped,
- Only send
MsgDonewhen the protocol actually ran, and- Call
Protocol.Stop()after releasing all locks to avoid deadlocks,while closing all result channels only after
DoneChan()when started (or immediately otherwise). This nicely fixes the earlier “Stop() never stops Protocol / DoneChan never closes” concern and gives callers a clearErrProtocolShuttingDownsignal.Also applies to: 89-156
158-235: Channel interactions are now shutdown-aware and race-freeThe changes to
Acquire/Release/HasTx/NextTx/GetSizesand the corresponding handlers:
- Gate public operations on
stoppedand returnprotocol.ErrProtocolShuttingDownwhen appropriate,- Use
acquiredMutexto safely manageacquired/acquiredSlot, and- Route all handler results through
selectstatements that race sends againstDoneChan().This prevents writes to closed channels and ensures that in-flight operations either complete normally or observe a clean shutdown error once
Stop()runs. The implementation looks correct and consistent with other protocols.Also applies to: 237-309, 332-351, 353-402, 404-425
protocol/localtxmonitor/client_test.go (1)
50-106: Extended mock-connection timeout is reasonableBumping the
runTestcompletion timeout to 5s aligns with other protocol tests and gives a safer margin for the additional shutdown sequencing without weakening the leak/timeout checks. No issues here.protocol/blockfetch/client_test.go (1)
211-230: Shutdown test adds useful coverage
TestClientShutdownfollows the existing harness pattern (NtN handshake, start then stop client) and gives explicit coverage for the new BlockFetch lifecycle behavior. Looks correct and consistent with other protocol tests.protocol/leiosnotify/server.go (1)
109-123: Guarding restart on Stop() error is correctUpdating
handleDone()to check and returns.Stop()’s error before reinitializing the protocol avoids restarting on a failed shutdown and cleanly propagates problems to the caller. This matches the tightened lifecycle semantics elsewhere in the PR.protocol/chainsync/error.go (1)
19-26: New ErrSyncCancelled error is well-defined
ErrSyncCancelledis named and documented consistently with the existing chainsync errors and provides a clear sentinel for cancellation cases. No issues here.protocol/leiosnotify/client_test.go (1)
30-55: LeiosNotify shutdown test and handshake helpers look goodThe mocked NtN v15 handshake (
mockNtNVersionData/mockNtNVersionDataV11) plusTestClientShutdownfollow the same pattern as other protocol shutdown tests, with realistic version data and strict leak/timeout checks. This gives solid coverage for the new LeiosNotify client lifecycle behavior.Also applies to: 58-115, 117-136
protocol/peersharing/client.go (2)
30-35: Lifecycle guards for Start/Stop look sound and avoid send-on-closed racesThe
onceStart/onceStopplusstateMutex/started/stoppedpattern makesStart()idempotent and ensuresStart()becomes a no-op afterStop(), whileStop()safely deferssharePeersChanclosure untilDoneChan()for started clients and closes immediately when never started. This avoids the earlier “Stop-before-Start” and send-on-closed issues and aligns well with the other protocol clients.If you want extra safety, you could add a small
TestStopBeforeStart/TestStartAfterStopinprotocol/peersharing/client_test.gomirroring the chainsync tests to lock in these semantics.Also applies to: 75-125
161-175: Shutdown-aware send inhandleSharePeersis correctThe
selectonDoneChan()vsc.sharePeersChan <- …ensures handlers returnErrProtocolShuttingDowninstead of ever sending to a channel that will be (or has been) closed during shutdown, eliminating the prior panic window.protocol/leiosnotify/client.go (1)
160-194: Handlerselectpatterns correctly avoid send-on-closed panicsThe four handlers now route notifications via:
select { case <-c.DoneChan(): return protocol.ErrProtocolShuttingDown case c.requestNextChan <- msg: }This is the right pattern to avoid writing to
requestNextChanafterStop()has closed it, while still letting in-flight requests complete when the protocol is healthy.protocol/blockfetch/client.go (1)
228-243: Shutdown-aware selects in handlers correctly protect result channelsThe updated handlers:
handleStartBatchandhandleNoBlocksnow do aselectonDoneChan()vsc.startBatchResultChan <- …, andhandleBlockchecksDoneChan()both before heavy work and again before sending intoblockChan,so handlers now return
ErrProtocolShuttingDowninstead of writing into channels that may be closed byStop(). This matches the pattern used in other protocols and fixes the prior send-on-closed race while keeping behavior for successful cases unchanged.Also applies to: 245-261, 263-316
protocol/chainsync/client.go (2)
256-382:readyForNextBlockChanhandling in GetAvailableBlockRange and syncLoop is now shutdown-safeThe new
case ready, ok := <-c.readyForNextBlockChaninGetAvailableBlockRange:
- Returns
ErrProtocolShuttingDownwhen the channel is closed (ok==false), and- Returns
ErrSyncCancelledwhenreadyis false, matching the semantics introduced forErrStopSyncProcess.In
syncLoop, readingreadyForNextBlockChanwith<-c.readyForNextBlockChanand checkingok/readysimilarly ensures the loop exits cleanly on shutdown or cancellation.Combined with
Stop()deferring channel closure until afterDoneChan()(for started clients), this ensures:
- No send-on-closed panic,
- Block-range computation and sync loop both terminate promptly on shutdown, and
- Cancellation is distinguishable from shutdown via
ErrSyncCancelled.Also applies to: 466-496
623-770: Roll-forward/backward handlers now coordinate cleanly with shutdown and cancellationIn both
handleRollForwardandhandleRollBackward:
- When callbacks return
ErrStopSyncProcess, the handlers signal cancellation via:select { case <-c.DoneChan(): return protocol.ErrProtocolShuttingDown case c.readyForNextBlockChan <- false: }
- On normal progress they signal readiness for the next block with a similar select sending
true.These changes:
- Route all sync-loop control through
readyForNextBlockChan,- Respect
DoneChan()to avoid writes afterStop()closes the channel, and- Cleanly distinguish shutdown (
ErrProtocolShuttingDown) from user-driven cancellation (ErrSyncCancelled).This is a good, consistent pattern with the rest of the PR.
Also applies to: 772-810
protocol/protocol.go (3)
175-193: LGTM: Stop() signature change provides future extensibility.The error return type allows future implementations to report failures without another breaking change. The current nil return and comprehensive documentation are appropriate.
420-443: LGTM: sendMessage() helper appropriately scoped for shutdown draining.The decision to skip state accounting during shutdown is well-documented and correct—these operations are irrelevant once shutdown is initiated.
251-251: LGTM: Stop() error handling appropriate in error path.Ignoring the Stop() error here is correct—the protocol violation error is the primary concern, and Stop() currently returns nil anyway. The inline comment clarifies the intent.
protocol/localstatequery/client.go (8)
38-46: LGTM: Lifecycle synchronization fields properly added.The addition of
stateMutexandonceStopcorrectly addresses the data race on thestartedfield identified in previous reviews. TheacquiredMutexprovides proper synchronization for theacquiredstate.
102-115: LGTM: Start() properly synchronizes started flag.The
stateMutexcorrectly guards thestartedfield write, preventing the data race withStop().
117-149: Verify shutdown on SendMessage error.If
SendMessage(msg)fails at line 133, the function returns without callingc.Protocol.Stop()(line 136). This leaves the underlying protocol running. Consider callingProtocol.Stop()in the error path:msg := NewMsgDone() if err = c.SendMessage(msg); err != nil { + _ = c.Protocol.Stop() return } _ = c.Protocol.Stop() // Error ignored - method returns SendMessage errorAlternatively, if leaving the protocol running on
SendMessagefailure is intentional (e.g., to allow retries), document this behavior.
913-922: LGTM: handleAcquired properly synchronizes acquired state.The
acquiredMutexcorrectly protects theacquiredfield, and the shutdown-aware channel send prevents blocking on a closed protocol.
953-968: LGTM: handleResult properly handles shutdown.The shutdown-aware channel send prevents blocking during protocol shutdown.
970-1006: LGTM: acquire() correctly synchronizes acquired state.The
acquiredMutexproperly protects the read ofacquired, and the channel receive correctly handles shutdown scenarios.
1008-1018: LGTM: release() properly synchronizes acquired state.The mutex correctly guards the
acquiredfield update.
1020-1041: LGTM: runQuery() correctly checks acquired state and handles shutdown.The
acquiredMutexproperly protects the read, and the shutdown handling via closed channel check is correct.protocol/txsubmission/server.go (8)
33-40: LGTM: Concurrency fields properly added to address data races.The
int32ackCount with atomic operations,doneMutexfor channel synchronization, and restart coordination viarestartMutexcorrectly address the race conditions identified in previous reviews.
93-116: LGTM: Stop() correctly prevents double-close with atomic select-close.The acquisition of
doneMutexbefore the select-and-close operation prevents the double-close panic identified in previous reviews. The locking order (restartMutex → doneMutex) is consistent withhandleDone().
118-130: LGTM: Helper methods provide safe state access.The
doneChan()andIsStopped()methods correctly encapsulate mutex-protected access to shared state.
132-196: LGTM: RequestTxIds correctly uses atomic operations and restart-safe channel access.The atomic operations on
ackCountand the restart-protected capture ofrequestTxIdsResultChancorrectly address the data races identified in previous reviews. The shutdown handling is proper.
198-231: LGTM: RequestTxs properly handles restart and shutdown.The restart-protected capture of
requestTxsResultChanand shutdown-aware channel receive are correct.
254-284: LGTM: Reply handlers properly synchronize channel sends with restart.The
restartMutexcorrectly ensures channel sends target the current generation of result channels, preventing sends to stale channels during restart.
286-349: LGTM: handleDone restart sequence properly synchronized.The restart logic correctly:
- Prevents concurrent Stop() via
restartMutex- Avoids double-close via atomic select-close under
doneMutex- Checks
stoppedflag twice for TOCTOU protection- Starts new protocol outside the lock
Note: Any in-flight reply messages during the restart window may be lost as result channels are recreated, but this is likely acceptable for a protocol restart scenario.
54-56: LGTM: Buffered result channels support non-blocking restart signaling.The capacity-1 buffers allow
handleDone()to signalRequestTxIdsnon-blockingly (lines 298-304), which is essential for the restart flow.
|
This code has gotten a little out of control compared to the original performance improvements, which I moved to another PR. Closing this one. |
Summary by CodeRabbit
New Features
Bug Fixes
Tests