Skip to content

Storage estimation fixes#144

Merged
raymondjacobson merged 2 commits intomainfrom
rj-commitment-fixes
Mar 13, 2026
Merged

Storage estimation fixes#144
raymondjacobson merged 2 commits intomainfrom
rj-commitment-fixes

Conversation

@raymondjacobson
Copy link
Contributor

@raymondjacobson raymondjacobson commented Mar 13, 2026

  • Use eth_registered_endpoints instead of core_validators for node count in storage expectation calculation
  • Fix upload scroller cursor only saving when overwrites exist, causing it to re-fetch the same page forever and never advance
  • Add dynamic peer updates to crudr so broadcast/sweep clients stay in sync with the eth registry

Note

Medium Risk
Moderate risk: changes peer replication client lifecycle and adds in-place peer list reconciliation, which could leave orphaned routines or affect cluster sync if host lists are wrong; other changes are straightforward DB query/cursor fixes.

Overview
Fixes two long-running cluster maintenance issues and improves peer syncing. Storage expectation is recalculated using eth_registered_endpoints (node count) instead of core_validators, aligning the estimate with the current registry.

Crudr peer handling is made safer and dynamic: broadcast/ForceSweep now snapshot peerClients under lock, and a new UpdatePeers reconciles the peer list (add/start new clients, drop removed peers) which is invoked during periodic peer refresh.

Upload scrolling now always persists the cursor after processing a page (even when there are no overwrites), preventing the scroller from re-fetching the same page indefinitely; overwrites are still applied conditionally.

Written by Cursor Bugbot for commit c895c57. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

c.logger.Info("removed crudr peer", zap.String("host", p.Host))
}
}
c.peerClients = kept
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed peer clients leak goroutines indefinitely

Medium Severity

When UpdatePeers removes a peer from c.peerClients, the two goroutines started by p.Start(c.lc) (sender and sweeper) continue running indefinitely. PeerClient has no Stop method, and Lifecycle has no way to cancel individual managed routines. The sweeper goroutine for each removed peer continues making HTTP requests and applying ops every 10 minutes. Since refreshPeersAndSigners calls UpdatePeers every 10 minutes, leaked goroutines can accumulate over the process lifetime.

Fix in Cursor Fix in Web

@raymondjacobson raymondjacobson merged commit 7d8f966 into main Mar 13, 2026
5 checks passed
@raymondjacobson raymondjacobson deleted the rj-commitment-fixes branch March 13, 2026 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants