-
Notifications
You must be signed in to change notification settings - Fork 181
fix: add retry to chain exchange in ForestStateCompute #6173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds a retry mechanism for network chain_exchange_messages during full tipset backfill in ForestStateCompute. Replaces direct await patterns with a bounded retry loop and refactors stream iteration to use TryStreamExt with error propagation via the try operator. Changes
Sequence Diagram(s)sequenceDiagram
participant ForestStateCompute
participant RetryLoop
participant NetworkChainExchange
ForestStateCompute->>RetryLoop: Begin tipset backfill
loop Retry up to MAX_RETRIES times
RetryLoop->>NetworkChainExchange: chain_exchange_messages request
alt Success
NetworkChainExchange-->>RetryLoop: tipset data
RetryLoop-->>ForestStateCompute: yield tipset
else Failure
NetworkChainExchange-->>RetryLoop: error
note over RetryLoop: Retry count < MAX_RETRIES?
alt Retries remaining
RetryLoop->>NetworkChainExchange: retry request
else Max retries exceeded
RetryLoop-->>ForestStateCompute: mapped error with epoch info
end
end
end
ForestStateCompute->>ForestStateCompute: Process all tipsets via try_next stream
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes The changes are focused to a single file with clear intentions (retry logic addition and stream iteration refactoring), but involve multiple coordinated modifications to control flow and error handling that require careful reasoning around retry semantics and proper error propagation patterns. Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)src/rpc/methods/state.rs (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
🔇 Additional comments (3)
Comment |
| .await | ||
| .map_err(|e| anyhow::anyhow!(e))?; | ||
| const MAX_RETRIES: usize = 5; | ||
| let fts = 'retry_loop: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's non-trival to get existing retry utils to compile here so implementing a retry loop manually
| Err("unreachable chain exchange error in ForestStateCompute".into()) | ||
| } | ||
| .map_err(|e| { | ||
| anyhow::anyhow!("failed to download messages@{}: {e}", ts.epoch()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: some whitespace between messages and @ would help with reading
Summary of changes
This PR addresses a few issues @ADobrodey encounted during backfilling the archival snapshots
Changes introduced in this pull request:
Reference issue to close (if applicable)
Closes
Other information and links
Change checklist
Summary by CodeRabbit
Bug Fixes
Refactor