Skip to content

Conversation

@hanabi1224
Copy link
Contributor

@hanabi1224 hanabi1224 commented Oct 20, 2025

Summary of changes

This PR addresses a few issues @ADobrodey encounted during backfilling the archival snapshots

Changes introduced in this pull request:

  • add retry to chain exchange in ForestStateCompute
  • fail right on chain exchange errors

Reference issue to close (if applicable)

Closes

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Summary by CodeRabbit

  • Bug Fixes

    • Implemented automatic retry logic for network requests during blockchain state synchronization to improve reliability and reduce transient failures.
    • Enhanced error messages to provide clearer feedback when state backfilling operations encounter issues.
  • Refactor

    • Optimized error handling and control flow for state computation operations.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 20, 2025

Walkthrough

Adds a retry mechanism for network chain_exchange_messages during full tipset backfill in ForestStateCompute. Replaces direct await patterns with a bounded retry loop and refactors stream iteration to use TryStreamExt with error propagation via the try operator.

Changes

Cohort / File(s) Summary
Retry logic and error handling
src/rpc/methods/state.rs
Introduces MAX_RETRIES-capped retry mechanism for chain_exchange_messages; replaces direct await + map_err with retry loop yielding mapped error including epoch info; changes stream iteration from while let Some(Ok(ts)) to while let Some(ts) with try_next().await?; adds imports for futures::StreamExt and futures::TryStreamExt; adjusts missing tipset/message handling with clearer error messaging

Sequence Diagram(s)

sequenceDiagram
    participant ForestStateCompute
    participant RetryLoop
    participant NetworkChainExchange
    
    ForestStateCompute->>RetryLoop: Begin tipset backfill
    
    loop Retry up to MAX_RETRIES times
        RetryLoop->>NetworkChainExchange: chain_exchange_messages request
        alt Success
            NetworkChainExchange-->>RetryLoop: tipset data
            RetryLoop-->>ForestStateCompute: yield tipset
        else Failure
            NetworkChainExchange-->>RetryLoop: error
            note over RetryLoop: Retry count < MAX_RETRIES?
            alt Retries remaining
                RetryLoop->>NetworkChainExchange: retry request
            else Max retries exceeded
                RetryLoop-->>ForestStateCompute: mapped error with epoch info
            end
        end
    end
    
    ForestStateCompute->>ForestStateCompute: Process all tipsets via try_next stream
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

The changes are focused to a single file with clear intentions (retry logic addition and stream iteration refactoring), but involve multiple coordinated modifications to control flow and error handling that require careful reasoning around retry semantics and proper error propagation patterns.

Suggested reviewers

  • LesnyRumcajs
  • akaladarshi

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "fix: add retry to chain exchange in ForestStateCompute" directly and clearly summarizes the main change in the changeset. The raw summary confirms that the primary modification is the introduction of a retry mechanism for network chain_exchange_messages integrated into the backfill of full tipsets in ForestStateCompute. The title is concise, specific, and avoids generic terms, making it clear to teammates reviewing the history what the primary change accomplishes. The title is not overly broad or misleading—it accurately reflects the core feature being added without unnecessary noise.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hm/retry-chain-exchange-in-state-compute

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2fb848a and 2dc6968.

📒 Files selected for processing (1)
  • src/rpc/methods/state.rs (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/rpc/methods/state.rs (2)
src/blocks/tipset.rs (2)
  • epoch (306-308)
  • epoch (543-545)
src/rpc/mod.rs (1)
  • chain_store (469-471)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: Build Ubuntu
  • GitHub Check: Build MacOS
  • GitHub Check: All lint checks
  • GitHub Check: tests-release
  • GitHub Check: tests
  • GitHub Check: Build forest binaries on Linux AMD64
🔇 Additional comments (3)
src/rpc/methods/state.rs (3)

61-61: Good switch to TryStreamExt; import looks correct.

Required for try_next() usage later; keeps existing StreamExt usages intact.


1507-1507: Fail-fast via try_next() matches PR intent.

Switching to futures.try_next().await? short-circuits on any backfill error, aligning with “fail immediately on chain exchange errors.” Validate that callers expect an early failure of the whole batch.

Confirm intended behavior with larger ranges:

  • On a single-tipset failure in [from_epoch, to_epoch], should the entire request error out? If yes, this is correct.

1486-1499: Review comment is based on incorrect type analysis and should be disregarded.

The review assumes the error type at the labeled block is anyhow::Error, but the actual return type of chain_exchange_messages is Result<FullTipset, String>. The expression Err("unreachable chain exchange error in ForestStateCompute".into()) correctly coerces to Result<FullTipset, String> because String implements From<&str>. The code compiles without errors.

While the labeled block does reach unreachable code by control flow (the last retry iteration guarantees a break either way), this is not a type mismatch issue as the review claims. The existing code is valid Rust and will compile successfully.

Likely an incorrect or invalid review comment.


Comment @coderabbitai help to get the list of available commands and usage tips.

.await
.map_err(|e| anyhow::anyhow!(e))?;
const MAX_RETRIES: usize = 5;
let fts = 'retry_loop: {
Copy link
Contributor Author

@hanabi1224 hanabi1224 Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's non-trival to get existing retry utils to compile here so implementing a retry loop manually

@hanabi1224 hanabi1224 marked this pull request as ready for review October 20, 2025 13:51
@hanabi1224 hanabi1224 requested a review from a team as a code owner October 20, 2025 13:51
@hanabi1224 hanabi1224 requested review from ADobrodey, LesnyRumcajs and elmattic and removed request for a team October 20, 2025 13:51
Err("unreachable chain exchange error in ForestStateCompute".into())
}
.map_err(|e| {
anyhow::anyhow!("failed to download messages@{}: {e}", ts.epoch())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: some whitespace between messages and @ would help with reading

@hanabi1224 hanabi1224 enabled auto-merge October 20, 2025 14:10
@hanabi1224 hanabi1224 added this pull request to the merge queue Oct 20, 2025
Merged via the queue into main with commit 87f9985 Oct 20, 2025
45 checks passed
@hanabi1224 hanabi1224 deleted the hm/retry-chain-exchange-in-state-compute branch October 20, 2025 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants