fix(sync): jump cursor to window_start_block on first 404#23
Conversation
v0.2.2 skipped pruned blocks one-by-one + advanced cursor by 1 per 404, which at ~1 skip/RTT and ~1.7 M pruned blocks meant ~21 days of network round-trips just to walk past the gap. Same data outcome as a single jump. This patch asks the chain where its retention window starts (`/chain/info`) and writes the cursor straight to `window_start_block - 1` so the next ingest fetches the first fully-available block. Falls back to single-block skip if the chain doesn't advertise a window (archive-mode nodes pre-feature don't); surfaces transport errors on /chain/info as transient so the orchestrator retries rather than burning cursor state. Adds `ChainInfoResponse` + `RestClient::chain_info()` in indexer-chain for the lookup. Same warn-on-skip log style; new line mentions the archive-node followup so operators know how to fill the historical gap later.
📝 WalkthroughWalkthroughThis PR extends the REST client with a new Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/sync/src/backfill.rs`:
- Around line 99-131: ingest_one currently writes a new cursor to the DB when
jumping over pruned blocks but still returns Ok(()) so run_backfill's in-memory
cursor isn't updated; change ingest_one's signature to return
Result<BlockHeight, SyncError> (the effective cursor after the operation),
update the 404-handling branches to return the written target (for the
window_start_block jump) or h (for the single-block skip), and make the normal
successful path return the next cursor (the value written after write_block);
then update run_backfill to assign cursor = ingest_one(...)? so the in-memory
cursor advances to the same position written to the DB while preserving existing
error returns like Err(SyncError::Chain(e)).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 63806956-fd1d-490f-b72a-4bd081009c38
📒 Files selected for processing (3)
crates/chain/src/lib.rscrates/chain/src/rest.rscrates/sync/src/backfill.rs
| match rest.chain_info().await { | ||
| Ok(info) if info.window_start_block.is_some_and(|w| w > h.0) => { | ||
| let target = BlockHeight(info.window_start_block.unwrap() - 1); | ||
| tracing::warn!( | ||
| from = h.0, | ||
| to = target.0 + 1, | ||
| pruned = target.0 + 1 - h.0, | ||
| "backfill: chain has pruned this block body (404); jumping cursor to \ | ||
| window_start_block. Historical gap can be filled later by repointing \ | ||
| INDEXER_NETWORK at an archive-mode node that retains all block bodies." | ||
| ); | ||
| write_cursor(pool, target, 0).await?; | ||
| return Ok(()); | ||
| } | ||
| Ok(_) => { | ||
| // Chain didn't advertise a retention window (archive-mode | ||
| // or pre-feature); the 404 is genuinely a one-off gap. | ||
| // Fall back to single-block skip + log. | ||
| tracing::warn!( | ||
| height = h.0, | ||
| "backfill: 404 from chain but no window_start_block \ | ||
| advertised; skipping single block." | ||
| ); | ||
| write_cursor(pool, h, 0).await?; | ||
| return Ok(()); | ||
| } | ||
| Err(e) => { | ||
| // chain_info failed; surface as transient error so the | ||
| // orchestrator retries the whole iteration rather than | ||
| // burning the cursor on a transient network blip. | ||
| return Err(SyncError::Chain(e)); | ||
| } | ||
| } |
There was a problem hiding this comment.
Critical: In-memory cursor is not updated after jump, defeating the optimization.
When the cursor jumps to window_start_block - 1, the new position is written to the database (line 110) but ingest_one returns Ok(()) without communicating this to run_backfill. Back in the loop, cursor = next (line 62) sets the in-memory cursor to h (the 404'd block), not the jumped target.
The loop then continues from h+1, which also 404s, calls chain_info() again, re-writes the same target to the DB, and so on. This results in one RTT per pruned block—exactly the problem this PR aims to fix.
Proposed fix: return the jumped cursor from ingest_one
Change the signature to return the effective cursor position:
pub async fn ingest_one(
pool: &PgPool,
provider: &ChainProvider,
rest: &RestClient,
h: BlockHeight,
backoff: BackoffConfig,
analytics: Option<&AnalyticsHandle>,
-) -> SyncResult<()> {
+) -> SyncResult<BlockHeight> {In the 404 handling branches, return the written cursor:
write_cursor(pool, target, 0).await?;
- return Ok(());
+ return Ok(target);
}
Ok(_) => {
tracing::warn!(...);
write_cursor(pool, h, 0).await?;
- return Ok(());
+ return Ok(h);
}At the end of normal ingest (after write_block):
- .await
+ .await?;
+ Ok(h)
}Then in run_backfill:
- ingest_one(pool, provider, rest, next, backoff, analytics).await?;
- cursor = next;
+ cursor = ingest_one(pool, provider, rest, next, backoff, analytics).await?;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/sync/src/backfill.rs` around lines 99 - 131, ingest_one currently
writes a new cursor to the DB when jumping over pruned blocks but still returns
Ok(()) so run_backfill's in-memory cursor isn't updated; change ingest_one's
signature to return Result<BlockHeight, SyncError> (the effective cursor after
the operation), update the 404-handling branches to return the written target
(for the window_start_block jump) or h (for the single-block skip), and make the
normal successful path return the next cursor (the value written after
write_block); then update run_backfill to assign cursor = ingest_one(...)? so
the in-memory cursor advances to the same position written to the DB while
preserving existing error returns like Err(SyncError::Chain(e)).
Problem
v0.2.2 unblocked the retry-loop by skip-on-404, but the cursor only advances by 1 per skipped block. With ~1.7 M pruned blocks on a months-old chain and ~1 skip/RTT, walking the gap takes ~21 days for zero indexed rows.
Fix
Same data outcome, one jump instead of 1.7 M walks. On 404 from `rest.block(h)`, hit `/chain/info` to read `window_start_block`. If `h` is before the window, write the cursor straight to `window_start_block - 1` so the next ingest lands on the first fully-available block.
Wire shape
New `ChainInfoResponse` + `RestClient::chain_info()` in indexer-chain wraps the existing public `/chain/info` endpoint. `window_start_block` / `window_is_partial` are `Option<…>` so archive-mode nodes that don't advertise them deserialize cleanly.
Test plan
Follow-up
The historical gap (h=33k → h=window_start) can be filled later by repointing `INDEXER_NETWORK` at an archive-mode chain node. That's an operator-side infra decision, scoped separately.
Summary by CodeRabbit
New Features
Bug Fixes