Skip to content

fix(sync): jump cursor to window_start_block on first 404#23

Merged
satyakwok merged 1 commit into
mainfrom
fix/jump-cursor-to-window-start
May 13, 2026
Merged

fix(sync): jump cursor to window_start_block on first 404#23
satyakwok merged 1 commit into
mainfrom
fix/jump-cursor-to-window-start

Conversation

@satyakwok
Copy link
Copy Markdown
Member

@satyakwok satyakwok commented May 13, 2026

Problem

v0.2.2 unblocked the retry-loop by skip-on-404, but the cursor only advances by 1 per skipped block. With ~1.7 M pruned blocks on a months-old chain and ~1 skip/RTT, walking the gap takes ~21 days for zero indexed rows.

Fix

Same data outcome, one jump instead of 1.7 M walks. On 404 from `rest.block(h)`, hit `/chain/info` to read `window_start_block`. If `h` is before the window, write the cursor straight to `window_start_block - 1` so the next ingest lands on the first fully-available block.

  • Falls back to single-block skip when the chain doesn't advertise a window (archive-mode nodes pre-feature).
  • Surfaces transport errors on `/chain/info` as transient so the orchestrator retries rather than burning cursor state on a network blip.

Wire shape

New `ChainInfoResponse` + `RestClient::chain_info()` in indexer-chain wraps the existing public `/chain/info` endpoint. `window_start_block` / `window_is_partial` are `Option<…>` so archive-mode nodes that don't advertise them deserialize cleanly.

Test plan

  • `cargo check --workspace -D warnings` clean
  • `cargo clippy --all-targets -D warnings` clean
  • `cargo fmt` clean
  • After merge + v0.2.3 image deploy: cursor jumps from current 33k stuck-region to ~1.73M (chain's window_start) on first iteration

Follow-up

The historical gap (h=33k → h=window_start) can be filled later by repointing `INDEXER_NETWORK` at an archive-mode chain node. That's an operator-side infra decision, scoped separately.

Summary by CodeRabbit

  • New Features

    • Added new chain information endpoint to retrieve current chain height and data retention/pruning window details.
  • Bug Fixes

    • Improved block synchronization to intelligently handle pruned blocks by consulting the retention window, allowing efficient cursor advancement instead of individual block skipping.

Review Change Stack

v0.2.2 skipped pruned blocks one-by-one + advanced cursor by 1 per
404, which at ~1 skip/RTT and ~1.7 M pruned blocks meant ~21 days
of network round-trips just to walk past the gap. Same data outcome
as a single jump.

This patch asks the chain where its retention window starts
(`/chain/info`) and writes the cursor straight to
`window_start_block - 1` so the next ingest fetches the first
fully-available block. Falls back to single-block skip if the chain
doesn't advertise a window (archive-mode nodes pre-feature don't);
surfaces transport errors on /chain/info as transient so the
orchestrator retries rather than burning cursor state.

Adds `ChainInfoResponse` + `RestClient::chain_info()` in
indexer-chain for the lookup. Same warn-on-skip log style; new line
mentions the archive-node followup so operators know how to fill
the historical gap later.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

📝 Walkthrough

Walkthrough

This PR extends the REST client with a new /chain/info endpoint that surfaces chain retention metadata, and updates the backfill sync logic to handle pruned blocks intelligently. When the backfill encounters a missing block (404), it now queries the retention window to determine how many blocks have been pruned, then advances the sync cursor to the start of the retention window in a single jump rather than incrementing one block at a time. If the endpoint is unavailable, the sync error is propagated to allow retry instead of skipping a potentially transient issue.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • Sentriscloud/indexer-rs#22: Both PRs modify crates/sync/src/backfill.rs's ingest_one logic for rest.block(h) returning None (pruned/404) so the backfill advances the cursor instead of stalling/retrying indefinitely—main PR enhances it by using rest.chain_info() to skip an entire retention window.
  • Sentriscloud/indexer-rs#20: The main PR's update to crates/sync/src/backfill.rs::ingest_one adds rest.chain_info()-based retention-window skipping when /chain/blocks/<n> returns None, which directly extends the native-REST backfill flow introduced in PR #20 by threading and using the same RestClient in ingest_one.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The description covers the problem, fix approach, wire shape, and test plan comprehensively. However, it does not include the required scope checkboxes or deploy impact assessment specified in the repository template. Add the required Scope and Deploy impact sections with appropriate checkboxes to match the repository's PR template structure.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: jumping the cursor to window_start_block when encountering a 404 (pruned blocks), which is the primary problem being solved.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/jump-cursor-to-window-start

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/sync/src/backfill.rs`:
- Around line 99-131: ingest_one currently writes a new cursor to the DB when
jumping over pruned blocks but still returns Ok(()) so run_backfill's in-memory
cursor isn't updated; change ingest_one's signature to return
Result<BlockHeight, SyncError> (the effective cursor after the operation),
update the 404-handling branches to return the written target (for the
window_start_block jump) or h (for the single-block skip), and make the normal
successful path return the next cursor (the value written after write_block);
then update run_backfill to assign cursor = ingest_one(...)? so the in-memory
cursor advances to the same position written to the DB while preserving existing
error returns like Err(SyncError::Chain(e)).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 63806956-fd1d-490f-b72a-4bd081009c38

📥 Commits

Reviewing files that changed from the base of the PR and between 4b18a46 and c6ac0de.

📒 Files selected for processing (3)
  • crates/chain/src/lib.rs
  • crates/chain/src/rest.rs
  • crates/sync/src/backfill.rs

Comment on lines +99 to +131
match rest.chain_info().await {
Ok(info) if info.window_start_block.is_some_and(|w| w > h.0) => {
let target = BlockHeight(info.window_start_block.unwrap() - 1);
tracing::warn!(
from = h.0,
to = target.0 + 1,
pruned = target.0 + 1 - h.0,
"backfill: chain has pruned this block body (404); jumping cursor to \
window_start_block. Historical gap can be filled later by repointing \
INDEXER_NETWORK at an archive-mode node that retains all block bodies."
);
write_cursor(pool, target, 0).await?;
return Ok(());
}
Ok(_) => {
// Chain didn't advertise a retention window (archive-mode
// or pre-feature); the 404 is genuinely a one-off gap.
// Fall back to single-block skip + log.
tracing::warn!(
height = h.0,
"backfill: 404 from chain but no window_start_block \
advertised; skipping single block."
);
write_cursor(pool, h, 0).await?;
return Ok(());
}
Err(e) => {
// chain_info failed; surface as transient error so the
// orchestrator retries the whole iteration rather than
// burning the cursor on a transient network blip.
return Err(SyncError::Chain(e));
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: In-memory cursor is not updated after jump, defeating the optimization.

When the cursor jumps to window_start_block - 1, the new position is written to the database (line 110) but ingest_one returns Ok(()) without communicating this to run_backfill. Back in the loop, cursor = next (line 62) sets the in-memory cursor to h (the 404'd block), not the jumped target.

The loop then continues from h+1, which also 404s, calls chain_info() again, re-writes the same target to the DB, and so on. This results in one RTT per pruned block—exactly the problem this PR aims to fix.

Proposed fix: return the jumped cursor from ingest_one

Change the signature to return the effective cursor position:

 pub async fn ingest_one(
     pool: &PgPool,
     provider: &ChainProvider,
     rest: &RestClient,
     h: BlockHeight,
     backoff: BackoffConfig,
     analytics: Option<&AnalyticsHandle>,
-) -> SyncResult<()> {
+) -> SyncResult<BlockHeight> {

In the 404 handling branches, return the written cursor:

             write_cursor(pool, target, 0).await?;
-            return Ok(());
+            return Ok(target);
         }
         Ok(_) => {
             tracing::warn!(...);
             write_cursor(pool, h, 0).await?;
-            return Ok(());
+            return Ok(h);
         }

At the end of normal ingest (after write_block):

-    .await
+    .await?;
+    Ok(h)
 }

Then in run_backfill:

-    ingest_one(pool, provider, rest, next, backoff, analytics).await?;
-    cursor = next;
+    cursor = ingest_one(pool, provider, rest, next, backoff, analytics).await?;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/sync/src/backfill.rs` around lines 99 - 131, ingest_one currently
writes a new cursor to the DB when jumping over pruned blocks but still returns
Ok(()) so run_backfill's in-memory cursor isn't updated; change ingest_one's
signature to return Result<BlockHeight, SyncError> (the effective cursor after
the operation), update the 404-handling branches to return the written target
(for the window_start_block jump) or h (for the single-block skip), and make the
normal successful path return the next cursor (the value written after
write_block); then update run_backfill to assign cursor = ingest_one(...)? so
the in-memory cursor advances to the same position written to the DB while
preserving existing error returns like Err(SyncError::Chain(e)).

@satyakwok satyakwok merged commit 57b7f97 into main May 13, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant