Skip to content

fix(sync): skip pruned blocks on 404 instead of retry-loop#22

Merged
satyakwok merged 1 commit into
mainfrom
fix/skip-pruned-blocks
May 13, 2026
Merged

fix(sync): skip pruned blocks on 404 instead of retry-loop#22
satyakwok merged 1 commit into
mainfrom
fix/skip-pruned-blocks

Conversation

@satyakwok
Copy link
Copy Markdown
Member

@satyakwok satyakwok commented May 13, 2026

Problem

After v0.2.1 deploy, backfill pinned at h=32,690 for 5+ hours. Sentrix prunes block bodies past its rolling history window, so `/chain/blocks/32690` returns 404. Pre-fix, `ingest_one` mapped that None to `SyncError::Invalid` → orchestrator treats as transient → retries same height forever. Backfill cursor never advances.

Fix

Treat 404 from `rest.block(h)` as "skip + advance cursor" rather than as an error. Warning logged so it's discoverable; pointer to the archive-mode follow-up in the same line.

```
backfill: chain has pruned this block body (404); skipping.
For a complete historical index, point INDEXER_NETWORK at an
archive-mode node that retains all block bodies.
```

Why not just retry harder

The block isn't coming back. Sentrix's pruner deletes block bodies once they fall outside `window_start_block` — that's a permanent state at this height, not a transient miss. A retry budget would just push the same stall onto a wall-clock timer.

Why this is the right shape for now

Operator's intent (per the deploy + flag conversation last night) is to keep the indexer running while a deeper archive-mode follow-up is scoped. This unblocks forward progress; the historical gap is documented in the warn message.

Test plan

  • `cargo check --workspace -D warnings` clean
  • `cargo clippy --all-targets -D warnings` clean
  • `cargo fmt` clean
  • After merge + v0.2.2 image, cursor advances past 32,690 and catches up to tip

Summary by CodeRabbit

  • Bug Fixes
    • Improved synchronization robustness by treating missing blocks as expected chain-pruning events rather than errors, allowing the sync process to continue forward instead of stalling or retrying indefinitely.

Review Change Stack

When Sentrix prunes block bodies past its rolling history window
(`window_start_block`), `/chain/blocks/<n>` returns 404 for old
heights. Pre-fix, `ingest_one` mapped that None to
`SyncError::Invalid`, which the orchestrator surfaces as a transient
error + retries the same height forever. After v0.2.1 deploy this
pinned the backfill cursor at h=32,690 indefinitely (5+ hours of
retry-loop logs).

The block existed and was canonical when produced; we just can't
fetch its txs anymore. For a complete historical index, an
archive-mode chain node (no body prune) is the right answer —
documented in the warn message + a future follow-up. Until then,
treat 404 as "skip + advance cursor" so the backfill keeps moving
forward and we eventually catch up to the live tip.

Same handler-level test pattern used elsewhere in the sync layer.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

📝 Walkthrough

Walkthrough

The backfill process in crates/sync/src/backfill.rs now handles cases where the REST client returns no block body for a given height. Instead of treating this as an error, the code now recognizes it as a chain-pruning/aged-out situation, logs a warning, writes the cursor to persist progress past the missing height, and returns success so the backfill loop continues forward. The cursor imports are consolidated into a single statement.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

  • Sentriscloud/indexer-rs#20: This PR's handling of missing block bodies from RestClient::block directly extends PR #20's changes to route block backfill through the native REST client.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description comprehensively explains the problem, fix, and rationale, but does not follow the required template structure with explicit sections like Summary, Scope, Checks, Linked issue, and Deploy impact. Reformat the description to match the repository template: add Summary, Scope, Checks, Linked issue, and Deploy impact sections with appropriate selections and checkmarks.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(sync): skip pruned blocks on 404 instead of retry-loop' is concise, specific, and directly summarizes the main change: handling 404 responses by skipping pruned blocks instead of retrying indefinitely.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/skip-pruned-blocks

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/sync/src/backfill.rs`:
- Around line 86-103: The current ingest_one flow treats RestClient::block
returning None as a permanent prune and advances the cursor; instead, modify the
logic so RestClient::block (or its caller) returns an explicit enum outcome
(e.g., BlockOutcome::Found(Block), BlockOutcome::PrunedPermanent,
BlockOutcome::TemporarilyUnavailable) and change ingest_one to branch on that
outcome: only call write_cursor(h, 0) and skip forward for PrunedPermanent, but
for TemporarilyUnavailable return an error or signal a retry so the tail loop in
tail.rs can retry post-reorg/catch-up; update any mapping in RestClient::block
to distinguish HTTP 404 with probe/metadata that proves permanent pruning vs
transient network/unavailable errors, and propagate the new enum through
ingest_one and its callers instead of treating all None as pruned.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 10b5629c-f048-4a2e-8e1e-b57f9b84c966

📥 Commits

Reviewing files that changed from the base of the PR and between 84f1573 and ad74853.

📒 Files selected for processing (1)
  • crates/sync/src/backfill.rs

Comment on lines +86 to +103
let Some(block) = block_opt else {
// Chain returned 404 — the block body has aged out of the
// node's rolling history window (Sentrix prunes block bodies
// outside `window_start_block`). The block existed and was
// canonical; we just can't fetch its txs anymore. Advance the
// cursor past it so backfill keeps moving forward rather than
// retry-looping on a height that will never come back. A
// dedicated archive-mode chain node (no body prune) would let
// us close this gap properly — until then, skip + log.
tracing::warn!(
height = h.0,
"backfill: chain has pruned this block body (404); skipping. \
For a complete historical index, point INDEXER_NETWORK at \
an archive-mode node that retains all block bodies."
);
write_cursor(pool, h, 0).await?;
return Ok(());
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Find RestClient::block definition(s) and status mapping.
rg -n -C4 --type rust '\b(async\s+fn|fn)\s+block\s*\('
rg -n -C4 --type rust 'NOT_FOUND|404|Option<|None'

# 2) Find all ingest_one call sites.
rg -n -C3 --type rust '\bingest_one\s*\('

# 3) Find all rest.block(...) call sites and None handling.
rg -n -C4 --type rust '\brest\.block\s*\('

Repository: Sentriscloud/indexer-rs

Length of output: 50379


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Get all ingest_one call sites with more context
echo "=== ingest_one call sites ==="
rg -n 'ingest_one' crates/sync/src/ --type rust -B2 -A2

# 2) Check the documentation at lines 71-77 in backfill.rs
echo ""
echo "=== backfill.rs file docs (lines 60-85) ==="
sed -n '60,85p' crates/sync/src/backfill.rs

# 3) Check if rest.block is called from anywhere else
echo ""
echo "=== All rest.block calls ==="
rg -n 'rest\.block\s*\(' crates/ --type rust

Repository: Sentriscloud/indexer-rs

Length of output: 2725


Clarify permanent vs. transient 404 responses before advancing cursor.

Line 86 treats every None (404 response) as irreversibly pruned and advances the persisted cursor. While RestClient::block maps 404 to None deterministically, a single 404 may be transient (network, temporary unavailability) rather than permanent node body pruning. Because ingest_one is reused by the tail loop (line 118 in tail.rs) for post-reorg catch-up, prematurely skipping a block that will momentarily become available creates a silent historical gap. Instead of treating all 404s identically, add explicit outcome types (e.g., Found | PrunedPermanent | TemporarilyUnavailable) so only confirmed permanent pruning triggers cursor advance.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/sync/src/backfill.rs` around lines 86 - 103, The current ingest_one
flow treats RestClient::block returning None as a permanent prune and advances
the cursor; instead, modify the logic so RestClient::block (or its caller)
returns an explicit enum outcome (e.g., BlockOutcome::Found(Block),
BlockOutcome::PrunedPermanent, BlockOutcome::TemporarilyUnavailable) and change
ingest_one to branch on that outcome: only call write_cursor(h, 0) and skip
forward for PrunedPermanent, but for TemporarilyUnavailable return an error or
signal a retry so the tail loop in tail.rs can retry post-reorg/catch-up; update
any mapping in RestClient::block to distinguish HTTP 404 with probe/metadata
that proves permanent pruning vs transient network/unavailable errors, and
propagate the new enum through ingest_one and its callers instead of treating
all None as pruned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant