fix(sync): skip pruned blocks on 404 instead of retry-loop#22
Conversation
When Sentrix prunes block bodies past its rolling history window (`window_start_block`), `/chain/blocks/<n>` returns 404 for old heights. Pre-fix, `ingest_one` mapped that None to `SyncError::Invalid`, which the orchestrator surfaces as a transient error + retries the same height forever. After v0.2.1 deploy this pinned the backfill cursor at h=32,690 indefinitely (5+ hours of retry-loop logs). The block existed and was canonical when produced; we just can't fetch its txs anymore. For a complete historical index, an archive-mode chain node (no body prune) is the right answer — documented in the warn message + a future follow-up. Until then, treat 404 as "skip + advance cursor" so the backfill keeps moving forward and we eventually catch up to the live tip. Same handler-level test pattern used elsewhere in the sync layer.
📝 WalkthroughWalkthroughThe backfill process in Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/sync/src/backfill.rs`:
- Around line 86-103: The current ingest_one flow treats RestClient::block
returning None as a permanent prune and advances the cursor; instead, modify the
logic so RestClient::block (or its caller) returns an explicit enum outcome
(e.g., BlockOutcome::Found(Block), BlockOutcome::PrunedPermanent,
BlockOutcome::TemporarilyUnavailable) and change ingest_one to branch on that
outcome: only call write_cursor(h, 0) and skip forward for PrunedPermanent, but
for TemporarilyUnavailable return an error or signal a retry so the tail loop in
tail.rs can retry post-reorg/catch-up; update any mapping in RestClient::block
to distinguish HTTP 404 with probe/metadata that proves permanent pruning vs
transient network/unavailable errors, and propagate the new enum through
ingest_one and its callers instead of treating all None as pruned.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 10b5629c-f048-4a2e-8e1e-b57f9b84c966
📒 Files selected for processing (1)
crates/sync/src/backfill.rs
| let Some(block) = block_opt else { | ||
| // Chain returned 404 — the block body has aged out of the | ||
| // node's rolling history window (Sentrix prunes block bodies | ||
| // outside `window_start_block`). The block existed and was | ||
| // canonical; we just can't fetch its txs anymore. Advance the | ||
| // cursor past it so backfill keeps moving forward rather than | ||
| // retry-looping on a height that will never come back. A | ||
| // dedicated archive-mode chain node (no body prune) would let | ||
| // us close this gap properly — until then, skip + log. | ||
| tracing::warn!( | ||
| height = h.0, | ||
| "backfill: chain has pruned this block body (404); skipping. \ | ||
| For a complete historical index, point INDEXER_NETWORK at \ | ||
| an archive-mode node that retains all block bodies." | ||
| ); | ||
| write_cursor(pool, h, 0).await?; | ||
| return Ok(()); | ||
| }; |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 🏗️ Heavy lift
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Find RestClient::block definition(s) and status mapping.
rg -n -C4 --type rust '\b(async\s+fn|fn)\s+block\s*\('
rg -n -C4 --type rust 'NOT_FOUND|404|Option<|None'
# 2) Find all ingest_one call sites.
rg -n -C3 --type rust '\bingest_one\s*\('
# 3) Find all rest.block(...) call sites and None handling.
rg -n -C4 --type rust '\brest\.block\s*\('Repository: Sentriscloud/indexer-rs
Length of output: 50379
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Get all ingest_one call sites with more context
echo "=== ingest_one call sites ==="
rg -n 'ingest_one' crates/sync/src/ --type rust -B2 -A2
# 2) Check the documentation at lines 71-77 in backfill.rs
echo ""
echo "=== backfill.rs file docs (lines 60-85) ==="
sed -n '60,85p' crates/sync/src/backfill.rs
# 3) Check if rest.block is called from anywhere else
echo ""
echo "=== All rest.block calls ==="
rg -n 'rest\.block\s*\(' crates/ --type rustRepository: Sentriscloud/indexer-rs
Length of output: 2725
Clarify permanent vs. transient 404 responses before advancing cursor.
Line 86 treats every None (404 response) as irreversibly pruned and advances the persisted cursor. While RestClient::block maps 404 to None deterministically, a single 404 may be transient (network, temporary unavailability) rather than permanent node body pruning. Because ingest_one is reused by the tail loop (line 118 in tail.rs) for post-reorg catch-up, prematurely skipping a block that will momentarily become available creates a silent historical gap. Instead of treating all 404s identically, add explicit outcome types (e.g., Found | PrunedPermanent | TemporarilyUnavailable) so only confirmed permanent pruning triggers cursor advance.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/sync/src/backfill.rs` around lines 86 - 103, The current ingest_one
flow treats RestClient::block returning None as a permanent prune and advances
the cursor; instead, modify the logic so RestClient::block (or its caller)
returns an explicit enum outcome (e.g., BlockOutcome::Found(Block),
BlockOutcome::PrunedPermanent, BlockOutcome::TemporarilyUnavailable) and change
ingest_one to branch on that outcome: only call write_cursor(h, 0) and skip
forward for PrunedPermanent, but for TemporarilyUnavailable return an error or
signal a retry so the tail loop in tail.rs can retry post-reorg/catch-up; update
any mapping in RestClient::block to distinguish HTTP 404 with probe/metadata
that proves permanent pruning vs transient network/unavailable errors, and
propagate the new enum through ingest_one and its callers instead of treating
all None as pruned.
Problem
After v0.2.1 deploy, backfill pinned at h=32,690 for 5+ hours. Sentrix prunes block bodies past its rolling history window, so `/chain/blocks/32690` returns 404. Pre-fix, `ingest_one` mapped that None to `SyncError::Invalid` → orchestrator treats as transient → retries same height forever. Backfill cursor never advances.
Fix
Treat 404 from `rest.block(h)` as "skip + advance cursor" rather than as an error. Warning logged so it's discoverable; pointer to the archive-mode follow-up in the same line.
```
backfill: chain has pruned this block body (404); skipping.
For a complete historical index, point INDEXER_NETWORK at an
archive-mode node that retains all block bodies.
```
Why not just retry harder
The block isn't coming back. Sentrix's pruner deletes block bodies once they fall outside `window_start_block` — that's a permanent state at this height, not a transient miss. A retry budget would just push the same stall onto a wall-clock timer.
Why this is the right shape for now
Operator's intent (per the deploy + flag conversation last night) is to keep the indexer running while a deeper archive-mode follow-up is scoped. This unblocks forward progress; the historical gap is documented in the warn message.
Test plan
Summary by CodeRabbit