Skip to content

revert(#35): bulk-COPY backfill crash-loops on dup-key after restart#36

Merged
satyakwok merged 1 commit into
mainfrom
revert/35-bulk-copy-dup-key
May 14, 2026
Merged

revert(#35): bulk-COPY backfill crash-loops on dup-key after restart#36
satyakwok merged 1 commit into
mainfrom
revert/35-bulk-copy-dup-key

Conversation

@satyakwok
Copy link
Copy Markdown
Member

@satyakwok satyakwok commented May 14, 2026

Revert #35.

Why

Bulk-COPY path drops ON CONFLICT DO NOTHING (COPY can't express it). Cursor monotonicity holds within ONE backfill run but NOT across restarts: indexer restart → reads cursor → restarts backfill from cursor+1 → first ~100 block batch may overlap with already-INSERT'd rows from a partially-flushed prior run → duplicate key value violates unique constraint blocks_pkey → batch transaction rolls back → backfill loop iterates same range forever.

Reproduced live on vps4 mainnet 2026-05-14 16:13Z immediately after deploy:

backfill: starting walk (concurrent fetch + bulk-COPY batched write) from=663715 to=1780822
backfill loop iteration failed: sqlx: duplicate key value violates unique constraint "blocks_pkey"
``` … repeating.

Reverting unblocks backfill at the PR #33 baseline (51 b/s, parallel-fetch + serial INSERT-with-ON-CONFLICT). Net cost: ~5x slower than #35's intended speedup, but functional.

## Follow-up

Reimplement bulk COPY with one of:
- COPY into temp table → INSERT … SELECT … ON CONFLICT DO NOTHING
- Pre-batch SELECT to filter already-indexed heights
- Use UNLOGGED temp + UPSERT pattern

Test with explicit restart-mid-batch repro before re-merging.

<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

## Summary by CodeRabbit

## Release Notes

* **Changed**
  * Refactored block synchronization mechanism to use concurrent fetching with sequential writing instead of batched operations.
  * Improved handling of network gaps and cancellation scenarios during block synchronization.
  * Updated documentation to reflect the revised synchronization model.
  * Removed batch-related configuration hints from environment examples.

<!-- review_stack_entry_start -->

[![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/Sentriscloud/indexer-rs/pull/36)

<!-- review_stack_entry_end -->

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

📝 Walkthrough

Walkthrough

This PR refactors the backfill pipeline from a bulk-COPY batched-write strategy to a pipelined concurrent-fetch model with sequential per-height writes. The backfill now concurrently fetches blocks within a sliding window bounded by DEFAULT_BACKFILL_CONCURRENCY and writes each result immediately in order to preserve cursor integrity. Gap handling (404s) writes cursor-only entries per block instead of accumulating gaps into batch flushes. Cancellation returns immediately with the current cursor value. The block writer analytics behavior changes from a shared helper call to inline per-transaction construction and post-commit push, with height-conversion and flusher-closure error handling. Configuration hints for batch-size and concurrency are removed from the example environment file, and the changelog entry describing the prior batching strategy is deleted.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Sentriscloud/indexer-rs#33: Modifies the same crates/sync/src/backfill.rs backfill loop to implement pipelined concurrent fetches with ordered serialized writes and updated cancellation/404 handling.
  • Sentriscloud/indexer-rs#25: Modifies ingest_one 404/damaged-block cursor handling in the same backfill flow to avoid jumping the cursor forward and instead record progress per missing height.
  • Sentriscloud/indexer-rs#20: Modifies the run_backfill/ingest_one backfill ingest flow in the same code path; one switches to native REST block fetching while the other refactors to pipelined concurrent fetching with serial per-height cursor semantics.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The description provides clear context (why the revert), a live reproduction case, performance implications, and follow-up recommendations, but does not follow the repository's required template structure. Fill in the required template sections (Summary, Scope checkboxes, Checks, Deploy impact) to match repository standards and ensure reviewer clarity.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: reverting PR #35 due to bulk-COPY causing duplicate-key crash-loops on restart.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch revert/35-bulk-copy-dup-key

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/sync/src/block_writer.rs`:
- Around line 68-90: The unchecked casts for RawTxRow fields timestamp
(b.block.timestamp), gas_used (t.gas_used), and status (t.status) can silently
wrap on negative values; change them to safe, checked conversions (e.g., use
u64::try_from(b.block.timestamp).ok() /
u64::try_from(t.gas_used.unwrap_or(0)).ok() and u8::try_from(t.status).ok()) and
if any conversion fails emit a tracing::warn (including the offending values)
and return Ok(()) to skip the malformed row, mirroring the defensive pattern
used for block_height (which uses .as_u64()); alternatively, if you can
guarantee non-negativity, add a clear comment above RawTxRow construction
documenting that invariant and assert it with debug_assert!(...) so the intent
is explicit.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 7faa5246-5f09-427a-b0e3-58eaa7ca5084

📥 Commits

Reviewing files that changed from the base of the PR and between 37a536a and 808ffea.

📒 Files selected for processing (4)
  • CHANGELOG.md
  • compose.env.example
  • crates/sync/src/backfill.rs
  • crates/sync/src/block_writer.rs
💤 Files with no reviewable changes (2)
  • compose.env.example
  • CHANGELOG.md

Comment on lines +68 to +90
let block_height = match b.block.height.as_u64() {
Some(h) => h,
None => {
tracing::warn!(
height = ?b.block.height,
"analytics: skipping row — block height not convertible to u64 \
(cursor sentinel reached writer; this should not happen)"
);
return Ok(());
}
buf.push('\t');
write_text(&mut buf, &t.fee.to_string());
buf.push('\t');
write_int(&mut buf, t.nonce);
buf.push('\t');
match &t.data {
Some(s) => write_text(&mut buf, s),
None => buf.push_str("\\N"),
}
buf.push('\t');
write_int(&mut buf, t.status as i64);
buf.push('\t');
match &t.contract_address {
Some(s) => write_text(&mut buf, s),
None => buf.push_str("\\N"),
};
for t in &b.txs {
let row = RawTxRow {
block_height,
timestamp: b.block.timestamp as u64,
tx_hash: t.hash.clone(),
from_addr: t.from_addr.clone(),
to_addr: t.to_addr.clone(),
value_str: t.value.to_string(),
fee_str: t.fee.to_string(),
gas_used: t.gas_used.unwrap_or(0) as u64,
status: t.status as u8,
tx_type: t.tx_type.as_str().to_string(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect source field types on the domain models and sink field types on RawTxRow.
rg -n -C3 'pub struct Block\b|pub struct Transaction\b|pub struct RawTxRow\b|pub timestamp:|pub gas_used:|pub status:|block_height:' --glob '*.rs'

Repository: Sentriscloud/indexer-rs

Length of output: 20841


Confirm source and sink types in the unchecked analytics casts are safe, or apply checked conversions.

Verification confirms the concern:

  • timestamp casts i64u64 (unchecked, signed to unsigned)
  • gas_used casts i64u64 (unchecked, signed to unsigned)
  • status casts i16u8 (unchecked, signed to unsigned and narrowing)

Unlike block_height, which defensively uses .as_u64() returning Option, these three casts are bare as operators. They will silently wrap if any source value is negative. While domain semantics (Unix timestamps, gas amounts, success/failure flags) suggest these should always be non-negative, unchecked signed-to-unsigned conversions can corrupt analytics data. Either confirm the source fields are always non-negative with a comment, or switch to checked conversions (e.g., u64::try_from(…)) that skip malformed rows with a warning, matching the defensive pattern used for block_height.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/sync/src/block_writer.rs` around lines 68 - 90, The unchecked casts
for RawTxRow fields timestamp (b.block.timestamp), gas_used (t.gas_used), and
status (t.status) can silently wrap on negative values; change them to safe,
checked conversions (e.g., use u64::try_from(b.block.timestamp).ok() /
u64::try_from(t.gas_used.unwrap_or(0)).ok() and u8::try_from(t.status).ok()) and
if any conversion fails emit a tracing::warn (including the offending values)
and return Ok(()) to skip the malformed row, mirroring the defensive pattern
used for block_height (which uses .as_u64()); alternatively, if you can
guarantee non-negativity, add a clear comment above RawTxRow construction
documenting that invariant and assert it with debug_assert!(...) so the intent
is explicit.

@satyakwok satyakwok merged commit 3017eca into main May 14, 2026
7 of 8 checks passed
@satyakwok satyakwok deleted the revert/35-bulk-copy-dup-key branch May 14, 2026 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant