feat(sync): decode ERC-20/ERC-721 transfers + drop orphan logs#46
Conversation
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughThis PR adds decoding of ERC-20/ERC-721 Transfer logs into TokenTransfer records, a DB batch-insert function for TokenTransfer rows, and integrates token_transfers into BlockBundle. Backfill and ingest paths filter logs to those whose tx_hash exists in decoded transactions, derive token_transfers from filtered logs, and both single-block and batched writers persist transfers atomically with blocks/transactions/logs. Sequence DiagramsequenceDiagram
participant Source as backfill.rs / ingest_one
participant Filter as Orphan Filter
participant Decoder as token_decode::decode_transfer
participant Writer as block_writer
participant DB as Database
Source->>Source: Fetch block logs and decoded transactions
Source->>Filter: Build tx_hash set from transactions
Source->>Filter: Filter logs by tx_hash presence
Filter->>Decoder: Filtered logs
loop Each log
Decoder->>Decoder: Decode to TokenTransfer (validate topic0, parse addresses/amount/token_id)
end
Decoder->>Source: token_transfers: Vec<TokenTransfer>
Source->>Writer: BlockBundle { blocks, txs, logs, token_transfers }
Writer->>DB: BEGIN TRANSACTION
Writer->>DB: INSERT blocks
Writer->>DB: INSERT transactions
Writer->>DB: INSERT logs
Writer->>DB: INSERT token_transfers (batch via token_transfers::insert_batch)
Writer->>DB: COMMIT
DB-->>Writer: Success
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/db/src/token_transfers.rs`:
- Around line 33-34: The doc comment above the batch insert claiming "Retry-safe
via `(tx_hash, log_index)` ON CONFLICT DO NOTHING" is incorrect; either
implement the promised dedup behavior by adding a unique constraint on (tx_hash,
log_index) and using ON CONFLICT DO NOTHING in the batch insert SQL, or update
the doc comment to remove the ON CONFLICT claim and clearly state that the table
has no unique constraint (matching the note at line 57). Locate the comment and
the batch-insert logic in token_transfers.rs (search for the doc string
mentioning tx_hash, log_index and the batch insert function) and make the
change: if you choose the DB change, add the unique constraint and adjust SQL to
use ON CONFLICT; otherwise, revise the comment to accurately reflect no
conflict-handling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 5b44eca0-8206-4f1a-80d2-5f9129bd4c74
📒 Files selected for processing (5)
crates/db/src/token_transfers.rscrates/sync/src/backfill.rscrates/sync/src/block_writer.rscrates/sync/src/lib.rscrates/sync/src/token_decode.rs
| /// Batch insert. Retry-safe via `(tx_hash, log_index)` ON CONFLICT DO NOTHING | ||
| /// — matches the dedup contract on the raw `logs` table. |
There was a problem hiding this comment.
Doc comment claims ON CONFLICT handling that doesn't exist.
The doc comment states "Retry-safe via (tx_hash, log_index) ON CONFLICT DO NOTHING" but line 57 explicitly notes "No ON CONFLICT — table has no unique constraint." This inconsistency could mislead future maintainers into assuming duplicate protection exists when it doesn't.
📝 Suggested fix
-/// Batch insert. Retry-safe via `(tx_hash, log_index)` ON CONFLICT DO NOTHING
-/// — matches the dedup contract on the raw `logs` table.
+/// Batch insert. Note: table lacks a unique constraint on `(tx_hash, log_index)`,
+/// so retry safety relies on the writer's atomic cursor advance preventing
+/// re-processing. Adding the unique index is a follow-up migration.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Batch insert. Retry-safe via `(tx_hash, log_index)` ON CONFLICT DO NOTHING | |
| /// — matches the dedup contract on the raw `logs` table. | |
| /// Batch insert. Note: table lacks a unique constraint on `(tx_hash, log_index)`, | |
| /// so retry safety relies on the writer's atomic cursor advance preventing | |
| /// re-processing. Adding the unique index is a follow-up migration. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/db/src/token_transfers.rs` around lines 33 - 34, The doc comment above
the batch insert claiming "Retry-safe via `(tx_hash, log_index)` ON CONFLICT DO
NOTHING" is incorrect; either implement the promised dedup behavior by adding a
unique constraint on (tx_hash, log_index) and using ON CONFLICT DO NOTHING in
the batch insert SQL, or update the doc comment to remove the ON CONFLICT claim
and clearly state that the table has no unique constraint (matching the note at
line 57). Locate the comment and the batch-insert logic in token_transfers.rs
(search for the doc string mentioning tx_hash, log_index and the batch insert
function) and make the change: if you choose the DB change, add the unique
constraint and adjust SQL to use ON CONFLICT; otherwise, revise the comment to
accurately reflect no conflict-handling.
Two related fixes that surfaced together during testnet recovery on 2026-05-20. ## ERC-20 / ERC-721 Transfer decoder The `token_transfers` table existed but was never populated — the `indexer-handlers` crate is still a Phase-0 placeholder, so raw logs landed in `logs` and went nowhere. Scan UIs that asked for an address's token balances saw an empty list even though the underlying Transfer events were present. Inline decoder in `sync::token_decode`: - topic0 `0xddf252ad…` matches the canonical Transfer signature. - ERC-20: topics = [sig, from, to], 32-byte data carries `amount`. - ERC-721: topics = [sig, from, to, token_id], no data, amount = 1. - ERC-1155 is out of scope here (different topic0, richer encoding). Decoded transfers flow through `BlockBundle.token_transfers` and the existing block-writer transaction — they commit atomically with the block, txs, and logs the chain already wrote. False-positive risk: any contract emitting a Transfer-shaped event with three indexed args will decode as ERC-721 here. Acceptable for the visibility goal; precise registry-driven classification is the declarative-handler workstream's job. ## Orphan-log filter Sentrix's native `/chain/blocks/<n>` and `eth_getLogs` can disagree — a tx whose effects reverted gets stripped from the block tx vec but its log envelopes still come back from `eth_getLogs`. The `logs.tx_hash → transactions.hash` FK then blew the whole batch transaction and the backfill loop stalled at the first such block. Drop logs whose tx_hash isn't backed by a tx row in the same bundle (both fetch_one + ingest_one paths). Such logs are orphans on this chain by definition; preserving them would only buy us repeated FK-violation rollbacks. ## Schema note `token_transfers` has no unique constraint on (tx_hash, log_index), so the batch insert skips ON CONFLICT. The writer's atomic cursor advance prevents re-processing the same block in the steady state; reorg recovery deletes downstream rows before re-insert. Adding the unique index is a follow-up migration.
74a8eb9 to
60eaacd
Compare
…arn (#47) Two audit follow-ups against the ERC-20 handler PR (#46): ## M-3: token_transfers unique constraint `token_transfers` had only `PRIMARY KEY (id)` when shipped. On reorg recovery + replay, the same `(tx_hash, log_index)` rows would re-insert without a uniqueness gate, producing duplicate transfer rows that double-count balances downstream. The atomic cursor advance in the steady-state writer prevented this in practice, but reorg / forensic- rerun paths bypass that protection. - New migration `0003_token_transfers_unique.sql` adds `transfers_tx_log_unique_idx` after deduping any existing duplicates. - `insert_batch` re-enabled `ON CONFLICT (tx_hash, log_index) DO NOTHING` so replay is now idempotent at the DB layer in addition to the cursor-based writer guard. The DELETE in the migration is a no-op for chains that have only run the indexer post-#46 (no duplicates accumulated); kept in place to keep the migration replay-safe on chains that did. ## M-5: raise orphan-log filter to warn `fetch_one` / `ingest_one` drop logs whose tx_hash isn't backed by a tx row in the same bundle (added in #46 to work around a Sentrix-side data inconsistency where reverted-tx log envelopes still come back from `eth_getLogs`). Original commit logged at `debug!`, which is silent at default log level. Sustained high-volume drops here signal a real chain bug worth investigating, so promote to `warn!` to make the gap visible without changing logging config. Verified locally: no compile changes, existing tests pass. Co-authored-by: satyakwok <satyakwok@users.noreply.github.com>
Audit M-4 follow-up against #46. The Transfer selector `0xddf252ad…` is shared by: - ERC-20 `Transfer(address indexed, address indexed, uint256)` — three topics + 32-byte unindexed data carrying amount. - ERC-721 `Transfer(address indexed, address indexed, uint256 indexed)` — four topics, no unindexed data. Custom events with three indexed args + unindexed data (eg. `Transfer(address indexed, address indexed, address indexed, uint256)`) hit the same selector. The original decoder distinguished only on topic count, so any such custom event got recorded as an ERC-721 with a fabricated `amount = 1` and a token_id taken from whatever the third indexed argument happened to be. That polluted `token_transfers` with false NFT rows and broke balance aggregation on the indexer's `/accounts/{addr}/tokens` endpoint. Per ERC-721 spec the Transfer event has NO unindexed data. Add a data-must-be-empty gate to the ERC-721 path; return `None` if data is present so the row drops out of the index entirely. Real NFT contracts emit empty data and continue to decode correctly. Real custom events with three indexed args fall through silently — the cost is missing them from `token_transfers`, but they weren't meaningfully decoded before either. Adds a regression test (`skips_three_indexed_args_with_nonempty_data`). Co-authored-by: satyakwok <satyakwok@users.noreply.github.com>
Replaces #45 (which was based on
feat/multi-endpoint-batch-writesand conflicted with main). This branch carries the same single commit, cherry-picked onto current main.See #45 for the full PR description; nothing else changed.
Test plan
cargo test -p indexer-sync token_decode(4/4 passing)token_transfers(verified via Sentrix testnet sUSDC route, 5 mints + 1 burn)Summary by CodeRabbit
New Features
Tests