Skip to content

feat(indexer): is_contract auto-detection + faster testnet backfill#9

Merged
satyakwok merged 3 commits into
mainfrom
feat/native-tx-shape-adapter
May 5, 2026
Merged

feat(indexer): is_contract auto-detection + faster testnet backfill#9
satyakwok merged 3 commits into
mainfrom
feat/native-tx-shape-adapter

Conversation

@satyakwok
Copy link
Copy Markdown
Member

Two follow-ups to the addresses-table fix from #8 — surfaced by deeper review of why /contracts/stats still returned empty even after addresses started populating.

1. Contract-detection worker (apps/indexer/src/contract-detect.ts)

The hot tx-insertion path in sync.ts upserts addresses with is_contract=false + code_hash=NULL to keep tx insertion fast. This worker runs in the background and flips the flag for addresses with non-empty bytecode.

  • Cadence: 10 addresses / 4 seconds (~150 contracts/min capacity)
  • Runs eth_getCode per address; non-empty → is_contract=true + code_hash=keccak256(code); empty → code_hash="0x" sentinel so we don't re-probe EOAs
  • Single-address failures don't block the batch (transient 502s retry on next tick)

Without this, addresses that came in via the sync.ts upsert sat with is_contract=false forever, and /contracts/stats (which INNER JOINs on is_contract=true) was permanently empty regardless of how full addresses got.

2. Backfill batch size 50 → 500 (testnet)

Testnet currently sits 2.5M blocks ahead of the main cursor; at INDEXER_BATCH_SIZE=50 the catch-up ETA was ~70h. Bump to 500 in docker-compose.testnet.yml only — mainnet stays at default. Each block fetch is independent and retry429() handles transient 429/502s, so no observed RPC pressure increase from larger batches.

3. SentrixClient.getCode(address)

Thin wrapper around viem's getBytecode with EOA-as-"0x" normalisation so the detector worker can use a string sentinel instead of special-casing undefined.

satyakwok added 3 commits May 5, 2026 16:41
Image `wget`s 127.0.0.1:8081/health (api) and 127.0.0.1:8082/health
(worker) by default — that's the mainnet stack's port layout. Testnet
relocates both via API_PORT=8083 + INDEXER_HEALTH_PORT=8084 to share
the host with the mainnet stack, but the bake-time healthcheck still
hit the old ports → exit 1 → docker reported unhealthy even though
both services were happily serving 200s on testnet-api.sentrixchain.com.

Add explicit `healthcheck:` blocks on each compose service that point
at the relocated ports. Same interval / timeout / retries as the
Dockerfile defaults so behaviour matches mainnet otherwise.

Verified: post-recreate, `docker inspect -f '{{.State.Health.Status}}'`
returns healthy on both `sentrix-indexer-testnet-{api,worker}` within
seconds of start_period elapsing.
indexBlock was writing blocks/transactions/logs/token_transfers but never
upserting into the addresses table — so addresses sat empty even after
50K+ indexed txs. Any UI/API that lists "addresses we've seen" (eg
/contracts/stats, scan recent-deployments feed) returned nothing.

Adds per-tx upsert of from + to (when non-null), tracking
first_seen_block / last_seen_block. Coinbase sentinel skipped on the from
side so the all-zero address doesn't claim a row from validator rewards.
is_contract stays false at insert time; a separate eth_getCode pass marks
it true for addresses with non-empty code (cheap, lazy, out of the hot
write path).

Surfaced by PR #8266 reviewer asking why a deployed contract didn't
appear in any list — the contract is on-chain and readable via
eth_getCode, but our indexer's address-derived endpoints had no row to
return.
Two follow-ups to the addresses-table fix (PR #8):

1. **Contract detection worker** (`apps/indexer/src/contract-detect.ts`).
   The hot tx-insertion path in sync.ts upserts addresses with
   is_contract=false + code_hash=NULL because doing eth_getCode mid-batch
   would dominate runtime. This worker runs in the background, picks up
   addresses with code_hash IS NULL, and flips the flag based on whether
   the chain reports any deployed code. Slow cadence (10 addrs / 4s) so
   a fresh boot doesn't fire 1000+ getCode calls in one second.
   Uses a "0x" sentinel for code_hash on EOAs so we never re-probe them.

2. **Backfill batch size 50 -> 500** in `docker-compose.testnet.yml`.
   Testnet sits 2.5M blocks ahead of the indexer's main cursor; at
   50/batch the catch-up ETA was ~70h. 500/batch trims that to ~7h with
   no observed RPC pressure increase (chain has retry429 handling).

3. `SentrixClient.getCode(address)` thin wrapper around viem's
   `getBytecode`. Returns "0x" when the address is an EOA so the
   detector worker can use a string sentinel rather than special-case
   undefined.

Surfaced by the PR #8266 audit: even after the addresses-table fix
(commit 037662d), `/contracts/stats` returned empty because every
new row had is_contract=false by default and only one address (manually
upserted) was flipped. With the auto-detector running, every contract
deployed across the chain will surface in addresses-table queries
within seconds of being indexed.
@satyakwok satyakwok merged commit 4f966d5 into main May 5, 2026
satyakwok added a commit that referenced this pull request May 5, 2026
Pairs with the contract-detect worker (PR #9): once an address gets
flipped to is_contract=true, this endpoint surfaces it ordered by
first_seen_block DESC. Doesn't depend on the transactions table —
unlike /contracts/stats which INNER JOINs on indexed call history and
lags the addresses table by hours during backfill catch-up.

Surfaced by PR #8266 reviewer feedback: a deployed contract should
appear in the explorer's contracts list immediately, not just via
direct address lookup.

Schema returns rank, address, first_seen_block, last_seen_block,
code_hash. limit clamped to MAX_PAGE (100).
satyakwok added a commit that referenced this pull request May 5, 2026
* ops(testnet): override Dockerfile healthcheck for relocated ports

Image `wget`s 127.0.0.1:8081/health (api) and 127.0.0.1:8082/health
(worker) by default — that's the mainnet stack's port layout. Testnet
relocates both via API_PORT=8083 + INDEXER_HEALTH_PORT=8084 to share
the host with the mainnet stack, but the bake-time healthcheck still
hit the old ports → exit 1 → docker reported unhealthy even though
both services were happily serving 200s on testnet-api.sentrixchain.com.

Add explicit `healthcheck:` blocks on each compose service that point
at the relocated ports. Same interval / timeout / retries as the
Dockerfile defaults so behaviour matches mainnet otherwise.

Verified: post-recreate, `docker inspect -f '{{.State.Health.Status}}'`
returns healthy on both `sentrix-indexer-testnet-{api,worker}` within
seconds of start_period elapsing.

* fix(indexer): populate addresses table from each tx

indexBlock was writing blocks/transactions/logs/token_transfers but never
upserting into the addresses table — so addresses sat empty even after
50K+ indexed txs. Any UI/API that lists "addresses we've seen" (eg
/contracts/stats, scan recent-deployments feed) returned nothing.

Adds per-tx upsert of from + to (when non-null), tracking
first_seen_block / last_seen_block. Coinbase sentinel skipped on the from
side so the all-zero address doesn't claim a row from validator rewards.
is_contract stays false at insert time; a separate eth_getCode pass marks
it true for addresses with non-empty code (cheap, lazy, out of the hot
write path).

Surfaced by PR #8266 reviewer asking why a deployed contract didn't
appear in any list — the contract is on-chain and readable via
eth_getCode, but our indexer's address-derived endpoints had no row to
return.

* feat(indexer): is_contract auto-detection + faster backfill

Two follow-ups to the addresses-table fix (PR #8):

1. **Contract detection worker** (`apps/indexer/src/contract-detect.ts`).
   The hot tx-insertion path in sync.ts upserts addresses with
   is_contract=false + code_hash=NULL because doing eth_getCode mid-batch
   would dominate runtime. This worker runs in the background, picks up
   addresses with code_hash IS NULL, and flips the flag based on whether
   the chain reports any deployed code. Slow cadence (10 addrs / 4s) so
   a fresh boot doesn't fire 1000+ getCode calls in one second.
   Uses a "0x" sentinel for code_hash on EOAs so we never re-probe them.

2. **Backfill batch size 50 -> 500** in `docker-compose.testnet.yml`.
   Testnet sits 2.5M blocks ahead of the indexer's main cursor; at
   50/batch the catch-up ETA was ~70h. 500/batch trims that to ~7h with
   no observed RPC pressure increase (chain has retry429 handling).

3. `SentrixClient.getCode(address)` thin wrapper around viem's
   `getBytecode`. Returns "0x" when the address is an EOA so the
   detector worker can use a string sentinel rather than special-case
   undefined.

Surfaced by the PR #8266 audit: even after the addresses-table fix
(commit 037662d), `/contracts/stats` returned empty because every
new row had is_contract=false by default and only one address (manually
upserted) was flipped. With the auto-detector running, every contract
deployed across the chain will surface in addresses-table queries
within seconds of being indexed.

* feat(api): /contracts/recent endpoint — addresses by deployment height

Pairs with the contract-detect worker (PR #9): once an address gets
flipped to is_contract=true, this endpoint surfaces it ordered by
first_seen_block DESC. Doesn't depend on the transactions table —
unlike /contracts/stats which INNER JOINs on indexed call history and
lags the addresses table by hours during backfill catch-up.

Surfaced by PR #8266 reviewer feedback: a deployed contract should
appear in the explorer's contracts list immediately, not just via
direct address lookup.

Schema returns rank, address, first_seen_block, last_seen_block,
code_hash. limit clamped to MAX_PAGE (100).

---------

Co-authored-by: satyakwok <satyakwok@users.noreply.github.com>
satyakwok added a commit that referenced this pull request May 7, 2026
…n insert (#11)

* ops(testnet): override Dockerfile healthcheck for relocated ports

Image `wget`s 127.0.0.1:8081/health (api) and 127.0.0.1:8082/health
(worker) by default — that's the mainnet stack's port layout. Testnet
relocates both via API_PORT=8083 + INDEXER_HEALTH_PORT=8084 to share
the host with the mainnet stack, but the bake-time healthcheck still
hit the old ports → exit 1 → docker reported unhealthy even though
both services were happily serving 200s on testnet-api.sentrixchain.com.

Add explicit `healthcheck:` blocks on each compose service that point
at the relocated ports. Same interval / timeout / retries as the
Dockerfile defaults so behaviour matches mainnet otherwise.

Verified: post-recreate, `docker inspect -f '{{.State.Health.Status}}'`
returns healthy on both `sentrix-indexer-testnet-{api,worker}` within
seconds of start_period elapsing.

* fix(indexer): populate addresses table from each tx

indexBlock was writing blocks/transactions/logs/token_transfers but never
upserting into the addresses table — so addresses sat empty even after
50K+ indexed txs. Any UI/API that lists "addresses we've seen" (eg
/contracts/stats, scan recent-deployments feed) returned nothing.

Adds per-tx upsert of from + to (when non-null), tracking
first_seen_block / last_seen_block. Coinbase sentinel skipped on the from
side so the all-zero address doesn't claim a row from validator rewards.
is_contract stays false at insert time; a separate eth_getCode pass marks
it true for addresses with non-empty code (cheap, lazy, out of the hot
write path).

Surfaced by PR #8266 reviewer asking why a deployed contract didn't
appear in any list — the contract is on-chain and readable via
eth_getCode, but our indexer's address-derived endpoints had no row to
return.

* feat(indexer): is_contract auto-detection + faster backfill

Two follow-ups to the addresses-table fix (PR #8):

1. **Contract detection worker** (`apps/indexer/src/contract-detect.ts`).
   The hot tx-insertion path in sync.ts upserts addresses with
   is_contract=false + code_hash=NULL because doing eth_getCode mid-batch
   would dominate runtime. This worker runs in the background, picks up
   addresses with code_hash IS NULL, and flips the flag based on whether
   the chain reports any deployed code. Slow cadence (10 addrs / 4s) so
   a fresh boot doesn't fire 1000+ getCode calls in one second.
   Uses a "0x" sentinel for code_hash on EOAs so we never re-probe them.

2. **Backfill batch size 50 -> 500** in `docker-compose.testnet.yml`.
   Testnet sits 2.5M blocks ahead of the indexer's main cursor; at
   50/batch the catch-up ETA was ~70h. 500/batch trims that to ~7h with
   no observed RPC pressure increase (chain has retry429 handling).

3. `SentrixClient.getCode(address)` thin wrapper around viem's
   `getBytecode`. Returns "0x" when the address is an EOA so the
   detector worker can use a string sentinel rather than special-case
   undefined.

Surfaced by the PR #8266 audit: even after the addresses-table fix
(commit 037662d), `/contracts/stats` returned empty because every
new row had is_contract=false by default and only one address (manually
upserted) was flipped. With the auto-detector running, every contract
deployed across the chain will surface in addresses-table queries
within seconds of being indexed.

* feat(api): /contracts/recent endpoint — addresses by deployment height

Pairs with the contract-detect worker (PR #9): once an address gets
flipped to is_contract=true, this endpoint surfaces it ordered by
first_seen_block DESC. Doesn't depend on the transactions table —
unlike /contracts/stats which INNER JOINs on indexed call history and
lags the addresses table by hours during backfill catch-up.

Surfaced by PR #8266 reviewer feedback: a deployed contract should
appear in the explorer's contracts list immediately, not just via
direct address lookup.

Schema returns rank, address, first_seen_block, last_seen_block,
code_hash. limit clamped to MAX_PAGE (100).

* feat(api): /coinblast/whales — buys+sells above an SRX threshold

The CoinBlast /live frontend wants a "Whale Activity" strip that
surfaces large single trades alongside the regular feed. Pre-fix the
client would have to fetch /coinblast/trades?limit=200 and filter by
srx_amount in JS — wasteful (most rows aren't whale-sized) and racy
(any window narrower than the page miss the threshold tail).

New endpoint:

  GET /coinblast/whales?threshold=<srx>&limit=<N>

threshold is decimal SRX (default 100). We multiply ×1e18 in pg-side
numeric to compare against cb_trades.srx_amount (numeric(78,0))
without ever round-tripping through JS Number — wallet-sized whales
sit comfortably above 2^53. Graduations are excluded server-side
(one-shot supply migrations, not user trades).

Order: srx_amount desc tie-broken by block_number desc, so the panel
leads with the biggest single trade in the window and falls back to
recency for equal-size whales.

* feat(indexer): cb_tokens metadata fields + sig-gated POST endpoint

Adds image_url, description, twitter_url, telegram_url, website_url,
metadata_updated_at columns to cb_tokens (all nullable; NULL until the
owner posts metadata).

New endpoint POST /coinblast/metadata accepts { curve_address, stamp_ms,
signature, image_url, description, twitter_url, telegram_url,
website_url } and updates the row only when the EIP-191 signature
recovers to the indexed owner_address. Replay window 5 min on stamp_ms.

Closes the gap that recordLocalLaunch only stored metadata in the
launching browser's localStorage — multi-browser visibility now works
via indexer.

* fix(indexer): normalize log address + topics + tx hash to lowercase on insert

txs.fromAddr / txs.toAddr already store lowercase (sync.ts:112-114).
Downstream consumers (scan, faucet, indexer endpoints) query with
lowercase WHERE clauses + JOINs. But logs.address and
tokenTransfers.contract were inserted as-is from viem's getLogsRange
output, which depending on the RPC implementation can be EIP-55
checksum (mixed-case).

Result: address-history queries, token-event filters, and
tokenTransfers JOINs would silently miss events for any contract whose
address came back checksummed. Bug class: data-correctness, not crash.
Hard to detect without running queries against real production data.

Defense-in-depth fix: explicitly .toLowerCase() the log address, all
four topics (selector-prefix LIKE patterns work either way, but
consistent), tx hash, and the tokenTransfers.contract column. logAddr
is computed once per log and reused at all three tokenTransfers insert
sites (erc20, erc721, erc1155).

tsc clean. No schema migration needed — column type stays varchar(42)
with no case-sensitive constraint.

---------

Co-authored-by: satyakwok <satyakwok@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant