perf(migrations): speed up reward disbursements backfill#829
Merged
Conversation
The 0201 backfill is taking over an hour in prod. Three structural issues account for the slowdown: 1. The dedup LEFT JOIN on (challenge_id, specifier) has no index. sol_reward_disbursements is keyed by (signature, instruction_index) and only indexed on recipient_eth_address and created_at. The join degenerates to a sequential scan per challenge_disbursements row. 2. The LATERAL subquery against sol_claimable_accounts re-runs an "ORDER BY slot DESC LIMIT 1" filter per row, without an index on (ethereum_address, mint). 3. The on_sol_reward_disbursement trigger fires for every insert, doing three SELECTs and possibly an INSERT into notification — 29k rows × that overhead is significant, and notifying users about months-old historical rewards is undesirable anyway. Fixes: - Add sol_reward_disbursements (challenge_id, specifier) index. Useful permanently, not just for this migration. CREATE CONCURRENTLY so the live indexer's writes aren't blocked; moved outside the BEGIN/COMMIT since CONCURRENTLY can't run inside an explicit transaction (psql runs each statement in its own implicit tx when not wrapped). - Add sol_claimable_accounts (ethereum_address, mint, slot DESC) index. Same reasoning — the live indexer also benefits from this lookup shape for user_bank resolution. - Replace the per-row LATERAL with a MATERIALIZED CTE that pre-computes DISTINCT ON (ethereum_address) once, then hash-joins. One indexed scan instead of N LATERAL invocations. - SET LOCAL session_replication_role = replica inside the backfill transaction to suppress on_sol_reward_disbursement. LOCAL keeps the setting scoped to this transaction so concurrent indexer writes still fire the trigger normally. Both index creations use IF NOT EXISTS so re-running is safe; the backfill INSERT is already idempotent via ON CONFLICT DO NOTHING.
4 tasks
rickyrombo
added a commit
that referenced
this pull request
May 19, 2026
## Summary
- Switches `0201_backfill_missing_reward_disbursements.sql` from `CREATE
INDEX CONCURRENTLY` to plain `CREATE INDEX` inside the migration's
`BEGIN/COMMIT`.
- Both indexes (`sol_reward_disbursements (challenge_id, specifier)` and
`sol_claimable_accounts (ethereum_address, mint, slot DESC)`) are now
atomic with the backfill INSERT — if anything fails, the schema rolls
back cleanly.
## Why
`CREATE INDEX CONCURRENTLY` waits on a `virtualxid` lock for every
transaction open during its build phases — not just transactions that
touch the target table, but every one in the cluster.
The legacy Python `index_rewards_manager` Celery task on
discovery-provider keeps ~3-minute transactions open against
`challenge_disbursements` continuously. As fast as one ends, another is
already open. So the CONCURRENTLY build can wait indefinitely without
ever seeing a quiet moment — and it did, for 10+ minutes blocked on
`Lock/virtualxid` in tonight's deploy.
Trade-off accepted: regular `CREATE INDEX` takes a `ShareLock` on the
target table for the duration of the build, blocking writes. But both
target tables are written only by the Go indexer, and only on
reward_manager `EvaluateAttestations` and claimable token `Create`
instructions — sparse on-chain. At current row counts each build
completes in seconds; the blocked writes just queue on pgxpool and
resume right after.
## Test plan
- [ ] Cancel any in-flight 0201 attempt and drop any invalid index it
left behind:
```sql
SELECT pg_cancel_backend(pid) FROM pg_stat_activity
WHERE query ILIKE 'CREATE INDEX CONCURRENTLY%';
DROP INDEX IF EXISTS sol_reward_disbursements_challenge_specifier_idx;
DROP INDEX IF EXISTS sol_claimable_accounts_eth_mint_slot_idx;
```
- [ ] Roll the new image; migration Job's `bridge migrate` should
complete in well under a minute.
- [ ] Verify both indexes exist as `indisvalid = true`:
```sql
SELECT indexrelid::regclass, indisvalid FROM pg_index
WHERE indexrelid::regclass::text IN (
'sol_reward_disbursements_challenge_specifier_idx',
'sol_claimable_accounts_eth_mint_slot_idx'
);
```
- [ ] Verify missing-row count drops as expected (per #829's test plan).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CREATE INDEX CONCURRENTLYstatements at the top of0201_backfill_missing_reward_disbursements.sql(outside theBEGIN/COMMIT):sol_reward_disbursements (challenge_id, specifier)— lets the dedupLEFT JOINfind existing rows by index instead of a per-row sequential scan.sol_claimable_accounts (ethereum_address, mint, slot DESC)— supports the "latest claimable account per wallet" lookup pattern (used by this migration and the live reward_manager indexer).LATERALsubquery with aWITH user_banks AS MATERIALIZEDCTE that pre-computesDISTINCT ON (ethereum_address)once and hash-joins against the result.SET LOCAL session_replication_role = replicainside the backfill transaction to suppress theon_sol_reward_disbursementtrigger, which fires per row to createchallenge_rewardnotifications +pg_notify. For a one-shot backfill of months-old historical rewards we don't want to spam users, and the trigger work was a meaningful chunk of the per-row cost.Why
The 0201 backfill is taking over an hour against prod. Diagnosis:
(challenge_id, specifier)had no index —sol_reward_disbursementsis keyed by(signature, instruction_index), and the only other indexes (from 0198) are onrecipient_eth_addressandcreated_at.LATERALagainstsol_claimable_accountsreranORDER BY slot DESC LIMIT 1per row.With the new index alone, the LEFT JOIN goes from O(n×m) to O(n log m). With the trigger off and the CTE substitution, the per-row work drops correspondingly. Expected runtime: well under a minute, vs >1h currently.
Migration idempotency
CREATE INDEX CONCURRENTLY IF NOT EXISTS— safe to re-run; existing valid indexes are no-ops, existing invalid indexes (from a previous failed CONCURRENTLY run) require manualDROP INDEXfirst.INSERT … ON CONFLICT (signature, instruction_index) DO NOTHING— unchanged; safe on re-run.Test plan
pg_cancel_backend(<pid>)on the stuck session).SELECT indexname, indisvalid FROM pg_indexes JOIN pg_class ON relname = indexname JOIN pg_index USING (indexrelid) WHERE indexname IN ('sol_reward_disbursements_challenge_specifier_idx', 'sol_claimable_accounts_eth_mint_slot_idx');— drop any invalid ones.SELECT COUNT(*) FROM challenge_disbursements cd LEFT JOIN sol_reward_disbursements rd ON rd.challenge_id = cd.challenge_id AND rd.specifier = cd.specifier WHERE rd.signature IS NULL AND cd.slot > 355300886;— should drop from ~29k toward 0 (modulo the no-current-user bucket which is intentionally not recoverable).🤖 Generated with Claude Code