Skip to content

feat(core/db): drop redundant idx_core_transactions_tx_hash#205

Merged
raymondjacobson merged 3 commits into
OpenAudio:mainfrom
RolfAris:feat/drop-redundant-tx-hash-index
May 11, 2026
Merged

feat(core/db): drop redundant idx_core_transactions_tx_hash#205
raymondjacobson merged 3 commits into
OpenAudio:mainfrom
RolfAris:feat/drop-redundant-tx-hash-index

Conversation

@RolfAris
Copy link
Copy Markdown
Contributor

@RolfAris RolfAris commented Apr 16, 2026

2026-05-02 update: This PR is now linked into a sibling-PR campaign at #218. Step 1 of 2; the parallel drop of idx_core_tx_hash on core_tx_stats is at #217. RFC has the consolidated zero-scan evidence across both tables. Either PR can land first.
The shipped tx-hash lookup (GetTx) uses lower(tx_hash) = lower($1) and rides idx_core_transactions_tx_hash_lower. The plain tx_hash index is not referenced by any query in pkg/core or pkg/etl; it is pure write-amp and disk overhead.

Drops online with CONCURRENTLY to avoid blocking block ingestion (a non-concurrent DROP INDEX would take ACCESS EXCLUSIVE on core_transactions). Rollback recreates the index online.

Operators running ad-hoc WHERE tx_hash = '…' in psql should switch to WHERE lower(tx_hash) = lower('…') — the functional index has been the canonical shipped pattern for a while.

Fleet evidence and EXPLAIN plans in comment below.

The shipped tx-hash lookup (GetTx) uses lower(tx_hash) = lower($1) and
rides idx_core_transactions_tx_hash_lower. The plain tx_hash index is
not referenced by any query in pkg/core or pkg/etl; it is pure write-amp
and disk overhead.

Drops online with CONCURRENTLY to avoid blocking block ingestion on a
live validator (a non-concurrent DROP would take ACCESS EXCLUSIVE on
core_transactions). Rollback recreates the index online.

Operators running ad-hoc WHERE tx_hash = '...' in psql should switch
to WHERE lower(tx_hash) = lower('...'); the functional index has been
the canonical shipped pattern for a while.
@RolfAris
Copy link
Copy Markdown
Contributor Author

Evidence

Code path audit (pkg/core/db/sql/reads.sql, main @ 2026-04-16)

Every core_transactions access:

query predicate index used
GetTx lower(tx_hash) = lower($1) idx_core_transactions_tx_hash_lower
GetBlockTransactions block_id = $1 idx_core_transactions_block_id
GetRecentTxs order by created_at desc idx_core_transactions_created_at
TotalTxResults count(tx_hash) any

No query in pkg/core or pkg/etl does exact-case tx_hash = $1 against core_transactions. Other tx_hash = $1 lookups in the tree target different tables (core_etl_tx, core_rewards, core_ern, core_mead, core_pie) and are untouched.

Live evidence — 20 independent validator nodes, Postgres 15.16

Stats cumulative since pg_postmaster_start_time ≈ 2026-04-13 (~3.2 days).

Per node (the universal unit — operators run 1, 3, 30 nodes):

metric (per node) value
idx_core_transactions_tx_hash size ~5.20 GiB
idx_core_transactions_tx_hash scans 0 (on all 20/20)
idx_core_transactions_tx_hash_lower scans > 0 14/20 nodes
core_transactions rows ~58.5–58.8M
core_transactions inserts / day ~50–120K
redundant btree inserts / day eliminated ~50–120K

Per node: ~5.20 GiB reclaimed; one fewer btree touched per core_transactions insert — smaller insert-path WAL, fewer dirty buffers, faster block apply. The write-amp reduction scales with a node's share of chain throughput, not with operator size.

EXPLAIN (ANALYZE, BUFFERS) — val001, val002, val020

```
Limit (cost=0.69..8.71 rows=1 width=557) (actual time=0.6–1.1 ms)
-> Index Scan using idx_core_transactions_tx_hash_lower on core_transactions
Index Cond: (lower(tx_hash) = lower($1))
```

Planner picks the functional index on every node tested. Sub-ms exec.

Our rollout plan

Canary one validator first, 24h soak, then staggered fleet rollout. Will report back if anything looks off.

@RolfAris
Copy link
Copy Markdown
Contributor Author

Canary report — val001, T+24h

DROP INDEX CONCURRENTLY IF EXISTS idx_core_transactions_tx_hash applied to val001 at 2026-04-16T10:42Z. 24h observation complete, posted automatically from the canary host.

Index state

  • idx_core_transactions_tx_hash: GONE
  • idx_core_transactions_tx_hash_lower: present (5313 MB)
  • Planner still selects the functional index for GetTx (EXPLAIN match: 1)
  • _lower scans: 583 → 742 (+159 in the 24h window)

Chain state

  • live=true, height=23042673, total_tx=59478326
  • peers healthy: 71/71

Logs (last 24h)

  • Postgres / migration / core_transactions errors: 0

No regressions observed. Ready to stagger-roll the remaining 19 nodes once this PR merges.

@raymondjacobson raymondjacobson self-requested a review May 11, 2026 22:58
@raymondjacobson raymondjacobson merged commit 42aea5d into OpenAudio:main May 11, 2026
4 checks passed
rickyrombo added a commit that referenced this pull request May 12, 2026
main shipped a different 00033 (drop_redundant_tx_hash_index, #205)
while this branch was open. Bump ours to 00034 to keep migration
ordering unambiguous; content is unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rickyrombo added a commit that referenced this pull request May 12, 2026
* Reward authority rotation primitive (PRs 222 + 225 + 228 bundled)

Three logical chunks bundled onto a single branch off main:

PR1 — Schema (mjp-reward-pools-schema):
  - New core_reward_pools table keyed by Solana RM pubkey, with a
    text[] authorities column (gin-indexed for @> containment).
  - launchpad_authority_rm seed table mapping every known launchpad-
    derived per-mint claim authority → its Solana reward manager
    state account. Used by both the migration backfill and PR2's
    wire-compat replay logic.
  - core_rewards.rewards_manager_pubkey FK column; claim_authorities
    column dropped (reads now alias coalesce(p.authorities, '{}')
    via LEFT JOIN on core_reward_pools).
  - Backfill creates one pool per RM (per-RM authority union across
    all rewards referencing it via launchpad lookup). Rows whose
    authorities don't match any launchpad RM stay NULL — there are
    no synthetic mig_<md5> identifiers.
  - Live finalizeCreateReward (legacy proto shape, brief PR1-only
    window) does launchpad lookup → bind to existing pool only;
    never upserts. NULL fallback if no match or pool missing.

PR2 — CometBFT transactions (mjp-reward-pools-tx):
  - New body+signature envelope: Tx { TxBody body; signatures[] }.
    Reward and RewardPool messages move to the new shape.
  - CreateRewardPool / SetRewardPoolAuthorities txs gated by
    real-RM-shape pubkey + signer ∈ current pool authorities.
  - CreateReward proto reserves tags 4-6 (former claim_authorities,
    deadline, signature) and uses tag 7 for rewards_manager_pubkey.
    DeleteReward reserves tags 2-3.
  - Wire-compat layer (rewards_legacy.go): legacy bytes are
    REJECTED at CheckTx/ProcessProposal (no new legacy txs
    accepted) but ACCEPTED at FinalizeBlock for block-sync replay
    of historical chain state. Replay uses launchpad lookup to
    bind legacy rewards to the same RM the migration produced.
  - Defense-in-depth re-validation at finalize for both pool txs
    (block-sync replay skips ProcessProposal / CheckTx).

PR3 — Validator endpoint cutover (mjp-reward-pools-endpoints):
  - GetRewardAttestation restored from the #215 kill-switch. Auth
    check uses dbReward.ClaimAuthorities, which is sourced from
    coalesce(p.authorities, '{}') — so rotating an authority out
    via SetRewardPoolAuthorities immediately revokes attestation
    rights. RewardClaim.RewardAddress is intentionally NOT set
    (Solana reward manager program expects 2-piece RewardID:
    Specifier disbursement_id).
  - GetRewardSenderAttestation / GetDeleteRewardSenderAttestation
    dispatch by RM: pool-gated if pool exists, else fall back to
    the legacy validator/AAO trust set (AUDIO path).
  - AUDIO RM denylist on validateRewardsManagerPubkey: prevents an
    attacker from creating a pool for the AUDIO RM and inheriting
    AUDIO sender attestations. Per-env constants in
    pkg/core/config/rewards.go (dev/prod populated; stage left
    empty intentionally).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Tighten reward-pool gating: union replay, AUDIO-only fallback

Three review-driven fixes on the bundle branch:

1. Replay/migration apphash divergence (#1).
   UpsertSyntheticRewardPool was a hard overwrite, which produced
   pool.authorities = last-replayed-reward.authorities on a from-genesis
   block-sync — diverging from the migration backfill, which UNIONs
   authorities across every legacy reward referencing the RM. Production
   data has at most one authority per reward today, so the bug doesn't
   currently manifest, but it's cheap insurance against future drift
   (multi-authority rewards, debug keys, etc.). The DO UPDATE clause now
   unions existing pool authorities with the incoming set.

   Renamed the query to UpsertLegacyReplayRewardPool to reflect its
   actual (and only) caller — the mig_<md5> shape was already gone (#5).

2. senderGateForRM AUDIO-only fallback (#2).
   The legacy validator/AAO trust set used to be the fallback for ANY
   RM without a pool. That was a quietly-permissive seam — any caller
   could request validator-signed attestations for an arbitrary unknown
   RM. Now the fallback applies only when the requested RM equals the
   configured AUDIO RM; every other no-pool RM gets
   ErrSenderGateUnknownRM, which the handlers map to InvalidArgument.

3. Stale doc comment in rewards_legacy.go (#6) saying the file did
   "synthetic-pool fallback for create" — predates the mig_<md5>
   removal. Updated to describe the launchpad-lookup behavior.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* CreateRewardPool: require ed25519 signature from RM keypair

Closes the pool-creation frontrunning vector. Today's
validateCreateRewardPool only requires signer ∈ initial_authorities,
which an attacker satisfies trivially by listing themselves. After a
new reward manager is initialized on Solana, an observer who watches
init events can race the legitimate launchpad operator's
CreateRewardPool and register a pool with attacker-chosen authorities;
the legitimate operator is then locked out (PK conflict on
rewards_manager_pubkey), and the attacker controls every reward and
sender attestation under the RM.

Defense rests on a property of the existing system: the Solana
rewardManagerState account is a deterministic ed25519 keypair, derived
by the launchpad relay as

  Keypair.fromSeed(sha256(launchpadDeterministicSecret ||
                          'audius-launchpad' ||
                          'reward-manager' ||
                          mint))

(see apps/.../solana-relay/.../launchpad/launch_coin.ts). The
launchpad has the secret and can re-derive the keypair at will; an
attacker who lacks the secret cannot. The 32-byte rewardManagerState
public key IS what cometbft has been carrying as
rewards_manager_pubkey — so we already have an ed25519 verification
key in hand at validate time.

This commit:

  1. Adds CreateRewardPool.rm_owner_signature (proto tag 3, bytes).
  2. Defines a canonical signing payload in pkg/rewards:
       "audius:create-reward-pool:" + chain_id + ":" +
       rm_pubkey_b58 + ":" + sorted_lowercased_authorities.join(",")
     and a SignCreateRewardPool helper for client-side use.
  3. validateCreateRewardPool and finalizeCreateRewardPool each call
     verifyRewardPoolOwnerSignature, which decodes rm_pubkey from
     base58 and runs ed25519.Verify against the canonical payload.
     Defense-in-depth at finalize matches the existing pattern for
     replay-time invariants.
  4. Updates SDK example (examples/rewards/main.go) and integration
     tests to populate the signature. New unit tests cover positive
     verification, canonicalization invariance, foreign-keypair
     rejection, cross-chain replay, mismatched authorities, malformed
     signature length, and rm_pubkey shape errors.

The existing signer ∈ initial_authorities check is retained alongside
the new ed25519 gate. They're independent: the ed25519 sig proves
control of the RM keypair (frontrunning defense); the membership
check enforces the existing "you can't create a pool you have no
membership in" property. Operationally, the launchpad relay holds
both the per-mint claim authority eth key (envelope signer + the only
initial authority) and the RM ed25519 keypair, so producing both
signatures is symmetric.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Move rm_owner_signature to envelope, sign body bytes

Restructuring on top of the previous commit: the ed25519
rm_owner_signature moves from CreateRewardPool.rm_owner_signature (tag
3, signing a custom canonical string) to
RewardPoolMessage.rm_owner_signature (envelope-level, signing the same
ProtoMarshal(body) bytes the secp256k1 envelope signature covers).

Why:

  - One encoding to maintain instead of two. Cross-language clients
    (the TS launchpad relay) now sign the same bytes for both
    signatures; no separate domain-separated string format to keep in
    sync.
  - Body bytes implicitly cover deadline_block_height + the action
    oneof discriminator. The earlier custom string didn't include
    deadline; stale-deadline replay was technically possible (though
    blocked by pool PK uniqueness).
  - Future fields added to RewardPoolBody / CreateRewardPool are
    automatically covered without revving the signing scheme.

Not included: chain_id in the body. Cross-chain replay isn't a
concrete threat — each environment's launchpad uses a different
deterministic secret, so the same rewards_manager_pubkey cannot be
derived on more than one chain. A captured CreateRewardPool replayed
on another chain refers to an RM that doesn't exist there.

Other changes:

  - pkg/common.ProtoSignableBytes (new): exports the deterministic-
    marshal helper so verifyRewardPoolOwnerSignature can hash the
    same bytes ProtoSign / ProtoRecover use.
  - SDK signAndSendRewardPool takes an rmOwnerSig parameter; the
    CreateRewardPool wrapper accepts an ed25519.PrivateKey and signs
    body bytes locally. SetRewardPoolAuthorities passes nil — rotation
    is gated by current pool authorities, no RM signature needed.
  - Removed pkg/rewards.SignCreateRewardPool /
    CanonicalCreateRewardPoolPayload / CreateRewardPoolOwnerSignatureDomain
    — replaced by the body-bytes signing path.
  - Updated unit tests, integration tests, and example to populate
    rmKey at the SDK call site rather than constructing a signed
    message struct.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Renumber reward-pools migration 00033 → 00034

main shipped a different 00033 (drop_redundant_tx_hash_index, #205)
while this branch was open. Bump ours to 00034 to keep migration
ordering unambiguous; content is unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Address PR #254 review feedback

Six Copilot-flagged items from the latest review:

1. Proto signature comments now spell out the actual pre-hashing:
   - RewardMessage.signature: secp256k1 over sha256(body bytes).
   - RewardPoolMessage.signature: same.
   - RewardPoolMessage.rm_owner_signature: ed25519 over body bytes
     directly (ed25519 hashes internally — do NOT pre-hash).
   Lets non-Go clients reproduce signatures without reading
   pkg/common/crypto.go.

2. Split validateRewardsManagerPubkey:
   - validateRewardsManagerPubkeyShape (new): non-empty, no whitespace,
     base58, 32 bytes. Pure shape. For read paths and rotation paths.
   - validateRewardsManagerPubkey (existing): shape + AUDIO denylist.
     Only for write paths (CreateRewardPool, CreateReward).

   Switched call sites:
   - validateSetRewardPoolAuthorities / finalizeSetRewardPoolAuthorities
     → Shape. SetAuthorities targets an existing pool; AUDIO has no
     pool by construction, so checkPoolAuthorization surfaces the case
     as "pool not found" rather than the misleading "is reserved".
   - GetRewardPool → Shape. Probing GetRewardPool(AudioRM) now returns
     a clean NotFound instead of InvalidArgument.
   - GetRewardSenderAttestation /
     GetDeleteRewardSenderAttestation → add Shape validation up front
     so malformed pubkeys return a clear InvalidArgument instead of
     falling through to ErrSenderGateUnknownRM (which is for valid-
     shape-but-unmapped RMs).

3. Removed the stale "chain_id is covered by signed body bytes"
   reference in validateCreateRewardPool's comment — the body
   doesn't carry chain_id, and that's intentional (cross-chain replay
   isn't a threat because per-env launchpad secrets prevent the same
   rewards_manager_pubkey from existing on more than one chain).

4. SDK CreateRewardPool now validates rmKey length up front and
   returns a typed error instead of panicking inside ed25519.Sign for
   callers that pass nil / hex-decode-wrong / public-key-by-mistake.

5. GetRewardAttestation now TrimSpaces eth_recipient_address,
   reward_address, and claim_authority at the boundary so
   surrounding whitespace returns a clean InvalidArgument here
   instead of a confusing hex-decode error deeper in
   RewardClaim.Compile.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Address PR #254 review feedback (round 2)

Six items from raymondjacobson:

1. examples/rewards/main.go: drop REWARDS_MANAGER_SECRET_HEX env var
   and generate a fresh ed25519 keypair inline. Strip the explainer
   comments — the simpler example is self-documenting.

2. pkg/core/config/rewards.go: remove the staging-specific AUDIO RM
   constant and the long comment about why staging is empty. The
   AudioRewardsManagerPubkey() switch no longer special-cases stage,
   so staging falls through to "" via the default branch, which the
   denylist treats as "no enforcement." The reward_pools_test save/
   restore no longer touches StageAudioRewardsManagerPubkey.

3. 00034_reward_pools.sql backfill comment: drop "leaked-key" framing,
   replace with neutral "additional entries."

4 + 6. Sweep PR1/PR2/PR3/PR #225 references out of all bundle code
   and comments — these labeled stacked-PR boundaries that no longer
   exist now that the work is bundled. Phrasing now describes what
   the code does, not which PR introduced it. Touched: connect.go,
   reward_pools.go, rewards.go, rewards_legacy.go, reads.sql,
   migration, proto, integration test.

5. reward_pools.go: drop the case-insensitive contains() helper and
   use slices.Contains across all call sites. Pool authorities are
   already canonicalized (lowercase) on write via
   CanonicalAuthorities, so callers just lowercase the needle. Removes
   ~10 lines and a custom helper in favor of stdlib.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
raymondjacobson added a commit that referenced this pull request May 12, 2026
The core_tx_stats table carries two btree indexes covering (tx_hash):

  "core_tx_stats_tx_hash_key" UNIQUE CONSTRAINT, btree (tx_hash)
  "idx_core_tx_hash"                              btree (tx_hash)

The UNIQUE constraint already provides a btree on tx_hash and dominates
the non-unique idx_core_tx_hash for every selectivity scenario.
pg_stat_user_indexes confirms idx_core_tx_hash accumulated zero scans
across all 20 OpenAudio fleet nodes after months of production traffic.

Each node holds ~5.20 GiB in this single redundant index. Total
dead-weight: ~104 GiB fleet-wide.

DROP CONCURRENTLY to avoid ACCESS EXCLUSIVE on core_tx_stats during
ingest. Rollback recreates online.

Sibling PR #205 covers the analogous drop on core_transactions
(idx_core_transactions_tx_hash, superseded by the functional lower()
index). The two are independent and can land in either order.

Co-authored-by: Ray Jacobson <ray@audius.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants