Skip to content

Rates Engine v0.5.0-rc.108

Pre-release
Pre-release

Choose a tag to compare

@github-actions github-actions released this 10 Jun 14:10
· 349 commits to main since this release

[v0.5.0-rc.108] — 2026-06-10

Tested against Stellar Protocol 23 (Whisk).

Operator notes:

  • The census + retention-scope completeness fixes take effect once the indexer
    (and the ratesengine-ops binary) are deployed via deploy.yml. Until then
    the live census still records both-zero no-ops.
  • The trades table must have NO retention policy (migration 0031 — keep raw
    forever). If timescaledb_information.jobs shows a policy_retention on
    trades, it's drift — remove_retention_policy('trades').

Added

  • GET /v1/assets/{asset_id}/supply + explorer supply panel (ADR-0034).
    Exposes the live decode-at-ingest supply: Σmint − Σburn − Σclawback from the
    supply_flows lake, current to the latest ledger with no rollup refresh.
    Resolves a Soroban contract id (C…) directly, a classic asset via the
    operator's SAC wrappers (404 if unmapped), and native/XLM from the ledger
    header total_coins (source=ledger_total_coins). Amounts are decimal
    strings (ADR-0003). The API server gains a pooled clickhouse.SupplyReader
    (nil when ClickHouse isn't configured → endpoint 503s; non-fatal at boot).
    The explorer's Supply tab now leads with a live "On-chain supply" section
    (total + mint/burn/clawback breakdown) for every token — not just the
    handful with an ADR-0011 asset_supply_history snapshot — degrading
    gracefully (section omitted) when the endpoint 404s/503s.

  • Real-time per-token supply via decode-at-ingest (ADR-0034). Token supply
    is now a pure SQL sum over a new stellar.supply_flows table instead of a
    periodically-refreshed rollup. The blocker for real-time supply was that the
    amount lives in the event body as a raw i128 XDR scval that ClickHouse can't
    decode — so supply required a 16-min Go batch recompute (ch-supply), stale
    by up to the refresh interval. Now the indexer decodes the i128 amount at
    ingest
    (DecodeSupplyAmount) for every mint/burn/clawback event and writes
    a decoded row to supply_flows (ReplacingMergeTree, ORDER BY contract_id
    first for fast per-token reads; event-identity suffix → idempotent under the
    lake's drop→heal / re-backfill). The real-time dual-sink feeds it inline, so
    a token's supply (Σmint − Σburn − Σclawback, SupplyForContract) is always
    current with no refresh job and no read-time XDR decode. History is
    seeded once from the existing lake via scripts/ops/ch-supply-flows-seed.sh
    (windowed + resumable wrapper over ch-supply -seed-flows — a single-shot
    all-history seed exceeds the 1h CH read timeout and, lacking an ORDER BY,
    leaves scattered holes; windowing bounds each read); thereafter the dual-sink
    keeps it live. The decode logic is shared between ingest and the seed so both
    produce identical amounts.

  • ClickHouse Tier-1 raw lake (ADR-0034, migration in progress). New
    columnar storage tier for the OLAP-scale firehose (every ledger/tx/op/
    event), moving it off Postgres where billion-row bulk reprocessing was
    infeasible. Ships the Tier-1 schema (deploy/clickhouse/tier1_schema.sql),
    the internal/storage/clickhouse structural sink + LCM extractor (reuses
    the proven ingest/CensusLedger/sorobanevents.Capture walk; stores raw
    XDR, no SCVal decoding), and the ratesengine-ops ch-backfill command
    (-parallel N for concurrent range-walkers — the historic-backfill
    throughput unlock). The ratesengine-ops ch-gate command runs the §6 gates
    over a backfilled range: it census-walks galexie, asserts the extractor
    matches the decoder-independent census oracle, then reads the range back out
    of ClickHouse and asserts the stored + actual row counts both equal the
    census; it also reports compressed bytes/ledger + a full-history footprint
    projection. Gated: a 100k-ledger sample must pass throughput +
    completeness-vs-census before any full historic walk. See
    docs/architecture/clickhouse-migration-plan.md +
    docs/architecture/clickhouse-tier1-decoder.md +
    docs/architecture/clickhouse-phase4-decoder-adapter.md.

    • Fixed an extractor bug before any full walk: claimAtomCount decoded
      CreatePassiveSellOffer via the wrong OperationResultTr union arm
      (GetManageSellOfferResult, always ok=false for that op type) and
      silently undercounted classic_trade_effect_count vs the census on every
      crossing passive offer. Now uses GetCreatePassiveSellOfferResult,
      matching sdex.decode + dispatcher.census; covered by a new
      per-op-variant test.
  • ADR-0033 — completeness verification model. Three independently
    provable claims (substrate continuity, recognition, projection
    reconciliation) replace threshold-based coverage as the
    100%-confidence signal. See docs/adr/0033-completeness-verification-model.md.

  • ledger_ingest_log substrate-continuity record (ADR-0033 Phase 2).
    Migration 0051. One row per fully-processed ledger, written
    post-persist by the live indexer, carrying the LCM-derived census
    (soroban_event_count, classic_trade_effect_count — counted
    decoder-independently from the LedgerCloseMeta) plus the header
    hash-chain anchors. New ratesengine-ops census-backfill -from -to
    populates history. Storage queries FindLedgerIngestGaps (contiguity)
    and VerifyLedgerHashChain (cryptographic linkage) are Claim 1 of the
    completeness model — both run over the narrow record, never a trades
    scan. Once a ledger is recorded with its census, "zero events for
    contract C here" is a proven quiet period, which is what lets the
    confidence signal stop guessing sparsity thresholds.

  • Recognition check (ADR-0033 Phase 3 / Claim 2a). New
    ratesengine-ops verify-recognition -from -to pulls every distinct
    (contract_id, topic_0_sym) shape from soroban_events and runs each
    through the production decoder chain's real Matches() (no
    hand-maintained topic list to drift). Any shape no decoder handles —
    e.g. a topic a WASM upgrade added that we'd silently drop — is listed
    and the command exits non-zero (cron/CI-gateable). Backed by
    dispatcher.Recognize (side-effect-free), Store.DistinctSorobanTopicSamples,
    and internal/completeness.AuditRecognition.

  • Projection reconciliation (ADR-0033 Phase 4 / Claim 2b). New
    ratesengine-ops verify-reconciliation -from -to [-source S]
    re-derives, per ledger, how many trades rows the real decoder would
    emit from soroban_events (deterministic recomputation) and diffs
    that against the rows actually present — localizing any projector drop
    (or phantom row) to an exact ledger. Covers soroswap/aquarius/phoenix/
    comet (seeds soroswap pairs via RPC). Backed by
    completeness.ReDeriveOutputCounts / ReconcileCounts and
    Store.CountRowsByLedger. Correlation sources reconcile correctly
    because each logical record's events share one (ledger, tx, op).

  • SDEX / classic reconciliation (ADR-0033 Phase 5 / Claim 2b classic).
    verify-reconciliation now also covers SDEX, which predates Soroban
    and has no soroban_events: its expected count comes from the
    LCM-derived classic_trade_effect_count census in ledger_ingest_log
    (one ClaimAtom = one trade), gated on the substrate record being
    continuous over the range (else it tells you to run census-backfill
    first). The existing hubble-check (per-ledger SDEX-vs-Hubble counts

    • amount cross-check) remains the external defense-in-depth anchor.
  • Completeness watermark verdict (ADR-0033 Phase 6 / headline).
    ratesengine-ops compute-completeness derives the per-source
    completeness WATERMARK — the highest ledger where substrate continuity

    • hash chain (Claim 1) AND projection reconciliation (Claim 2b) both
      hold from genesis — plus a system recognition verdict (Claim 2a), and
      writes them to the new completeness_snapshots table (migration 0052).
      /v1/diagnostics/ingestion overlays completeness_pct /
      completeness_watermark / completeness_complete onto each source
      row, and the status page renders completeness_pct as the headline
      (falling back to gap-free coverage when not yet computed). Unlike
      density/gap_free this uses NO sparsity threshold — a single proven gap
      pins it — so it is the honest 100%-confidence signal. MinGapSizeOverride
      is now documented as alerting-cadence only, off the confidence path.
  • Projection reconciliation extended to all per-ledger sources +
    multi-output fix (ADR-0033 future work).
    verify-reconciliation and
    compute-completeness now drive off a shared catalogue covering every
    source that writes a per-ledger table — trades (soroswap/aquarius/
    phoenix/comet), oracles (reflector ×3 / redstone), cctp/rozo/defindex,
    and blend's four tables — plus sdex via the LCM census. The re-derive
    now buckets outputs by EventKind() (ReDeriveOutputCountsByKind +
    SumKinds) and reconciles each table against only the kinds that
    route to it
    — fixing a latent overcount where multi-output sources
    (soroswap/phoenix/comet also emit skim/liquidity/stake events to other
    tables) were compared whole against trades alone. Recognition gaps
    are now attributed per-source for contract-pinned sources (oracles),
    with a system recognition snapshot for gaps on unowned contracts.
    (sep41/band/soroswap-router remain out of scope — documented in the
    catalogue.) Also chunk-prunes those queries via SorobanEventsTimeBound.

  • Incremental completeness verify + hourly timer (ADR-0033 standing guard).
    compute-completeness gains -from <ledger>: verify only [from, tip],
    trusting [genesis, from] as previously verified (substrate hash-chain,
    recognition shape scan, and projection reconcile all scoped to the window);
    the watermark still extends to tip when the window is clean. scripts/ops/ completeness-incremental.sh computes from = min(watermark) from the prior
    snapshots, so each run re-checks only new ledgers — minutes, not the hours a
    full genesis→tip sweep takes. It is READ-ONLY on served data (recomputes
    completeness_snapshots only) and exits non-zero with the failing source +
    range if a source regresses; repair (ch-rebuild over the range) stays a
    deliberate action. Wired as ratesengine-completeness.{service,timer} (hourly,
    niced). This is the runtime data-driven guard that keeps "verified 100%" true
    as the tip advances; it complements the PR-time lint-pk-discriminators.

  • lint-pk-discriminators CI guard. A new scripts/ci lint that parses
    per-source table PKs and fails the build if a table that can receive multiple
    same-key events per operation lacks a per-event discriminator (the coarse-PK
    data-loss class) — wired into verify.sh + ci.yml. Guards against
    reintroducing the silent-drop bug fixed below for trades/blend/defindex.

Changed

  • Sources panel shows "Entries 24h" instead of "Trades 24h". The
    old column came from a GROUP BY source scan over the trades
    hypertable whose error was swallowed — so any timeout under load
    silently rendered every source 0, and it was structurally 0 for the
    many registered sources that don't write trades (oracles, bridges,
    FX). It's replaced by a universal per-source trailing-24h event count
    sourced from increase(ratesengine_source_events_total[24h]) (the
    same counter that backs active_sources) via a new
    StatusBackend.SourceEntries24h — cheap, reliable, and non-zero for
    every active source whether on-chain or external. New entries_24h
    field on /v1/diagnostics/ingestion sources[]; the silent-VWAP
    highlight now keys off it too.

  • Status-page on-chain coverage is now honest about what it's
    measuring (ADR-0033).
    A source's coverage figure is only shown as a
    trustworthy bar once its completeness watermark (completeness_pct)
    has been computed — the substrate+projection-verified signal. Until
    then the page falls back to gap_free_pct, a liveness proxy ("no
    large interior gap detected") that reads ~100% for sources that are
    merely sparse or only partially indexed (e.g. phoenix-liquidity at
    18 of 11.3M ledgers). Those unverified figures are now rendered muted
    and tagged "unverified · N% gap-free" with an explanatory tooltip,
    instead of a green ~100% bar that overstated completeness. Because we
    cannot distinguish "sparse-but-complete" from "incomplete" without the
    watermark, we never dress an unverified figure up as verified coverage.

Fixed

  • Real-time projector CH feed-switch no longer risks silent loss
    (ADR-0034 #10).
    The dual-sink (clickhouse.LiveSink) is best-effort:
    it drops whole ledgers under buffer pressure and a flush can partially
    fail, so the CH lake can have holes near the tip — and the prior
    ch-live-catchup only extended [CH_max+1, tip], which can never re-fill
    a hole the sink already wrote past (verified: 48 orphaned ledgers,
    [62939016,62939063]). Reading the projector forward from CH with the raw
    ledgerstream tip as its bound would skip such holes and lose their protocol
    events (the cursor advances unconditionally). Three changes make the
    feed-switch safe by construction: (1) Sink.Flush now writes
    stellar.ledgers last, making a ledgers row a per-ledger commit marker
    (present ⟹ all of that ledger's tables are already durable); (2) the
    projector clamps its CH-mode upper bound to ContiguousWatermark — the
    highest ledger with no hole below it — so an unhealed drop stalls the
    source at the hole instead of skipping it; (3) ch-live-catchup.sh
    gap-scans stellar.ledgers and back-fills holes below CH_max, not just
    the tip. Net: the lake self-heals and the projector never reads ahead of
    provably-complete CH.

    • Also: the no-contract-prefilter DEX/lending projector sources
      (soroswap/aquarius/phoenix/comet/blend/cctp/rozo/defindex) now exclude the
      CAP-67 classic-token firehose (transfer/mint/burn/clawback/
      approve/set_authorized — ~99.8% of all events under V4 meta) at the SQL
      layer on both read paths. A caught-up source reads a tiny window so it never
      mattered, but a far-behind source's 10k-ledger catch-up window was streaming
      ~5M firehose rows it only discarded via Decoder.Matches, blowing the 60s
      cycle budget and wedging the source (aquarius was stuck ~92k ledgers behind,
      deadlock-storming the trades table). Exclude-only and audited lossless —
      every one of the eight decoders was checked against the six symbols;
      set_admin is deliberately retained because blend dispatches on it.
  • trades no longer silently drops multi-trade-per-op trades
    (aquarius, comet).
    The ADR-0033 projection reconciliation found
    aquarius emitting 5 trade events in one operation (a multi-pool swap)
    but only 2 rows landing — the decoders keyed the row on the raw
    op_index, so every trade after the first in an op collided on the
    trades PK (source, ledger, tx_hash, op_index, ts) and was dropped
    by ON CONFLICT. They now fan out via canonical.FanoutOpIndex(op, event_index) (op in the high 16 bits, the Phase-1 event_index in the
    low 16), matching the stride pattern SDEX already used. Forward fix;
    historical collided ops need re-backfill (delete-then-replay) to
    recover. All four event-based trade sources are now fanned out:
    aquarius/comet by the event's own index, soroswap by the swap
    event's index (RawPair.Swap), phoenix by the swap's first-field
    event index (RawSwap.EventIndex). Phoenix's 8-field buffer
    emits-and-clears on completion, so router multi-hop segments into
    separate swaps correctly — it was the same op_index collision, not a
    merge (the old "multihops split on op_index naturally" assumption was
    wrong).

  • soroban_events no longer silently drops events from multi-event
    operations.
    event_index was hardcoded to 0 at capture, so every
    contract event in one operation collided on the
    (ledger_close_time, ledger, tx_hash, op_index, event_index) PK and
    the writer's ON CONFLICT DO NOTHING kept only the first — Phoenix
    (8 events per swap in one op) was archiving 1 of 8. A real
    event_index is now threaded from the dispatcher's per-op event walk
    through events.Event into Capture/Reconstruct, and
    StreamSorobanEvents orders by it for deterministic replay. This is
    the precondition for using soroban_events as a completeness oracle
    (ADR-0033 Phase 1). Note: rows captured before this fix are missing
    the collided events; affected ranges need re-backfilling — the
    ADR-0033 reconciliation will surface exactly which.

  • /v1/markets no longer returns 500 on unparseable trades rows.
    A single stray row with base_asset='test' 500ed every markets
    request on 2026-06-01, tripping page-tier api_error_rate_critical

    • slo_availability_burn_fast until the row was hand-deleted.
      The scanner now skips rows whose base/quote fail
      canonical.ParseAsset, logs a WARN, and bumps the new
      ratesengine_markets_skipped_rows_total counter so operators
      can find and remove the offending row without serving 500s to
      every consumer.
  • SDEX census counts real trades, not both-zero no-op crosses. The
    projection census (claimAtomCount) counted EVERY claim atom — including the
    both-zero no-op crosses stellar-core emits when an offer is touched in matching
    but both legs round to 0 (dust offers / integer-rounding artifacts; ~1–2% of
    SDEX claims). The decoder correctly drops those (one-side-zero KEPT), so the
    census over-counted vs COUNT(trades) — violating its own invariant and
    showing a spurious SDEX projection Δ. realTradeCount now mirrors the decoder
    exactly (skip both-zero), in both mirrored copies (dispatcher/census.go +
    clickhouse/extract.go). Going forward the live census equals the served trade
    count; the historical retention window re-records once to match.

  • SDEX projection reconcile floors at the actual retained boundary. trades
    is drop_chunks-managed, and retentionStart = tip-1.5M is ~100d at the
    current ledger rate — ~10d / 150k ledgers below the oldest retained chunk. The
    reconcile compared census>0 vs served=0 over that strip, manufacturing a
    100%/20% "gap" in the lowest windows for rows retention deliberately dropped.
    New store.MinLedger + retentionFloor scope the reconcile to where served
    data actually begins; full-history coverage rests on the substrate (ADR-0033).

  • blend_positions / blend_emissions / blend_admin / defindex_flows no
    longer silently drop multi-event-per-op rows.
    Same coarse-PK class as the
    trades fanout above, on the per-source entity tables: their PKs lacked a
    per-event discriminator, so a second same-kind event in one operation collided
    on ON CONFLICT and was dropped. Migrations 0053–0055 add event_index (and,
    for blend_positions, (asset, user_address)) to the PKs; the decoders +
    sinks thread the in-tx event_index through. Forward fix; collided historical
    rows recover via re-derive from the lake.