Releases: StellarIndex/stellar-index
Rates Engine v0.5.0-rc.108
[v0.5.0-rc.108] — 2026-06-10
Tested against Stellar Protocol 23 (Whisk).
Operator notes:
- The census + retention-scope completeness fixes take effect once the indexer
(and theratesengine-opsbinary) are deployed viadeploy.yml. Until then
the live census still records both-zero no-ops. - The
tradestable must have NO retention policy (migration 0031 — keep raw
forever). Iftimescaledb_information.jobsshows apolicy_retentionon
trades, it's drift —remove_retention_policy('trades').
Added
-
GET /v1/assets/{asset_id}/supply+ explorer supply panel (ADR-0034).
Exposes the live decode-at-ingest supply:Σmint − Σburn − Σclawbackfrom the
supply_flowslake, current to the latest ledger with no rollup refresh.
Resolves a Soroban contract id (C…) directly, a classic asset via the
operator's SAC wrappers (404 if unmapped), andnative/XLMfrom the ledger
headertotal_coins(source=ledger_total_coins). Amounts are decimal
strings (ADR-0003). The API server gains a pooledclickhouse.SupplyReader
(nil when ClickHouse isn't configured → endpoint 503s; non-fatal at boot).
The explorer's Supply tab now leads with a live "On-chain supply" section
(total + mint/burn/clawback breakdown) for every token — not just the
handful with an ADR-0011asset_supply_historysnapshot — degrading
gracefully (section omitted) when the endpoint 404s/503s. -
Real-time per-token supply via decode-at-ingest (ADR-0034). Token supply
is now a pure SQL sum over a newstellar.supply_flowstable instead of a
periodically-refreshed rollup. The blocker for real-time supply was that the
amount lives in the event body as a raw i128 XDR scval that ClickHouse can't
decode — so supply required a 16-min Go batch recompute (ch-supply), stale
by up to the refresh interval. Now the indexer decodes the i128 amount at
ingest (DecodeSupplyAmount) for every mint/burn/clawback event and writes
a decoded row tosupply_flows(ReplacingMergeTree, ORDER BYcontract_id
first for fast per-token reads; event-identity suffix → idempotent under the
lake's drop→heal / re-backfill). The real-time dual-sink feeds it inline, so
a token's supply (Σmint − Σburn − Σclawback,SupplyForContract) is always
current with no refresh job and no read-time XDR decode. History is
seeded once from the existing lake viascripts/ops/ch-supply-flows-seed.sh
(windowed + resumable wrapper overch-supply -seed-flows— a single-shot
all-history seed exceeds the 1h CH read timeout and, lacking an ORDER BY,
leaves scattered holes; windowing bounds each read); thereafter the dual-sink
keeps it live. The decode logic is shared between ingest and the seed so both
produce identical amounts. -
ClickHouse Tier-1 raw lake (ADR-0034, migration in progress). New
columnar storage tier for the OLAP-scale firehose (every ledger/tx/op/
event), moving it off Postgres where billion-row bulk reprocessing was
infeasible. Ships the Tier-1 schema (deploy/clickhouse/tier1_schema.sql),
theinternal/storage/clickhousestructural sink + LCM extractor (reuses
the proveningest/CensusLedger/sorobanevents.Capturewalk; stores raw
XDR, no SCVal decoding), and theratesengine-ops ch-backfillcommand
(-parallel Nfor concurrent range-walkers — the historic-backfill
throughput unlock). Theratesengine-ops ch-gatecommand runs the §6 gates
over a backfilled range: it census-walks galexie, asserts the extractor
matches the decoder-independent census oracle, then reads the range back out
of ClickHouse and asserts the stored + actual row counts both equal the
census; it also reports compressed bytes/ledger + a full-history footprint
projection. Gated: a 100k-ledger sample must pass throughput +
completeness-vs-census before any full historic walk. See
docs/architecture/clickhouse-migration-plan.md+
docs/architecture/clickhouse-tier1-decoder.md+
docs/architecture/clickhouse-phase4-decoder-adapter.md.- Fixed an extractor bug before any full walk:
claimAtomCountdecoded
CreatePassiveSellOffervia the wrongOperationResultTrunion arm
(GetManageSellOfferResult, alwaysok=falsefor that op type) and
silently undercountedclassic_trade_effect_countvs the census on every
crossing passive offer. Now usesGetCreatePassiveSellOfferResult,
matchingsdex.decode+dispatcher.census; covered by a new
per-op-variant test.
- Fixed an extractor bug before any full walk:
-
ADR-0033 — completeness verification model. Three independently
provable claims (substrate continuity, recognition, projection
reconciliation) replace threshold-based coverage as the
100%-confidence signal. Seedocs/adr/0033-completeness-verification-model.md. -
ledger_ingest_logsubstrate-continuity record (ADR-0033 Phase 2).
Migration 0051. One row per fully-processed ledger, written
post-persist by the live indexer, carrying the LCM-derived census
(soroban_event_count,classic_trade_effect_count— counted
decoder-independently from the LedgerCloseMeta) plus the header
hash-chain anchors. Newratesengine-ops census-backfill -from -to
populates history. Storage queriesFindLedgerIngestGaps(contiguity)
andVerifyLedgerHashChain(cryptographic linkage) are Claim 1 of the
completeness model — both run over the narrow record, never a trades
scan. Once a ledger is recorded with its census, "zero events for
contract C here" is a proven quiet period, which is what lets the
confidence signal stop guessing sparsity thresholds. -
Recognition check (ADR-0033 Phase 3 / Claim 2a). New
ratesengine-ops verify-recognition -from -topulls every distinct
(contract_id, topic_0_sym)shape fromsoroban_eventsand runs each
through the production decoder chain's realMatches()(no
hand-maintained topic list to drift). Any shape no decoder handles —
e.g. a topic a WASM upgrade added that we'd silently drop — is listed
and the command exits non-zero (cron/CI-gateable). Backed by
dispatcher.Recognize(side-effect-free),Store.DistinctSorobanTopicSamples,
andinternal/completeness.AuditRecognition. -
Projection reconciliation (ADR-0033 Phase 4 / Claim 2b). New
ratesengine-ops verify-reconciliation -from -to [-source S]
re-derives, per ledger, how manytradesrows the real decoder would
emit fromsoroban_events(deterministic recomputation) and diffs
that against the rows actually present — localizing any projector drop
(or phantom row) to an exact ledger. Covers soroswap/aquarius/phoenix/
comet (seeds soroswap pairs via RPC). Backed by
completeness.ReDeriveOutputCounts/ReconcileCountsand
Store.CountRowsByLedger. Correlation sources reconcile correctly
because each logical record's events share one (ledger, tx, op). -
SDEX / classic reconciliation (ADR-0033 Phase 5 / Claim 2b classic).
verify-reconciliationnow also covers SDEX, which predates Soroban
and has nosoroban_events: its expected count comes from the
LCM-derivedclassic_trade_effect_countcensus inledger_ingest_log
(one ClaimAtom = one trade), gated on the substrate record being
continuous over the range (else it tells you to runcensus-backfill
first). The existinghubble-check(per-ledger SDEX-vs-Hubble counts- amount cross-check) remains the external defense-in-depth anchor.
-
Completeness watermark verdict (ADR-0033 Phase 6 / headline).
ratesengine-ops compute-completenessderives the per-source
completeness WATERMARK — the highest ledger where substrate continuity- hash chain (Claim 1) AND projection reconciliation (Claim 2b) both
hold from genesis — plus a system recognition verdict (Claim 2a), and
writes them to the newcompleteness_snapshotstable (migration 0052).
/v1/diagnostics/ingestionoverlayscompleteness_pct/
completeness_watermark/completeness_completeonto each source
row, and the status page renderscompleteness_pctas the headline
(falling back to gap-free coverage when not yet computed). Unlike
density/gap_free this uses NO sparsity threshold — a single proven gap
pins it — so it is the honest 100%-confidence signal.MinGapSizeOverride
is now documented as alerting-cadence only, off the confidence path.
- hash chain (Claim 1) AND projection reconciliation (Claim 2b) both
-
Projection reconciliation extended to all per-ledger sources +
multi-output fix (ADR-0033 future work).verify-reconciliationand
compute-completenessnow drive off a shared catalogue covering every
source that writes a per-ledger table — trades (soroswap/aquarius/
phoenix/comet), oracles (reflector ×3 / redstone), cctp/rozo/defindex,
and blend's four tables — plus sdex via the LCM census. The re-derive
now buckets outputs byEventKind()(ReDeriveOutputCountsByKind+
SumKinds) and reconciles each table against only the kinds that
route to it — fixing a latent overcount where multi-output sources
(soroswap/phoenix/comet also emit skim/liquidity/stake events to other
tables) were compared whole againsttradesalone. Recognition gaps
are now attributed per-source for contract-pinned sources (oracles),
with a systemrecognitionsnapshot for gaps on unowned contracts.
(sep41/band/soroswap-router remain out of scope — documented in the
catalogue.) Also chunk-prunes those queries viaSorobanEventsTimeBound. -
Incremental completeness verify + hourly timer (ADR-0033 standing guard).
compute-completenessgains-from <ledger>: verify only[from, tip],
trusting[genesis, from]as previously verified (substrate hash-chain,
recognition shape scan, and projection reconcile all scoped to the window);
the watermark still extends to tip when the window is clean.scripts/ops/ completeness-incremental.shcomputesfrom = min(watermark)from the prior
snapshots, so each run re-checks only new ledgers — minutes, not the hours a
full genesis→t...
Rates Engine v0.5.0-rc.107
[v0.5.0-rc.107] — 2026-06-01
Tested against Stellar Protocol 23 (Whisk).
Pre-deploy operator note: aggregator restart picks up new oracle gap-detector targets. Coverage_pct values populate after the first 30-min cycle.
Fixed
- Oracle sources (band, redstone, reflector-dex/cex/fx) now have
gap-detector targets sliced from the unifiedoracle_updates
hypertable. Pre-rc.107 these sources showed n/a on the
backfill_coveragelisting because no per-source target existed.
Same shape as the rc.104 Soroban-DEX trade targets: shared
hypertable + per-sourceWhereFilter. Result: customer-facing
coverage_pct now populates for ALL Soroban sources with a
per-source hypertable. defindex + soroswap-router remain n/a
because they're log-only sinks (no per-ledger hypertable rows
to scan).
Rates Engine v0.5.0-rc.106
[v0.5.0-rc.106] — 2026-06-01
Tested against Stellar Protocol 23 (Whisk).
Pre-deploy operator note: api restart picks up the coverage_pct
semantic fix. No migrations.
Fixed
coverage_pctnow reflects gap-free-ness, not event-density.
ADR-0031 Phase 2 deprecated the legacy cursor-derived
coverage_pctand the status page fell back to rendering
density_pct.density_pct = distinct_ledgers / expected_ledgers
over[genesis, tip]— for sparse sources (Soroban oracles
pushing once per hour, low-volume DEXes), density is naturally
<1% and the UI was reading that as "1% covered". User feedback
on r1 2026-06-01: that's a misleading metric.
Fix:coverage_pct = gap_free_pct = 1 - max_gap_ledgers / expected_ledgers. 1.0 means the indexer hasn't skipped any
ledger in this source's window — what "coverage" intuitively
means. Sparse sources hit 100% as long as ingest is healthy.
Rates Engine v0.5.0-rc.105
[v0.5.0-rc.105] — 2026-06-01
Tested against Stellar Protocol 23 (Whisk).
Pre-deploy operator note: indexer restart picks up the poller skip-doesn't-mean-stale fix.
Fixed
ratesengine_external_poller_stalefalsely firing on
chainlink. Live-r1 incident 2026-06-01: chainlink poller
reports ~36 min stale shortly after every indexer restart,
even though it's polling correctly every 30s. Root cause:
the runner's "skipped" branch (when the poller returns
nil, nil, nil— by convention meaning "polled successfully
but no new feed data") did NOT update
ratesengine_external_poller_last_success_unix. Chainlink's
Ethereum feeds update at most every 1 hour, so the vast
majority of its 30-second polls naturally take the skip path.
The alert read this as "the poller hasn't successfully
reached upstream in 30+ min" — wrong: the poller IS
reaching upstream, just finding nothing new.
Fix: bump LastSuccessUnix on the skipped path too — the
outcome="skipped"counter still distinguishes skip from
success, but the timestamp tracks "last time we polled at all"
not "last time we got an event."
Rates Engine v0.5.0-rc.104
[v0.5.0-rc.104] — 2026-06-01
Tested against Stellar Protocol 23 (Whisk).
Pre-deploy operator note: aggregator restart on r1 picks up the new per-source gap-detector targets. Migration 0048 (source_coverage_snapshots) was hand-applied during the incident and needs to be marked at version 48 in schema_migrations (already done on r1).
Fixed
- Coverage snapshot rows for Soroban-DEX sources.
Post-ADR-0031 Phase 2 removed the cursor-derived density and
routed/v1/diagnostics/ingestion's coverage listing through
source_coverage_snapshots. The gap detector targets covered
SDEX (viasource = 'sdex'WhereFilter on trades) but not the
Soroban-DEX sources (aquarius, soroswap, phoenix, comet) that
also land in the unifiedtradeshypertable. Result on r1
2026-06-01: API reported 0% coverage for all four. Added the
matching per-source targets with appropriate genesis ledgers
and 100K-ledger sparsity overrides — matches the SDEX shape.
Rates Engine v0.5.0-rc.103
[v0.5.0-rc.103] — 2026-06-01
Tested against Stellar Protocol 23 (Whisk).
Pre-deploy operator note: indexer restart picks up 8 workers from 4. No migrations.
Fixed
- PersistWorkers bumped 4 → 8. rc.102 with 4 workers gave
~5 ledgers/min on r1 vs the ~10 ledgers/min network rate;
doubling the concurrent drain lifts processing throughput above
the network rate so the live cursor catches up and stays close
to the SLA-freshness threshold.
Rates Engine v0.5.0-rc.102
[v0.5.0-rc.102] — 2026-06-01
Tested against Stellar Protocol 23 (Whisk).
Pre-deploy operator note: indexer restart picks up the
4-worker parallel drain. No migrations.
Fixed
- PersistEvents parallel drain (4 workers). Live-r1 incident
2026-06-01: even after rc.101's batch-INSERT fix, the indexer
cursor advanced at ~1 ledger/min vs ~10/min network rate.
Root cause: the single-goroutine drain meant only one PG
roundtrip in flight at a time; the indexer's ProcessLedger
goroutine was blocked onevents <- evwaiting for that one
worker to drain. With 4 worker goroutines sharing the same
channel (Go's channel semantics handle concurrent receive
safely), the events channel drains 4× faster; the existing
PG pool of 25 conns carries the concurrent INSERTs. Each worker
maintains its own 200-row trade batch + 200ms flush ticker.
Per-event ordering within a source is not preserved across
workers; the trades hypertable's PK(source, ledger, tx_hash, op_index, ts)makes that irrelevant for correctness.
Rates Engine v0.5.0-rc.101
[v0.5.0-rc.101] — 2026-06-01
Tested against Stellar Protocol 23 (Whisk).
Pre-deploy operator note: indexer restart picks up the batched-INSERT trade path. No migrations.
Fixed
-
Trade-insert throughput lifted ~40× via batch INSERT.
Live-r1 incident 2026-06-01: per-INSERT roundtrip cost capped
sustained trade throughput at ~5 trades/sec on the live indexer,
despite PostgreSQL handling 9000+ single-row INSERTs/sec in a raw
loop (verified). The bottleneck was the serial drain loop in
pipeline.PersistEvents: one event dequeue → one HandleEvent →
one InsertTrade roundtrip, no overlap. With ~300 events per
mainnet ledger, the cap meant ~1.8 ledgers/min processed vs the
~10 ledgers/min network rate, accumulating multi-hour lag.New
Store.BatchInsertTradeswrites N rows in one statement
(INSERT … VALUES (…), (…), … ON CONFLICT DO NOTHING); same
idempotency, same per-sourcesource_entry_countsUPSERT semantic,
sameTradeInsertOutcomeTotalmetrics.PersistEventsnow
buffers trade events up to 200 rows OR 200 ms (whichever first),
flushes via the batch path, falls back per-row on a batch DB
error. Non-trade events (oracle updates, supply observations,
log-only events) stay on the single-row HandleEvent path.
Rates Engine v0.5.0-rc.100
[v0.5.0-rc.100] — 2026-06-01
Tested against Stellar Protocol 23 (Whisk).
Pre-deploy operator note: aggregator restart picks up the cadence-aware gap-detector. No migrations. After deploy, confirm pg_stat_activity shows no concurrent DISTINCT ledger FROM trades scans accumulating across cycles.
Fixed
- Gap-detector no longer pile-drives postgres on huge tables.
Live r1 incident 2026-05-29: three concurrentSELECT DISTINCT ledger FROM trades WHERE source='sdex'scans accumulated over
successive gap-detector cycles because the Go-side ctx timeout
didn't propagate to PostgreSQL — the queries kept running and
starved trade-insert latency, lighting theslo_latency_burn
page. Two complementary fixes:- Per-target ScanCadence override. New
GapDetectorTarget.ScanCadencelets huge-table targets opt
into a longer scan cadence than the global 30-min interval.
SDEX trades and soroban_events now scan every 6 hours; light
targets keep the 30-min cadence for fast signal. - SQL
SET LOCAL statement_timeoutbackstop.
CountDistinctLedgersandFindPerSourceLedgerGapsnow wrap
their query in a transaction with a 5-min PG-side timeout.
If Go-side cancellation fails (the F-0020-cousin failure mode
we just observed), PostgreSQL itself aborts the query —
in-flight scans can no longer leak across cycles.
- Per-target ScanCadence override. New
Rates Engine v0.5.0-rc.99
[v0.5.0-rc.99] — 2026-05-29
Tested against Stellar Protocol 23 (Whisk).
Pre-deploy operator note: api + ops binary restart. Indexer + aggregator unchanged (projector is opt-in and defaults to off). After api restart, run ratesengine-ops sep1-refresh -older-than 0 once to repopulate every issuer's sep1_payload JSONB column with the new Currencies shape; until then the per-asset overlay fields are empty.
Changed
/v1/assets/{id}SEP-1 overlay reads from DB instead of live
HTTPS. Pre-rc.99 the asset-detail handler called
metadata.Cache.Resolve(home_domain)on every uncached request,
which dominated p95 (~4s long tail on cold issuers — drove the
slo_latency_burn_mediumpage 2026-05-29 11:30). The handler now
reads theissuers.sep1_payloadJSONB column populated by the
ratesengine-ops sep1-refreshcron, which is what /v1/issuers
already did. Thesep1-refreshcron is extended to persist
Currencies (per-asset metadata) so the overlay's Name /
Description / Image / AnchorAsset fields stay populated on the
next cron run.- ADR-0029, ADR-0031, ADR-0032 promoted to Accepted. Phase 6
of the projection-architecture rollout completes the
documentation contract — three ADRs now describe the single
writer per data domain (projector for Soroban-derived, direct
for trades), the single data-derived coverage signal, and the
rawsoroban_eventslanding zone they share. CLAUDE.md gains
Invariant 7 ("One writer per data domain") summarising the
contract for future agents.
Added
- ADR-0032 Phase 5 —
projector-replayoperator subcommand.
Single SQL cursor-rewind:
ratesengine-ops projector-replay -source <name> -from <ledger>.
The projector goroutine catches up on its next cycle (≤ 5 s)
and re-projects forward to the live tip. Replaces the family of
*-backfillsubcommands deleted in this release. New
projector-replay
runbook captures the new operator flow.
Removed
- ADR-0032 Phase 5 — dead-code deletion. Removed eight
redundantratesengine-opssubcommands (~1500 LoC):
cctp-backfill,rozo-backfill,soroswap-skim-backfill,
comet-liquidity-backfill,phoenix-backfill,blend-backfill,
sep41-transfers-backfill,drain-cascade-window. All replaced
byprojector-replay+ the projector goroutine. Also removed
thecascade-window-drainrunbook (superseded by
projector-replay). Runbook + alert references updated.
Changed
- ADR-0032 Phase 4 — projector becomes sole writer for Soroban-
derived events. New[ingestion.projector] persist_per_source
knob (defaulttrue= Phase 3 parallel mode); flipping to
falseswitches the dispatcher's events-goroutine to
pipeline.SinkModeSkipProjectedso it stops writing the
Soroban-derived event subset. The projector becomes single
writer-of-record fortrades,blend_*,phoenix_*,
comet_*,soroswap_skim,cctp_events,rozo_events,
sep41_*,oracle_updates(reflector + redstone). Non-projected
events (sdex, external CEX/FX,band, supply-observer
LedgerEntry observations) continue through the events-goroutine
unchanged. Newpipeline.IsProjectedEventis the dispatch
contract — table-driven test pins it.
Added
- ADR-0032 Phase 3 — projector scaffold in parallel mode. New
internal/projectorcomponent tailssoroban_events(the
ADR-0029 raw-event landing zone) and invokes each protocol's
existing Go decoder, then routes decodedconsumer.Events
throughpipeline.HandleEvent(newly exported) to the same
per-source persisters the dispatcher uses. Phase 3 runs in
parallel with the dispatcher's existing per-source sinks — both
writers race for the same per-source PKs and ON CONFLICT DO
NOTHING absorbs duplicates, so projector lag versus the live
tip can be measured before Phase 4 flips the writer primary.
New[ingestion.projector] enabledconfig knob defaults to off;
cmd/ratesengine-indexer/main.gowires + drains the goroutine
on shutdown. - Projector observability. Four new metrics
(ratesengine_projector_lag_ledgers,_runs_total,
_events_decoded_total,_cycle_duration_seconds) plus a
paired alert (ratesengine_projector_lag_high+
ratesengine_projector_error_rate_high, both P3) and the
projector-lagrunbook.