Release Rates Engine v0.5.0-rc.51 · StellarIndex/stellar-index

[v0.5.0-rc.51] — 2026-05-14

Changed

verify-archive Tier A is now incremental. Pre-fix the nightly
systemd unit re-walked the entire chain from genesis every night,
taking ~13.8h of wall time and ~7h of CPU time per pass (67% of
every day; visible as a sustained load-average drag on r1). Past
LCM files are immutable, so re-hashing them is wasted compute.

New scheme: ratesengine-ops verify-archive accepts
-state-file PATH -from-last-verified [-safety-overlap N].
Reads the prior run's high-water mark from a small JSON file,
computes -from = max(2, last_verified - safety_overlap), and
verifies only the new tail. The resume-from-hash from prior
state is plumbed through so cross-run chain continuity is
preserved (the next incremental run's first chunk must chain to
the previous run's last verified hash).

Default safety overlap: 5000 ledgers (~17h of chain) catches any
anomalies that snuck in just before the last run's tip.
systemd unit defaults updated:
VERIFY_ARCHIVE_STATE_FILE=/var/lib/ratesengine/verify-archive-state.json,
VERIFY_ARCHIVE_MAX_RUNTIME=4h (down from 16h). Typical
incremental pass covers ~24h of new ledgers in minutes.

A weekly full-archive re-pass (defense-in-depth against silent
corruption in older chunks) remains a TBD sibling unit.
backfill_coverage[].density_pct replaces coverage_pct on
/v1/diagnostics/ingestion. Pre-fix the metric was (latest - earliest) / (tip - genesis) — endpoint span, not data density. A
source with one trade at genesis and one trade at tip scored
100% even with the whole interior empty (caught live 2026-05-14
when SDEX backfill was still running but coverage showed 99.8%
and aquarius/comet/phoenix/soroswap all showed 99.99%).

New metric: union of completed portions of all backfill cursor
intervals that include this source in their decoder set, clamped
to [genesis, tip], divided by tip - genesis + 1. Hits 100%
only when backfill ranges actually cover the whole interval.
Sparse sources no longer score 100% just for having endpoint
trades — they score by what fraction of ledgers their backfill
has processed, which is the question operators actually want
answered.

Wire: coverage_pct retained as a transitional field for one
release. New fields: density_pct, covered_ledgers,
expected_ledgers. Status page updated to render the new
density (tooltip exposes the absolute "covered / expected"
numerator + denominator).

Added

Chainlink ingest source (internal/sources/external/chainlink/).
Promotes the formerly-divergence-only Chainlink reference into a
full ingest source — writes canonical.OracleUpdate rows to
oracle_updates on its own poller goroutine alongside Reflector /
Redstone / Band. Implements external.Poller; lives parallel to
the existing internal/divergence/chainlink.go cross-check (which
stays in place for synchronous divergence_warning checks).

Wire shape: poll AggregatorV3.latestRoundData() over JSON-RPC,
dedupe by (feed_address, roundId), project to canonical with
synthetic deterministic tx_hash (sha256(feed || roundId)) for
idempotent restart. Default 30s cadence; per-feed Decimals/Invert
overrides via TOML. Default endpoint is Cloudflare public; operator
drops an Alchemy URL (with embedded API key) into r1's TOML or via
CHAINLINK_RPC_URL env. Bounded concurrency (8) per tick.

Backfill: new ratesengine-ops backfill-chainlink subcommand walks
AnswerUpdated event logs via chunked eth_getLogs (5k blocks /
call, the safe default for Alchemy / Infura / QuickNode response-
size caps). ~33k RPC calls and ~7h wall time for all-time backfill
of the default 6 majors on Alchemy free tier (~19% of monthly
quota); scale linearly with feed count up to all 516 ETH-mainnet
Chainlink feeds within the same free-tier envelope. Idempotent on
the oracle_updates PK; safe to re-run over already-covered ranges.

Surface: registered in external.Registry as
ClassOracle / BackfillSafe=true / IncludeInVWAP=false. Picked up
by /v1/sources?class=oracle automatically — explorer's /oracles
page surfaces it without UI changes.
Oracle CAGG ladder (migration 0034). Seven continuous
aggregates on oracle_updates at the standard
1m/15m/1h/4h/1d/1w/1mo tiers — sister to the trade CAGG ladder
in migration 0002. Closes the gap where every /v1/oracle/*
history query was scanning raw oracle_updates; manageable at
~3 oracle sources × ~860 rows/day each, untenable once Chainlink
arrives at scale.

Aggregation semantics differ from trades: oracles are point-in-
time observations, so each bucket carries first / last / min / max / last_decimals / count (no VWAP / TWAP because there is no
volume dimension). One row per (source, asset, quote, bucket) —
per-source identity preserved so cross-oracle comparison stays
meaningful. Refresh policies match the trade ladder; no retention
on sub-1h tiers (matches the operator's "store everything forever"
decision in migration 0031), indefinite for 1h+ per the proposal.
DeFindex vault decoder (internal/sources/defindex/).
Event-based decoder (dispatcher.Decoder, NOT
ContractCallDecoder) for paltalabs/defindex's autocompound
vaults. Phase A matches ("DeFindexVault","deposit") and
("DeFindexVault","withdraw") events on the 3 known vaults
(USDC / EURC / XLM autocompound). Decoder pulls
depositor / withdrawer, multi-asset amounts vec
(i128, no truncation per ADR-0003), and the share-token
delta (df_tokens_minted / _burned) by name from the
body Map (decode-by-name per
contract-schema-evolution.md). Phase B will tag matching
same-tx Blend / Soroswap legs as
routed_via=defindex-{vault} and write
aggregator_exposures rows from a separate periodic
ticker. Pre-seed migration 0033_seed_defindex_vaults.up.sql
populates the 3 vaults in the routers registry as
kind='aggregator-vault'. WASM-history audit started at
docs/operations/wasm-audits/defindex.md;
BackfillSafe=false until the per-hash review lands.
Soroswap Router decoder (internal/sources/soroswap_router/).
New ContractCallDecoder following the Band oracle pattern —
matches by (contract_id, function_name) and decodes
swap_exact_tokens_for_tokens / swap_tokens_for_exact_tokens
invocations on the canonical pubnet router
(CAG5LRYQ5JVEUI5TEID72EYOVX44TTUJT5BQR2J6J77FH65PCCFAJDDH).
Phase A is log-only — every routed swap surfaces an INFO line
with path, in/out amounts (i128, no truncation per ADR-0003),
recipient, and deadline. Phase B will tag matching same-tx
trades.routed_via rows via the existing migration-0025 column.
New ClassRouter taxonomy in internal/sources/external/
(alongside the existing ClassLending); router class is
attribution-only, never contributes to VWAP. Pre-seed migration
0032_seed_soroswap_router.up.sql populates the routers
registry. WASM-history audit started at
docs/operations/wasm-audits/soroswap-router.md;
BackfillSafe=false until the per-hash review lands.

Changed

Raw trades retention removed (migration 0031). Pre-fix the
trades hypertable aged out at 90 days; we relied on the
hourly+ CAGGs to preserve historical OHLC. Operator wants raw
per-trade fidelity preserved indefinitely (regulatory + proof-
of-pricing queries can't be reconstructed from CAGGs).
Justification: r1's postgres data dir is on a 1.5 TB ZFS volume
with 4% used. Earlier "no room" analysis was wrong — was
measuring the OS root disk (49 GB), not the postgres data
volume. Status page coverage panel relabeled from "Raw-trades
coverage (last 90 days)" → "Raw-trades coverage — genesis →
tip"; coverage_pct grows monotonically as backfills land.
Compression policy on chunks > 7d is unchanged (~5x reduction).

Added

CEX pair coverage — cross-fiat majors. All four CEX
connectors (binance/bitstamp/coinbase/kraken) now stream BTC
and ETH against EUR + GBP in addition to USD. Pre-fix, only
Bitstamp published BTC/EUR — every aggregator tick on
crypto:BTC/fiat:EUR was single-source, which falsely tripped
Phase 2 freeze permanently. Bitstamp + Coinbase + Kraken +
Binance all support these pairs natively; we just hadn't
enumerated them in the connector defaults.

Stop-gap pre-Tier-3. The next change in this area will replace
the hand-curated DefaultPairs() maps with auto-discovery from
each exchange's pair-catalogue endpoint
(/api/v3/exchangeInfo / /products / /0/public/AssetPairs /
/api/v2/trading-pairs-info), filtered by an allow-list of
quote assets. That move expands coverage from ~50 hand-curated
pairs/exchange to ~200-1500 active pairs/exchange. Storage
scales with PAIR COUNT (CAGG rows, ~50 MB/year for 1500 pairs)
not raw trade volume (90-day retention), so the cost is
bounded.

Fixed

Backfill auto-refresh: three bugs caught on first real run.
Yesterday's commit added refresh_continuous_aggregate calls
after each backfill chunk but every CAGG refresh failed. Three
fixes from the live test:
1. 42P18: could not determine data type of parameter $1
  — lib/pq's CALL syntax doesn't propagate the procedure
  signature's parameter types, so untyped placeholders fail.
  Fix: explicit ::timestamptz casts in the SQL.
2. 22023: refresh window too small for prices_4h /
  _1d / _1w / _1mo — Timescale rejects refresh windows
  narrower than 2× bucket width. A 10k-ledger chunk's ts
  span (~4h) was fine for prices_1h but failed every
  coarser CAGG. Fix: per-CAGG MinWindow declared in
  CAGGsLiveForever; new PadRefreshWindow helper expands
  the chunk's window to that minimum centered on the
  chunk's midpoint. Padded area materialises as empty
  buckets (cheap).
3. 55P03: concurrent refresh — with -parallel N,
  multiple chunks race on the same coarse CAGG (prices_1mo
  was the worst — chunks finishing close together all want
  to refresh the same monthly bucket). Fix: retry-on-55P03
  with exponential backoff (200ms → 1.6s × 5 attempts).
End-to-end verified live: 10k-ledger SDEX backfill at
ledgers 50,000,000-50,010,000 inserted 718,873 trades AND
populated 66,513 prices_1h buckets + 22,005 prices_1d
buckets — those CAGGs will now persist past the 90-day raw
retention. Yesterday's claim "auto-refresh now works" was
premature; this commit is what makes it true.
Live-site QA pass — F-01/F-03/F-04 resolved, F-02 partial.
Working through docs/review-2026-05-13-live-site-qa.md:
- F-01 (degraded state invisible in explorer): new
  DegradedBanner component polls /v1/status every 60s and
  renders a fixed band between Navbar and content when
  overall ≠ "ok". Tone (amber/red) keys off pageCount > 0.
  Includes top alert name + link to status page. Quiet when
  everything's fine; noisy enough to set expectations when
  it isn't.
- F-02 (pools 503 silently rendered as "No pools matched"):
  DexesView now branches on q.isError and surfaces an
  explicit error card with retry + link to status. Empty-
  state path is gated behind !q.isError. Backend perf
  (the underlying 7s cold-cache p99) tracked alongside the
  api_cache_miss_rate_high workstream.
- F-03 (CORS credentials mismatch): explorer's useMe()
  no longer sends credentials: include against an API that
  explicitly refuses credentialed CORS. Cost: signed-in
  users see signed-out CTAs in the explorer navbar
  (dashboard.ratesengine.net is unaffected — same-origin).
  Inline comment documents the cross-origin cookie work
  needed to re-enable session detection (Domain=
  .ratesengine.net + ACA-Credentials + SameSite=None).
- F-04 (deep_link API path leaked to next/link):
  NetworksPanel no longer feeds API deep_link values
  (e.g. /v1/assets/USDC-GA5Z…) into <Link>. Stellar
  rows now build the explorer route explicitly
  (/assets/{slug}/stellar); the API deep_link stays in
  the JSON for programmatic consumers.
Incident triage sweep — 9 active alerts → root-cause +
preventatives. Worked through every alert firing on r1 today
and either resolved the root cause, codified prevention in
ansible, or filed it as a known-real signal needing follow-up:
- node_root_disk_warning — disk 81% → 62% by truncating a
  7.3GB syslog. Root cause: Loki running at log_level=debug
  spamming ~4M caller=mock.go msg=Get key=collectors/...
  lines/day into syslog. Fix: set Loki to warn
  (configs/ansible/roles/loki/templates/loki-config.yaml.j2)
  and add a defense-in-depth rsyslog filter so even an
  accidental level regression can't reach /var/log/syslog
  (configs/ansible/roles/archival-node/tasks/15-log-discipline.yml).
  Also pruned 36 old binary backups + 9 stale toml backups +
  vacuumed journal to 7 days.
- verify_archive_unit_failed — root cause: 8h max-runtime
  cap was tight for ~62.5M-ledger pubnet. Fresh run completed
  34.7M ledgers in 8h (1207 l/s aggregate at 8 workers) then
  exited 1/FAILURE on context deadline — the same as the
  previously-rotated journal would have shown. Bumped
  defaults to 12 workers + 16h cap (sits inside the 24h
  timer cadence with headroom). Updated both the in-repo
  unit (deploy/systemd/verify-archive-tier-a.service) and
  the live r1 drop-in. Started a fresh run on the new
  settings; the alert clears when it finishes.
- sla_probe_unit_failed_alert — REAL: /v1/markets,
  /v1/assets cold-cache p99 spikes (~5s, ~2.4s) breach
  the 500ms target on the probe's first sample after each
  30s cache-TTL window. Filed as a perf workstream — needs
  /v1/assets + /v1/issuers cache wrappers + prewarm.
- api_cache_miss_rate_high — REAL: prewarm covers
  markets/all_pools for limits {5,25,100,200} but
  markets/asset_markets and markets/source_markets
  ops aren't prewarmed at all; user-facing requests with
  novel param tuples miss cache. Same perf workstream.
- anomaly_freeze_sustained / anomaly_freeze_engaged —
  REAL but invisible: 1892 freeze decisions emitted, zero
  Redis markers, zero freeze_events rows. Phase 2's
  baseline z-score is unstable because we only have 7 days
  of prices_1h data (root cause = the SDEX backfill bug
  from the previous session). Added an INFO log in
  markPhase2Freeze so operators can grep
  journalctl -u ratesengine-aggregator | grep "phase2 freeze"
  to see which pairs are firing. Updated the alert
  annotation (both repo + R1 overlay) to call out the
  cold-baseline pattern + triage steps.
- aggregator_supply_refresh_never_initialized — gated by
  [supply].aggregator_refresh_enabled = false (default).
  Enabling it requires the on-chain supply observers to be
  backfilled across the watched accounts; same workstream as
  the SDEX backfill. Not a quick fix; documented for follow-up.
- supply_snapshot_never_initialized — RESOLVED: the
  supply-snapshot.service was running daily and exiting 0,
  but /etc/default/supply-snapshot didn't set TEXTFILE_OUTPUT,
  so the binary skipped the metric write. Wired the textfile
  path; metric now emits. Codified in
  configs/ansible/roles/archival-node/tasks/10-observability.yml
  so a rebuilt host gets the wiring automatically.
- slo_latency_burn_slow — same family as the SLA-probe
  perf finding; will track with that workstream.
Backfill status surfaces "stalled" vs "running" separately.
BackfillDecoderState (the per-decoder row on
/v1/diagnostics/ingestion) decomposes the previously-opaque
ranges_active count into ranges_complete (done),
ranges_running (incomplete + updated within 10 min), and
ranges_stalled (incomplete + idle > 10 min — needs
ratesengine-ops backfill -resume). Status page renders three
separate columns with green/blue/red coloring. The old
ranges_active field stays on the wire for back-compat.
Backfill auto-refreshes the long-lived CAGGs (prices_1h /
prices_4h / prices_1d / prices_1w / prices_1mo) at the
end of every chunk. Without this, historical inserts get
dropped by the 90-day raw-trades retention policy before the
CAGG policy refresher's natural cadence picks them up — which
is what happened to the May 6-11 2026 SDEX backfill (cursors
hit last_ledger == range_end for every range, ~80M trades
inserted, retention dropped them within 24h, no CAGG
materialisation, ~5d of wall-clock work lost; trades
MIN(ledger) for sdex collapsed back to 61,191,617).

Backfill tool changes:
- New -refresh-caggs flag (default true). After each
  chunk's trade-insert loop, derives the actual ts range from
  the inserted rows (Store.LedgerRangeToTimeRange) and
  force-refreshes every long-lived CAGG over that window
  (Store.RefreshContinuousAggregate).
- Per-view soft-fail so one wedged CAGG doesn't block the
  others.
- Procedure doc rewritten — manual CALL refresh_continuous_ aggregate step removed (now automatic).
Diagnostics endpoint additions:
- cagg_coverage field reports prices_1h MIN/MAX bucket +
  row count — the real source-of-truth answer to "do we have
  historical OHLC since genesis?" (raw trades only spans
  the last 90 days; hourly+ CAGGs are retained forever).

Added

Backfill coverage on /v1/diagnostics/ingestion + status page.
New backfill_coverage[] array on the diagnostics endpoint
reports per-source MIN/MAX ledger from the trades hypertable,
joined with an operator-curated map of source genesis ledgers
(1 for SDEX, contract deploy ledger for each Soroban DEX), with
a derived coverage_pct so the answer to "do we have data from
ledger 1 to tip?" is one column. CEX/FX sources surface as
applies=false (their trades have no Stellar-ledger context).
Backed by a process-local cache refreshed every 5 min in a
background goroutine — the underlying SQL is 2-3s on a populated
trades hypertable, too slow for the request path.

Status page renders a new "Coverage — ledger genesis → tip"
table with per-source progress bars (green ≥99%, amber ≥50%,
red <50%). Today's r1 reading: SDEX 2.18% covered (61.2M → 62.5M
out of 1 → 62.5M), Soroban DEXes 15-17%, off-chain sources N/A.
Status page — per-region "Ingestion" section. Polls each
region's /v1/diagnostics/ingestion every 30s and renders a
panel with: binary version + commit, live ledger card (latest,
lag, 24h volume, indexed markets/assets), FX backfill coverage
(date range, currencies, total quotes), CoinGecko market-cap
cache state (entries, newest/oldest fetch age), supply observer
counts, per-decoder backfill table (ranges total/active, oldest
lag), and per-source health table joined with trailing-24h
trades/volume/markets. Region list is a single REGIONS const —
r2/r3 join by appending a row, no other code changes needed.
GET /v1/diagnostics/ingestion — single-fetch ingestion
snapshot for the region. Composes: region label, binary version,
live ledger tip + lag, per-decoder backfill state (ranges
total/active, oldest lag), Frankfurter / fx_quotes coverage
(earliest/latest dates, total quotes, distinct currencies),
market-cap cache state, supply observer coverage (classic vs
SEP-41 counts, last snapshot age), and the full source registry
joined with trailing-24h trades/volume/markets.
Designed as the only call the status page makes for its
per-region ingestion panel — operators no longer have to scrape
/v1/network/stats + /v1/sources + /v1/diagnostics/cursors
- /v1/version and reconcile by hand. New storage helpers:
  FXCoverageStats, SupplyCoverageStats (one query each, ~1ms
  on populated tables). Cache: public, max-age=15.

Fixed

/assets/{slug} for catalogue slugs (usdc, chinese-yuan,
btc, …) now renders the real cross-chain view instead of the
"Asset not found" fallback. The page's fetchGlobalAsset
was firing a per-slug /v1/assets/{slug} request at build
time, just like [network] was before its consolidation —
with ~1000 prerendered routes that storm tripped r1's anon
rate limit and every catalogue page baked in the not-found
fallback. Extracted the catalogue source to
web/explorer/src/app/assets/catalogue.ts (shared module,
single /v1/assets/verified call, memoised promise, 429-aware
retry). Both [slug] and [slug]/[network] now read from the
same map.
/assets/{slug} and /assets/{slug}/{network} now resolve in
both case variants for catalogue entries. Previously only the
uppercase form (/assets/USDC/) was prerendered because dedup in
generateStaticParams picked first-seen casing, so user-typed
lowercase URLs (/assets/usdc/) and any links pointing at the
catalogue's canonical lowercase slugs returned 404. Now both
cases get a route per catalogue entry; non-catalogue Stellar
assets keep their listing casing as before.
/assets/{slug} for verified-catalogue currencies now renders
the cross-chain identity view, not the Stellar-issuer view.
The dispatcher used to fall through to AssetDetail (with the
IssuerPanel) whenever /v1/coins returned a row, even when
the slug also matched a catalogue entry. Result: /assets/USDC/
was showing Circle's Stellar issuer detail instead of the
cross-chain page. The [network] route (/assets/USDC/Stellar/)
is now the only place per-issuer detail lives. Title +
description for catalogue slugs now use cross-chain framing
(USDC — Stablecoin) instead of Stellar-only framing.

Changed

Ansible template now bakes in anon_rate_limit_per_min = 600
/ key_rate_limit_per_min = 6000. Codifies the live r1 bump
applied 2026-05-13. The prior defaults (60 / 1000 per min) were
too tight for any consumer doing a static build or dashboard
refresh from a single IP — the explorer Cloudflare Pages build
was the canary.

Fixed

Explorer build no longer 429s on /assets/[slug]/[network]
prerender. Next.js opts out of its built-in fetch dedup when
signal is set, so each prerendered slug+network page was
separately re-fetching /v1/assets/{slug} and the build was
firing hundreds of requests in parallel — far above r1's
anonymous-tier rate limit (60 req/min). Result: every
[slug]/[network] route prerendered as a "Not found" page on
prod. Fix consolidates the catalogue source: a single
/v1/assets/verified call (with 429-aware retry) populates a
module-level Map from which both generateStaticParams and
per-page fetchGlobalAsset read. Concurrent r1 config bump
(anon_rate_limit_per_min = 600, key_rate_limit_per_min = 6000) gives real consumers headroom too — the prior 60/min
was unworkable for any client doing a static build or
dashboard refresh.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rates Engine v0.5.0-rc.51

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

[v0.5.0-rc.51] — 2026-05-14

Changed

Added

Changed

Added

Fixed

Added

Fixed

Changed

Fixed

Uh oh!