Release Rates Engine v0.5.0-rc.49 · StellarIndex/stellar-index

[v0.5.0-rc.49] — 2026-05-12

audit-2026-05-12 remediation pass — 27 audit findings closed in
code/docs/config, 14 more verified already-resolved, plus the
pre-flip blocker F-1201 explorer migration that unblocks the
rc.48 deploy to R1.

Fixed

Explorer migrated off /v1/currencies (F-1201 — pre-flip
blocker). rc.48 removed the /v1/coins + /v1/currencies
HTTP surface. The explorer had eight files still making live
calls against /v1/currencies — every one would 404 the
moment rc.48 deploys to R1. Migrated:
- HomeCurrencies.tsx → /v1/price/batch?asset_ids=fiat:EUR,…&quote=fiat:USD
  (single RT, names hardcoded for the 6-tile home strip).
- sitemap.ts → /v1/assets/verified filtered to class=fiat.
- HomeTryAPI.tsx → updated example paths to /v1/assets/verified
  - /v1/assets/euro.
- embed/currency/[ticker]/page.tsx → /v1/assets/{ticker}
  (GlobalAssetView). Sparkline + 24h/7d change degrade
  gracefully to a price-only widget; chart hookup is a follow-up.
- AssetConverter.tsx → /v1/price/batch for the FX rate table
  (inverts so the converter's rate_usd = 1 USD = N target
  contract stays unchanged).
- convert/[from]/[to]/ConvertPair.tsx → /v1/price/batch
  for the live from→to rate (one pair vs the old cross_rates
  bulk).
- convert/[from]/[to]/page.tsx → /v1/assets/{from} for
  identity + /v1/price/batch for the singleton cross-rate.
- SearchModal.tsx → /v1/assets/verified filtered to fiat
  for the ticker→/currencies/X affordance.
  Zero remaining live calls to the removed routes. Typecheck +
  lint + build all green.

Changed

Multi-region tooling now handles single-region operation
gracefully (F-1234). Pre-R2/R3-bringup only R1 is deployed.
scripts/dev/verify-cross-region.sh, ratesengine-ops cross-region-check, and ratesengine-ops cross-region-monitor
all used to fail with "need at least 2 regions to compare"
even when called against the only deployed region. Operators
who triggered them got a confusing failure and learned to
ignore the family. Now: each command logs a one-line
"single-region — pre-launch posture, see r2-r3-bringup.md"
notice and exits 0. The check fires for real once a second
region URL is supplied. Default R1 URL in
verify-cross-region.sh points at the live public hostname
(api.ratesengine.net); R2/R3 default empty.
R1 prometheus.r1.yml scrape coverage + rule_files path
(F-1219 + F-1220). Added scrape jobs for redis_exporter
(port 9121, installed by the redis-sentinel ansible role),
alertmanager self-scrape (so we can alert on alertmanager
being down), postgres_exporter / pgbackrest_exporter
placeholder slots (operator deploys the exporters, scrape
picks them up), and minio cluster metrics with bearer-token
auth path. Each job has a one-line comment naming the alert
family it feeds. rule_files path changed from the empty-
opt-in glob /etc/prometheus/rules.d/*.yml to the canonical
/etc/prometheus/rules.r1/*.yml matching the deployed-asset
path so operators no longer have to symlink the full
configs/prometheus/rules.r1/ set into a parallel directory.
Prometheus multi-host ↔ R1 overlay drift caught at CI (F-1222).
Multi-host rules in deploy/monitoring/rules/ use underscored
job labels (ratesengine_api) matching the multi-host ansible
scrape config; R1's single-host overlay at
configs/prometheus/rules.r1/ uses hyphenated labels matching
the R1 systemd units. Editing only one half silently breaks the
other deployment shape. New header notes pin the convention in
the canonical files; scripts/ci/lint-docs.sh now flags any
multi-host rule file without an R1-overlay sibling so the gap
surfaces at CI time. Created R1 overlays for cache.yml,
stellar.yml, storage.yml to satisfy the new pairing
check — the underlying rules already match upstream metric
names so no expression changes were needed.
Tailored error for supply-observer backfill attempts (F-1243).
ratesengine-ops backfill accounts (or any of the six supply
observers: accounts, trustlines, claimable_balances,
sac_balances, sep41_supply, liquidity_pools) used to fail
with the generic "WASM-hash audit pending" error — misleading,
since supply observers aren't Soroban price/oracle sources at
all. They plug into different dispatcher hooks
(LedgerEntryChange / OpDecoder / SEP-41) and have no
historical replay path through this command. New
checkBackfillSources helper distinguishes the two cases and
emits a supply-observer-specific message pointing operators
at the supply-snapshot timer (or a future supply-backfill
command for SEP-41 windows). 3 new unit tests cover the
closed name set, the tailored error path, and the unchanged
WASM-audit gate.

Added

Customer-webhook delivery worker wired into the API binary
(F-1270 follow-up). The worker that drains the
webhook_deliveries queue (HMAC sign + POST + retry) now runs
as a goroutine in cmd/ratesengine-api/main.go whenever the
dashboard surface comes up (Postgres reachable). Pre-this:
operator had to launch the worker as a separate process per
the docblock "operator-launched via internal/customerwebhook.New".
Single-binary deploy now does it inline — same context, same
lifecycle, same logger; one less ansible task.
Customer-webhook delivery alerts + runbook (F-1270 follow-up).
Two new Prometheus alerts wired into both the multi-host and R1
rules: _delivery_failing (P3) fires when 5xx + network-error
attempts exceed 0.1/s for 15+ min (single-customer outage);
_delivery_exhausted (informational) fires when a delivery hits
the 15-attempt retry budget. New customer-webhook-delivery-failing.md
runbook covers the SQL to identify the failing webhook, the
customer-outreach template, and the worker-vs-customer triage
tree. Catalogued in alerts-catalog.md so operators see them
alongside the rest of the API alerts.

Tested

internal/usage package gains unit-test coverage. The
per-subject daily usage counter (Redis-backed) was the one
remaining no-test package in internal/. 8 tests cover the
Increment/Read round-trip, day-boundary handling, empty-subject
no-op, retention clamp, key-prefix isolation, URL-encoded
subjects (so : inside IPv6 addresses doesn't collide on the
date separator), and the 35-day retention TTL applied on every
key.

Documented

ADR-0012 placeholder (F-1262). Filled the numeric gap in
docs/adr/ — 0011 jumped to 0013 with no file at 0012, even
though docs/adr/README.md had listed the slot as Planned
(reserved for Quorum-set composition per ADR-0004 Phase 3)
since the initial audit. The placeholder documents what the
future ADR must cover (third-party validator selection,
HALT-LIVE-DROP scoring, cross-region quorum overlap, stellar-
core [QUORUM_SET] thresholds) and what invariants it must
preserve (Tier-1 independence, no self-included validators,
≤ 33% effective weight per validator). README index now links
to the file.
Dashboard surface bypasses the v1 envelope on purpose (F-1235).
/v1/dashboard/keys* handlers write bare JSON rather than the
data / as_of / flags envelope used by market-data
endpoints. Documented the rationale in
docs/reference/api-design.md §4.1: different audience
(dashboard React app, not SDK), session-scoped data (no
market-quality flags to carry), distinct auth path (session
cookie vs API key). Future contributors won't "fix" the
perceived drift. RFC 9457 problem responses, Cache-Control: no-
store, and X-Request-Id correlation are preserved.

Fixed

Guard test for flags.stale=true on every fallback path (F-1254).
The stale-flag fix in internal/api/v1/price.go:287-299 (the
May-10 SEV-2 lesson — "fallback chain is itself the staleness
signal") had no regression test. New
TestPrice_FallbackChainSetsStaleFlag covers both the
triangulated and direct-rewrite fallback paths and asserts
flags.stale=true on each — so a future change that re-clears
the flag is caught immediately.
API reference doc drift after rc.48 (F-1246). Regenerated
docs/reference/api/rates-engine.v1.yaml to match the current
OpenAPI source — three residual /v1/coins references in the
generated file (an ?issuer= description, the home-page
summary, and an error-envelope instance example) lingered
after rc.48 removed the route. Pure regen; OpenAPI source was
already clean.
Postman collection drift (F-1247). make docs-postman now
writes to the customer-facing canonical at
examples/postman/rates-engine.postman_collection.json —
previously it wrote to a gitignored docs/reference/api/...
path, so the tracked customer copy drifted silently every time
the OpenAPI spec moved. The docs-site build pipeline runs its
own regeneration; the in-repo file is for customers who clone
the repo to import the collection. README + Makefile docstring
updated; canonical refreshed (656k bytes).
classic_assets.first_seen_* ordering bug under chunked
backfill (F-1239). The ON CONFLICT clause in
registerClassicAssetSeen previously updated only last_seen_*
(GREATEST) and observation_count — leaving first_seen_*
pinned to whichever ledger first hit the row, regardless of
actual chronology. Out-of-order parallel backfill (chunked
ranges processed in parallel) could leave first_seen_ledger
higher than the asset's true first observation. Fix: also
update first_seen_* with LEAST(existing, incoming). Idempotent
for forward ingest (incoming is always ≥ existing → no-op).

Removed

Unused GIN indexes on blend_auctions.bid / .lot (F-1238).
Migration 0029 drops the two JSONB GIN indexes from migration
0009. No reader in internal/storage/timescale/ queries those
columns by content — LatestBlendAuctionEvent and
ListBlendPools both filter only by pool /auction_type /
user_address / ts. Index write-amplification on every
blend-auction INSERT for a read path that never materialised.
Down migration restores them.

Changed

ratesengine_ingestion_source_stopped alert window widened
to 30m × 15m (F-1212b). The pre-existing 5-min window
produced routine false positives on low-volume Soroban / FX
sources (blend auctions, ECB FX dailies, Band oracle pushes,
Comet pool swaps, Phoenix off-peak windows). On R1 this
manifested as 5 simultaneous ticket-tier alerts at any given
time; operators learned to ignore the family. The new window
waits past the natural quiet-window cadence for these sources;
total-outage coverage stays tight via the separate 3-min
ratesengine_ingestion_all_sources_stopped (P1). Rule updated
in both deploy/monitoring/rules/ingestion.yml and
configs/prometheus/rules.r1/ingestion.yml. Runbook + alerts
catalog updated to reflect the new threshold and rationale.

Documented

RFP F4.2 one-year retention catch-up procedure (F-1265).
docs/operations/backfill-procedure.md gains a new section
walking through the 1-year catch-up backfill needed to meet
Freighter RFP F4.2's ≥1y retention commitment. Covers
resolving the target ledger window from the Galexie archive
manifest, sanity-checking upstream archive completeness, row-
count estimation, the chunked-by-week run loop with -resume
so a mid-chunk crash doesn't re-do 12 hours of work, the CAGG
force-refresh sequence, and a /v1/chart?timeframe=1y
verification step. Pre-flip operator step; the code path is
unchanged.

Added

R1 TOML supply.watched_ defaults (F-1266).* The
archival-node ansible role's ratesengine.toml.j2 template
gains a [supply] block with sensible launch-day defaults:
watched_classic_assets populated with the top Stellar-classic
verified currencies (USDC / EURC / AQUA / yXLM / VELO / BLND /
PHO / KALE) mirroring internal/currency/data/seed.yaml. Plus
inventory-overridable knobs for watched_sep41_contracts,
sdf_reserve_accounts, and reserve_balances_stroops.
Pre-F-1266: R1's TOML had no supply block → every F2 field
(market_cap_usd, fdv_usd, circulating_supply,
total_supply, max_supply) returned NULL even though the
code path is correct. The next archival-node role run will
flip every one of those fields from NULL to a real value for
the 8 watched currencies.
Opt-in Redis ACL lockdown template (F-1213). Closes the
pre-flip Redis-ACL gap on R1 by codifying a narrow ACL config
in the redis-sentinel ansible role. New redis_acl_lockdown
flag (default false for backward compat) renders
templates/users.acl.j2 to /etc/redis/users.acl, references
it from redis.conf.j2 via aclfile, and:
- Disables the legacy default user (off nopass nocommands)
  so no password-less access path remains.
- Creates a ratesengine application user with
  +@read +@write +@scripting +@pubsub +@connection minus the
  @admin + @dangerous families, scoped via ~prefix:* to
  exactly the cache key prefixes the application uses (vwap,
  confidence, freeze, div, ratelimit, signup-ip, toml, meta,
  price, apikey, health, oracle, subscriber) plus the pub/sub
  channels (&closed-bucket-*, &stream-*).
  Application binaries get a new [storage].redis_username TOML
  key (default empty = legacy path; operators set it to
  ratesengine when they flip the lockdown). redisclient.Build
  threads it into both the FailoverClient and single-node code
  paths. Commented-out per-component (re_aggregator /
  re_api / re_indexer) users in the template show the
  follow-on split when operators add per-binary passwords.
L2.2 Phase 2 FX-anchor USD volume coverage (F-1268). New
timescale.VWAPUSDFXResolver implements the pre-existing
USDVolumeFXResolver interface against the prices_1m
CAGG: for any on-chain quote asset not already on the
operator's [trades].usd_pegged_classic_assets list, the
resolver looks up <quote>/<USD-peg> at the trade's
timestamp and supplies a per-minute-bucket-cached USD rate
that tradeUSDVolume multiplies through. Pre-Phase-2: only
CEX/FX + operator-allow-listed pegs contributed to
volume_24h_usd; an EURC/XLM Soroswap trade contributed 0
even though we had a fresh EURC/USDC VWAP one minute earlier.
Now it inherits USD value through the peg chain. Wired
alongside the Phase 1 quote spec in
cmd/ratesengine-indexer/main.go whenever
usd_pegged_classic_assets is non-empty (no new config
knob). 7 unit tests cover defaults, cache hits, negative
cache, TTL expiry, minute-bucket key stability. The
AssetDetail volume_24h_usd docstring rewritten to
document the three-tier coverage chain (Phase 1 off-chain
- Phase 1 on-chain pegs + Phase 2 FX-anchor).
Customer-facing dashboard webhook CRUD handlers (F-1270
complete). New internal/api/v1/dashboardwebhooks package
mounts five routes: GET/POST/PATCH/DELETE
/v1/dashboard/webhooks + GET
/v1/dashboard/webhooks/{id}/deliveries. Session-gated,
role-gated (Owner/Admin/Member create; Viewer/Billing 403),
cross-account 404 (no existence-leak), 10-per-account quota,
HTTPS-only URLs, closed-set event validation, secret returned
ONCE on create. OpenAPI spec adds 5 paths + 5 schemas;
postman + api docs regenerated. 8 unit tests cover happy
path, 401, 403, malformed URL, unknown event, quota, list
scoping, cross-account delete. Wired into the v1 server via
the same DashboardAuthMounter pattern as keys, and into
main.go's buildDashboardBundle so the handlers come up
whenever Postgres is reachable.
Customer-webhook delivery worker (F-1270 close-out).
New internal/customerwebhook package drains the queue the
store wrote in the prior commit: poll-loop drains
ListPendingDeliveries, HMAC-SHA-256 signs the payload,
POSTs to the customer URL with X-RatesEngine-Signature +
X-RatesEngine-Event + X-RatesEngine-Delivery-Id headers,
marks delivered on 2xx, schedules retry on 5xx/network
(exponential backoff 30s → 1h cap, 15-attempt budget),
terminates on 4xx / disabled-webhook / missing-webhook /
malformed-URL. New
ratesengine_customer_webhook_delivery_attempts_total
counter labelled by 10 outcomes; documented in metrics ref
with two alert recipes. 5 unit tests cover the happy path,
5xx-retry, 4xx-terminal, disabled-webhook, missing-webhook.
postgresstore.WebhookStore customer-webhook data plane
(F-1270 partial). Implements the existing
platform.WebhookStore interface against the
customer_webhooks + webhook_deliveries tables from
migration 0027: Create / Get / List / Update / Delete on the
registry; Enqueue / ListPending / MarkDelivered /
MarkAttemptFailed on the delivery queue; Append / Update /
ListDeliveries on the dashboard delivery log. Four new
WebhookEventType constants (incident.sev1,
incident.resolved, anomaly.freeze, divergence.firing)
pin the closed event set without forcing the schema to use an
enum. New integration subtest
WebhookStore/CRUD+queue covers the full lifecycle:
create → list → update → enqueue → fail-with-retry →
enqueue → mark-delivered → list-history → delete-cascades.
RotateWebhookSecret is a tagged stub pending the dashboard
CRUD handlers. Delivery worker + customer-facing API are
follow-up commits.
Inline price_usd on /v1/assets/{id} (F-1271). The
asset-detail body now carries price_usd whenever the price
lookup succeeds — previously it only surfaced via the optional
coins-overlay block (assets not in the coins catalogue had a
null price_usd even though the same handler was already
fetching the price for market_cap_usd). Freighter wallet
- retail apps that just want the current price no longer pay
  a second /v1/price round-trip on every asset-detail render.
  Extracted populatePriceUSD runs before the supply early-
  return so off-chain assets without a supply snapshot also get
  the field; populateMarketCap now re-uses the already-inlined
  price instead of paying for a second lookup. OpenAPI spec
  updated; postman + api docs regenerated. 1 new unit test
  covers the no-supply path.
postgresstore.BillingStore subscription mirror (F-1231).
UpsertSubscription and GetActiveSubscriptionForAccount,
previously stubbed, now hit the subscriptions table from
migration 0027. UPSERT is idempotent on stripe_subscription_id
so a re-delivered webhook updates plan + period without
duplicating rows. GetActiveSubscriptionForAccount enforces
both the period-end and canceled-at semantics from
platform.Subscription.IsActive. The Stripe webhook handler
wire-up (which would need to resolve stripe_customer_id →
account_id + extract subscription IDs from the event
payload) is the next layer; this commit lands the store half
so the data path is end-to-end-ready. New integration
subtest BillingStore/Subscription/UpsertAndGetActive covers
insert / idempotent update / expired / validation paths.
Stripe webhook tier-upgrade audit log (F-1240). New
internal/platform/postgresstore.AuditStore implements the
platform.AuditStore interface against the audit_log table
from migration 0027 (Append / AppendBatch / List). The Stripe
webhook handler now writes one plan.upgrade audit row per
successful upgrade event (one row per event, not per key —
metadata carries identifier + tier + key counts so the
dashboard can render "the upgrade happened" without N rows
for a customer holding N keys). StripeWebhookConfig.Audit
is a narrow StripeAuditSink interface so the v1 package
doesn't import the full audit-store surface. Append failures
log at WARN and never block the webhook ack — audit-log
unavailability must not turn a successful Stripe upgrade
into a Stripe retry storm. 3 new unit tests cover the happy
path, the nil-sink legacy fallback, and the swallowed-error
contract.
Depeg-scenario test wiring stablecoin late binding ↔ divergence
worker (F-1230). ADR-0026's stablecoin late binding deliberately
conceals stablecoin↔fiat drift so XLM/USDC trades flow into the
same XLM/USD bucket as XLM/USDT — the divergence worker is the
designed safety net that fires flags.divergence_warning when
the concealed price drifts from external references. The two
components had no test wiring them together; nothing would catch
a regression that broke either side. New
divergence/depeg_test.go exercises the round-trip:
- TestStablecoinDepeg_DivergenceWorkerFires — aggregate.ProxyTrade
  rewrites XLM/USDC → XLM/fiat:USD, the aggregator publishes
  a price assuming USDC=$1, references show the true XLM/USD
  after USDC depegged to $0.95, and the worker fires
  WarningFired=true on the resulting ~5.3% delta.
- TestStablecoinPegHolds_DivergenceWorkerStaysQuiet — symmetric
  negative case so a future change can't make the warning fire
  on the steady state.
Guard tests for two CLAUDE.md surprises (F-1242).
Locks behaviours that no production test previously asserted:
- comet.TestDecodeSwap_DispatchIsByTopicNotContract proves
  that two events with different ContractIDs but the same
  (POOL, swap) topic both decode to Source="comet" — i.e.,
  the Comet decoder is generic Balancer-v1, not contract-
  specific. A future change that narrows the decoder to a
  specific allow-list would silently drop trades from any new
  Balancer-v1 deployment; this test fires first.
- sep41_supply.TestDecoder_CAP67_FourTopic_BackCompat exercises
  mint / burn / clawback events with both pre-P23 (3/2/3 topics)
  and post-P23/CAP-67 (4/3/4 topics with sep0011_asset)
  arities. The decoder reads counterparty positionally and
  must ignore the optional 4th topic; a future contributor who
  naively asserts topic length would break the post-P23 path.
  The third surprise (SEP-41 transfer dual i128/Map shape) has
  no production transfer-amount decoder yet, so the dual-shape
  guard already lives in sac_balances.TestObserver_Decode{I128, MapVal}; documented as such in the audit register.
Per-request CORS observability metric (F-1244). New
ratesengine_api_cors_decisions_total{outcome} counter wired
into the CORS middleware. Outcomes: no_origin /
allowed_origin / allowed_wildcard / denied. The
pre-existing warnOpenCORS startup-only check fires once at
boot then drifts out of memory; this counter is the per-request
companion so operators can dashboard real cross-origin traffic
and alert when a wildcard policy starts handling actual cross-
origin requests in production (the silent failure mode of
RATESENGINE_ALLOWED_ORIGINS=* slipping in alongside
credentialed auth_mode). Wired into the existing middleware
without changing public CORS behaviour; one new test case covers
all four outcomes.
Freeze EventSink LKG VWAP + recovery worker (F-1228 + F-1229).
freeze.EventSink.RecordFreeze and freeze.Writer.Mark now
carry the last-known-good VWAP we're freezing on as a
fixed-precision decimal string (orchestrator passes
formatRatFixed(prev, 12)); the timescale sink stamps it on
the new freeze_events row instead of the previous hardcoded
frozen_value = 0. The recovery worker is the inverse half:
every 60s it lists open freeze_events rows, checks whether
the Redis marker still exists, and calls MarkRecovered when
the marker is gone (TTL elapsed → underlying anomaly cleared).
Without it, durable rows accumulated forever and the explorer
/anomalies timeline showed resolved freezes as still-firing.
New metrics ratesengine_anomaly_freeze_recovered_total and
ratesengine_anomaly_freeze_recovery_sweeps_total{outcome},
new alert ratesengine_anomaly_freeze_recovery_stalled (P3),
new runbook freeze-recovery-stalled.md. Two phase-1 + phase-2
orchestrator callers updated to thread the prevVWAP through.
3 new unit tests + extended existing freeze + orchestrator
tests.
Per-IP signup throttle (F-1232). New v1.SignupIPThrottle
interface + auth.RedisSignupIPThrottle Redis-backed
implementation. Default 5 signups per IP per hour via
INCR + EXPIRE sliding window. The global anonymous rate
limit (60/min/IP) is plenty for browsing public surfaces but
lets a single IP bulk-mint 3,600 email→key_id pairs/hour via
signup. The new throttle closes that vector without affecting
other anonymous traffic; falls open on Redis errors. Wired in
cmd/ratesengine-api/main.go whenever Redis is available.
New auth.ErrSignupRateLimited sentinel + exported
middleware.RemoteIP for handlers needing trusted-proxy-aware
client IP outside the middleware chain. 5 unit tests
(under-cap, over-cap, distinct IPs, empty IP falls open,
defaults applied).
Stripe webhook event dedupe (F-1227). New
internal/platform/postgresstore.BillingStore implements the
AppendStripeEvent / MarkStripeEventProcessed /
MarkStripeEventFailed triple from
internal/platform/billing.go against the stripe_event_log
table from migration 0027. The webhook handler now claims a
dedupe slot with INSERT INTO stripe_event_log BEFORE running
any side effects; ErrAlreadyProcessed (Postgres
23505 unique_violation) signals "we've already done this work"
and acks 200 immediately without re-running the upgrade. Stripe
at-least-once delivery means the same event can land hours
later — without this guard, a manual operator-side downgrade
between original delivery and redelivery silently re-upgrades
the customer. Wired in cmd/ratesengine-api/main.go to the same
*sql.DB the timescale store uses; falls open to the legacy
"rely on idempotent UpdateRateLimit" path when Postgres is
absent. Two new unit tests pin the contract
(duplicate-doesn't-reupgrade + nil-events-store-falls-back).
UpsertSubscription + GetActiveSubscriptionForAccount stubbed
pending Phase-2 / F-1231.
SEP-10 challenge-replay defence (F-1224). Added a
sep10.ReplayGuard interface + sep10.RedisReplayGuard Redis-
backed implementation. After a challenge XDR clears
txnbuild.VerifyChallengeTxSigners, the validator hashes the
signed XDR with SHA-256 and SETNX's the dedupe key
(sep10:seen:<base64-url-no-pad>) with TTL = ChallengeTTL.
A second submission of the same signed XDR finds the slot
taken and returns auth.ErrUnauthorized instead of minting a
fresh JWT. Wired in cmd/ratesengine-api/main.go to the
same Redis client the rest of the auth subsystem uses;
initial validator construction at main.go:144 happens before
rdb is available, so the validator is rebuilt with the guard
once rdb exists. miniredis-backed unit tests pin the three
contracts (first claim ok, replay rejected, TTL expiry allows
fresh claim, distinct hashes don't collide).
ratesengine_aggregator_vwap_cache_write_errors_total metric
- paired ratesengine_aggregator_cache_write_errors page-tier
  alert. The May-10 SEV-2 (Redis BGSAVE blocked by full root FS for
  ~9 h → every cache Set returned MISCONF → /v1/price 404'd on
  every rewritten / triangulated / stablecoin-proxy pair) had no
  upstream signal in monitoring — flags.stale did not flip
  because the aggregator process was alive and ticking, just unable
  to publish. The post-mortem (internal/incidents/data/2026-05-10-redis-writes-blocked-disk-full.md)
  explicitly recommended "alert on aggregator WARN rate (not just
  service-up status)" — this counter realises that recommendation
  as the cleanest signal: any non-zero rate(...[5m]) for ≥ 2 min
  pages. Increments at the single cache-write failure point in
  internal/aggregate/orchestrator/orchestrator.go:653. Closes
  audit-2026-05-12 F-1253; supports F-1254 (flags.stale semantic
  bug — separate fix).

Fixed

Postgres max_locks_per_transaction = 256 codified (F-1251).
The 2026-05-06 SEV-3 (internal/incidents/data/2026-05-06-postgres-lock-table-full.md)
hit out of shared memory (53200) when the per-tx lock table
saturated under concurrent ingest from many sources. The
operator bumped this knob to 256 by hand on R1; un-codified, a
from-scratch R1 rebuild or R2/R3 cutover would inherit the
Postgres default of 64 and re-experience the same incident
class. Now templated by archival-node/templates/postgresql.conf.j2
with default postgres_max_locks_per_transaction: 256 (4×
headroom; 51,200-entry lock table at the current 200-connection
limit). Paired with new ratesengine_timescale_lock_table_pressure
Prometheus alert at 70% saturation so the next bump is
forecast not forced — depends on postgres_exporter (not yet
scraped on R1; rule lights up when the exporter lands).
web/status/wrangler.toml added (F-1245). Mirrors the
explorer + dashboard wrangler.toml shape so Cloudflare Pages
git-integration deploy works without manual project setup.
web/explorer/src/app/oracles/OraclesView.tsx ESLint
react-hooks/exhaustive-deps warning fixed (F-1258).
streamRows was a fresh [] on every render when
streams.data was undefined, causing the downstream useMemo
to recompute every tick. Wrapped in its own useMemo for
referential stability.
internal/sources/comet/adapter_test.go: pin
topic-vs-contract-id contract (F-1242). New
TestDecoder_Decode_NoContractIDDiscrimination makes the
CLAUDE.md surprise-list claim ("Comet decoder matches by
topic, not contract address") executable. Any future change
adding a contract-id allow-list at the decoder layer (instead
of downstream filtering) MUST flip the assertion.
F-1228 + F-1229 acknowledged but deferred to a separate
refactor. freeze_events.frozen_value always written as 0
- MarkRecovered has zero callers. The structural fix
  (extend freeze.EventSink.RecordFreeze to accept the LKG
  VWAP, plus wire a recovery worker that calls MarkRecovered
  on Redis-marker TTL expiry) touches the EventSink interface
  used by 3 packages + tests. Both medium-severity, neither
  blocks the public flip.

Investigated, no code change

N-1262 ADR-0012 missing from disk — turns out to be an
intentional reservation, documented in docs/adr/README.md:56:
"0012 | Planned | Quorum-set composition (referenced by
multi-region-topology) | —". Per the ADR README's
"gaps allowed when reserved" rule. F-1262 closed as invalid.
flags.stale semantic bug fixed (F-1254).
internal/api/v1/price.go reset stale = false after falling
through to priceFallback (last-trade / stablecoin proxy /
triangulation). The May-10 SEV-2 (Redis BGSAVE blocked → cache
empty → every closed-bucket read hit ErrPriceNotFound →
priceFallback served last-trade for ~9 h) hit this path: the
customer-visible response was the fallback, but flags.stale
was false. Per ADR-0018 §"flags.stale semantic" and the doc
comment on Flags.Stale, fallback responses ARE stale by
definition. Set stale = ok on the fallback branch in both
the single-asset and /v1/price/batch paths so any non-VWAP
response now correctly carries stale=true. Companion fix
to F-1253's cache-write-error counter (the upstream signal)
and F-1252 (the alert-routing the May-10 incident exposed).
/v1/price/batch sources nondeterminism (F-1259).
internal/api/v1/price.go:902-905 lookupPriceBatch unioned
per-row sources through a map[string]struct{} and emitted
them in map-iteration order, breaking the ADR-0015
byte-identical cross-region property for batch responses.
Added sort.Strings(srcs) before writeJSON so batch
responses match the single-asset path's stable lexical order
(set by timescale.normalizeVwapSources at the storage
boundary per F-0016 closure).
Cache-Control gap on credential surfaces (F-1225).
/v1/auth/login, /v1/auth/callback, /v1/auth/logout,
/v1/dashboard/keys*, /v1/signup, /v1/webhooks/stripe,
/v1/price/stream, /v1/methodology, and /v1/incidents.atom
all fell through policyForPath's switch with no case match,
emitting no Cache-Control header. Most concerning was
/v1/auth/callback: a CDN in front of the API could have
cached the magic-link consume response and re-issued the
session cookie to subsequent requests. Added explicit cases:
every /v1/auth/* and /v1/dashboard/* and the two
state-changing surfaces (/v1/signup, /v1/webhooks/stripe)
use private, no-store; /v1/price/stream uses no-store;
/v1/methodology and /v1/incidents.atom get explicit
public-cache policies appropriate to their content cadence.
4 of 4 make test-integration failures (F-1250).
- TestPlatformPostgresStores/APIKey/CRUD+revoke+touch —
  test/integration/platform_postgres_stores_test.go:400,510
  constructed key IDs as "kid_" + uuid.New().String()[:12]
  which contains a hyphen at position 9, violating
  migration-0027 check id ~ '^kid_[a-f0-9]{12,}$'. Switched
  to strings.ReplaceAll(uuid.New().String(), "-", "")[:12]
  (12 hex chars).
- TestEndToEnd_LedgerstreamToTimescale/soroban_LCM_with_reflector_FX_update
  - TestTradesInRangeAndMarkets — both used hand-crafted
    G-strkeys (GA7QYNF7…UWDA and GA5ZSEJYB…ZVM) with invalid
    CRCs. The strkey package now enforces CRC; tests switched
    to AQUA's real mainnet G-strkey
    (GBNZILSTVQZ4R7IKQDGHYGY2QXL5QOFJYQMXPKWRRM5PAV7Y4M67AQUA)
    which round-trips cleanly and is distinct from USDC's
    issuer.
- TestSupplyStorageRoundTrip — schema/reader drift:
  migration 0005_create_asset_supply_history.up.sql:60
  creates a UNIQUE index on (asset_key, ledger_sequence, time)
  (TimescaleDB requires the partition column in any unique
  index on a hypertable), but internal/storage/timescale/supply.go:47
  used ON CONFLICT (asset_key, ledger_sequence) DO NOTHING.
  Postgres requires an exact column-set match; the INSERT
  failed with 42P10. Updated the conflict target to all 3
  columns and revised the doc comment to explain the
  invariant preservation.
- Plus TestTradesInRangeAndMarkets DistinctPairs returned
  0 markets after the strkey fix because the test inserted
  into trades directly but DistinctPairs reads from the
  prices_1m continuous aggregate (post rc.45 commit
  8717bc20). Added a CALL refresh_continuous_aggregate('prices_1m', NULL, NULL)
  before the assertion, mirroring test/integration/api_test.go:65-74.
R1 alert blackout closed: 9 alert families wired up, textfile
evidence chain repaired (F-1219 + F-1220 + F-1221 + F-1252).
Pre-change R1 loaded only 6 of 18 rule families
(aggregator/api/infra/ingestion/meta/slo); every alert in
anomaly, divergence, external-pollers, supply,
supply-snapshot, supply-refresh, archive-completeness,
verify-archive, sla-probe was permanently silent. The
SLA-evidence chain specifically was broken end-to-end: the probe
binary supports -textfile-output (cmd/ratesengine-sla-probe/textfile.go:190 writeTextfileAtomic) but the R1 wrapper at
configs/healthchecks/sla-probe.sh never set it, the
textfile-collector dir didn't exist, and node_exporter ran
without --collector.textfile. Three changes close the chain:
- configs/ansible/roles/archival-node/tasks/10-observability.yml
  now provisions /var/lib/node_exporter/textfile_collector/
  and adds --collector.textfile + --collector.textfile.directory
  to the node_exporter systemd unit.
- configs/healthchecks/sla-probe.sh now defaults
  SLA_PROBE_TEXTFILE_OUTPUT=/var/lib/node_exporter/textfile_collector/sla_probe.prom
  and passes -textfile-output $value conditionally (preserves
  the opt-out for operators that set the env var blank).
- configs/prometheus/rules.r1/ gains 9 rule files copied
  verbatim from deploy/monitoring/rules/ (none of them had
  job-label refs requiring single-host adaptation). README
  table updated; rules cache.yml / storage.yml / stellar.yml
  stay excluded with a clear note (redis_exporter +
  postgres_exporter + stellar-core-prometheus-exporter are
  not on R1).
Source-stopped alert false-positive class on low-volume
Soroban contracts (F-1212b). ratesengine_ingestion_source_stopped
used a 5-min rate window which routinely false-fired on
band, blend, comet, ecb, phoenix (legitimate 5+-minute
gaps during quiet trading windows — the source-stopped runbook
itself acknowledges this at line 60). Widened to a 30-min rate
window + 15-min for: in both deploy/monitoring/rules/ingestion.yml
and configs/prometheus/rules.r1/ingestion.yml. Total-outage
coverage stays tight via the separate _all_sources_stopped
alert at 3 min — that one continues to catch the
upstream-broke-across-the-fleet case.
Multi-host alert rule job labels (F-1222).
deploy/monitoring/rules/api.yml / aggregator.yml /
ingestion.yml / slo.yml / meta.yml referenced job="api"
/ "aggregator" / "indexer" but the multi-host ansible
prometheus role's scrape config uses ratesengine_api /
ratesengine_aggregator / ratesengine_indexer (underscores).
Rules would never have evaluated true on a multi-host deploy.
Renamed the canonical multi-host labels to match the scrape
config; meta.yml's scrape-failing regex updated to the actual
exporter job names (postgres_exporter, redis_exporter,
node_exporter, minio). R1's configs/prometheus/rules.r1/
copies already used the correct hyphenated R1 names and are
unaffected.
rc.48 dead-route cleanup follow-up. rc.48 removed the
/v1/coins + /v1/currencies HTTP surface but left several
stale references behind: cmd/ratesengine-sla-probe was still
probing /coins (would 404 after rc.48 deploy → SLA-probe
perma-fail on availability); examples/curl/04-coins.sh +
README still advertised the removed route; web/status synthetic
smoke probe still pointed at /v1/coins?limit=1; openapi/rates-engine.v1.yaml
carried 3 stale /v1/coins text references (incl. the rate-limit
example's instance field); internal/api/v1/server.go Options
doc comments still said "backs GET /v1/coins" / "backs /v1/currencies"
even though the seams now feed /v1/assets and /v1/chart.
All migrated to live equivalents:
- cmd/ratesengine-sla-probe/main.go staticEndpoints switches
  /coins → /assets (same fan-out coverage; comment explains
  the rc.47 → rc.48 → rc.49 progression).
- examples/curl/04-coins.sh deleted; replaced with 04-assets.sh
  using ?order=volume_24h_usd:desc.
- web/status/src/app/page.tsx synthetic-probe entry switched
  to /v1/assets?limit=1 with the same Catalogue group.
- openapi/rates-engine.v1.yaml lines 193 / 1602 / 2608
  updated.
- internal/api/v1/server.go Options.Coins / .Currencies /
  .FXHistory doc comments rewritten to describe the actual
  /v1/assets + /v1/chart consumers.
  Net: make verify clean; go test ./internal/api/v1/... +
  ./cmd/ratesengine-sla-probe/... green.
  Closes audit findings F-1202, F-1210 (cosmetic doc-text portion),
  F-1211, F-1223, F-1245 (smoke surface), F-RFP-0017.

Tooling

docs/reference/api/rates-engine.v1.yaml regenerated
from openapi/rates-engine.v1.yaml via make docs-api. The
checked-in copy had drifted ~990 lines (561 ins / 429 del) since
the last regeneration. web/explorer/src/api/types.ts (the
openapi-typescript output) auto-regenerated as a transitive
consequence (~415 lines lighter; pnpm typecheck clean). Closes
F-1246.
docs/reference/config/README.md regenerated from
internal/config/config.go via make docs-config (+6 lines).
Closes F-1255.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rates Engine v0.5.0-rc.49

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

[v0.5.0-rc.49] — 2026-05-12

Fixed

Changed

Added

Tested

Documented

Fixed

Removed

Changed

Documented

Added

Fixed

Investigated, no code change

Tooling

Uh oh!