Skip to content

cost_health field regression: missing from /health endpoint response #26

@stackbilt-admin

Description

@stackbilt-admin

Summary

The cost_health field is missing from the live /health?format=json response, despite being documented as present in CHANGELOG entry #373 ("Cost health in /health endpoint — cost_health field in JSON response with per-provider spend, burn rate, depletion projection, and active downgrades").

Verified 2026-04-11

Live: GET https://aegis.stackbilt.dev/health?format=json

Returned keys:

["docs_sync_status", "kernel", "mode", "service", "status", "tasks_24h", "timestamp", "version"]

Expected per CHANGELOG: a cost_health key with per-provider { spend_usd, monthly_budget, threshold_tier, projected_depletion_days, burn_rate_per_hour }.

Version running: 1.96.2. The field is not present with or without ?verbose=1.

Data exists — just not exposed

cost_snapshots and cost_budgets D1 tables are being populated correctly by runCostMonitor. Manual query confirms current state:

provider current_spend monthly_budget threshold_tier
anthropic $2.23 $20 ok
cerebras $0.63 $10 ok
groq $0.03 $5 ok
workers_ai $0.18 $0 (free) ok

So the data pipeline is healthy — the route handler is the gap.

Impact

  • Agenda goal #217 ("Review AEGIS LLM provider cost health for Cerebras and Anthropic") is a scheduled check that was supposed to hit /health?format=json and parse cost_health. It cannot execute its specified logic because the field isn't there. Today it was resolved manually by querying D1 directly — that's not a scalable path for scheduled checks.
  • External monitors (if any) that were wired to cost_health silently report nothing.
  • CHANGELOG vs reality drift — we document a feature that no longer exists. Erodes trust in the CHANGELOG as source of truth.

Hypothesis

Likely candidates for where the regression happened:

  1. Kernel shadow collapse phase — the /health route handler got rewritten from daemon into aegis-core and the cost_health block was not ported
  2. Someone removed it intentionally and CHANGELOG wasn't updated
  3. Conditional gate (env flag, version check) that's never true

First step: grep cost_health across aegis-oss/src — if it's gone, it was pruned; if it's there, the route isn't calling it.

Proposed fix

  1. Locate /health route handler in aegis-core (likely src/routes/health.ts or similar)
  2. Re-add a loadCostHealth(db) call that selects from cost_budgets joined with recent cost_snapshots for burn rate
  3. Shape the response to match the original CHANGELOG entry: cost_health: { [provider]: { spend_usd, monthly_budget, threshold_tier, projected_depletion_days, burn_rate_per_hour } }
  4. Add a smoke test that hits /health?format=json and asserts cost_health key present

Acceptance

  • curl /health?format=json | jq .cost_health returns an object with entries for anthropic, cerebras, groq, workers_ai
  • Each entry has the 5 fields listed in CHANGELOG
  • CHANGELOG is updated if the shape changed

Filed from Co-Founder Brief triage 2026-04-11.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions