Background
Cost guardrail warnings fire at the configured thresholds (default 50% / 80% of daily / monthly cap). The warning is logged as a JSONL record in <agent>/log/YYYY-MM/YYYY-MM-DD.jsonl with severity INFO/WARN (verified — atomic_agents/agent.py:1707-1714).
JSONL log is the only delivery channel. There is no webhook hook, no callback, no integration point for an external alerter (Telegram bot, Slack webhook, PagerDuty, email).
Why it matters
The reason warnings exist is to give the operator time to react before the cap blocks runs. Burying them in a JSONL file the operator reads only when they remember to check defeats the purpose.
For a personal-deployment use case (Dan's gizmo), Telegram alerting is non-negotiable — silent cron failures and silent threshold-crosses are both known operational pain. For a SaaS use case, alerting needs to route per tenant.
What to change
- Config schema — add optional
alert_hooks block to model.md cost_guardrails:
cost_guardrails:
daily_cap_usd: 5.00
monthly_cap_usd: 100.00
warning_thresholds: [0.50, 0.80]
alert_hooks:
- type: webhook
url: https://hooks.slack.com/...
on: [threshold_cross, cap_blocked]
- type: webhook
url: https://api.telegram.org/bot.../sendMessage
on: [threshold_cross, cap_blocked]
template: "atomic-agents alert: {agent} at {pct}% of {period} cap"
- Runtime —
_fire_cost_warning() in agent.py walks alert_hooks, posts to each:
- JSONL log entry as today (don't regress)
- Webhook POST with JSON payload
{agent, period, pct, threshold, severity, ts}
- Failure handling: alerter timeout/error logged, doesn't block the agent run
- Library hook (orthogonal) —
AtomicAgent accepts an optional on_cost_alert: Callable[[CostAlert], None] parameter for programmatic use. Hub-wrapped invocations register their own callable; bare-CLI uses the YAML config.
- Spec doc update —
docs/spec/05-cost-guardrails.md documents the alert hook contract.
- Sample — Caldwell
model.md shows commented-out Telegram webhook example.
Acceptance
model.md parses alert_hooks correctly (with and without — backward compatible)
- Webhook POST happens at threshold cross + at cap-blocked event, payload schema is documented
- Webhook failure does NOT block the agent run
- Programmatic
on_cost_alert hook fires for both events
- Tests cover: hook fires once per threshold per day (not on every run), hook failure is logged, webhook timeout doesn't hang the run
- New JSONL fields
cost_alert_dispatched: true/false for audit
Open questions
- Telegram webhook needs
chat_id per operator — is that in alert_hooks.url (URL has chat_id baked in) or a separate field? (URL probably; standard Telegram pattern)
- Per-agent vs per-deployment alert config: today guardrails are per-agent. Hooks probably want to be per-agent too (different agents → different routing) but with a global default at deployment level (
atomic-agents.toml or env). Defer global default until needed.
- Webhook retry policy: probably "best effort, don't retry, don't block" — alerter is responsible for not dropping. Document.
Context
- Surfaced in deployment-readiness review (2026-05-08), gap E
- Telegram alerting is non-negotiable for Dan's gizmo deployment
- Pattern reference: similar webhook/callback hooks would be useful for other framework events (run_failed, dream_completed, eval_failed) — track here as future-but-not-this-PR
Background
Cost guardrail warnings fire at the configured thresholds (default 50% / 80% of daily / monthly cap). The warning is logged as a JSONL record in
<agent>/log/YYYY-MM/YYYY-MM-DD.jsonlwith severity INFO/WARN (verified —atomic_agents/agent.py:1707-1714).JSONL log is the only delivery channel. There is no webhook hook, no callback, no integration point for an external alerter (Telegram bot, Slack webhook, PagerDuty, email).
Why it matters
The reason warnings exist is to give the operator time to react before the cap blocks runs. Burying them in a JSONL file the operator reads only when they remember to check defeats the purpose.
For a personal-deployment use case (Dan's gizmo), Telegram alerting is non-negotiable — silent cron failures and silent threshold-crosses are both known operational pain. For a SaaS use case, alerting needs to route per tenant.
What to change
alert_hooksblock tomodel.mdcost_guardrails:_fire_cost_warning()inagent.pywalksalert_hooks, posts to each:{agent, period, pct, threshold, severity, ts}AtomicAgentaccepts an optionalon_cost_alert: Callable[[CostAlert], None]parameter for programmatic use. Hub-wrapped invocations register their own callable; bare-CLI uses the YAML config.docs/spec/05-cost-guardrails.mddocuments the alert hook contract.model.mdshows commented-out Telegram webhook example.Acceptance
model.mdparsesalert_hookscorrectly (with and without — backward compatible)on_cost_alerthook fires for both eventscost_alert_dispatched: true/falsefor auditOpen questions
chat_idper operator — is that inalert_hooks.url(URL has chat_id baked in) or a separate field? (URL probably; standard Telegram pattern)atomic-agents.tomlor env). Defer global default until needed.Context