Skip to content

feat(sinks): Slack/Discord webhook sink for state-transition notifications#32

Merged
jiunbae merged 1 commit into
mainfrom
feat/webhook-sink
May 5, 2026
Merged

feat(sinks): Slack/Discord webhook sink for state-transition notifications#32
jiunbae merged 1 commit into
mainfrom
feat/webhook-sink

Conversation

@jiunbae
Copy link
Copy Markdown
Member

@jiunbae jiunbae commented May 5, 2026

Summary

  • New [sinks.webhook] HTTP sink that POSTs to a Slack/Discord webhook on agent state transitions, so operators get phone alerts the moment an agent flips to WaitingInput/Error while they're AFK from muxa watch.
  • Slack ({"text": "..."}) and Discord ({"content": "..."}) wire shapes auto-detected from the URL; flavor = "generic" posts the full Transition JSON for n8n/Zapier-style receivers.
  • Best-effort: failed POSTs log at WARN and drop. No on-disk queue, no retry backoff.

Config

[sinks.webhook]
enabled         = true
endpoint        = "https://hooks.slack.com/services/T0/B0/abc"   # OR
endpoint_env    = "MUXA_SLACK_URL"                               # preferred
flavor          = "slack"            # slack | discord | generic
on_states       = ["WaitingInput", "Error"]
rate_limit_secs = 60

endpoint_env wins when both are set — the webhook URL is the secret on Slack/Discord, so the env-var path keeps it out of shared dotfiles.

Filter + rate-limit

Two intentional pieces of policy live in this sink:

  1. on_states filter (default ["WaitingInput", "Error"]) — most transitions are routine Idle ↔ Working and would spam.
  2. Per-(kind, session_id, state) rate-limit (default 60s) — kept in an in-task HashMap. WaitingInputWorking flap-flap is common during permission-grant loops; without the limiter one flaky agent fires 30 push notifications a minute.

We don't queue or retry on failure: Slack/Discord both retry on their side, and a buffered alert delivered ten minutes late is worse than no alert at all (operators already moved on). One bounded HashMap per task, no shared state.

Test plan

  • cargo build --workspace
  • cargo test --workspace (260 + 7 + 177 + 8 + 2 unit/e2e tests pass)
  • cargo clippy --workspace --all-targets -- -D warnings
  • cargo fmt --all -- --check
  • New unit tests cover: format_message_for_waiting_input, format_message_for_error, should_forward_filters_idle_transitions, rate-limiter suppress/release/per-key behaviour, flavor_inference_from_url, payload shape per flavor (Slack text / Discord content / Generic full Transition), config validation (enabled requires endpoint or endpoint_env, env wins over inline, invalid URL rejected).
  • Manual: enable with a real Slack incoming-webhook URL, run muxa watch, trigger a WaitingInput → confirm push arrives once, then again on the next WaitingInput outside the rate-limit window.
  • Manual: same with a Discord webhook URL — verify the body uses content, not text.

🤖 Generated with Claude Code

…tions

Push a Slack or Discord message to the operator's phone the moment an
agent flips to `WaitingInput` or `Error` while they're AFK from `muxa
watch` — grant permission from the couch, or come back to the desk
knowing what needs you. Off by default; opt in via `[sinks.webhook]`.

Auto-detects the wire shape from the URL (`hooks.slack.com` →
`{"text": "..."}`, `discord.com/api/webhooks` → `{"content": "..."}`)
and falls through to a generic flavor that posts the full `Transition`
JSON for n8n/Zapier-style receivers. Explicit `flavor = "..."` overrides
the inference. The webhook URL is itself the secret on Slack and
Discord, so `endpoint_env = "..."` is preferred over inline TOML and
wins when both are set.

Two design choices worth calling out:

1. Filter by `to`-state up front via `on_states` (default
   `["WaitingInput", "Error"]`). Most transitions are routine
   `Idle ↔ Working` and would spam.
2. Per-`(kind, session_id, state)` rate-limit (default 60s) lives in an
   in-task `HashMap` keyed off the Transition. `WaitingInput` ↔
   `Working` flap-flap is common during permission loops; without the
   limiter one flaky agent can fire 30 push notifications a minute.

Best-effort delivery — failed POSTs log at WARN and drop. We
deliberately don't queue or retry: Slack/Discord both have their own
retry semantics on the receiving end, and we'd rather lose one alert
than buffer 1000 of them while Slack is down and then thunder-herd the
operator with a backlog when service comes back. The sink is for
"page me now", not "audit log".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jiunbae jiunbae merged commit 876508e into main May 5, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant