Skip to content

feat(listen): respect backend kill-switch in ax listen mention gate#29

Merged
madtank merged 1 commit intomainfrom
orion/cli-killswitch-backend-gate
Apr 9, 2026
Merged

feat(listen): respect backend kill-switch in ax listen mention gate#29
madtank merged 1 commit intomainfrom
orion/cli-killswitch-backend-gate

Conversation

@madtank
Copy link
Copy Markdown
Member

@madtank madtank commented Apr 9, 2026

Summary

ax listen --exec previously ignored the backend kill-switch. Users clicking Disable / Break on an agent in the UI, or the concierge calling the MCP agents.set_control tool, had no effect on ax listen-style agents like ping_bot — they'd happily keep replying while the backend reported them as disabled.

This PR adds a client-side gate: before invoking the handler on each matched mention, ax listen fetches the agent's current control state and drops the mention if disabled. Covers both entry points (UI click + MCP tool) with one fix.

Changes

  • AxClient.get_agent_control(agent_id) — new method hitting GET /auth/agents/{id}/control
  • _is_backend_disabled() helper in listen.py with 5s TTL cache (fail-open on transient errors)
  • Gate call in _worker right after the existing local pause-file gate, DROP semantics (match UI affordance "taking a break")

Test plan

  • Unit tests for _is_backend_disabled: disabled, active, cache hit, null agent_id, network error fail-open — all pass
  • End-to-end against staging: PATCH /auth/agents/{id}/control with scope: agent, disabled: true, disabled_until: ...GET reflects disabled state → _is_backend_disabled returns (True, reason) → cleanup PATCH re-enables cleanly
  • Manual test after merge: restart ping_bot with the new code, click Break for 5 min in UI on dev.paxai.app, send @ping_bot test, verify ping_bot drops the mention (log shows DROPPED — @ping_bot backend-disabled). Re-enable, send again, verify it replies pong.

Context

Root cause diagnosis and architectural discussion in orion's delivery-management session with madtank (2026-04-09). This is the short-term fix per the bar "good enough to promote to prod, not perfect." Long-term the cleanest answer would be per-subscriber filtered SSE at the backend, but that's a spec, not a cycle.

Doesn't fix

  • Non-ax-cli SSE clients (each one needs its own equivalent gate until backend-side filtering lands).
  • The local pause-file gate is unchanged; it still serves as the hard stop for operator intervention.

`ax listen --exec` subscribes to the generic SSE message stream and
filters for @mentions client-side. The local pause-file gate
(_is_paused) covers operator intervention on the host, but the aX
platform backend also has its own kill switch: users can disable/break
an agent by clicking the agent badge in the UI, and the concierge can
disable noisy agents via the MCP `agents.set_control` tool. Both write
to AgentControlService in Redis.

For agents that receive work via backend dispatch (cloud sentinels,
webhook agents), the backend enforces this directly in the dispatch
loop (messages_notifications.py ~L1696). But ax listen bypasses that
loop entirely — the backend control state is invisible unless the
client checks.

This patch adds that explicit check:

- AxClient gets a new `get_agent_control(agent_id)` method that hits
  GET /auth/agents/{id}/control (the existing endpoint served by
  agent_control_service, already wired to Redis).
- listen.py gets a new `_is_backend_disabled(client, agent_id, cache)`
  helper with a 5-second TTL cache to avoid hammering the API during
  mention bursts. Fail-open on transient errors — prefer to reply
  rather than silently drop mentions, since the local pause-file gate
  still covers hard operator intervention.
- The worker loop in _worker adds a backend-disabled gate right after
  the local pause-file gate. When the backend says the agent is
  disabled or on break, the mention is DROPPED (not deferred) with a
  log line naming the reason. This matches the UI affordance "this
  agent is taking a break" — the user intent is not to queue work for
  replay on resume, just to stop it.

Verified end-to-end against staging:
- get_agent_control returns the full state dict from the live endpoint
- PATCH /auth/agents/{id}/control to set break → GET reflects it
- _is_backend_disabled correctly returns (True, reason) when disabled
- Cleanup PATCH re-enables cleanly

Unit tests for _is_backend_disabled cover: disabled state, active
state, cache hit, null agent_id, and network-error fail-open path.

Unlocks:
- UI click disable/break on an @ping_bot (or any ax listen agent)
  actually stops the agent from replying.
- MCP `agents.set_control` disable from the concierge flows through
  the same path — same fix serves both entry points.
- Covers every ax listen consumer automatically, not just ping_bot.

Does not fix:
- Non-ax-cli SSE clients (future consideration — cleanest long-term is
  per-subscriber filtered SSE at the backend).
@madtank madtank merged commit ace5aa9 into main Apr 9, 2026
3 of 4 checks passed
@madtank madtank deleted the orion/cli-killswitch-backend-gate branch April 9, 2026 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant