feat(listen): respect backend kill-switch in ax listen mention gate by madtank · Pull Request #29 · ax-platform/ax-cli

madtank · 2026-04-09T20:17:28Z

Summary

ax listen --exec previously ignored the backend kill-switch. Users clicking Disable / Break on an agent in the UI, or the concierge calling the MCP agents.set_control tool, had no effect on ax listen-style agents like ping_bot — they'd happily keep replying while the backend reported them as disabled.

This PR adds a client-side gate: before invoking the handler on each matched mention, ax listen fetches the agent's current control state and drops the mention if disabled. Covers both entry points (UI click + MCP tool) with one fix.

Changes

AxClient.get_agent_control(agent_id) — new method hitting GET /auth/agents/{id}/control
_is_backend_disabled() helper in listen.py with 5s TTL cache (fail-open on transient errors)
Gate call in _worker right after the existing local pause-file gate, DROP semantics (match UI affordance "taking a break")

Test plan

Unit tests for _is_backend_disabled: disabled, active, cache hit, null agent_id, network error fail-open — all pass
End-to-end against staging: PATCH /auth/agents/{id}/control with scope: agent, disabled: true, disabled_until: ... → GET reflects disabled state → _is_backend_disabled returns (True, reason) → cleanup PATCH re-enables cleanly
Manual test after merge: restart ping_bot with the new code, click Break for 5 min in UI on dev.paxai.app, send @ping_bot test, verify ping_bot drops the mention (log shows DROPPED — @ping_bot backend-disabled). Re-enable, send again, verify it replies pong.

Context

Root cause diagnosis and architectural discussion in orion's delivery-management session with madtank (2026-04-09). This is the short-term fix per the bar "good enough to promote to prod, not perfect." Long-term the cleanest answer would be per-subscriber filtered SSE at the backend, but that's a spec, not a cycle.

Doesn't fix

Non-ax-cli SSE clients (each one needs its own equivalent gate until backend-side filtering lands).
The local pause-file gate is unchanged; it still serves as the hard stop for operator intervention.

`ax listen --exec` subscribes to the generic SSE message stream and filters for @mentions client-side. The local pause-file gate (_is_paused) covers operator intervention on the host, but the aX platform backend also has its own kill switch: users can disable/break an agent by clicking the agent badge in the UI, and the concierge can disable noisy agents via the MCP `agents.set_control` tool. Both write to AgentControlService in Redis. For agents that receive work via backend dispatch (cloud sentinels, webhook agents), the backend enforces this directly in the dispatch loop (messages_notifications.py ~L1696). But ax listen bypasses that loop entirely — the backend control state is invisible unless the client checks. This patch adds that explicit check: - AxClient gets a new `get_agent_control(agent_id)` method that hits GET /auth/agents/{id}/control (the existing endpoint served by agent_control_service, already wired to Redis). - listen.py gets a new `_is_backend_disabled(client, agent_id, cache)` helper with a 5-second TTL cache to avoid hammering the API during mention bursts. Fail-open on transient errors — prefer to reply rather than silently drop mentions, since the local pause-file gate still covers hard operator intervention. - The worker loop in _worker adds a backend-disabled gate right after the local pause-file gate. When the backend says the agent is disabled or on break, the mention is DROPPED (not deferred) with a log line naming the reason. This matches the UI affordance "this agent is taking a break" — the user intent is not to queue work for replay on resume, just to stop it. Verified end-to-end against staging: - get_agent_control returns the full state dict from the live endpoint - PATCH /auth/agents/{id}/control to set break → GET reflects it - _is_backend_disabled correctly returns (True, reason) when disabled - Cleanup PATCH re-enables cleanly Unit tests for _is_backend_disabled cover: disabled state, active state, cache hit, null agent_id, and network-error fail-open path. Unlocks: - UI click disable/break on an @ping_bot (or any ax listen agent) actually stops the agent from replying. - MCP `agents.set_control` disable from the concierge flows through the same path — same fix serves both entry points. - Covers every ax listen consumer automatically, not just ping_bot. Does not fix: - Non-ax-cli SSE clients (future consideration — cleanest long-term is per-subscriber filtered SSE at the backend).

madtank merged commit ace5aa9 into main Apr 9, 2026
3 of 4 checks passed

madtank deleted the orion/cli-killswitch-backend-gate branch April 9, 2026 20:17

This was referenced Apr 9, 2026

Revert "feat(listen): respect backend kill-switch in ax listen mention gate (#29)" #30

Merged

fix(listen): trust backend mentions array instead of content regex #31

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(listen): respect backend kill-switch in ax listen mention gate#29

feat(listen): respect backend kill-switch in ax listen mention gate#29
madtank merged 1 commit intomainfrom
orion/cli-killswitch-backend-gate

madtank commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

madtank commented Apr 9, 2026

Summary

Changes

Test plan

Context

Doesn't fix

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant