Skip to content

feat(daemon): enforce Slack reply before clean exit (silent-exit gate)#68

Merged
spashii merged 2 commits into
mainfrom
sam/enforce-slack-reply-before-exit
May 24, 2026
Merged

feat(daemon): enforce Slack reply before clean exit (silent-exit gate)#68
spashii merged 2 commits into
mainfrom
sam/enforce-slack-reply-before-exit

Conversation

@spashii
Copy link
Copy Markdown
Member

@spashii spashii commented May 24, 2026

What this enables

Slack-triggered sessions can't exit cleanly anymore without closing the loop — defined as: a chat.postMessage / chat.update must come AFTER the last substantive outward-facing tool call in the session. An ACK at the start followed by work + silence does NOT close the loop.

The corrected rule (timing-based, not 'any post')

Substantive outward-facing (operator expects a report):

  • gh pr create / edit / merge, other non-Slack bash
  • consult_opus
  • worker / parallel_workers
  • edit_file / write_file outside /data/journal/
  • fetch_url

Inward (operator doesn't need a report):

  • read_file, grep, glob_files
  • Journal writes (/data/journal/)
  • Slack housekeeping bash (setStatus, reactions.add, conversations.replies)

closed_loop is True iff a post comes AFTER the last outward call (or no outward work happened at all — a question Sam answered without tools).

If the gate fires, the daemon spawns a retry whose only job is to read the previous session's audit-log slice and post the summary. The retry agent is told explicitly not to trust the journal (since the journal-claiming-without-evidence pattern is the parent failure mode).

Consequences

  • Today's failure mode — Sam runs 30+ tool calls, opens a PR, exits clean, ✅ fires, operator sees no reply — is structurally blocked. Two sessions today (f31a21c5 after 'Can you update to add these tools?' and 624e27ec after 'Use opus to review this'); both opened PRs (feat(runtime): support opencode skill structure with symlinks #66, docs(identity): forbid unsourced quantitative or factual claims #67) but never posted. After this PR: caught and retried.
  • The ACK-first rule from SAM-32 doesn't false-pass the gate anymore. Posting an ACK doesn't satisfy the rule unless a final reply also comes after the substantive work.
  • Sessions with no outward work (read_file only or no tools at all) just need any single post — they're 'Sam answered a question' shapes.
  • Scheduled (daily-maintenance) and retry sessions are exempt — silence is allowed for routine work, retries have their own failure path.

How to verify

  • pytest tests/ — 119 passed (22 new in test_silent_exit.py + 97 existing, no regressions). The new tests include 4 explicit ACK-then-work-then-silent cases that would have false-passed the earlier rule.
  • Next Slack mention where Sam opens a PR but forgets to post: the logs will show session exited cleanly but never posted to Slack; spawning retry to narrate (session=...) and the retry will post the summary in-thread.

Tier

Tier 3 (src/runtime/session.py, src/runtime/daemon.py, src/runtime/prompts.py) + Tier 1 (src/capabilities/slack.md documents the gate in Sam's source).

Closes the silent-exit class of bug surfaced by the 2026-05-24 14:30 and 15:55 sessions.

Detect Slack-triggered sessions that exit cleanly but never call
chat.postMessage / chat.update. Treat as failure → spawn a retry
whose job is to read the previous session's audit-log slice and
post the summary.

Today (2026-05-24) two sessions opened PRs + filed Linear issues
but never posted to the thread:
- f31a21c5 (opencode-tools-integration): 31 tool calls, PR #66, 0 posts
- 624e27ec (opus-thread-budget-review): 59 tool calls, consult_opus
  + PR #67, 0 posts

Both journals claimed Sam posted. Neither did. The ✅
reaction fired anyway because it's keyed on exit_code=0, not on
'did the operator actually get a reply.' Operator's framing: 'it
gave green check' but no Slack reply.

Detection:
- _classify_tool_use now returns a 5th value, slack_posted, set
  when any bash call contains 'chat.postMessage' or 'chat.update'
  (curl or python3 -c both register; setStatus / reactions.add /
  conversations.replies / other housekeeping endpoints do NOT).
- SessionResult.slack_posted carries the flag through.
- SessionResult.silent_exit(message=...) returns True when:
  (a) session not failed, (b) not scheduled, (c) not retry,
  (d) slack_posted is False.

Enforcement:
- Daemon's _worker treats silent_exit the same as failed → spawns
  retry with retry_context={'silent_exit': True, 'previous_session_id'}.
- silent_exit() returns False for retries, preventing recursion.
- If the retry ITSELF silent-exits, operator-alert fires directly
  (the daemon checks slack_posted on retry_result, bypassing the
  retry-context guard).

Retry shape:
- New SILENT_EXIT_INTRO / SILENT_EXIT_OUTRO templates in prompts.py.
- _format_silent_exit_message() in IncomingMessage dispatches on
  retry_context.silent_exit, distinct from the failure-narration
  retry path.
- The retry agent is told: read /data/tool_calls/<today>.jsonl
  filtered by previous_session_id, reconstruct what got done,
  post ONE summary. Audit log is ground truth; do not infer from
  journal text (the journal is the failure mode this retry exists
  to surface).

Documentation:
- src/capabilities/slack.md gets a new subsection naming the gate
  and the structural enforcement. So Sam reads about the rule
  from its own source on every session.

Tests: 15 new in test_silent_exit.py + 97 existing all pass (112).
@spashii spashii enabled auto-merge May 24, 2026 14:59
Operator caught the hole before merge: the previous gate counted
'at least one chat.postMessage = closed.' That false-passes the
ACK-then-work-no-reply pattern — an ACK at index 0, then gh pr create
at index 5, then silent exit. Operator sees 'got it' + green check
but never hears about the PR.

New rule (timing-based):

  closed_loop iff (chat.postMessage / chat.update was issued AFTER
  the last substantive outward-facing tool call)

Substantive outward-facing tool calls:
- worker / parallel_workers (delegated work)
- fetch_url
- consult_opus
- non-Slack bash (gh, git, curl-non-slack, etc.)
- write_file / edit_file outside /data/journal/

Inward (operator doesn't need a report for these):
- read_file, grep, glob_files
- write_file / edit_file to /data/journal/
- Slack housekeeping bash (setStatus, reactions.add, conversations.replies)

Field renamed slack_posted → closed_loop everywhere (PR not merged yet,
no external consumers). Docstring + slack.md text rewritten to name
the timing rule explicitly.

Tests: 22 in test_silent_exit.py (up from 15), including 4 explicit
ACK-then-work cases (the headline failure mode). 119 total pass.
@spashii spashii disabled auto-merge May 24, 2026 15:58
@spashii spashii merged commit 6705738 into main May 24, 2026
2 checks passed
@spashii spashii deleted the sam/enforce-slack-reply-before-exit branch May 24, 2026 15:58
spashii added a commit that referenced this pull request May 24, 2026
## What this enables

Sam reads a new principle when proposing source fixes: identity /
capability / skill changes are prompts (not deterministic). For
recurring behavioral failures, look for the systemic lever (a daemon
check, a runtime gate) before reaching for prose.

## Consequences

- Sam stops shipping Tier-1 prose patches as the *whole* fix for
behaviors the LLM already 'knew' about and failed on anyway (today's
silent-exit class).
- When Sam DOES ship prose, the PR description names whether there's a
paired Tier-3 enforcement, or honestly flags 'no systemic lever
available; expect partial compliance.'
- The two-layer pattern (system enforces, prose explains why) becomes
the default mental model for behavioral fixes.

## Lives in self-maintenance.md

Inserted between *'Where does a change belong?'* (which picks the target
tier) and *'The flow'* (mechanics). Chose self-maintenance.md over
identity.md because the principle is about *how to choose where to fix
something* — exactly self-maintenance's domain. Sam reads this when
proposing changes, not on every session.

## The silent-exit example is baked in

The PR-#68 silent-exit gate is named explicitly as the canonical case:
prose-only would have been 'add a stronger rule to slack.md.' The real
fix is the daemon's audit-log check + retry. Future-Sam reading this
principle has a concrete case to anchor on.

## Tier

Tier 1 (capability prose). 21 lines added. Source-integrity tests pass.

## Verify

`pytest tests/runtime/test_source_integrity.py` — 25 passed (frontmatter
intact, no dangling refs).
spashii added a commit that referenced this pull request May 24, 2026
… (v2) (#71)

## What this enables

Sam can break out of a session mid-work for genuine unknown unknowns,
post a question to Slack, exit cleanly. Operator replies whenever —
daemon detects the continuation, Sam picks up via the audit log. **Slack
reactions are the source of truth** for paused state; no in-memory
daemon map, no parallel state store, no special-case boot recovery.

## How the lifecycle works

```
Sam pauses → ask_operator tool posts question + adds 💬 atomically
            → session exits clean, ✅ on inbound
            → ledger entry has ask_operator_called: true

  ... operator replies whenever (works across daemon restarts) ...

Operator reply → daemon fetches thread_history (already happens)
              → finds the bot message with 💬 from this bot
              → looks up session_id from sessions.jsonl
                 (filter thread_ts + ask_operator_called + ts_start strictly before post)
              → injects paused_session_id into IncomingMessage
              → calls reactions.remove on the question post (marks resolved)
              → continuation prompt fires: read audit log, apply answer, continue
```

## Consequences

- New ADK tool `ask_operator(question: str)` on the main agent only —
workers/pro_executor/mentor cannot escalate to operator directly.
- Tool atomically posts question + adds `💬`
(race-mitigation per your design call).
- `SessionLedgerEntry.ask_operator_called: bool` is the new ledger field
the daemon's lookup correlates on.
- `_find_active_paused_question` + `_lookup_paused_session_id` replace
the deleted `_paused_threads` map. Reactions on Slack + ledger on GCS
both persist across daemon restarts.
- Daily-maintenance §6 handles abandoned questions (>24h) — Sam
enumerates 💬 via reactions.list, decides per-question
whether to remind / mark abandoned / escalate. Daemon mechanical, Sam
judgment.

## Composition with existing gates

- silent-exit gate (PR #68): `ask_operator` counts as a post → closes
the loop for that turn.
- ack-first rule (PR #44): unaffected.
- retry / silent-exit retry: takes precedence over continuation (failure
narration wins over resumption — defends against a failed pause spawning
a continuation loop).
- recovered=True (boot recovery): paused_session_id takes precedence
(more specific signal).

## Tier

Tier 3 (`src/runtime/`) + Tier 1 (`src/capabilities/slack.md`,
`src/skills/daily-maintenance/skill.md`). Both layers: system enforces
routing; prose explains the rule and Sam's review responsibility.

## Test plan

- [x] `pytest tests/` — 159 passed (24 new in `test_ask_operator.py`, no
regressions)
- [x] `_find_active_paused_question` defended for: empty history, no bot
messages, no reaction, reaction from another user, multiple paused
(most-recent wins), missing reactions field
- [x] `_lookup_paused_session_id` defended for: missing ledger, matching
session, future-skipping, most-recent-match
- [x] SessionLedgerEntry field defaulting + plumbing
- [ ] Live: Sam mid-work calls ask_operator → 💬 appears →
operator reply → continuation runs + clears reaction

Closes the async-question class of bug. Three new tickets queued for
follow-up work (24h reminder, introspect tool, skill descriptions audit)
tracked separately.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant