feat(capability/slack): ACK-first rule on tool-using sessions (SAM-32)#44
Merged
Conversation
Add "First reply on a tool-using task" section to src/capabilities/slack.md. Closes the operator-UX gap surfaced today: on the README + image ask, the operator waited ~4 minutes seeing only 👀 + ⌛ with no signal of what Sam understood, what approach Sam was taking, or whether to redirect. By the time Sam replied, Sam had spent 11 min going down a misread "ur in gcp" hint into the wrong metadata-server token path — the operator had no window to course-correct. The rule: any session that'll involve real work posts a substantive first reply within ~5s that restates the ask, names the approach, and commits to a follow-up. Then the work proceeds. The end-of-session substantive reply is unchanged. Sits under "Posting to Slack" / before "Live UX" in slack.md so it lands on the Sam-as-coworker behavior in the right narrative spot. Distinct from the existing "Status indicator — always" rule (which only covers the status pill, not a substantive restatement of intent). Validation is deferred to the SAM-26 eval harness — golden-set ambiguous-bucket sessions can score whether Sam ACKed and whether the ACK was substantive. If the rate is consistently low after this lands, SAM-32 has a follow-on (Option C in the ticket — daemon-driven floor). Refs: SAM-32. Related: SAM-31 (mid-flight steering depends on the ACK existing to anchor against), SAM-26 (validation).
dembrane-sam-bot
approved these changes
May 23, 2026
5 tasks
spashii
added a commit
that referenced
this pull request
May 24, 2026
… (v2) (#71) ## What this enables Sam can break out of a session mid-work for genuine unknown unknowns, post a question to Slack, exit cleanly. Operator replies whenever — daemon detects the continuation, Sam picks up via the audit log. **Slack reactions are the source of truth** for paused state; no in-memory daemon map, no parallel state store, no special-case boot recovery. ## How the lifecycle works ``` Sam pauses → ask_operator tool posts question + adds 💬 atomically → session exits clean, ✅ on inbound → ledger entry has ask_operator_called: true ... operator replies whenever (works across daemon restarts) ... Operator reply → daemon fetches thread_history (already happens) → finds the bot message with 💬 from this bot → looks up session_id from sessions.jsonl (filter thread_ts + ask_operator_called + ts_start strictly before post) → injects paused_session_id into IncomingMessage → calls reactions.remove on the question post (marks resolved) → continuation prompt fires: read audit log, apply answer, continue ``` ## Consequences - New ADK tool `ask_operator(question: str)` on the main agent only — workers/pro_executor/mentor cannot escalate to operator directly. - Tool atomically posts question + adds `💬` (race-mitigation per your design call). - `SessionLedgerEntry.ask_operator_called: bool` is the new ledger field the daemon's lookup correlates on. - `_find_active_paused_question` + `_lookup_paused_session_id` replace the deleted `_paused_threads` map. Reactions on Slack + ledger on GCS both persist across daemon restarts. - Daily-maintenance §6 handles abandoned questions (>24h) — Sam enumerates 💬 via reactions.list, decides per-question whether to remind / mark abandoned / escalate. Daemon mechanical, Sam judgment. ## Composition with existing gates - silent-exit gate (PR #68): `ask_operator` counts as a post → closes the loop for that turn. - ack-first rule (PR #44): unaffected. - retry / silent-exit retry: takes precedence over continuation (failure narration wins over resumption — defends against a failed pause spawning a continuation loop). - recovered=True (boot recovery): paused_session_id takes precedence (more specific signal). ## Tier Tier 3 (`src/runtime/`) + Tier 1 (`src/capabilities/slack.md`, `src/skills/daily-maintenance/skill.md`). Both layers: system enforces routing; prose explains the rule and Sam's review responsibility. ## Test plan - [x] `pytest tests/` — 159 passed (24 new in `test_ask_operator.py`, no regressions) - [x] `_find_active_paused_question` defended for: empty history, no bot messages, no reaction, reaction from another user, multiple paused (most-recent wins), missing reactions field - [x] `_lookup_paused_session_id` defended for: missing ledger, matching session, future-skipping, most-recent-match - [x] SessionLedgerEntry field defaulting + plumbing - [ ] Live: Sam mid-work calls ask_operator → 💬 appears → operator reply → continuation runs + clears reaction Closes the async-question class of bug. Three new tickets queued for follow-up work (24h reminder, introspect tool, skill descriptions audit) tracked separately.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a behavioral rule to
src/capabilities/slack.md: any Sam session that'll involve real work posts a substantive first reply within ~5s that restates the ask, names the approach, and commits to a follow-up — before the long work begins.Why now
Today's README + image ask exposed the gap concretely. The operator sent two messages, waited ~4 minutes seeing only
:eyes:→:hourglass:reactions, and only then got a reply. That reply was wrong — Sam had spent 11 minutes going down a misread "ur in gcp" hint into the wrong metadata-server token path. The operator had no window to course-correct because Sam never signaled its plan.The ACK is what closes that gap:
In ~5 seconds, the operator now knows what Sam heard, what Sam plans to do, and roughly when to expect the result. If the plan is wrong, the operator can redirect before Sam burns 10 minutes acting on it.
The change
One section added to
src/capabilities/slack.md, between "Posting to Slack" and "Live UX." Sits where the Sam-as-coworker behavior is described.The rule is explicit about:
The existing "Status indicator — always" rule (which sets the
:thinking:pill) is unchanged. That rule covers something is happening; this new rule covers here's what and roughly how long.Validation
Deferred to the SAM-26 eval harness. Golden-set ambiguous-bucket sessions can score:
If the rate is consistently low after this lands, SAM-32 has a runtime-floor fallback (Option C — daemon posts a templated ACK if Sam hasn't within N seconds).
Test plan
tool_calls/<date>.jsonlearly in the session_id's trace, not at the end.Tier
2 — prose change to
src/capabilities/.Refs
🤖 Generated with Claude Code