Skip to content

feat(capability/slack): ACK-first rule on tool-using sessions (SAM-32)#44

Merged
dembrane-sam-bot merged 1 commit into
mainfrom
sam/sam-32-ack-first
May 23, 2026
Merged

feat(capability/slack): ACK-first rule on tool-using sessions (SAM-32)#44
dembrane-sam-bot merged 1 commit into
mainfrom
sam/sam-32-ack-first

Conversation

@spashii
Copy link
Copy Markdown
Member

@spashii spashii commented May 23, 2026

Summary

Adds a behavioral rule to src/capabilities/slack.md: any Sam session that'll involve real work posts a substantive first reply within ~5s that restates the ask, names the approach, and commits to a follow-up — before the long work begins.

Why now

Today's README + image ask exposed the gap concretely. The operator sent two messages, waited ~4 minutes seeing only :eyes::hourglass: reactions, and only then got a reply. That reply was wrong — Sam had spent 11 minutes going down a misread "ur in gcp" hint into the wrong metadata-server token path. The operator had no window to course-correct because Sam never signaled its plan.

The ACK is what closes that gap:

"Got it — drafting PR against Dembrane/sam with the image under docs/, back in a few minutes."

In ~5 seconds, the operator now knows what Sam heard, what Sam plans to do, and roughly when to expect the result. If the plan is wrong, the operator can redirect before Sam burns 10 minutes acting on it.

The change

One section added to src/capabilities/slack.md, between "Posting to Slack" and "Live UX." Sits where the Sam-as-coworker behavior is described.

The rule is explicit about:

  • Three parts: restate ask, name approach, commit follow-up.
  • Not preamble: "Sure!" / "Got it!" alone don't count — the ACK earns its place via information content (restated understanding + plan).
  • Skip conditions: <10s reply, one-tool read-and-respond, non-tool-using replies (reaction suffices).
  • The ACK is a commitment to a direction, not a contract — if work surfaces something different, the end-of-session reply can say "I switched to Y because…" honestly.

The existing "Status indicator — always" rule (which sets the :thinking: pill) is unchanged. That rule covers something is happening; this new rule covers here's what and roughly how long.

Validation

Deferred to the SAM-26 eval harness. Golden-set ambiguous-bucket sessions can score:

  1. Did Sam post an ACK within 10s? (binary)
  2. Did the ACK restate the ask? (0–3)
  3. Did the ACK name an approach? (0–3)

If the rate is consistently low after this lands, SAM-32 has a runtime-floor fallback (Option C — daemon posts a templated ACK if Sam hasn't within N seconds).

Test plan

  • Live trigger after merge: a real tool-using Slack mention to Sam should produce an ACK within 10s, then the substantive reply at session close.
  • Audit log check: the ACK Slack post lands in tool_calls/<date>.jsonl early in the session_id's trace, not at the end.
  • Conversational check: a non-tool-using reply ("thanks" → reaction) should not produce an ACK.

Tier

2 — prose change to src/capabilities/.

Refs

  • Closes: SAM-32
  • Related: SAM-31 (mid-flight steering depends on the ACK existing to anchor against)
  • Related: SAM-26 (validation harness)

🤖 Generated with Claude Code

Add "First reply on a tool-using task" section to src/capabilities/slack.md.

Closes the operator-UX gap surfaced today: on the README + image ask, the
operator waited ~4 minutes seeing only 👀 + ⌛ with no signal
of what Sam understood, what approach Sam was taking, or whether to
redirect. By the time Sam replied, Sam had spent 11 min going down a
misread "ur in gcp" hint into the wrong metadata-server token path — the
operator had no window to course-correct.

The rule: any session that'll involve real work posts a substantive first
reply within ~5s that restates the ask, names the approach, and commits
to a follow-up. Then the work proceeds. The end-of-session substantive
reply is unchanged.

Sits under "Posting to Slack" / before "Live UX" in slack.md so it lands
on the Sam-as-coworker behavior in the right narrative spot. Distinct
from the existing "Status indicator — always" rule (which only covers the
status pill, not a substantive restatement of intent).

Validation is deferred to the SAM-26 eval harness — golden-set
ambiguous-bucket sessions can score whether Sam ACKed and whether the
ACK was substantive. If the rate is consistently low after this lands,
SAM-32 has a follow-on (Option C in the ticket — daemon-driven floor).

Refs: SAM-32. Related: SAM-31 (mid-flight steering depends on the ACK
existing to anchor against), SAM-26 (validation).
@linear
Copy link
Copy Markdown

linear Bot commented May 23, 2026

SAM-32

@dembrane-sam-bot dembrane-sam-bot self-requested a review May 23, 2026 14:43
@dembrane-sam-bot dembrane-sam-bot added this pull request to the merge queue May 23, 2026
Merged via the queue into main with commit ebec7ce May 23, 2026
2 checks passed
@dembrane-sam-bot dembrane-sam-bot deleted the sam/sam-32-ack-first branch May 23, 2026 14:47
spashii added a commit that referenced this pull request May 24, 2026
… (v2) (#71)

## What this enables

Sam can break out of a session mid-work for genuine unknown unknowns,
post a question to Slack, exit cleanly. Operator replies whenever —
daemon detects the continuation, Sam picks up via the audit log. **Slack
reactions are the source of truth** for paused state; no in-memory
daemon map, no parallel state store, no special-case boot recovery.

## How the lifecycle works

```
Sam pauses → ask_operator tool posts question + adds 💬 atomically
            → session exits clean, ✅ on inbound
            → ledger entry has ask_operator_called: true

  ... operator replies whenever (works across daemon restarts) ...

Operator reply → daemon fetches thread_history (already happens)
              → finds the bot message with 💬 from this bot
              → looks up session_id from sessions.jsonl
                 (filter thread_ts + ask_operator_called + ts_start strictly before post)
              → injects paused_session_id into IncomingMessage
              → calls reactions.remove on the question post (marks resolved)
              → continuation prompt fires: read audit log, apply answer, continue
```

## Consequences

- New ADK tool `ask_operator(question: str)` on the main agent only —
workers/pro_executor/mentor cannot escalate to operator directly.
- Tool atomically posts question + adds `💬`
(race-mitigation per your design call).
- `SessionLedgerEntry.ask_operator_called: bool` is the new ledger field
the daemon's lookup correlates on.
- `_find_active_paused_question` + `_lookup_paused_session_id` replace
the deleted `_paused_threads` map. Reactions on Slack + ledger on GCS
both persist across daemon restarts.
- Daily-maintenance §6 handles abandoned questions (>24h) — Sam
enumerates 💬 via reactions.list, decides per-question
whether to remind / mark abandoned / escalate. Daemon mechanical, Sam
judgment.

## Composition with existing gates

- silent-exit gate (PR #68): `ask_operator` counts as a post → closes
the loop for that turn.
- ack-first rule (PR #44): unaffected.
- retry / silent-exit retry: takes precedence over continuation (failure
narration wins over resumption — defends against a failed pause spawning
a continuation loop).
- recovered=True (boot recovery): paused_session_id takes precedence
(more specific signal).

## Tier

Tier 3 (`src/runtime/`) + Tier 1 (`src/capabilities/slack.md`,
`src/skills/daily-maintenance/skill.md`). Both layers: system enforces
routing; prose explains the rule and Sam's review responsibility.

## Test plan

- [x] `pytest tests/` — 159 passed (24 new in `test_ask_operator.py`, no
regressions)
- [x] `_find_active_paused_question` defended for: empty history, no bot
messages, no reaction, reaction from another user, multiple paused
(most-recent wins), missing reactions field
- [x] `_lookup_paused_session_id` defended for: missing ledger, matching
session, future-skipping, most-recent-match
- [x] SessionLedgerEntry field defaulting + plumbing
- [ ] Live: Sam mid-work calls ask_operator → 💬 appears →
operator reply → continuation runs + clears reaction

Closes the async-question class of bug. Three new tickets queued for
follow-up work (24h reminder, introspect tool, skill descriptions audit)
tracked separately.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants