Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions memory/zo-platform/DECISION_LOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -1173,3 +1173,23 @@ The `--no-headlines` flag is preserved (not removed) for backwards compatibility
- Category-allowlist-only (skip blocklist) — rejected; defence-in-depth wants generic-category AND blocklist-clear AND no-blocklist-refusal.

**Outcome:** modified `src/zo/{orchestrator,experiments,cli,draft}.py`; new `src/zo/promote.py`; docs (`the-team`, `memory-and-continuity`, `overview`, `introduction`, `installation`, `demo.html`, `COMMANDS.md`, `README.md`); `.gitignore`; new `tests/unit/test_promote.py` (15) + additions to test_orchestrator/test_experiments/test_training_metrics/test_cli. **+32 tests (780 → 812 + 7 skipped), green on Python 3.11 AND 3.12, ruff `src/` clean, validate-docs 0 failures.** PR-041 added. **Deferred (audit #13/#15/#16/#17):** semantic reindex at session-end, agent failure-reporting protocol, `end_session` DECISION_LOG/PRIORS integration, `zo retrospective` CLI. Branch `claude/self-evolution`, stacked on #95.

---

## Decision: 2026-06-01T11:30:00Z
**Type:** BUGFIX
**Title:** Add startup grace + confirmation debounce to tmux session-liveness detection
**Decision:** Rewrite `LifecycleWrapper._wait_tmux` to (1) ignore negative liveness readings for the first `_STARTUP_GRACE_POLLS=2` polls and (2) require `_DEAD_CONFIRM_POLLS=2` consecutive negatives (re-checked every `_DEAD_RECHECK_INTERVAL=2.0`s) before concluding the session ended.
**Rationale:** `zo continue` in tmux mode launched the lead session correctly, then logged "Lead session completed, agent window closed" ~15ms later and killed the window — the first liveness poll fired before Claude's TUI claimed the pane (and while a one-time workspace-trust dialog could be up), and a single negative reading was treated as a real `/exit`. The launch path already polls carefully for *readiness*; the completion path must be equally skeptical about "it died."
**Alternatives considered:** (1) Pre-trust `--add-dir` directories to suppress the trust dialog — fixes one trigger, not the class (any transient still tears down). (2) Time-based grace only — breaks under mocked time in tests; chose poll-count-based grace + debounce.
**Outcome:** 53 wrapper tests pass (old complete-on-first-negative test updated + 2 regressions added), ruff clean, validate-docs 0 failures. PR-042 in PRIORS.md. Branch `claude/tmux-liveness-grace`, PR opened.

---

## Decision: 2026-06-01T12:30:00Z
**Type:** BUGFIX
**Title:** Persist bypass-permissions disclaimer acceptance to suppress Claude's startup consent dialog (root cause of the dying-session report)
**Decision:** Add `ensure_bypass_disclaimer_accepted()` to `permissions_overlay.py` (sets `bypassPermissionsModeAccepted: true` in `~/.claude.json`, idempotent, key-preserving, refuses to overwrite corrupt JSON) and call it from `_launch_tmux` and `_launch_headless` whenever `bypass_permissions` is True.
**Rationale:** `--bypass-permissions` makes Claude 2.1.159 show a one-time interactive consent dialog (default "No, exit"). The tmux launcher mistakes the dialog for the ready TUI (`_wait_for_tui_ready`), pastes the lead prompt + Enter, selects the default, and Claude quits on startup — the session dies before any work. The lead Claude's session transcript (killed mid-Bash-call) and live instrumentation of the real launch path confirmed the mechanism; PR-042's grace/debounce had only delayed the symptom (15ms → 22s). Passing the flag IS the user's consent, so persisting it is faithful to intent.
**Alternatives considered:** (1) Screen-scrape the dialog and send "2"+Enter to accept — works but is fragile to menu-layout changes across Claude versions. (2) Rely on PR-042 grace/debounce alone — rejected: it never stops Claude from exiting, only delays detection. (3) Use `--dangerously-skip-permissions` CLI flag in tmux — historically exits immediately in interactive mode.
**Outcome:** End-to-end verified (flag removed → wrapper re-set it → Claude alive 32s, past the old 22s death). 53 wrapper + 16 overlay tests pass, ruff clean, validate-docs 0 failures. PR-043 added. Stacked on PR-042 in PR #97.
55 changes: 55 additions & 0 deletions memory/zo-platform/PRIORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -1298,3 +1298,58 @@ For PR #92 the corrective sequence was: identify the actual CI gates by reading
### Verified Solution

Seed (`_maybe_seed_priors`), load (`_prompt_memory` injects project priors), write (`_record_learning` on loop DEAD_END/PLATEAU via `EvolutionEngine.record_failure` + `append_prior`), promote (`src/zo/promote.py` fail-closed sanitizer + `zo learnings promote`). 4 `log_error(message=)` → `description=` fixes + a forced-failure-branch regression test. New `tests/unit/test_promote.py` (15 adversarial cases) + seed/load/write/bug-fix tests. +32 tests (780 → 812, both Python 3.11 & 3.12). **Cross-reference:** PR-009 (built ≠ wired), PR-005/PR-035/PR-040 (enforcement over aspiration), PR-024/PR-030 (confidentiality — the promotion sanitizer reuses the validate-docs client blocklist).

---

## PR-042: tmux Session-Liveness Detection Must Have a Startup Grace + Confirmation Debounce — A Single Instant Reading Must Not Tear Down a Healthy Session

**Source:** Session 036 (2026-06-01), prod project continue run
**Root cause category:** incomplete_rule (fragile liveness detection)

**Failure:** `zo continue --repo . --bypass-permissions` launched the lead session in tmux ("TUI ready after 5s", "Launched lead session in tmux pane=…") and then logged "Lead session completed, agent window closed" **~15ms later**, closing immediately with no visible Claude window. An earlier same-day run with the same binary had run for 4+ hours, so it was not a version/config issue. Root cause: `LifecycleWrapper._wait_tmux` polled liveness with **zero grace period and zero debounce** — its first poll fired milliseconds after launch (before Claude's TUI had fully claimed the pane, and while a one-time workspace-trust dialog for the newly `--add-dir`'d delivery repo could still be up), and a *single* negative reading (`not pane_exists or not claude_running`) immediately concluded the session ended and killed the window. Any transient — startup race, trust dialog, momentary `tmux display-message` hiccup, brief foreground flip — looked identical to a real `/exit`.

### Rules

1. **Liveness/teardown decisions driven by polling an external async process need a startup grace window — the first poll often fires before the thing being watched has finished starting.** The launch path already polled carefully for *readiness* (`_wait_for_tui_ready` requires stable content) but the *completion* path trusted an instantaneous first reading. Asymmetric care is the bug: be at least as skeptical about "it died" as about "it's ready."
- **Why:** `_wait_tmux`'s first iteration runs ~15ms after `_launch_tmux` returns; `pane_current_command` and pane existence are not reliably settled that early, and a trust dialog can transiently alter what tmux reports.
- **How to apply:** ignore negative readings for the first `_STARTUP_GRACE_POLLS` polls; never conclude "exited" inside the grace window.

2. **Never tear down on a single negative sample — require consecutive confirmations (debounce).** A momentary subprocess-query failure or a one-frame foreground flip is not an exit. Demand `_DEAD_CONFIRM_POLLS` consecutive negatives, and re-check on a short interval (`_DEAD_RECHECK_INTERVAL`) so a genuine exit is still detected promptly.
- **How to apply:** maintain a `consecutive_dead` counter that resets on any positive reading; only conclude when it reaches the threshold. Keep responsiveness by shortening the sleep while a suspected exit is being confirmed.

3. **Treat a tmux/subprocess query failure as "unknown," not "dead."** The debounce already absorbs this, but it is the explicit reasoning: `tmux` returning non-zero / timing out means "I couldn't tell," which must not be coerced into a teardown.

### Verified Solution

`_wait_tmux` rewritten with `_STARTUP_GRACE_POLLS=2` (ignore negatives for ~first 20s at the 10s default interval), `_DEAD_CONFIRM_POLLS=2` (consecutive negatives required), and `_DEAD_RECHECK_INTERVAL=2.0` (fast re-poll while confirming). The old test that asserted complete-on-first-negative was updated to expect confirmed-exit-on-4th-poll; added `test_tmux_wait_ignores_negative_during_startup_grace` (regression for the 15ms teardown) and `test_tmux_wait_debounces_transient_negative`. 53 wrapper tests pass, ruff clean. **Cross-reference:** PR-040/PR-041 family is about wiring/enforcement; this one is about *robustness of the monitoring loop itself* — the watcher must not be more fragile than the thing it watches.

---

## PR-043: --bypass-permissions Triggers Claude's Startup Consent Dialog Which the tmux Launcher Dismisses as "No, exit" — Persist the Acceptance Flag

**Source:** Session 036 (2026-06-01), prod project continue run (root cause of the PR-042 symptom)
**Root cause category:** novel_case (CLI dialog interaction) + incomplete_rule (PR-042 treated the symptom)

**Failure:** `zo continue --repo . --bypass-permissions` launched the lead session and it died seconds later with no usable Claude window — repeatedly. PR-042 (startup grace + debounce in `_wait_tmux`) only *delayed* the death from ~15ms to ~22s; it never addressed why Claude was leaving. The lead Claude's own session transcript (`~/.claude/projects/-home-sam-zero-operators/<id>.jsonl`) was the key evidence: Claude read the plan, ran a Bash tool call, then the transcript **stopped mid-work** — it was *killed*, not crashed. Instrumenting the real launch path (overlay applied, real prompt, both `--add-dir`s) showed `pane_current_command='bash'` at t~0s and a captured pane containing:
```
WARNING: Claude Code running in Bypass Permissions mode
❯ 1. No, exit
2. Yes, I accept
Enter to confirm · Esc to cancel
```
**Mechanism:** `apply_bypass_overlay` writes `permissions.defaultMode: "bypassPermissions"`. On startup in that mode Claude Code 2.1.159 shows a one-time interactive consent dialog whose default-highlighted option is **"1. No, exit"**. `_wait_for_tui_ready` mistakes the dialog for the ready TUI (it is ~1231 chars and stable — the exact "1231 chars" seen in the field logs), so the launcher pastes the 24KB lead prompt and presses Enter → confirms the default → **Claude quits on startup** → pane falls back to `bash` → `_wait_tmux` correctly sees "not claude" and tears down. Intermittent because Claude remembers acceptance once given (`bypassPermissionsModeAccepted` in `~/.claude.json`); the field machine had never accepted it. The misleading "Session summary: I don't see a prior agent session… fresh start" was a *separate* red herring — `_generate_session_summary` runs `claude -p --model haiku` over the buffered comms events, and with the session killed before emitting any, Haiku summarized an empty buffer.

### Rules

1. **When you drive an interactive CLI by pasting into its TTY, any startup dialog you didn't anticipate will silently eat your input — and the default action is often the destructive one ("exit").** A "ready" heuristic based on "screen has stabilized with N chars" cannot tell a consent dialog from an input prompt. Suppress known dialogs at the source rather than relying on the paste landing in the right place.
- **Why:** `_wait_for_tui_ready` keys off pane content length/stability; a modal dialog satisfies it. The pasted prompt + Enter then drives whatever modal is focused.
- **How to apply:** for every mode/flag ZO passes to Claude, enumerate the startup dialogs it can trigger (trust, bypass-consent) and pre-satisfy them via config (`hasTrustDialogAccepted`, `bypassPermissionsModeAccepted` in `~/.claude.json`) before launch. Passing `--bypass-permissions` IS the user's consent, so persisting it is faithful to intent, not a security bypass.

2. **A fix that makes a failure slower is not a fix — confirm the mechanism, don't just stop the bleeding.** PR-042's grace+debounce looked plausible and shipped green, but the session still died (just later). The mechanism was only found by reading Claude's *own* session transcript and instrumenting the real launch path (not a synthetic one) — synthetic repros missed it because they lacked the bypass overlay, so no dialog appeared.
- **How to apply:** when a fix doesn't resolve a user-reported failure, treat the new symptom timing as a clue and reproduce the *exact* production path (same flags, same overlay, same prompt). Read the subprocess's own logs/transcripts. PR-042 keeps its value as defense-in-depth, but the root cause was elsewhere.

3. **Read-modify-write of a shared user config must preserve all keys and refuse to clobber a corrupt file.** `ensure_bypass_disclaimer_accepted` loads `~/.claude.json`, sets only the one flag, rewrites preserving everything, is idempotent, and bails (no write) if the JSON is unreadable rather than destroying a 700-startup user profile.

### Verified Solution

`ensure_bypass_disclaimer_accepted(config_path=None)` in `permissions_overlay.py` sets `bypassPermissionsModeAccepted: true` in `~/.claude.json` (idempotent, preserves all keys, refuses to overwrite corrupt JSON). Called from `_launch_tmux` (logs a checkpoint when newly set) and `_launch_headless` whenever `bypass_permissions` is True. **End-to-end verified:** with the flag removed to simulate a fresh machine, a full wrapper launch re-set it and Claude stayed alive 16/16 polls (32s, past the old 22s death). +4 overlay tests (69 overlay+wrapper tests pass), ruff clean. Ships in PR #97 alongside PR-042. **Cross-reference:** PR-001 (Claude CLI interactive-mode constraints — same family: the TUI has launch-time interaction quirks the wrapper must handle), PR-042 (the grace/debounce defense-in-depth this supersedes as root cause).
Loading
Loading