Skip to content

achieve: fix #0184 -- separate stdout/stderr in pick_session_role#179

Merged
fazxes merged 2 commits intomainfrom
achieve/fix-0184-pick-role-stderr-separation
Apr 7, 2026
Merged

achieve: fix #0184 -- separate stdout/stderr in pick_session_role#179
fazxes merged 2 commits intomainfrom
achieve/fix-0184-pick-role-stderr-separation

Conversation

@fazxes
Copy link
Copy Markdown
Member

@fazxes fazxes commented Apr 7, 2026

Summary

  • Fixes pentest watch item overseer: queue triage -- close 8, fix paths, 4 pentest tasks #184: pick_session_role() in daemon.sh previously merged stdout and stderr from pick-role.py via 2>&1, then used tail -1 to extract SESSION_ROLE. Any unexpected stdout line (atexit handler, uncaught exception after sys.exit, future library print) would corrupt SESSION_ROLE to garbage, silently falling back to build for all subsequent cycles — losing OVERSEE, STRATEGIZE, and ACHIEVE scheduling until a human debugged it.
  • pick-role.py already separates concerns: role name → stdout, reasoning log → stderr. The fix routes stderr through mktemp so stdout is clean.
  • 2 new contract tests in TestPickSessionRoleStderrSeparation.
  • Also: autonomy score corrected from 81 → 85 (task-gen and healer rescored 3→5 with evidence).
  • False-positive documented: docs: comprehensive README update #185 is not a real bug — _require_int() in config.py rejects non-integer eval_frequency before reaching the shell arithmetic.

Changes

  • scripts/daemon.sh: pick_session_role() uses mktemp instead of 2>&1
  • tests/test_nightshift.py: 2 new contract tests
  • docs/autonomy/2026-04-07.md: Autonomy report with score, false-positive analysis
  • docs/changelog/v0.0.8.md: Entry for this fix
  • docs/learnings/2026-04-07-stdout-stderr-separation-role-selection.md: Learning
  • docs/handoffs/LATEST.md: Handoff for next session

Test plan

  • make check passes: 1134 tests, all checks
  • python3 -m nightshift run --dry-run --agent codex OK
  • python3 -m nightshift run --dry-run --agent claude OK
  • test_pick_session_role_does_not_merge_stderr_into_stdout passes
  • test_pick_session_role_captures_stderr_via_temp_file passes

fazxes added 2 commits April 6, 2026 21:51
Previously daemon.sh:83 merged pick-role.py stdout+stderr with 2>&1 and
used tail -1 to extract SESSION_ROLE.  Any unexpected stdout line (atexit
handler, uncaught exception after sys.exit, future library print) would
cause tail -1 to return the wrong line, silently corrupting SESSION_ROLE
to an unknown string that falls through to the build default -- locking
the daemon into BUILD-only mode indefinitely with no error.

pick-role.py already writes reasoning to stderr and the role name to
stdout.  The fix honors that separation: mktemp captures stderr; stdout
is captured cleanly; cat prints the reasoning log in controlled order.

Also includes autonomy score correction: 81 -> 85 (task-gen and healer
rescored 3->5 with evidence from session index and healer log).

2 new contract tests in TestPickSessionRoleStderrSeparation.
…llow-up)

Code review (PR #179) flagged two advisory items:
1. mktemp failure under set -uo pipefail would abort the daemon; add
   || _pick_stderr="/dev/null" fallback so stderr is silently dropped
   rather than killing the role-selection flow.
2. Contract tests are static-only; add task #192 to track behavioral
   test for the stdout isolation guarantee.
@fazxes fazxes merged commit df70ee4 into main Apr 7, 2026
@fazxes fazxes deleted the achieve/fix-0184-pick-role-stderr-separation branch April 7, 2026 01:54
fazxes added a commit that referenced this pull request Apr 7, 2026
…st tasks

Queue cleanup after session #103 major restructuring:

Closed (8):
- #73: AGENTS.md exists (commit 38e1fe5)
- #88: duplicate of #69 (auto-changelog)
- #141: obsolete (docs/prompt/evolve.md deleted)
- #157: obsolete (docs/prompt/feedback/ deleted)
- #159: consolidated into #190
- #161: consolidated into #190
- #181: obsolete (docs/prompt/unified.md deleted)
- #184: done (fixed by PR #179)

Path updates (30+ tasks):
- docs/ -> .recursive/ or Recursive/ops/
- scripts/daemon.sh -> Recursive/engine/daemon.sh
- scripts/lib-agent.sh -> Recursive/engine/lib-agent.sh
- .nightshift.json -> .recursive.json
- nightshift/*.py -> nightshift/{core,owl,raven,infra}/*.py

Pentest tasks created (4):
- #194: budget limiter triple-failure (CONFIRMED)
- #195: python3 -c path interpolation (CONFIRMED)
- #196: .recursive.json prompt guard (THEORETICAL)
- #197: costs.json negative value validation (THEORETICAL)
fazxes added a commit that referenced this pull request Apr 9, 2026
…one)

Queue before: 72 pending + 9 wontfix-in-active-dir
Queue after: 65 pending + 0 wontfix (all converted to done for archiving)

Merged into primary tasks (5 closures):
- #175 -> #174: both add tests to TestAuthFailureDetection, same PR
- #163 -> #162: both are scoring module tests from PR #158 review, same PR
- #124 -> #122: both validate doc snapshot consistency, same PR scope
- #196 -> #173: both add entries to PROMPT_GUARD_FILES in lib-agent.sh
- #180 -> #179: both touch _is_valid_eval_file() in pick-role.py, same PR

Closed as obsolete (1):
- #78: references non-existent "evolve.md Step 8" and the multi-agent
  review panel replaced by unified review in PR #107

Closed as low-value (1):
- #230: _DELEGATION_ROLE_MAP covers all 8 current agent types; new agent
  types require major framework work making the map update obvious

Converted wontfix -> done for archiving (9):
- #77, #80, #107, #111, #115, #119, #127, #129, #134
  All had wontfix status with rationale already documented; changed to
  done so daemon's archive_done_tasks() housekeeping removes them
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant