Skip to content

Add --budget-conscious CLI flag for session-aware experiment execution#346

Closed
lukeinglis wants to merge 4 commits into
mainfrom
factory/run-310e5ca3
Closed

Add --budget-conscious CLI flag for session-aware experiment execution#346
lukeinglis wants to merge 4 commits into
mainfrom
factory/run-310e5ca3

Conversation

@lukeinglis
Copy link
Copy Markdown
Collaborator

@lukeinglis lukeinglis commented May 25, 2026

Closes #349

Changes

  • Added --budget-conscious boolean flag to both factory ceo and factory run argparse definitions, following the same pattern as --no-github
  • When set, _build_ceo_task() appends a ## Budget-Conscious Mode markdown section to the CEO task string with three throttles:
    1. Skip Reviewer agent (step 2e) for diffs under 50 lines when CEO's own review verdict is CLEAN/PROCEED
    2. Cap REDIRECT iterations at 1 per agent (instead of default 2)
    3. Defer operational execution: produce code-only PRs with execution instructions in PR description
  • Added a ## Budget-Conscious Mode section to the CEO prompt (factory/agents/prompts/ceo.md) defining the protocol
  • Added 8 unit tests validating parser flags, task injection, and end-to-end flag propagation through cmd_ceo and cmd_run

Add pre-respawn verification checks to the Error Recovery section
that prevent duplicate Builder spawns when failures are caused by
rate limits or timeouts on the final notification token rather than
genuine build failures.

The protocol requires checking git log and gh pr list before any
re-spawn decision, and skips directly to review if work exists.

Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.54%. Comparing base (f72f5e6) to head (466dff5).
⚠️ Report is 126 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #346      +/-   ##
==========================================
+ Coverage   86.97%   87.54%   +0.57%     
==========================================
  Files          51       60       +9     
  Lines        7276     9148    +1872     
==========================================
+ Hits         6328     8009    +1681     
- Misses        948     1139     +191     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lukeinglis
Copy link
Copy Markdown
Collaborator Author

✅ Factory Review: KEEP

Verdict: KEEP
Reason: Prompt-only change adds sound Builder failure recovery protocol; no code changes, guard check clean, no eval regressions

Experiment: #1
Hypothesis: Better Builder failure recovery in CEO playbook: require git log + gh pr list checks before re-spawning a Builder after failure notification (issue #294 ask 3)

Score Comparison

Metric Value
Before N/A (prompt-only, no eval scores)
After N/A
Delta N/A
Threshold 0.8000

Guard Checks

Check Result
eval_immutable ✅ PASS
scope ✅ PASS
git_clean ✅ PASS

Code Review Notes

  • Well-structured recovery protocol with concrete git log + gh pr list checks
  • OR-based decision logic correctly handles partial completion scenarios
  • Placed correctly in error recovery section hierarchy
  • UNVERIFIED: Behavioral effectiveness depends on CEO agent following instructions at runtime

Posted by Factory Reviewer

@lukeinglis lukeinglis marked this pull request as ready for review May 25, 2026 02:39
…ecution

Add --budget-conscious flag to both `factory ceo` and `factory run` CLI
commands. When set, injects a Budget-Conscious Mode section into the CEO
task string with three throttles: skip Reviewer for small diffs, cap
REDIRECT iterations at 1, and defer operational execution to PRs.

Closes #349

Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>
@lukeinglis lukeinglis changed the title Add Builder failure recovery protocol to CEO prompt Add --budget-conscious CLI flag for session-aware experiment execution May 25, 2026
@lukeinglis
Copy link
Copy Markdown
Collaborator Author

✅ Factory Review: KEEP

Verdict: KEEP
Reason: Clean implementation: CLI flag plumbed correctly through parser/cmd_ceo/cmd_run/_build_ceo_task, CEO prompt updated with well-scoped throttle rules, 8 tests covering all paths, all 135 CLI tests pass, guard check clean, no scope violations

Experiment: #2
Hypothesis: --budget-conscious CLI flag: skip Reviewer for small clean diffs, cap redirects at 1, defer operational execution to user

Guard Checks

Check Result
eval_immutable ✅ PASS
scope ✅ PASS
git_clean ✅ PASS

Code Quality Assessment

  • Critical issues: 0
  • Important issues: 0
  • Minor issues: 0

Code Review Notes

  • Flag flows parser → cmd_ceo/cmd_run → _build_ceo_task → task string injection correctly
  • CEO prompt adds Budget-Conscious Mode section with 3 clear throttles, placed before Sacred Rules
  • Builder Failure Recovery Protocol is a separate additive improvement referencing issue Subscription-plan users (Max/Pro) hit session limits mid-experiment — needs visibility + budget controls #294
  • 8 new tests cover parser defaults, flag acceptance, task injection, and e2e propagation for both ceo and run subcommands
  • No existing tests modified or broken (135/135 pass)
  • Style follows existing --no-github pattern exactly

Test Results

135 passed in 61.54s

Posted by Factory Reviewer

Add `factory handoff <path>` that synthesizes .factory/ state into
a structured markdown brief printed to stdout. Reads cycle state,
checkpoint, events, strategy, backlog, reviews, git branch, and
open PRs. Missing files are handled gracefully. Includes 4 tests.

Closes #352

Signed-off-by: Luke Inglis <linglis@redhat.com>
Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>
Add structured logging to 5 uninstrumented modules:
- factory/runners/protocol.py: module-level logger
- factory/runners/__init__.py: runner selection/registration
- factory/runners/_stream.py: stream lifecycle events
- factory/agents/plugin.py: plugin config loading and generation
- factory/mcp_server.py: MCP tool call and server start logging

12 new log statements using structured key-value pairs covering
runner dispatch, stream start/end, plugin registration, and
MCP tool invocations.

Closes #354

Signed-off-by: Luke Inglis <linglis@redhat.com>
Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>
@lukeinglis
Copy link
Copy Markdown
Collaborator Author

Closing: decomposed into separate PRs

This PR bundled 4 independent experiments into a single changeset. Each piece now has its own home:

Change New PR Status
Builder failure recovery protocol (exp #1) Already on main Merged
--budget-conscious flag (exp #2) #399 (renamed to --lite with v2 design) Open
Handoff command (exp #3) #373 Open
Structlog instrumentation (exp #4) #374 Open

PR #399 redesigns the budget-conscious concept into a proper lite mode with 40-50% token reduction (5 agents instead of 12, review pipeline reduction, Archivist consolidation, archive-only Researcher, baseline eval skip, invocation budget). Full spec at .factory/strategy/idea.md.

@lukeinglis lukeinglis closed this May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add --budget-conscious CLI flag for session-aware experiment execution

1 participant