feat: raise per-cycle limit to 8, remove daily limit#136
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #136 +/- ##
==========================================
+ Coverage 86.23% 86.26% +0.02%
==========================================
Files 45 45
Lines 6307 6304 -3
==========================================
- Hits 5439 5438 -1
+ Misses 868 866 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
colehurwitz
left a comment
There was a problem hiding this comment.
Review: feat: raise per-cycle limit to 8, remove daily limit
Summary
This PR consolidates work from #132 and #134 to simplify Bob Shell guardrails: per-cycle limit raised from 3→8, daily limit removed, and an early warning mechanism added when ≤2 invocations remain.
Observations
Good:
- Clean removal of daily ceiling complexity — the per-cycle limit is the more meaningful guardrail for preventing runaway loops
CeilingWarningdataclass and event emission (bob.ceiling_warning) provide useful observability before hitting the hard limit- Test coverage is comprehensive, including edge cases for warning at exactly 1 and 2 remaining invocations
_budget_allows_respawn()simplification is sound — per-cycle enforcement belongs in BobRunner, not in the respawn check
Questions/Concerns:
-
Unrelated change in
factory.md— MovingREADME.mdfrom read-only to writable scope and addingdocs/**doesn't relate to Bob Shell limits. Should this be a separate PR, or is there context I'm missing? -
Behavioral change in
count_cycle_invocations()— Previously, passingcycle_start=Nonefell back tocount_today_invocations()as an approximation. Now it returns 0. Any callers relying on the old fallback behavior would silently get 0 counts. Worth confirming no other code paths depend on this. -
Warning threshold hardcoded at 2 — The
≤2 remainingthreshold is reasonable but not configurable. If someone setsFACTORY_BOB_MAX_INVOCATIONS_PER_CYCLE=3, they'd get a warning on the very first invocation (3-0=3, next check 3-1=2 → warning). Minor, but could be noisy for low limits.
Verdict
Changes look correct and well-tested. The concerns above are minor — would appreciate clarification on the factory.md scope change.
colehurwitz
left a comment
There was a problem hiding this comment.
Thanks for the thorough review! Addressing all three concerns:
Concern 1 (factory.md scope change):
Reverted. The scope expansion for README.md and docs/** came from a separate documentation task (commit 8b92df8) and doesn't belong in this PR.
Concern 2 (count_cycle_invocations returning 0):
No callers rely on the old fallback. Traced all call sites:
BobRunneralways setsself.cycle_start = cycle_start or datetime.now(timezone.utc)in__init__(line 170)ceo_completion.pypasses explicitdatetime.now(timezone.utc)(line 337)- Only the old tests passed None, and they've been updated
The fallback to count_today_invocations() was dead code.
Concern 3 (hardcoded warning threshold):
Acknowledged as minor. With default limit of 8, warning fires at invocation 6 (75% usage). For edge cases with limit=3, the warning fires earlier but this is acceptable since the PR explicitly raises the default. No change made per reviewer guidance.
akashgit
left a comment
There was a problem hiding this comment.
Review — PR #136
Overview
Consolidates #132 and #134 into a cleaner approach: removes the daily/session limit entirely, keeps only per-cycle ceiling (raised from 3→8), adds near-ceiling warnings (CeilingWarning dataclass + bob.ceiling_warning event), and simplifies _budget_allows_respawn to always return True.
Issue: Per-cycle ceiling is silently ineffective
get_runner() in factory/runners/__init__.py creates BobRunner() with no arguments:
return _RUNNERS[resolved]()This means cycle_start defaults to datetime.now(timezone.utc) — the moment the runner is instantiated. But get_runner() is called inside invoke_agent() (line 129 of factory/agents/runner.py), which runs per agent invocation. So each agent call creates a fresh BobRunner with cycle_start=now(), resetting the cycle counter to zero every time.
With a per-cycle ceiling of 8 and each agent invocation starting its own "cycle" from zero, the ceiling never triggers — each agent only makes one bob invocation per invoke_agent call.
The test works because it manually passes cycle_start=datetime.now() - timedelta(seconds=5) and pre-logs a usage entry. But that's not how production uses it.
Fix options:
- Pass
cycle_startfrom the CEO's cycle state throughinvoke_agent→get_runner→BobRunner - Store cycle_start in
.factory/state/cycle.json(which already exists from the completion guard) and haveBobRunner.__init__read it - Use the completion guard's
CycleState.started_atas the authoritative cycle start
Without this fix, the only ceiling is effectively "no ceiling" since the daily/session limits were removed and the per-cycle limit doesn't accumulate across agent invocations.
Everything else looks good
- Removing
count_today_invocationsandget_daily_ceiling— clean deletion _budget_allows_respawnalways returning True with per-cycle-only limits — logically correct since respawns start new cyclescount_cycle_invocationsreturning 0 whencycle_start is None— safe default- CeilingWarning with lazy
emit_eventimport andexcept Exception— same pattern from #132, still good - Tests properly updated: budget tests simplified, ceiling tests pass
cycle_start, warning tests cover near/sufficient/exact-one cases - CLAUDE.md accurately reflects the change (only per-cycle listed, warning documented)
✅ Factory Review: KEEPExperiment: #11 Score Comparison
Guard Checks
CEO Override RationalePrecheck failed on
Addresses Review IssueFixes @akashgit's critical finding: the per-cycle ceiling was silently ineffective because each Verdict: KEEP — Ready for human review and merge. Posted by Factory CEO |
colehurwitz
left a comment
There was a problem hiding this comment.
Code Review: PR #136
Overview
This PR raises the Bob Shell per-cycle invocation limit from 3 to 8, adds an early warning mechanism when ≤2 invocations remain, and removes the daily limit entirely. Importantly, it also fixes a bug where the per-cycle ceiling wasn't accumulating correctly across invoke_agent calls.
Analysis
Bug Fix (Key Change)
The core fix addresses a subtle issue where each get_runner() call created a fresh BobRunner with cycle_start=now(), causing every invocation to be counted as the "first" in a cycle. The fix passes project_path to BobRunner, which now reads the cycle's started_at from .factory/state/cycle.json. This ensures all invocations within a cycle share the same start time and accumulate correctly.
Code Quality
- Clean refactoring of
_budget_allows_respawn()— it now always returnsTruesince per-cycle limits are enforced withinBobRunner.headless()rather than at respawn time - The
CeilingWarningdataclass and event emission add good observability - The
count_cycle_invocationsreturning 0 whencycle_start is Noneis correct — no cycle context means no per-cycle enforcement
Test Coverage
TestCeilingAccumulationAcrossInvocationsdirectly reproduces and verifies the fixTestCeilingWarningcovers the new warning functionality- Tests for removed daily ceiling are appropriately removed
- Mock signature updates for
get_runnerare correct (lambda *args, **kwargs: MockRunner())
Suggestions
-
Minor: In
bob.pyline 178, consider adding a debug log when reading cycle_start from cycle.json — useful for troubleshooting ceiling issues. -
Consider: The
count_cycle_invocationsreturning 0 whencycle_start is Nonemeans no ceiling enforcement without explicit cycle tracking. This is intentional but worth noting in the docstring (the current docstring is good).
Potential Risks
- Removing the daily limit removes a safety net for runaway loops. The per-cycle limit (8) still provides protection, but an infinite loop of cycles could theoretically run unbounded. This is per user preference, so acceptable.
- The raised limit (3→8) is configurable via env var, so users can adjust as needed.
Verdict
LGTM — The bug fix is correct and well-tested. The limit changes are per user preference with appropriate configurability. The warning mechanism adds useful observability.
|
Closing — stale factory experiment branch. Reopen or recreate if still needed. |
|
@akashgit my last review was ltgm if you are good with it |
- Update guardrails table to match PR #136 final state (per-cycle=8, remove session row) - Add Bob Shell docs link (https://bob.ibm.com/docs/shell) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add Runners section documenting Claude Code (default) and Bob Shell backends - Document runner selection via env var (FACTORY_RUNNER) and CLI flag (--runner) - Document Bob Shell setup including API key and auth persistence - Document guardrails (per-cycle limit of 8) and dry-run mode - Move README.md from read-only to modifiable scope in factory.md - Add docs/** to modifiable scope Addresses review feedback from #129: - Per-cycle default updated to 8 (matches #136) - Bob Shell docs link added (https://bob.ibm.com/docs/shell) - Session/daily limit row removed (per #136 final state) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
colehurwitz
left a comment
There was a problem hiding this comment.
Code Review: PR #136
feat: raise per-cycle limit to 8, remove daily limit
Overview
This PR consolidates several related changes to Bob Shell guardrails:
- Increases per-cycle invocation limit from 3 → 8
- Removes the daily limit entirely
- Adds early warning mechanism when ≤2 invocations remain
- Fixes a bug where
get_runner()created freshBobRunnerinstances withcycle_start=now(), preventing ceiling accumulation - Adds Runners documentation to README.md
Code Quality & Correctness
Strengths:
-
Bug fix is sound (
factory/runners/__init__.py:52-54,bob.py:186-208): Passingproject_paththroughget_runner()and readingcycle.jsonensures all invocations within a cycle share the samecycle_start. This correctly fixes the accumulation bug. -
Clean CeilingWarning design (
usage.py:92-99): The dataclass is simple and effective. Emitting events toevents.jsonlmaintains observability. -
Excellent test coverage: New tests in
TestCeilingWarningandTestCeilingAccumulationAcrossInvocationsthoroughly verify the new behavior, including the specific bug scenario. -
Simplified respawn logic (
ceo_completion.py:315-134): Removing the pre-check makes sense since per-cycle limits are now enforced withinBobRunner.headless().
Concerns:
-
Lazy import in
__init__(bob.py:204-207):from factory.ceo_completion import read_cycle_state state = read_cycle_state(project_path)
This works but is a code smell. Consider moving to module level with proper import ordering, or document why it's necessary for circular dependency avoidance.
-
Silent exception swallowing (
usage.py:116):except Exception: log.debug("Failed to emit ceiling warning event", exc_info=True)
This is intentional (don't fail the invocation just because event emission failed), but logging at
debuglevel may make issues hard to diagnose in production. Considerwarninglevel. -
Daily limit removal risk: Without a daily limit, a bug causing rapid cycles could burn through tokens quickly. The per-cycle limit of 8 is reasonable, but consider whether there should be any circuit breaker for runaway scenarios.
Style & Conventions
- Follows project conventions (snake_case, structured logging, Pydantic models)
- Documentation updates are well-written and comprehensive
- Test mock updates (
*args, **kwargs) are correct for the new signature
Performance
No concerns. The read_cycle_state() call in BobRunner.__init__ is a single file read, which is negligible.
Security
No security concerns. The changes don't affect authentication or expose sensitive data.
Suggestions
-
Minor: In
check_ceilings(), the return type change fromNonetoCeilingWarning | Nonecould surprise callers who ignore the return value. The current call sites don't use the return value, so this is safe, but consider documenting this more explicitly. -
Consider: Adding a configurable "soft daily cap" as an optional safety net that logs warnings but doesn't block, for users who want some visibility into daily usage without hard limits.
-
Test improvement (
test_runners.py:672-741): The integration testtest_ceiling_accumulates_across_invoke_agent_callsis thorough but quite long. Consider extracting the setup into a fixture.
Verdict
Approve with minor suggestions. The core fix is correct, well-tested, and the documentation is thorough. The concerns raised are minor polish items rather than blockers.
1. bob.py: Clarify lazy import comment to explain the specific circular dependency (ceo_completion imports runners) 2. usage.py: Elevate ceiling warning event failure log from debug to warning level for production diagnosability Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Review Feedback AddressedCommit 8b4e834 addresses the minor suggestions from the latest review:
The third concern (daily limit removal risk) was acknowledged as a design tradeoff per user preference with per-cycle limits still providing protection. Ready for merge. |
colehurwitz
left a comment
There was a problem hiding this comment.
Code Review: PR #136
Overview
This PR raises the Bob Shell per-cycle limit from 3→8, removes the daily limit entirely, and adds early warnings when ≤2 invocations remain. It also fixes a bug where BobRunner wasn't tracking cycle_start correctly across invocations, consolidates runner documentation into README.md, and updates factory.md scope.
Analysis
Correctness
- The core bug fix is solid:
get_runner()now passesproject_pathtoBobRunner, which readscycle_startfromcycle.json. This ensures invocations accumulate correctly within a cycle. - The lazy import in
factory/runners/bob.py:204-207is necessary to avoid circular imports — good approach. count_cycle_invocations()returning 0 whencycle_start is Noneis correct — without a cycle start, there's nothing to count.
Style & Conventions
- Uses
X | Yunion types ✓ - Follows project structlog pattern ✓
- Test classes are well-organized with descriptive docstrings ✓
Potential Issues
-
Return type change in
check_ceilings(): Changed from-> Noneto-> CeilingWarning | None. This is a breaking change for any external callers that might be checking the return value. However, the function previously had no return statement, so existing callers ignoring the return should still work. -
Silenced exception in
_emit_warning_event(): Appropriate — failing to emit a warning shouldn't abort the invocation. -
_budget_allows_respawn()simplification: Now always returnsTrue. The docstring correctly explains why: per-cycle limits are enforced withinBobRunnerduring execution, not at respawn decision time.
Test Coverage
- Excellent coverage of the new behavior
TestCeilingAccumulationAcrossInvocationsis a good regression test for the bug being fixed- Tests verify both the warning case (≤2 remaining) and no-warning case (>2 remaining)
Documentation
- README.md Runners section is clear and well-organized
- CLAUDE.md updated appropriately
factory.mdscope change makes sense (README.md → modifiable)
Suggestions
-
Minor: The comment in
count_cycle_invocations()could be slightly clearer — consider: "Without cycle_start, we can't determine which invocations are in-cycle, so return 0." -
Consider: The warning threshold of
≤2is hardcoded. If the per-cycle limit is lowered (e.g., to 3), ≤2 remaining means a warning fires on the first successful invocation. This is fine behavior, but worth documenting.
Verdict
Approve — This is a well-structured PR that fixes a real bug (cycle_start not persisting across invocations), simplifies the codebase by removing the daily limit, adds useful early warnings, and includes thorough tests. The documentation consolidation from PR #129 is a nice addition.
Remove README.md from writable scope and docs/** addition — these belong in a separate PR. Restores factory.md to match main. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ion changes Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Merges the best of PRs #132 and #134: - Raised per-cycle invocation limit from 3 to 8 (from #132) - Added early warning when ≤2 invocations remain (from #132) - Removed daily limit entirely (user preference - no session limit either) Per-cycle limit is enforced within BobRunner during execution. Cross-cycle respawn is always allowed (no daily/session cap). Closes #131, #133 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The per-cycle ceiling was silently ineffective because each get_runner() call created a fresh BobRunner with cycle_start=now(), resetting the invocation counter. The fix threads project_path through the call stack so BobRunner can read started_at from .factory/state/cycle.json. Changes: - BobRunner.__init__: Accept project_path parameter; when cycle_start is None and project_path is provided, lazy-import read_cycle_state and use state.started_at if available, falling back to now() when no cycle.json - get_runner(): Accept project_path and forward it to BobRunner - invoke_agent(): Pass project_path to get_runner() - Tests: Add TestCeilingAccumulationAcrossInvocations with three tests verifying ceiling accumulates across invocations, BobRunner reads from cycle.json, and graceful fallback when no cycle.json exists Closes #138 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add Runners section documenting Claude Code (default) and Bob Shell backends - Document runner selection via env var (FACTORY_RUNNER) and CLI flag (--runner) - Document Bob Shell setup including API key and auth persistence - Document guardrails (per-cycle limit of 8) and dry-run mode - Move README.md from read-only to modifiable scope in factory.md - Add docs/** to modifiable scope Addresses review feedback from #129: - Per-cycle default updated to 8 (matches #136) - Bob Shell docs link added (https://bob.ibm.com/docs/shell) - Session/daily limit row removed (per #136 final state) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1. bob.py: Clarify lazy import comment to explain the specific circular dependency (ceo_completion imports runners) 2. usage.py: Elevate ceiling warning event failure log from debug to warning level for production diagnosability Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove README.md from writable scope and docs/** addition — these belong in a separate PR. Restores factory.md to match main. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
d790906 to
c63f9ae
Compare
akashgit
left a comment
There was a problem hiding this comment.
All issues addressed: per-cycle ceiling accumulation fix is solid (threads project_path → BobRunner → cycle.json), factory.md scope leak reverted, minor items (lazy import comment, log.warning) done. Rebased on main, all tests pass. Ship it.
Summary
This merges the best of PRs #132 and #134 while respecting the user's preference for no daily limit.
Changes:
factory/runners/usage.py: Raised default, removed daily ceiling, added CeilingWarningfactory/ceo_completion.py: Simplified_budget_allows_respawn(always allow respawn)CLAUDE.md: Updated documentationREADME.md: Added Runners section with runner selection, Bob Shell setup, guardrails, and dry-run docsfactory.md: Updated scope (README.md and docs/** now modifiable)Closes #131, #133
Supersedes #132, #134
Consolidates #129
Test plan
pytest tests/test_runners.py -v— 43 tests passpytest tests/test_ceo_completion.py -v— 43 tests passpytest -v— all 1145 tests passruff checkandmypypass🤖 Generated with Claude Code