refactor(goal): remove token budget subsystem#165
Merged
Conversation
…jective + status only Token budget added surface area without payoff: it required token accounting, extend_budget control commands, BudgetLimited state, render_budget_limit_prompt injection, UI usage display, and per-tool plumbing — all to enforce a soft cap the user could always override. Removed end-to-end. Protocol: ThreadGoal loses token_budget / tokens_used / time_used_ms; ThreadGoalStatus loses BudgetLimited; GoalTransitionReason loses BudgetExhausted / UserExtendedBudget / UsageUpdated; ControlCommand loses GoalExtendBudget + GoalCreate.token_budget. Runtime: delete goal/session_writes.rs (add_usage / extend_budget) and agent_loop/goal_accounting.rs (token-delta flushing + budget_limit warning injection); strip token_baseline / cumulative_charged_to_goal / budget_limit_warning_pushed from TurnContext; drop charge_goal_for_turn from turn_telemetry. Barren behavior: barren continuations now transition Active → Complete (reason=BarrenContinuation) instead of demoting to BudgetLimited. Goal is auto-closed when continuations stop producing tool calls. Tool surface: create_goal loses token_budget param; get_goal response loses remaining_tokens; update_goal description no longer warns against budget-driven completion; InvalidBudget error variant removed. TUI: status bar drops [budget] indicator and used/total token display; /goal command drops --budget=N flag and the extend N subcommand. Config: GoalSettings.default_token_budget and barren_continuation_limit were never read (runner used DEFAULT_MAX_BARREN_CONTINUATIONS=2); delete the whole GoalSettings struct and Settings.goals field. Old settings.json with a "goals" block deserializes cleanly (serde ignores unknown keys); likewise old goal.json with token_budget/tokens_used fields loads fine. Tests: drop goal_session_accounting_test.rs entirely; rewrite goal_e2e_test::barren path to assert auto-Complete instead of BudgetLimited; trim budget assertions from continuation / lifecycle / session tests; simplify FakeGoalSession::with_active signature.
6 tasks
yishuiliunian
added a commit
that referenced
this pull request
May 21, 2026
…equest_idle for long-running safety
Background: macmini-03-64 session 89e6dc04 ran 10h53m and produced 1395
copies of "16/100. Not complete. No action." because the runtime had
five overlapping architectural gaps:
1. max_barren_continuations was a dead field — counter accumulated but
never enforced.
2. LoopDetector only watched tool_use, never assistant text.
3. ESC was a turn-level interrupt; cron / goal_continuation / rewake
kept waking the agent.
4. ThreadGoalStatus had no "structurally infeasible" terminal — the
LLM could only lie via update_goal(complete) or keep emitting filler.
5. The LLM had no protocol to declare "I have nothing to do right now;
don't push me until an external signal arrives."
This change introduces three orthogonal mechanisms:
• ContinuationGate (session-scoped injection valve)
Controls whether goal_continuation re-prompts the LLM. Closed by
DegenerationDetector, by /suspend, or by request_idle. Reopens on
any envelope (user/cron/rewake) OR when a deadline elapses.
Type-level invariant via GateClose enum: only UserSuspend may carry
no wake_deadline; all other closures MUST provide one.
• DegenerationDetector (Governance trait extension)
Observes TurnHistory via the new on_after_turn hook. Detects
RepeatedText (same hash run) and BarrenStreak (no tool_use run),
emits DegenerationDetected event, closes the gate, injects a
governance feedback note for the LLM. NEVER mutates goal status
(PR #169 contract preserved).
• request_idle tool (RunnerDirect intercept)
Lets the LLM declare session-level idle without changing the goal.
Mandatory max_idle_duration_secs ∈ [60, 86400] gives the runtime a
deadline to fall back to if no external signal arrives — prevents
idle-deadlock in cron-less sessions.
Supporting changes:
• Protocol: ThreadGoalStatus +Infeasible terminal status; AgentStatus
+Suspended; ControlCommand +Suspend/+Unsuspend; new event variants
DegenerationDetected and ContinuationGateChanged with payloads in
event_summary.rs to keep event_payload.rs ≤ 200 lines.
• MessageSource +wakes_suspended_session() / +is_ephemeral_in_history()
predicates so policy lives on the protocol, not inlined in ingest.
• Message +ephemeral_in_history (write-time snapshot of source class)
so all envelopes — including goal_continuation — persist, enabling
forensic replay of what the LLM actually saw.
• Configurable thresholds via HarnessConfig (degeneration_*).
• TUI /suspend + /unsuspend commands; status renders Suspended; goal
renders Infeasible.
• Test infrastructure: harness wiring now actually wires the Governance
chain (silent long-standing bug); SpawnedHarness exposes
InterruptSignal so tests can drive ESC; llm_chunk_delay + new
wait_for_stream_event give event-driven test pacing (replaces
sleep-based timing).
Removed: goal_barren.rs (dead barren counter), GoalSettings (dead config
with default_token_budget + barren_continuation_limit — both refer to
subsystems already removed by PR #165 and PR #169).
Coverage: 16 e2e tests covering 12 core invariants. ESC interrupt test
runs 5/5 PASS with 0.0s deviation (event-driven sync). All bazel
test //... + clippy + rustfmt pass. New files ≤ 200 lines.
yishuiliunian
added a commit
that referenced
this pull request
May 21, 2026
…equest_idle for long-running safety (#180) Background: macmini-03-64 session 89e6dc04 ran 10h53m and produced 1395 copies of "16/100. Not complete. No action." because the runtime had five overlapping architectural gaps: 1. max_barren_continuations was a dead field — counter accumulated but never enforced. 2. LoopDetector only watched tool_use, never assistant text. 3. ESC was a turn-level interrupt; cron / goal_continuation / rewake kept waking the agent. 4. ThreadGoalStatus had no "structurally infeasible" terminal — the LLM could only lie via update_goal(complete) or keep emitting filler. 5. The LLM had no protocol to declare "I have nothing to do right now; don't push me until an external signal arrives." This change introduces three orthogonal mechanisms: • ContinuationGate (session-scoped injection valve) Controls whether goal_continuation re-prompts the LLM. Closed by DegenerationDetector, by /suspend, or by request_idle. Reopens on any envelope (user/cron/rewake) OR when a deadline elapses. Type-level invariant via GateClose enum: only UserSuspend may carry no wake_deadline; all other closures MUST provide one. • DegenerationDetector (Governance trait extension) Observes TurnHistory via the new on_after_turn hook. Detects RepeatedText (same hash run) and BarrenStreak (no tool_use run), emits DegenerationDetected event, closes the gate, injects a governance feedback note for the LLM. NEVER mutates goal status (PR #169 contract preserved). • request_idle tool (RunnerDirect intercept) Lets the LLM declare session-level idle without changing the goal. Mandatory max_idle_duration_secs ∈ [60, 86400] gives the runtime a deadline to fall back to if no external signal arrives — prevents idle-deadlock in cron-less sessions. Supporting changes: • Protocol: ThreadGoalStatus +Infeasible terminal status; AgentStatus +Suspended; ControlCommand +Suspend/+Unsuspend; new event variants DegenerationDetected and ContinuationGateChanged with payloads in event_summary.rs to keep event_payload.rs ≤ 200 lines. • MessageSource +wakes_suspended_session() / +is_ephemeral_in_history() predicates so policy lives on the protocol, not inlined in ingest. • Message +ephemeral_in_history (write-time snapshot of source class) so all envelopes — including goal_continuation — persist, enabling forensic replay of what the LLM actually saw. • Configurable thresholds via HarnessConfig (degeneration_*). • TUI /suspend + /unsuspend commands; status renders Suspended; goal renders Infeasible. • Test infrastructure: harness wiring now actually wires the Governance chain (silent long-standing bug); SpawnedHarness exposes InterruptSignal so tests can drive ESC; llm_chunk_delay + new wait_for_stream_event give event-driven test pacing (replaces sleep-based timing). Removed: goal_barren.rs (dead barren counter), GoalSettings (dead config with default_token_budget + barren_continuation_limit — both refer to subsystems already removed by PR #165 and PR #169). Coverage: 16 e2e tests covering 12 core invariants. ESC interrupt test runs 5/5 PASS with 0.0s deviation (event-driven sync). All bazel test //... + clippy + rustfmt pass. New files ≤ 200 lines.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
goalfeature:token_budget/tokens_used/time_used_msfields,BudgetLimitedstate,extend_budgetcontrol, budget-limit warning injection, UI usage indicator, and/goal --budget=NCLI flag.Active → Complete (reason=BarrenContinuation)instead of demoting toBudgetLimited. Goal lifecycle reduces to objective + status.GoalSettingsstruct (default_token_budget,barren_continuation_limit) was never read; the runner uses a hardcodedDEFAULT_MAX_BARREN_CONTINUATIONS = 2constant.Changes
ThreadGoalslimmed to 6 fields;ThreadGoalStatus::{BudgetLimited},GoalTransitionReason::{BudgetExhausted, UserExtendedBudget, UsageUpdated}, andControlCommand::GoalExtendBudgetremoved.goal/session_writes.rs(add_usage / extend_budget) andagent_loop/goal_accounting.rs(token charging + budget warning).TurnContextloses 3 budget-tracking fields.goal_barren::complete_goal_on_barrenreplacestransition_goal_to_budget_limited.create_goaldropstoken_budgetparam;get_goaldropsremaining_tokens;update_goaldescription trimmed;InvalidBudgeterror removed./goal --budget=N/extend Nflags removed; usage simplified.GoalSettingsstruct +Settings.goalsfield. Oldsettings.json/goal.jsonfiles with legacy fields deserialize cleanly (serde ignores unknowns).goal_session_accounting_test.rs; rewrite E2Ebarren_continuations_auto_complete_goal; trim budget assertions across continuation / lifecycle / session / TUI tests.Net diff: +81 / −1209 across 48 files.
Test plan
bazel build //...bazel build //... --config=clippybazel build //... --config=rustfmtbazel test //...(81 test targets)