Skip to content

refactor(goal): remove token budget subsystem#165

Merged
yishuiliunian merged 1 commit into
mainfrom
refactor/goal-remove-token-budget
May 17, 2026
Merged

refactor(goal): remove token budget subsystem#165
yishuiliunian merged 1 commit into
mainfrom
refactor/goal-remove-token-budget

Conversation

@yishuiliunian
Copy link
Copy Markdown
Contributor

Summary

  • Removes the entire token budget subsystem from the goal feature: token_budget / tokens_used / time_used_ms fields, BudgetLimited state, extend_budget control, budget-limit warning injection, UI usage indicator, and /goal --budget=N CLI flag.
  • Barren-continuation now auto-transitions the goal Active → Complete (reason=BarrenContinuation) instead of demoting to BudgetLimited. Goal lifecycle reduces to objective + status.
  • Pre-existing dead config removed: GoalSettings struct (default_token_budget, barren_continuation_limit) was never read; the runner uses a hardcoded DEFAULT_MAX_BARREN_CONTINUATIONS = 2 constant.

Changes

  • protocol: ThreadGoal slimmed to 6 fields; ThreadGoalStatus::{BudgetLimited}, GoalTransitionReason::{BudgetExhausted, UserExtendedBudget, UsageUpdated}, and ControlCommand::GoalExtendBudget removed.
  • runtime: delete goal/session_writes.rs (add_usage / extend_budget) and agent_loop/goal_accounting.rs (token charging + budget warning). TurnContext loses 3 budget-tracking fields. goal_barren::complete_goal_on_barren replaces transition_goal_to_budget_limited.
  • tools: create_goal drops token_budget param; get_goal drops remaining_tokens; update_goal description trimmed; InvalidBudget error removed.
  • tui: status bar drops budget indicator; /goal --budget=N / extend N flags removed; usage simplified.
  • config: delete GoalSettings struct + Settings.goals field. Old settings.json / goal.json files with legacy fields deserialize cleanly (serde ignores unknowns).
  • tests: drop goal_session_accounting_test.rs; rewrite E2E barren_continuations_auto_complete_goal; trim budget assertions across continuation / lifecycle / session / TUI tests.

Net diff: +81 / −1209 across 48 files.

Test plan

  • CI: bazel build //...
  • CI: bazel build //... --config=clippy
  • CI: bazel build //... --config=rustfmt
  • CI: bazel test //... (81 test targets)

…jective + status only

Token budget added surface area without payoff: it required token accounting,
extend_budget control commands, BudgetLimited state, render_budget_limit_prompt
injection, UI usage display, and per-tool plumbing — all to enforce a soft cap
the user could always override. Removed end-to-end.

Protocol: ThreadGoal loses token_budget / tokens_used / time_used_ms;
ThreadGoalStatus loses BudgetLimited; GoalTransitionReason loses
BudgetExhausted / UserExtendedBudget / UsageUpdated; ControlCommand loses
GoalExtendBudget + GoalCreate.token_budget.

Runtime: delete goal/session_writes.rs (add_usage / extend_budget) and
agent_loop/goal_accounting.rs (token-delta flushing + budget_limit warning
injection); strip token_baseline / cumulative_charged_to_goal /
budget_limit_warning_pushed from TurnContext; drop charge_goal_for_turn from
turn_telemetry.

Barren behavior: barren continuations now transition Active → Complete
(reason=BarrenContinuation) instead of demoting to BudgetLimited. Goal is
auto-closed when continuations stop producing tool calls.

Tool surface: create_goal loses token_budget param; get_goal response loses
remaining_tokens; update_goal description no longer warns against
budget-driven completion; InvalidBudget error variant removed.

TUI: status bar drops [budget] indicator and used/total token display;
/goal command drops --budget=N flag and the extend N subcommand.

Config: GoalSettings.default_token_budget and barren_continuation_limit
were never read (runner used DEFAULT_MAX_BARREN_CONTINUATIONS=2); delete
the whole GoalSettings struct and Settings.goals field. Old settings.json
with a "goals" block deserializes cleanly (serde ignores unknown keys);
likewise old goal.json with token_budget/tokens_used fields loads fine.

Tests: drop goal_session_accounting_test.rs entirely; rewrite
goal_e2e_test::barren path to assert auto-Complete instead of
BudgetLimited; trim budget assertions from continuation / lifecycle /
session tests; simplify FakeGoalSession::with_active signature.
@yishuiliunian yishuiliunian merged commit 9cfbe1f into main May 17, 2026
4 checks passed
@yishuiliunian yishuiliunian deleted the refactor/goal-remove-token-budget branch May 17, 2026 08:02
yishuiliunian added a commit that referenced this pull request May 21, 2026
…equest_idle for long-running safety

Background: macmini-03-64 session 89e6dc04 ran 10h53m and produced 1395
copies of "16/100. Not complete. No action." because the runtime had
five overlapping architectural gaps:

  1. max_barren_continuations was a dead field — counter accumulated but
     never enforced.
  2. LoopDetector only watched tool_use, never assistant text.
  3. ESC was a turn-level interrupt; cron / goal_continuation / rewake
     kept waking the agent.
  4. ThreadGoalStatus had no "structurally infeasible" terminal — the
     LLM could only lie via update_goal(complete) or keep emitting filler.
  5. The LLM had no protocol to declare "I have nothing to do right now;
     don't push me until an external signal arrives."

This change introduces three orthogonal mechanisms:

  • ContinuationGate (session-scoped injection valve)
      Controls whether goal_continuation re-prompts the LLM. Closed by
      DegenerationDetector, by /suspend, or by request_idle. Reopens on
      any envelope (user/cron/rewake) OR when a deadline elapses.
      Type-level invariant via GateClose enum: only UserSuspend may carry
      no wake_deadline; all other closures MUST provide one.

  • DegenerationDetector (Governance trait extension)
      Observes TurnHistory via the new on_after_turn hook. Detects
      RepeatedText (same hash run) and BarrenStreak (no tool_use run),
      emits DegenerationDetected event, closes the gate, injects a
      governance feedback note for the LLM. NEVER mutates goal status
      (PR #169 contract preserved).

  • request_idle tool (RunnerDirect intercept)
      Lets the LLM declare session-level idle without changing the goal.
      Mandatory max_idle_duration_secs ∈ [60, 86400] gives the runtime a
      deadline to fall back to if no external signal arrives — prevents
      idle-deadlock in cron-less sessions.

Supporting changes:

  • Protocol: ThreadGoalStatus +Infeasible terminal status; AgentStatus
    +Suspended; ControlCommand +Suspend/+Unsuspend; new event variants
    DegenerationDetected and ContinuationGateChanged with payloads in
    event_summary.rs to keep event_payload.rs ≤ 200 lines.
  • MessageSource +wakes_suspended_session() / +is_ephemeral_in_history()
    predicates so policy lives on the protocol, not inlined in ingest.
  • Message +ephemeral_in_history (write-time snapshot of source class)
    so all envelopes — including goal_continuation — persist, enabling
    forensic replay of what the LLM actually saw.
  • Configurable thresholds via HarnessConfig (degeneration_*).
  • TUI /suspend + /unsuspend commands; status renders Suspended; goal
    renders Infeasible.
  • Test infrastructure: harness wiring now actually wires the Governance
    chain (silent long-standing bug); SpawnedHarness exposes
    InterruptSignal so tests can drive ESC; llm_chunk_delay + new
    wait_for_stream_event give event-driven test pacing (replaces
    sleep-based timing).

Removed: goal_barren.rs (dead barren counter), GoalSettings (dead config
with default_token_budget + barren_continuation_limit — both refer to
subsystems already removed by PR #165 and PR #169).

Coverage: 16 e2e tests covering 12 core invariants. ESC interrupt test
runs 5/5 PASS with 0.0s deviation (event-driven sync). All bazel
test //... + clippy + rustfmt pass. New files ≤ 200 lines.
yishuiliunian added a commit that referenced this pull request May 21, 2026
…equest_idle for long-running safety (#180)

Background: macmini-03-64 session 89e6dc04 ran 10h53m and produced 1395
copies of "16/100. Not complete. No action." because the runtime had
five overlapping architectural gaps:

  1. max_barren_continuations was a dead field — counter accumulated but
     never enforced.
  2. LoopDetector only watched tool_use, never assistant text.
  3. ESC was a turn-level interrupt; cron / goal_continuation / rewake
     kept waking the agent.
  4. ThreadGoalStatus had no "structurally infeasible" terminal — the
     LLM could only lie via update_goal(complete) or keep emitting filler.
  5. The LLM had no protocol to declare "I have nothing to do right now;
     don't push me until an external signal arrives."

This change introduces three orthogonal mechanisms:

  • ContinuationGate (session-scoped injection valve)
      Controls whether goal_continuation re-prompts the LLM. Closed by
      DegenerationDetector, by /suspend, or by request_idle. Reopens on
      any envelope (user/cron/rewake) OR when a deadline elapses.
      Type-level invariant via GateClose enum: only UserSuspend may carry
      no wake_deadline; all other closures MUST provide one.

  • DegenerationDetector (Governance trait extension)
      Observes TurnHistory via the new on_after_turn hook. Detects
      RepeatedText (same hash run) and BarrenStreak (no tool_use run),
      emits DegenerationDetected event, closes the gate, injects a
      governance feedback note for the LLM. NEVER mutates goal status
      (PR #169 contract preserved).

  • request_idle tool (RunnerDirect intercept)
      Lets the LLM declare session-level idle without changing the goal.
      Mandatory max_idle_duration_secs ∈ [60, 86400] gives the runtime a
      deadline to fall back to if no external signal arrives — prevents
      idle-deadlock in cron-less sessions.

Supporting changes:

  • Protocol: ThreadGoalStatus +Infeasible terminal status; AgentStatus
    +Suspended; ControlCommand +Suspend/+Unsuspend; new event variants
    DegenerationDetected and ContinuationGateChanged with payloads in
    event_summary.rs to keep event_payload.rs ≤ 200 lines.
  • MessageSource +wakes_suspended_session() / +is_ephemeral_in_history()
    predicates so policy lives on the protocol, not inlined in ingest.
  • Message +ephemeral_in_history (write-time snapshot of source class)
    so all envelopes — including goal_continuation — persist, enabling
    forensic replay of what the LLM actually saw.
  • Configurable thresholds via HarnessConfig (degeneration_*).
  • TUI /suspend + /unsuspend commands; status renders Suspended; goal
    renders Infeasible.
  • Test infrastructure: harness wiring now actually wires the Governance
    chain (silent long-standing bug); SpawnedHarness exposes
    InterruptSignal so tests can drive ESC; llm_chunk_delay + new
    wait_for_stream_event give event-driven test pacing (replaces
    sleep-based timing).

Removed: goal_barren.rs (dead barren counter), GoalSettings (dead config
with default_token_budget + barren_continuation_limit — both refer to
subsystems already removed by PR #165 and PR #169).

Coverage: 16 e2e tests covering 12 core invariants. ESC interrupt test
runs 5/5 PASS with 0.0s deviation (event-driven sync). All bazel
test //... + clippy + rustfmt pass. New files ≤ 200 lines.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant