refactor(goal): drop barren-driven auto-complete#169
Merged
Conversation
…ned only by user/model Barren-continuation safety net used to silently transition Active → Complete once the LLM produced two unproductive continuation turns in a row. This overloaded the terminal Complete status with a system-driven "stalled" meaning, surprised both users and the model (the goal reads "complete" while the objective is clearly unfinished), and after the recent Complete → Active reopen edge it created an oscillating anomaly: every reopen would be re-stamped Complete on the next idle. Removes the auto-complete edge entirely. Complete is now reachable only via ModelCompleted (update_goal) or UserCompleted (/goal complete); the existing ModelReopened/UserReopened reopen edges from #167 are preserved. The continuation pump itself stays — pump shape is the value of thread goals. Barren-tracking infrastructure (counter, threshold, compute_next_barren_count, record_turn_for_barren_tracking, control-command reset hooks) is retained as an observation primitive for future safety designs that act without mutating goal status. Protocol: drop GoalTransitionReason::BarrenContinuation; can_transition_to loses its (Active, Complete, BarrenContinuation) match arm while keeping (Complete, Active, ModelReopened | UserReopened). Runtime: drop complete_goal_on_barren method and the threshold check in goal_continuation_check. Tests: thread_goal_test removes BarrenContinuation from NON_REOPEN_REASONS; goal_e2e_test replaces barren_continuations_auto_complete_goal with a negative barren_threshold_does_not_complete_goal proving three unproductive turns followed by ModelCompleted yields exactly UserCreated + ModelCompleted transitions. mock_provider PendingStream comment loses the now-defunct "barren-continuation demotion" reference.
7207745 to
85b91d6
Compare
6 tasks
yishuiliunian
added a commit
that referenced
this pull request
May 21, 2026
…equest_idle for long-running safety
Background: macmini-03-64 session 89e6dc04 ran 10h53m and produced 1395
copies of "16/100. Not complete. No action." because the runtime had
five overlapping architectural gaps:
1. max_barren_continuations was a dead field — counter accumulated but
never enforced.
2. LoopDetector only watched tool_use, never assistant text.
3. ESC was a turn-level interrupt; cron / goal_continuation / rewake
kept waking the agent.
4. ThreadGoalStatus had no "structurally infeasible" terminal — the
LLM could only lie via update_goal(complete) or keep emitting filler.
5. The LLM had no protocol to declare "I have nothing to do right now;
don't push me until an external signal arrives."
This change introduces three orthogonal mechanisms:
• ContinuationGate (session-scoped injection valve)
Controls whether goal_continuation re-prompts the LLM. Closed by
DegenerationDetector, by /suspend, or by request_idle. Reopens on
any envelope (user/cron/rewake) OR when a deadline elapses.
Type-level invariant via GateClose enum: only UserSuspend may carry
no wake_deadline; all other closures MUST provide one.
• DegenerationDetector (Governance trait extension)
Observes TurnHistory via the new on_after_turn hook. Detects
RepeatedText (same hash run) and BarrenStreak (no tool_use run),
emits DegenerationDetected event, closes the gate, injects a
governance feedback note for the LLM. NEVER mutates goal status
(PR #169 contract preserved).
• request_idle tool (RunnerDirect intercept)
Lets the LLM declare session-level idle without changing the goal.
Mandatory max_idle_duration_secs ∈ [60, 86400] gives the runtime a
deadline to fall back to if no external signal arrives — prevents
idle-deadlock in cron-less sessions.
Supporting changes:
• Protocol: ThreadGoalStatus +Infeasible terminal status; AgentStatus
+Suspended; ControlCommand +Suspend/+Unsuspend; new event variants
DegenerationDetected and ContinuationGateChanged with payloads in
event_summary.rs to keep event_payload.rs ≤ 200 lines.
• MessageSource +wakes_suspended_session() / +is_ephemeral_in_history()
predicates so policy lives on the protocol, not inlined in ingest.
• Message +ephemeral_in_history (write-time snapshot of source class)
so all envelopes — including goal_continuation — persist, enabling
forensic replay of what the LLM actually saw.
• Configurable thresholds via HarnessConfig (degeneration_*).
• TUI /suspend + /unsuspend commands; status renders Suspended; goal
renders Infeasible.
• Test infrastructure: harness wiring now actually wires the Governance
chain (silent long-standing bug); SpawnedHarness exposes
InterruptSignal so tests can drive ESC; llm_chunk_delay + new
wait_for_stream_event give event-driven test pacing (replaces
sleep-based timing).
Removed: goal_barren.rs (dead barren counter), GoalSettings (dead config
with default_token_budget + barren_continuation_limit — both refer to
subsystems already removed by PR #165 and PR #169).
Coverage: 16 e2e tests covering 12 core invariants. ESC interrupt test
runs 5/5 PASS with 0.0s deviation (event-driven sync). All bazel
test //... + clippy + rustfmt pass. New files ≤ 200 lines.
yishuiliunian
added a commit
that referenced
this pull request
May 21, 2026
…equest_idle for long-running safety (#180) Background: macmini-03-64 session 89e6dc04 ran 10h53m and produced 1395 copies of "16/100. Not complete. No action." because the runtime had five overlapping architectural gaps: 1. max_barren_continuations was a dead field — counter accumulated but never enforced. 2. LoopDetector only watched tool_use, never assistant text. 3. ESC was a turn-level interrupt; cron / goal_continuation / rewake kept waking the agent. 4. ThreadGoalStatus had no "structurally infeasible" terminal — the LLM could only lie via update_goal(complete) or keep emitting filler. 5. The LLM had no protocol to declare "I have nothing to do right now; don't push me until an external signal arrives." This change introduces three orthogonal mechanisms: • ContinuationGate (session-scoped injection valve) Controls whether goal_continuation re-prompts the LLM. Closed by DegenerationDetector, by /suspend, or by request_idle. Reopens on any envelope (user/cron/rewake) OR when a deadline elapses. Type-level invariant via GateClose enum: only UserSuspend may carry no wake_deadline; all other closures MUST provide one. • DegenerationDetector (Governance trait extension) Observes TurnHistory via the new on_after_turn hook. Detects RepeatedText (same hash run) and BarrenStreak (no tool_use run), emits DegenerationDetected event, closes the gate, injects a governance feedback note for the LLM. NEVER mutates goal status (PR #169 contract preserved). • request_idle tool (RunnerDirect intercept) Lets the LLM declare session-level idle without changing the goal. Mandatory max_idle_duration_secs ∈ [60, 86400] gives the runtime a deadline to fall back to if no external signal arrives — prevents idle-deadlock in cron-less sessions. Supporting changes: • Protocol: ThreadGoalStatus +Infeasible terminal status; AgentStatus +Suspended; ControlCommand +Suspend/+Unsuspend; new event variants DegenerationDetected and ContinuationGateChanged with payloads in event_summary.rs to keep event_payload.rs ≤ 200 lines. • MessageSource +wakes_suspended_session() / +is_ephemeral_in_history() predicates so policy lives on the protocol, not inlined in ingest. • Message +ephemeral_in_history (write-time snapshot of source class) so all envelopes — including goal_continuation — persist, enabling forensic replay of what the LLM actually saw. • Configurable thresholds via HarnessConfig (degeneration_*). • TUI /suspend + /unsuspend commands; status renders Suspended; goal renders Infeasible. • Test infrastructure: harness wiring now actually wires the Governance chain (silent long-standing bug); SpawnedHarness exposes InterruptSignal so tests can drive ESC; llm_chunk_delay + new wait_for_stream_event give event-driven test pacing (replaces sleep-based timing). Removed: goal_barren.rs (dead barren counter), GoalSettings (dead config with default_token_budget + barren_continuation_limit — both refer to subsystems already removed by PR #165 and PR #169). Coverage: 16 e2e tests covering 12 core invariants. ESC interrupt test runs 5/5 PASS with 0.0s deviation (event-driven sync). All bazel test //... + clippy + rustfmt pass. New files ≤ 200 lines.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ModelCompleted(update_goal) orUserCompleted(/goal complete);GoalTransitionReason::BarrenContinuationis gone.Why
The auto-complete edge overloaded the terminal
Completestatus with a system-driven "stalled" meaning, which surprised both the model and the user (goal readscompletewhile objective is clearly unfinished). After the recentComplete → Activereopen edge (#167) it produced an oscillating anomaly: every model reopen would be re-stampedCompleteon the next idle. Continuation pump itself stays — pump shape is the value of thread goals; the bad part was reusing a terminal status as a circuit breaker.Changes
Protocol (
crates/loopal-protocol/src/thread_goal.rs)GoalTransitionReason::BarrenContinuationenum variantcan_transition_toloses its(Active, Complete, BarrenContinuation)match armRuntime (
crates/loopal-runtime/src/agent_loop/)goal_continuation.rs: remove threshold check +complete_goal_on_barrencallgoal_barren.rs: removecomplete_goal_on_barrenmethod; keepcompute_next_barren_count+record_turn_for_barren_trackingTests
loopal-protocol/tests/suite/thread_goal_test.rs: purgeBarrenContinuationreferencesloopal-runtime/tests/agent_loop/goal_e2e_test.rs: replacebarren_continuations_auto_complete_goalwithbarren_threshold_does_not_complete_goal— three unproductive turns followed byModelCompletedyields exactlyUserCreated+ModelCompletedtransitionsloopal-test-support/src/mock_provider.rs: drop the now-defunct "barren-continuation demotion" mention in PendingStream docTest plan
bazel build //...✅bazel build //... --config=clippy✅ zero warningsbazel build //... --config=rustfmt✅bazel test //...✅ 81/81 passing (incl. new negative test)