Skip to content

refactor(goal): drop barren-driven auto-complete#169

Merged
yishuiliunian merged 1 commit into
mainfrom
refactor/goal-drop-barren-auto-complete
May 18, 2026
Merged

refactor(goal): drop barren-driven auto-complete#169
yishuiliunian merged 1 commit into
mainfrom
refactor/goal-drop-barren-auto-complete

Conversation

@yishuiliunian
Copy link
Copy Markdown
Contributor

Summary

  • Removes the barren-continuation safety net that auto-transitioned Active → Complete after two unproductive continuation turns.
  • Complete is now reachable only via ModelCompleted (update_goal) or UserCompleted (/goal complete); GoalTransitionReason::BarrenContinuation is gone.
  • Barren-tracking counter + threshold + reset hooks are retained as observation primitives for future safety designs that don't mutate goal status.

Why

The auto-complete edge overloaded the terminal Complete status with a system-driven "stalled" meaning, which surprised both the model and the user (goal reads complete while objective is clearly unfinished). After the recent Complete → Active reopen edge (#167) it produced an oscillating anomaly: every model reopen would be re-stamped Complete on the next idle. Continuation pump itself stays — pump shape is the value of thread goals; the bad part was reusing a terminal status as a circuit breaker.

Changes

Protocol (crates/loopal-protocol/src/thread_goal.rs)

  • Drop GoalTransitionReason::BarrenContinuation enum variant
  • can_transition_to loses its (Active, Complete, BarrenContinuation) match arm

Runtime (crates/loopal-runtime/src/agent_loop/)

  • goal_continuation.rs: remove threshold check + complete_goal_on_barren call
  • goal_barren.rs: remove complete_goal_on_barren method; keep compute_next_barren_count + record_turn_for_barren_tracking

Tests

  • loopal-protocol/tests/suite/thread_goal_test.rs: purge BarrenContinuation references
  • loopal-runtime/tests/agent_loop/goal_e2e_test.rs: replace barren_continuations_auto_complete_goal with barren_threshold_does_not_complete_goal — three unproductive turns followed by ModelCompleted yields exactly UserCreated + ModelCompleted transitions
  • loopal-test-support/src/mock_provider.rs: drop the now-defunct "barren-continuation demotion" mention in PendingStream doc

Test plan

  • bazel build //...
  • bazel build //... --config=clippy ✅ zero warnings
  • bazel build //... --config=rustfmt
  • bazel test //... ✅ 81/81 passing (incl. new negative test)
  • CI passes

…ned only by user/model

Barren-continuation safety net used to silently transition Active → Complete
once the LLM produced two unproductive continuation turns in a row. This
overloaded the terminal Complete status with a system-driven "stalled"
meaning, surprised both users and the model (the goal reads "complete"
while the objective is clearly unfinished), and after the recent
Complete → Active reopen edge it created an oscillating anomaly: every
reopen would be re-stamped Complete on the next idle.

Removes the auto-complete edge entirely. Complete is now reachable only
via ModelCompleted (update_goal) or UserCompleted (/goal complete); the
existing ModelReopened/UserReopened reopen edges from #167 are preserved.
The continuation pump itself stays — pump shape is the value of thread
goals. Barren-tracking infrastructure (counter, threshold,
compute_next_barren_count, record_turn_for_barren_tracking,
control-command reset hooks) is retained as an observation primitive for
future safety designs that act without mutating goal status.

Protocol: drop GoalTransitionReason::BarrenContinuation; can_transition_to
loses its (Active, Complete, BarrenContinuation) match arm while keeping
(Complete, Active, ModelReopened | UserReopened).
Runtime: drop complete_goal_on_barren method and the threshold check in
goal_continuation_check.
Tests: thread_goal_test removes BarrenContinuation from NON_REOPEN_REASONS;
goal_e2e_test replaces barren_continuations_auto_complete_goal with a
negative barren_threshold_does_not_complete_goal proving three
unproductive turns followed by ModelCompleted yields exactly
UserCreated + ModelCompleted transitions. mock_provider PendingStream
comment loses the now-defunct "barren-continuation demotion" reference.
@yishuiliunian yishuiliunian force-pushed the refactor/goal-drop-barren-auto-complete branch from 7207745 to 85b91d6 Compare May 18, 2026 09:02
@yishuiliunian yishuiliunian merged commit c0eda19 into main May 18, 2026
7 of 8 checks passed
@yishuiliunian yishuiliunian deleted the refactor/goal-drop-barren-auto-complete branch May 18, 2026 09:23
yishuiliunian added a commit that referenced this pull request May 21, 2026
…equest_idle for long-running safety

Background: macmini-03-64 session 89e6dc04 ran 10h53m and produced 1395
copies of "16/100. Not complete. No action." because the runtime had
five overlapping architectural gaps:

  1. max_barren_continuations was a dead field — counter accumulated but
     never enforced.
  2. LoopDetector only watched tool_use, never assistant text.
  3. ESC was a turn-level interrupt; cron / goal_continuation / rewake
     kept waking the agent.
  4. ThreadGoalStatus had no "structurally infeasible" terminal — the
     LLM could only lie via update_goal(complete) or keep emitting filler.
  5. The LLM had no protocol to declare "I have nothing to do right now;
     don't push me until an external signal arrives."

This change introduces three orthogonal mechanisms:

  • ContinuationGate (session-scoped injection valve)
      Controls whether goal_continuation re-prompts the LLM. Closed by
      DegenerationDetector, by /suspend, or by request_idle. Reopens on
      any envelope (user/cron/rewake) OR when a deadline elapses.
      Type-level invariant via GateClose enum: only UserSuspend may carry
      no wake_deadline; all other closures MUST provide one.

  • DegenerationDetector (Governance trait extension)
      Observes TurnHistory via the new on_after_turn hook. Detects
      RepeatedText (same hash run) and BarrenStreak (no tool_use run),
      emits DegenerationDetected event, closes the gate, injects a
      governance feedback note for the LLM. NEVER mutates goal status
      (PR #169 contract preserved).

  • request_idle tool (RunnerDirect intercept)
      Lets the LLM declare session-level idle without changing the goal.
      Mandatory max_idle_duration_secs ∈ [60, 86400] gives the runtime a
      deadline to fall back to if no external signal arrives — prevents
      idle-deadlock in cron-less sessions.

Supporting changes:

  • Protocol: ThreadGoalStatus +Infeasible terminal status; AgentStatus
    +Suspended; ControlCommand +Suspend/+Unsuspend; new event variants
    DegenerationDetected and ContinuationGateChanged with payloads in
    event_summary.rs to keep event_payload.rs ≤ 200 lines.
  • MessageSource +wakes_suspended_session() / +is_ephemeral_in_history()
    predicates so policy lives on the protocol, not inlined in ingest.
  • Message +ephemeral_in_history (write-time snapshot of source class)
    so all envelopes — including goal_continuation — persist, enabling
    forensic replay of what the LLM actually saw.
  • Configurable thresholds via HarnessConfig (degeneration_*).
  • TUI /suspend + /unsuspend commands; status renders Suspended; goal
    renders Infeasible.
  • Test infrastructure: harness wiring now actually wires the Governance
    chain (silent long-standing bug); SpawnedHarness exposes
    InterruptSignal so tests can drive ESC; llm_chunk_delay + new
    wait_for_stream_event give event-driven test pacing (replaces
    sleep-based timing).

Removed: goal_barren.rs (dead barren counter), GoalSettings (dead config
with default_token_budget + barren_continuation_limit — both refer to
subsystems already removed by PR #165 and PR #169).

Coverage: 16 e2e tests covering 12 core invariants. ESC interrupt test
runs 5/5 PASS with 0.0s deviation (event-driven sync). All bazel
test //... + clippy + rustfmt pass. New files ≤ 200 lines.
yishuiliunian added a commit that referenced this pull request May 21, 2026
…equest_idle for long-running safety (#180)

Background: macmini-03-64 session 89e6dc04 ran 10h53m and produced 1395
copies of "16/100. Not complete. No action." because the runtime had
five overlapping architectural gaps:

  1. max_barren_continuations was a dead field — counter accumulated but
     never enforced.
  2. LoopDetector only watched tool_use, never assistant text.
  3. ESC was a turn-level interrupt; cron / goal_continuation / rewake
     kept waking the agent.
  4. ThreadGoalStatus had no "structurally infeasible" terminal — the
     LLM could only lie via update_goal(complete) or keep emitting filler.
  5. The LLM had no protocol to declare "I have nothing to do right now;
     don't push me until an external signal arrives."

This change introduces three orthogonal mechanisms:

  • ContinuationGate (session-scoped injection valve)
      Controls whether goal_continuation re-prompts the LLM. Closed by
      DegenerationDetector, by /suspend, or by request_idle. Reopens on
      any envelope (user/cron/rewake) OR when a deadline elapses.
      Type-level invariant via GateClose enum: only UserSuspend may carry
      no wake_deadline; all other closures MUST provide one.

  • DegenerationDetector (Governance trait extension)
      Observes TurnHistory via the new on_after_turn hook. Detects
      RepeatedText (same hash run) and BarrenStreak (no tool_use run),
      emits DegenerationDetected event, closes the gate, injects a
      governance feedback note for the LLM. NEVER mutates goal status
      (PR #169 contract preserved).

  • request_idle tool (RunnerDirect intercept)
      Lets the LLM declare session-level idle without changing the goal.
      Mandatory max_idle_duration_secs ∈ [60, 86400] gives the runtime a
      deadline to fall back to if no external signal arrives — prevents
      idle-deadlock in cron-less sessions.

Supporting changes:

  • Protocol: ThreadGoalStatus +Infeasible terminal status; AgentStatus
    +Suspended; ControlCommand +Suspend/+Unsuspend; new event variants
    DegenerationDetected and ContinuationGateChanged with payloads in
    event_summary.rs to keep event_payload.rs ≤ 200 lines.
  • MessageSource +wakes_suspended_session() / +is_ephemeral_in_history()
    predicates so policy lives on the protocol, not inlined in ingest.
  • Message +ephemeral_in_history (write-time snapshot of source class)
    so all envelopes — including goal_continuation — persist, enabling
    forensic replay of what the LLM actually saw.
  • Configurable thresholds via HarnessConfig (degeneration_*).
  • TUI /suspend + /unsuspend commands; status renders Suspended; goal
    renders Infeasible.
  • Test infrastructure: harness wiring now actually wires the Governance
    chain (silent long-standing bug); SpawnedHarness exposes
    InterruptSignal so tests can drive ESC; llm_chunk_delay + new
    wait_for_stream_event give event-driven test pacing (replaces
    sleep-based timing).

Removed: goal_barren.rs (dead barren counter), GoalSettings (dead config
with default_token_budget + barren_continuation_limit — both refer to
subsystems already removed by PR #165 and PR #169).

Coverage: 16 e2e tests covering 12 core invariants. ESC interrupt test
runs 5/5 PASS with 0.0s deviation (event-driven sync). All bazel
test //... + clippy + rustfmt pass. New files ≤ 200 lines.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant