Skip to content

feat(agent-loop): ContinuationGate + DegenerationDetector + request_idle#180

Merged
yishuiliunian merged 1 commit into
mainfrom
worktree-dreamy-painting-kite
May 21, 2026
Merged

feat(agent-loop): ContinuationGate + DegenerationDetector + request_idle#180
yishuiliunian merged 1 commit into
mainfrom
worktree-dreamy-painting-kite

Conversation

@yishuiliunian
Copy link
Copy Markdown
Contributor

Summary

  • Fix long-running agent degeneration loop (root case: macmini-03-64 / 10h53m / 1395 reps).
  • Three orthogonal mechanisms: ContinuationGate (session-level injection valve), DegenerationDetector (Governance trait extension, observes TurnHistory, never mutates goal), request_idle tool (LLM-declared session idle with mandatory deadline).
  • Protocol extensions: ThreadGoalStatus::Infeasible, AgentStatus::Suspended, ControlCommand::{Suspend,Unsuspend}, new event variants, MessageSource predicate methods.

Changes

New mechanisms (crates/loopal-runtime/src/agent_loop/):

  • continuation_gate.rs — Session-scoped valve; GateClose enum makes "UserSuspend has no deadline" a type-level invariant.
  • turn_history.rs — Circular buffer of TurnRecord (metrics + text_hash) for cross-turn governance decisions.
  • degeneration_detector.rs — Detects RepeatedText / BarrenStreak streaks; on_envelope_received resets silenced flag so /unsuspend + relapse can re-trigger.
  • degeneration_feedback.rs — User-facing feedback templates.
  • handle_request_idle.rs — RunnerDirect intercept; validates max_idle_duration_secs ∈ [60, 86400].

Protocol (crates/loopal-protocol/):

  • ThreadGoalStatus::Infeasible + ModelInfeasible/UserInfeasible reasons.
  • AgentStatus::Suspended; ControlCommand::Suspend/Unsuspend.
  • MessageSource::wakes_suspended_session() + is_ephemeral_in_history() — policy moved off ingest.rs inline matches.
  • Message::ephemeral_in_history — write-time snapshot, enables forensic replay.
  • New event payloads: DegenerationDetected(DegenerationSummary), ContinuationGateChanged(ContinuationGateSummary).

Tool layer:

  • New crate crates/tools/agent/idle/RequestIdleTool typed schema.
  • update_goal extended to accept status: infeasible.
  • GoalSession trait + adapter add mark_infeasible_by_model.

Governance:

  • Governance::on_after_turn(record, history) -> PostTurnAction — default no-op, binary-compatible.
  • PostTurnAction::Degeneration carries DegenerationSummary + feedback string.
  • build_governance registers DegenerationDetector alongside existing LoopDetector.

Runtime control plane:

  • select_input short-circuits on Suspended (cron/rewake blocked).
  • ingest_message opens gate, auto-unsuspends on Human envelope.
  • goal_continuation_check checks gate + auto-reopens at deadline.

TUI:

  • /suspend + /unsuspend commands.
  • Status renders Suspended ⏸; goal renders Infeasible (red).

Test infrastructure:

  • wiring.rs now actually wires the Governance chain (silent long-standing bug — all previous tests with governance-dependent assertions ran with empty governance vec).
  • SpawnedHarness exposes InterruptSignal so tests can drive ESC.
  • HarnessBuilder::llm_chunk_delay() — controllable stream pacing.
  • e2e_event_waiters.rs — shared async waiters with scope-guard comment.

Tests: 16 e2e + ~45 unit tests covering 12 core invariants. ESC interrupt test stable at 5/5 PASS with 0.0s deviation (event-driven sync via wait_for_stream_event, replaced sleep-based timing).

Config:

  • HarnessConfig.degeneration_{barren_threshold, duplicate_text_threshold, wake_after_secs} defaults 50 / 5 / 3600s.
  • Removed dead GoalSettings (held default_token_budget from removed token-budget subsystem and barren_continuation_limit from removed barren counter).

Deleted:

  • goal_barren.rs + goal_kickoff_barren_test.rs (barren counter never enforced; replaced by DegenerationDetector).

Test plan

  • CI: bazel test //... passes
  • CI: bazel build //... --config=clippy passes
  • CI: bazel build //... --config=rustfmt passes
  • All 16 new e2e tests pass
  • ESC interrupt test stable across 5+ runs (already verified locally with deviation 0.0s)
  • file_size_cap_test still passes (event_payload.rs / event_summary.rs / control.rs all ≤ 200)

Design references

…equest_idle for long-running safety

Background: macmini-03-64 session 89e6dc04 ran 10h53m and produced 1395
copies of "16/100. Not complete. No action." because the runtime had
five overlapping architectural gaps:

  1. max_barren_continuations was a dead field — counter accumulated but
     never enforced.
  2. LoopDetector only watched tool_use, never assistant text.
  3. ESC was a turn-level interrupt; cron / goal_continuation / rewake
     kept waking the agent.
  4. ThreadGoalStatus had no "structurally infeasible" terminal — the
     LLM could only lie via update_goal(complete) or keep emitting filler.
  5. The LLM had no protocol to declare "I have nothing to do right now;
     don't push me until an external signal arrives."

This change introduces three orthogonal mechanisms:

  • ContinuationGate (session-scoped injection valve)
      Controls whether goal_continuation re-prompts the LLM. Closed by
      DegenerationDetector, by /suspend, or by request_idle. Reopens on
      any envelope (user/cron/rewake) OR when a deadline elapses.
      Type-level invariant via GateClose enum: only UserSuspend may carry
      no wake_deadline; all other closures MUST provide one.

  • DegenerationDetector (Governance trait extension)
      Observes TurnHistory via the new on_after_turn hook. Detects
      RepeatedText (same hash run) and BarrenStreak (no tool_use run),
      emits DegenerationDetected event, closes the gate, injects a
      governance feedback note for the LLM. NEVER mutates goal status
      (PR #169 contract preserved).

  • request_idle tool (RunnerDirect intercept)
      Lets the LLM declare session-level idle without changing the goal.
      Mandatory max_idle_duration_secs ∈ [60, 86400] gives the runtime a
      deadline to fall back to if no external signal arrives — prevents
      idle-deadlock in cron-less sessions.

Supporting changes:

  • Protocol: ThreadGoalStatus +Infeasible terminal status; AgentStatus
    +Suspended; ControlCommand +Suspend/+Unsuspend; new event variants
    DegenerationDetected and ContinuationGateChanged with payloads in
    event_summary.rs to keep event_payload.rs ≤ 200 lines.
  • MessageSource +wakes_suspended_session() / +is_ephemeral_in_history()
    predicates so policy lives on the protocol, not inlined in ingest.
  • Message +ephemeral_in_history (write-time snapshot of source class)
    so all envelopes — including goal_continuation — persist, enabling
    forensic replay of what the LLM actually saw.
  • Configurable thresholds via HarnessConfig (degeneration_*).
  • TUI /suspend + /unsuspend commands; status renders Suspended; goal
    renders Infeasible.
  • Test infrastructure: harness wiring now actually wires the Governance
    chain (silent long-standing bug); SpawnedHarness exposes
    InterruptSignal so tests can drive ESC; llm_chunk_delay + new
    wait_for_stream_event give event-driven test pacing (replaces
    sleep-based timing).

Removed: goal_barren.rs (dead barren counter), GoalSettings (dead config
with default_token_budget + barren_continuation_limit — both refer to
subsystems already removed by PR #165 and PR #169).

Coverage: 16 e2e tests covering 12 core invariants. ESC interrupt test
runs 5/5 PASS with 0.0s deviation (event-driven sync). All bazel
test //... + clippy + rustfmt pass. New files ≤ 200 lines.
@yishuiliunian yishuiliunian force-pushed the worktree-dreamy-painting-kite branch from ed7fc51 to d7974ac Compare May 21, 2026 09:25
@yishuiliunian yishuiliunian merged commit dc5f3c3 into main May 21, 2026
4 checks passed
@yishuiliunian yishuiliunian deleted the worktree-dreamy-painting-kite branch May 21, 2026 09:45
yishuiliunian added a commit that referenced this pull request May 26, 2026
Closes the e2e coverage gap identified in the post-deletion audit:
`build_kernel_from_config` had no direct tests for its depth-/production-
based branch selection of MCP backend after its deletion in #179
(Vault+MCP-to-Hub migration).

Restored 7 tests covering still-applicable invariants:

- depth_zero_uses_local_backend_with_manager — root agent always owns a
  LocalMcpProvider (mcp_manager().is_some())
- depth_gt0_with_hub_client_uses_proxy_backend — sub-agent + hub_client →
  McpProxyClient, no local manager
- depth_gt0_without_hub_client_falls_back_to_local — defensive fallback
- non_production_skips_mcp_entirely — test mode keeps Local backend but
  skips MCP server spawn
- build_kernel_with_slow_mcp_server_returns_within_bounded_wait — core
  startup-resilience promise: 1s budget + overhead ≤3s wall-clock
- build_kernel_mixed_servers_completes_within_bounded_wait — even with
  multiple problem servers, build respects bounded wait
- sub_agent_build_with_slow_root_config_does_not_spawn_local_mcp — anti-
  process-explosion: depth>0 + hub_client skips local MCP spawn entirely
  (chrome-devtools-mcp duplication scenario from the original incident)

Two tests from the original were genuinely obsolete and NOT restored:
- build_kernel_with_failing_mcp_server_does_not_propagate_error
- build_kernel_skips_disabled_servers_entirely

Both asserted on `mcp_provider().snapshot()` showing per-server status.
Post-#179, MCP server spawning happens in `loopal-agent-hub::mcp_service`
(Hub side), not in the agent process. The agent's
`build_kernel_from_config` only selects the provider type — actual
spawn/failure-reporting is now the Hub's responsibility and is covered by
`loopal-agent-hub` tests.

Audit also confirmed that `goal_kickoff_barren_test` deleted in #180 is
legitimate — `barren_continuation_count` was replaced by ContinuationGate
+ DegenerationDetector, and is now covered by `degeneration_e2e_test.rs`
and `idle_e2e_test.rs`. No restoration needed there.

Adjusted McpServerConfig::Stdio variant to include the new fields
(`sharing`, `cwd_isolation`) introduced after 5e41a0f.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant