feat(agent-loop): ContinuationGate + DegenerationDetector + request_idle by yishuiliunian · Pull Request #180 · AgentsMesh/Loopal

yishuiliunian · 2026-05-21T09:14:10Z

Summary

Fix long-running agent degeneration loop (root case: macmini-03-64 / 10h53m / 1395 reps).
Three orthogonal mechanisms: ContinuationGate (session-level injection valve), DegenerationDetector (Governance trait extension, observes TurnHistory, never mutates goal), request_idle tool (LLM-declared session idle with mandatory deadline).
Protocol extensions: ThreadGoalStatus::Infeasible, AgentStatus::Suspended, ControlCommand::{Suspend,Unsuspend}, new event variants, MessageSource predicate methods.

Changes

New mechanisms (crates/loopal-runtime/src/agent_loop/):

continuation_gate.rs — Session-scoped valve; GateClose enum makes "UserSuspend has no deadline" a type-level invariant.
turn_history.rs — Circular buffer of TurnRecord (metrics + text_hash) for cross-turn governance decisions.
degeneration_detector.rs — Detects RepeatedText / BarrenStreak streaks; on_envelope_received resets silenced flag so /unsuspend + relapse can re-trigger.
degeneration_feedback.rs — User-facing feedback templates.
handle_request_idle.rs — RunnerDirect intercept; validates max_idle_duration_secs ∈ [60, 86400].

Protocol (crates/loopal-protocol/):

ThreadGoalStatus::Infeasible + ModelInfeasible/UserInfeasible reasons.
AgentStatus::Suspended; ControlCommand::Suspend/Unsuspend.
MessageSource::wakes_suspended_session() + is_ephemeral_in_history() — policy moved off ingest.rs inline matches.
Message::ephemeral_in_history — write-time snapshot, enables forensic replay.
New event payloads: DegenerationDetected(DegenerationSummary), ContinuationGateChanged(ContinuationGateSummary).

Tool layer:

New crate crates/tools/agent/idle/ — RequestIdleTool typed schema.
update_goal extended to accept status: infeasible.
GoalSession trait + adapter add mark_infeasible_by_model.

Governance:

Governance::on_after_turn(record, history) -> PostTurnAction — default no-op, binary-compatible.
PostTurnAction::Degeneration carries DegenerationSummary + feedback string.
build_governance registers DegenerationDetector alongside existing LoopDetector.

Runtime control plane:

select_input short-circuits on Suspended (cron/rewake blocked).
ingest_message opens gate, auto-unsuspends on Human envelope.
goal_continuation_check checks gate + auto-reopens at deadline.

TUI:

/suspend + /unsuspend commands.
Status renders Suspended ⏸; goal renders Infeasible (red).

Test infrastructure:

wiring.rs now actually wires the Governance chain (silent long-standing bug — all previous tests with governance-dependent assertions ran with empty governance vec).
SpawnedHarness exposes InterruptSignal so tests can drive ESC.
HarnessBuilder::llm_chunk_delay() — controllable stream pacing.
e2e_event_waiters.rs — shared async waiters with scope-guard comment.

Tests: 16 e2e + ~45 unit tests covering 12 core invariants. ESC interrupt test stable at 5/5 PASS with 0.0s deviation (event-driven sync via wait_for_stream_event, replaced sleep-based timing).

Config:

HarnessConfig.degeneration_{barren_threshold, duplicate_text_threshold, wake_after_secs} defaults 50 / 5 / 3600s.
Removed dead GoalSettings (held default_token_budget from removed token-budget subsystem and barren_continuation_limit from removed barren counter).

Deleted:

goal_barren.rs + goal_kickoff_barren_test.rs (barren counter never enforced; replaced by DegenerationDetector).

Test plan

CI: bazel test //... passes
CI: bazel build //... --config=clippy passes
CI: bazel build //... --config=rustfmt passes
All 16 new e2e tests pass
ESC interrupt test stable across 5+ runs (already verified locally with deviation 0.0s)
file_size_cap_test still passes (event_payload.rs / event_summary.rs / control.rs all ≤ 200)

Design references

PR refactor(goal): remove token budget subsystem #165 — token budget removal (informed: do not re-introduce budget surface)
PR refactor(goal): drop barren-driven auto-complete #169 — barren-driven auto-complete removal ("retain barren tracking as observation primitive for future safety designs that act without mutating goal status") — this PR is exactly that design.

…equest_idle for long-running safety Background: macmini-03-64 session 89e6dc04 ran 10h53m and produced 1395 copies of "16/100. Not complete. No action." because the runtime had five overlapping architectural gaps: 1. max_barren_continuations was a dead field — counter accumulated but never enforced. 2. LoopDetector only watched tool_use, never assistant text. 3. ESC was a turn-level interrupt; cron / goal_continuation / rewake kept waking the agent. 4. ThreadGoalStatus had no "structurally infeasible" terminal — the LLM could only lie via update_goal(complete) or keep emitting filler. 5. The LLM had no protocol to declare "I have nothing to do right now; don't push me until an external signal arrives." This change introduces three orthogonal mechanisms: • ContinuationGate (session-scoped injection valve) Controls whether goal_continuation re-prompts the LLM. Closed by DegenerationDetector, by /suspend, or by request_idle. Reopens on any envelope (user/cron/rewake) OR when a deadline elapses. Type-level invariant via GateClose enum: only UserSuspend may carry no wake_deadline; all other closures MUST provide one. • DegenerationDetector (Governance trait extension) Observes TurnHistory via the new on_after_turn hook. Detects RepeatedText (same hash run) and BarrenStreak (no tool_use run), emits DegenerationDetected event, closes the gate, injects a governance feedback note for the LLM. NEVER mutates goal status (PR #169 contract preserved). • request_idle tool (RunnerDirect intercept) Lets the LLM declare session-level idle without changing the goal. Mandatory max_idle_duration_secs ∈ [60, 86400] gives the runtime a deadline to fall back to if no external signal arrives — prevents idle-deadlock in cron-less sessions. Supporting changes: • Protocol: ThreadGoalStatus +Infeasible terminal status; AgentStatus +Suspended; ControlCommand +Suspend/+Unsuspend; new event variants DegenerationDetected and ContinuationGateChanged with payloads in event_summary.rs to keep event_payload.rs ≤ 200 lines. • MessageSource +wakes_suspended_session() / +is_ephemeral_in_history() predicates so policy lives on the protocol, not inlined in ingest. • Message +ephemeral_in_history (write-time snapshot of source class) so all envelopes — including goal_continuation — persist, enabling forensic replay of what the LLM actually saw. • Configurable thresholds via HarnessConfig (degeneration_*). • TUI /suspend + /unsuspend commands; status renders Suspended; goal renders Infeasible. • Test infrastructure: harness wiring now actually wires the Governance chain (silent long-standing bug); SpawnedHarness exposes InterruptSignal so tests can drive ESC; llm_chunk_delay + new wait_for_stream_event give event-driven test pacing (replaces sleep-based timing). Removed: goal_barren.rs (dead barren counter), GoalSettings (dead config with default_token_budget + barren_continuation_limit — both refer to subsystems already removed by PR #165 and PR #169). Coverage: 16 e2e tests covering 12 core invariants. ESC interrupt test runs 5/5 PASS with 0.0s deviation (event-driven sync). All bazel test //... + clippy + rustfmt pass. New files ≤ 200 lines.

Closes the e2e coverage gap identified in the post-deletion audit: `build_kernel_from_config` had no direct tests for its depth-/production- based branch selection of MCP backend after its deletion in #179 (Vault+MCP-to-Hub migration). Restored 7 tests covering still-applicable invariants: - depth_zero_uses_local_backend_with_manager — root agent always owns a LocalMcpProvider (mcp_manager().is_some()) - depth_gt0_with_hub_client_uses_proxy_backend — sub-agent + hub_client → McpProxyClient, no local manager - depth_gt0_without_hub_client_falls_back_to_local — defensive fallback - non_production_skips_mcp_entirely — test mode keeps Local backend but skips MCP server spawn - build_kernel_with_slow_mcp_server_returns_within_bounded_wait — core startup-resilience promise: 1s budget + overhead ≤3s wall-clock - build_kernel_mixed_servers_completes_within_bounded_wait — even with multiple problem servers, build respects bounded wait - sub_agent_build_with_slow_root_config_does_not_spawn_local_mcp — anti- process-explosion: depth>0 + hub_client skips local MCP spawn entirely (chrome-devtools-mcp duplication scenario from the original incident) Two tests from the original were genuinely obsolete and NOT restored: - build_kernel_with_failing_mcp_server_does_not_propagate_error - build_kernel_skips_disabled_servers_entirely Both asserted on `mcp_provider().snapshot()` showing per-server status. Post-#179, MCP server spawning happens in `loopal-agent-hub::mcp_service` (Hub side), not in the agent process. The agent's `build_kernel_from_config` only selects the provider type — actual spawn/failure-reporting is now the Hub's responsibility and is covered by `loopal-agent-hub` tests. Audit also confirmed that `goal_kickoff_barren_test` deleted in #180 is legitimate — `barren_continuation_count` was replaced by ContinuationGate + DegenerationDetector, and is now covered by `degeneration_e2e_test.rs` and `idle_e2e_test.rs`. No restoration needed there. Adjusted McpServerConfig::Stdio variant to include the new fields (`sharing`, `cwd_isolation`) introduced after 5e41a0f.

yishuiliunian force-pushed the worktree-dreamy-painting-kite branch from ed7fc51 to d7974ac Compare May 21, 2026 09:25

yishuiliunian merged commit dc5f3c3 into main May 21, 2026
4 checks passed

yishuiliunian deleted the worktree-dreamy-painting-kite branch May 21, 2026 09:45

yishuiliunian mentioned this pull request May 26, 2026

test: restore build_kernel_depth_test (7 tests, originally from #179) #185

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent-loop): ContinuationGate + DegenerationDetector + request_idle#180

feat(agent-loop): ContinuationGate + DegenerationDetector + request_idle#180
yishuiliunian merged 1 commit into
mainfrom
worktree-dreamy-painting-kite

yishuiliunian commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yishuiliunian commented May 21, 2026

Summary

Changes

Test plan

Design references

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant