feat(agent-loop): ContinuationGate + DegenerationDetector + request_idle#180
Merged
Conversation
…equest_idle for long-running safety
Background: macmini-03-64 session 89e6dc04 ran 10h53m and produced 1395
copies of "16/100. Not complete. No action." because the runtime had
five overlapping architectural gaps:
1. max_barren_continuations was a dead field — counter accumulated but
never enforced.
2. LoopDetector only watched tool_use, never assistant text.
3. ESC was a turn-level interrupt; cron / goal_continuation / rewake
kept waking the agent.
4. ThreadGoalStatus had no "structurally infeasible" terminal — the
LLM could only lie via update_goal(complete) or keep emitting filler.
5. The LLM had no protocol to declare "I have nothing to do right now;
don't push me until an external signal arrives."
This change introduces three orthogonal mechanisms:
• ContinuationGate (session-scoped injection valve)
Controls whether goal_continuation re-prompts the LLM. Closed by
DegenerationDetector, by /suspend, or by request_idle. Reopens on
any envelope (user/cron/rewake) OR when a deadline elapses.
Type-level invariant via GateClose enum: only UserSuspend may carry
no wake_deadline; all other closures MUST provide one.
• DegenerationDetector (Governance trait extension)
Observes TurnHistory via the new on_after_turn hook. Detects
RepeatedText (same hash run) and BarrenStreak (no tool_use run),
emits DegenerationDetected event, closes the gate, injects a
governance feedback note for the LLM. NEVER mutates goal status
(PR #169 contract preserved).
• request_idle tool (RunnerDirect intercept)
Lets the LLM declare session-level idle without changing the goal.
Mandatory max_idle_duration_secs ∈ [60, 86400] gives the runtime a
deadline to fall back to if no external signal arrives — prevents
idle-deadlock in cron-less sessions.
Supporting changes:
• Protocol: ThreadGoalStatus +Infeasible terminal status; AgentStatus
+Suspended; ControlCommand +Suspend/+Unsuspend; new event variants
DegenerationDetected and ContinuationGateChanged with payloads in
event_summary.rs to keep event_payload.rs ≤ 200 lines.
• MessageSource +wakes_suspended_session() / +is_ephemeral_in_history()
predicates so policy lives on the protocol, not inlined in ingest.
• Message +ephemeral_in_history (write-time snapshot of source class)
so all envelopes — including goal_continuation — persist, enabling
forensic replay of what the LLM actually saw.
• Configurable thresholds via HarnessConfig (degeneration_*).
• TUI /suspend + /unsuspend commands; status renders Suspended; goal
renders Infeasible.
• Test infrastructure: harness wiring now actually wires the Governance
chain (silent long-standing bug); SpawnedHarness exposes
InterruptSignal so tests can drive ESC; llm_chunk_delay + new
wait_for_stream_event give event-driven test pacing (replaces
sleep-based timing).
Removed: goal_barren.rs (dead barren counter), GoalSettings (dead config
with default_token_budget + barren_continuation_limit — both refer to
subsystems already removed by PR #165 and PR #169).
Coverage: 16 e2e tests covering 12 core invariants. ESC interrupt test
runs 5/5 PASS with 0.0s deviation (event-driven sync). All bazel
test //... + clippy + rustfmt pass. New files ≤ 200 lines.
ed7fc51 to
d7974ac
Compare
2 tasks
yishuiliunian
added a commit
that referenced
this pull request
May 26, 2026
Closes the e2e coverage gap identified in the post-deletion audit: `build_kernel_from_config` had no direct tests for its depth-/production- based branch selection of MCP backend after its deletion in #179 (Vault+MCP-to-Hub migration). Restored 7 tests covering still-applicable invariants: - depth_zero_uses_local_backend_with_manager — root agent always owns a LocalMcpProvider (mcp_manager().is_some()) - depth_gt0_with_hub_client_uses_proxy_backend — sub-agent + hub_client → McpProxyClient, no local manager - depth_gt0_without_hub_client_falls_back_to_local — defensive fallback - non_production_skips_mcp_entirely — test mode keeps Local backend but skips MCP server spawn - build_kernel_with_slow_mcp_server_returns_within_bounded_wait — core startup-resilience promise: 1s budget + overhead ≤3s wall-clock - build_kernel_mixed_servers_completes_within_bounded_wait — even with multiple problem servers, build respects bounded wait - sub_agent_build_with_slow_root_config_does_not_spawn_local_mcp — anti- process-explosion: depth>0 + hub_client skips local MCP spawn entirely (chrome-devtools-mcp duplication scenario from the original incident) Two tests from the original were genuinely obsolete and NOT restored: - build_kernel_with_failing_mcp_server_does_not_propagate_error - build_kernel_skips_disabled_servers_entirely Both asserted on `mcp_provider().snapshot()` showing per-server status. Post-#179, MCP server spawning happens in `loopal-agent-hub::mcp_service` (Hub side), not in the agent process. The agent's `build_kernel_from_config` only selects the provider type — actual spawn/failure-reporting is now the Hub's responsibility and is covered by `loopal-agent-hub` tests. Audit also confirmed that `goal_kickoff_barren_test` deleted in #180 is legitimate — `barren_continuation_count` was replaced by ContinuationGate + DegenerationDetector, and is now covered by `degeneration_e2e_test.rs` and `idle_e2e_test.rs`. No restoration needed there. Adjusted McpServerConfig::Stdio variant to include the new fields (`sharing`, `cwd_isolation`) introduced after 5e41a0f.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ThreadGoalStatus::Infeasible,AgentStatus::Suspended,ControlCommand::{Suspend,Unsuspend}, new event variants,MessageSourcepredicate methods.Changes
New mechanisms (
crates/loopal-runtime/src/agent_loop/):continuation_gate.rs— Session-scoped valve;GateCloseenum makes "UserSuspend has no deadline" a type-level invariant.turn_history.rs— Circular buffer ofTurnRecord(metrics + text_hash) for cross-turn governance decisions.degeneration_detector.rs— Detects RepeatedText / BarrenStreak streaks;on_envelope_receivedresets silenced flag so /unsuspend + relapse can re-trigger.degeneration_feedback.rs— User-facing feedback templates.handle_request_idle.rs— RunnerDirect intercept; validatesmax_idle_duration_secs ∈ [60, 86400].Protocol (
crates/loopal-protocol/):ThreadGoalStatus::Infeasible+ModelInfeasible/UserInfeasiblereasons.AgentStatus::Suspended;ControlCommand::Suspend/Unsuspend.MessageSource::wakes_suspended_session()+is_ephemeral_in_history()— policy moved offingest.rsinline matches.Message::ephemeral_in_history— write-time snapshot, enables forensic replay.DegenerationDetected(DegenerationSummary),ContinuationGateChanged(ContinuationGateSummary).Tool layer:
crates/tools/agent/idle/—RequestIdleTooltyped schema.update_goalextended to acceptstatus: infeasible.GoalSessiontrait + adapter addmark_infeasible_by_model.Governance:
Governance::on_after_turn(record, history) -> PostTurnAction— default no-op, binary-compatible.PostTurnAction::DegenerationcarriesDegenerationSummary+ feedback string.build_governanceregistersDegenerationDetectoralongside existingLoopDetector.Runtime control plane:
select_inputshort-circuits onSuspended(cron/rewake blocked).ingest_messageopens gate, auto-unsuspends on Human envelope.goal_continuation_checkchecks gate + auto-reopens at deadline.TUI:
/suspend+/unsuspendcommands.Suspended ⏸; goal rendersInfeasible(red).Test infrastructure:
wiring.rsnow actually wires the Governance chain (silent long-standing bug — all previous tests with governance-dependent assertions ran with empty governance vec).SpawnedHarnessexposesInterruptSignalso tests can drive ESC.HarnessBuilder::llm_chunk_delay()— controllable stream pacing.e2e_event_waiters.rs— shared async waiters with scope-guard comment.Tests: 16 e2e + ~45 unit tests covering 12 core invariants. ESC interrupt test stable at 5/5 PASS with 0.0s deviation (event-driven sync via
wait_for_stream_event, replaced sleep-based timing).Config:
HarnessConfig.degeneration_{barren_threshold, duplicate_text_threshold, wake_after_secs}defaults 50 / 5 / 3600s.GoalSettings(helddefault_token_budgetfrom removed token-budget subsystem andbarren_continuation_limitfrom removed barren counter).Deleted:
goal_barren.rs+goal_kickoff_barren_test.rs(barren counter never enforced; replaced by DegenerationDetector).Test plan
Design references