Skip to content

feat(sprout-acp): parallel agent pool, heartbeat timer, and just goose recipes#64

Merged
tlongwell-block merged 8 commits intomainfrom
tyler/parallel-agents-heartbeat
Mar 14, 2026
Merged

feat(sprout-acp): parallel agent pool, heartbeat timer, and just goose recipes#64
tlongwell-block merged 8 commits intomainfrom
tyler/parallel-agents-heartbeat

Conversation

@tlongwell-block
Copy link
Copy Markdown
Collaborator

@tlongwell-block tlongwell-block commented Mar 14, 2026

Summary

Run up to N agent subprocesses in parallel with optional periodic heartbeat prompts. Adds just goose convenience recipes for launching agent harnesses.

N=1 preserves fully backward-compatible behavior — no config changes needed for existing deployments.

What Changed

File Change Δ
pool.rs New — AgentPool, OwnedAgent, take-and-return ownership, run_prompt_task +551
main.rs Rewritten — 5-branch biased select!, dispatch/recovery helpers, shutdown sequence +456 −284
queue.rs Multi-channel in-flight (HashSet), retry throttle, requeue_preserve_timestamps +371 −49
config.rs --agents, --heartbeat-interval, --heartbeat-prompt, --heartbeat-prompt-file +127 −1
README.md Parallel agents docs, config examples, shared identity note, heartbeat semantics +57 −4
justfile just goose (foreground) + just goose-bg (screen session) recipes +45
acp.rs #[allow(dead_code)] on pre-existing unused Timeout variant (clippy fix) +1

Architecture

                  ┌──────────────────────────────────────────────────┐
                  │              sprout-acp harness                  │
                  │                                                  │
Relay ──WS──────▶│  EventQueue ──▶ AgentPool ─┬─ Agent[0] → goose  │
                  │       ▲           ▲  │     ├─ Agent[1] → goose  │
heartbeat tick ─▶│       │           │  │     └─ Agent[2] → goose  │
                  │       │    result_rx ◀┘                          │
                  │       └───────────┘                              │
                  │     redispatch on every result                   │
                  └──────────────────────────────────────────────────┘

5-branch biased; select loop:

  1. Results (highest priority) — drain completions before accepting new work
  2. PanicsJoinSet + task_map with JoinError::id() → O(1) agent identification and recovery
  3. Relay events → queue → dispatch to available agents
  4. Heartbeat (lower priority) — skipped when all agents busy
  5. Shutdown — unified SIGINT + SIGTERM via watch::channel, grace period + JoinSet::shutdown()

Key Design Decisions

  • Take-and-return poolAcpClient is not Clone. Agent moves out of slot on claim, back on return. No Arc<Mutex<>>.
  • rx_and_join_set() split-borrow — polls result channel and JoinSet in one select without double-borrowing the pool
  • Per-channel in-flightHashSet<Uuid> ensures the same channel is never processed by two agents simultaneously
  • Channel affinity — best-effort: prefer the agent that already has a session for that channel
  • Retry throttleHashMap<Uuid, Instant> with 5s backoff on failed channels, prevents tight retry loops
  • !join_set.is_empty() guard — prevents 100% CPU spin when no tasks are in flight (JoinSet returns None immediately when empty)
  • Box<PromptResult> in PoolEvent — clippy large_enum_variant: PromptResult (~528B) vs JoinError (~24B), boxed to avoid bloating the enum
  • Heartbeat: at-most-one globally — next tick suppressed until current completes; lower priority than queued events

Configuration

# Default (backward compatible)
sprout-acp

# 4 parallel agents
sprout-acp --agents 4

# 2 agents with 5-minute heartbeat
sprout-acp --agents 2 --heartbeat-interval 300

# Via justfile
just goose agents=2 heartbeat=300
Flag Env Var Default Description
--agents SPROUT_ACP_AGENTS 1 Agent subprocess count (1–32)
--heartbeat-interval SPROUT_ACP_HEARTBEAT_INTERVAL 0 Seconds between heartbeats (0=off, ≥10)
--heartbeat-prompt SPROUT_ACP_HEARTBEAT_PROMPT built-in Custom heartbeat prompt text
--heartbeat-prompt-file SPROUT_ACP_HEARTBEAT_PROMPT_FILE Read heartbeat prompt from file

Testing

  • 116 unit tests pass (24 queue, 31 config, 61 existing)
  • E2E verified against live relay:
    • N=1: backward compatible, 0% CPU idle ✅
    • N=2: concurrent channel processing ✅
    • Heartbeat (30s interval): fires on schedule, 0% CPU between ticks ✅
  • Crossfire reviewed: Opus 9/10 APPROVE, Codex 9/10 APPROVE

Follow-Up (deferred from review)

  • Unit tests for AgentPool::try_claim / return_agent / live_count
  • Carry task_id through PromptResult for explicit invariant checking
  • Document shutdown event-drop tradeoff in operator docs
  • Respawn backoff on repeated agent crashes

Steps 1-3 of parallel agents + heartbeat implementation:

- queue.rs: HashSet<Uuid> in_flight_channels, retry_after throttle,
  requeue_preserve_timestamps, has_flushable_work, Clone derives
- pool.rs: AgentPool, OwnedAgent, PromptResult, run_prompt_task
- config.rs: --agents, --heartbeat-interval, --heartbeat-prompt flags

All 116 tests pass. main.rs loop rewrite (Step 4) follows.
Step 4: Replace 2-branch select with 5-branch biased select.

- N-agent startup with AgentPool
- dispatch_pending: flush queued work to idle agents
- handle_prompt_result: reclaim agent, requeue on failure, respawn on exit
- recover_panicked_agent: JoinError::id() -> task_map O(1) lookup
- drain_ready_join_results: now_or_never() prevents panic starvation
- dispatch_heartbeat: at-most-one-globally guard
- Unified SIGINT+SIGTERM shutdown via watch channel
- Grace period drain + JoinSet::shutdown() abort
- rx_and_join_set() split-borrow helper for select!

All 116 tests pass.
Steps 5-6: Add structured tracing events for pool health monitoring
and update README with parallel agents + heartbeat documentation.

- 7 new structured log events (agent_claimed, agent_returned,
  heartbeat_fired, heartbeat_skipped_*, dispatch_pending, pool_exhausted)
- has_session_for() helper in pool.rs for affinity_hit detection
- README: new flags table, config examples, shared identity note,
  heartbeat semantics, choosing N guidance
- Fix dead_code warnings in pool.rs
…-heartbeat

* origin/main:
  feat: NIP-29 native compatibility — standard nostr clients can chat on Sprout (#63)
join_set.join_next() returns None immediately when the JoinSet has no
tasks. Without a guard, the biased select loop spins at 100% CPU in
the idle state (no in-flight prompts). Add is_empty() precondition
so the branch is disabled when there are no tasks to join.
Convenience recipes to launch a goose agent connected to a Sprout relay.
Accepts relay URL, agent count, system prompt, private key, and API token.
goose-bg runs in a detached screen session.
- README: update stale 'How It Works' for multi-agent semantics
- pool.rs: debug_assert slot empty in return_agent
- pool.rs: remove dead next_result/result_rx_mut methods
- main.rs: pool_exhausted log warn→debug (normal under load)
- justfile: unique screen session name, add heartbeat param
@tlongwell-block tlongwell-block merged commit 3c68325 into main Mar 14, 2026
8 checks passed
@tlongwell-block tlongwell-block deleted the tyler/parallel-agents-heartbeat branch March 14, 2026 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant