ClaudeTalker registry + session_alive + halt on leak (closes #473) by FidoCanCode · Pull Request #475 · FidoCanCode/home

FidoCanCode · 2026-04-14T17:33:23Z

Redesigns claude-subprocess tracking per #473.

What's in

ClaudeTalker registry in `kennel/claude.py` — at most one thread per repo may drive a claude subprocess at any moment
`ClaudeSession.enter` registers a `worker`-kind talker; `_run_streaming` registers a `webhook`-kind talker when the thread-local `current_repo()` is set
Thread-local repo is pinned by `WorkerThread.run` + `WebhookHandler._process_action`
Halt on leak: concurrent registration raises `ClaudeLeakError` → `os._exit(3)`
`/status` gains `session_alive` (subprocess up) + `claude_talker` (structured snapshot of who's driving)
`session.owner` now reads under lock via the talker registry — correct under free-threaded 3.14t (noted in CLAUDE.md)
Drive-by: `pytest-xdist` wired with `addopts = -n auto` (37s → 16s)

What's not in (follow-ups)

Routing webhook handlers through the persistent session (so they stop using one-shot `print_prompt` entirely) is the follow-up — the leak detector will scream the moment a webhook and worker collide, which motivates that change.

Also filed #474 to kill `test_raises_on_idle_timeout`'s 10s `sleep 60` (caps xdist parallelism).

Closes #473.

Add per-repo ClaudeTalker exclusivity tracking: at most one thread may drive a claude subprocess per repo at any moment, enforced at the point of registration. ClaudeSession.__enter__ registers a "worker" talker; _run_streaming registers a "webhook" talker when current_repo() is set via a thread-local pinned at the webhook handler and worker thread entrypoints. Concurrent registration for the same repo raises ClaudeLeakError, which the worker thread and webhook dispatcher catch and handle by os._exit(3) — a loud halt so a leaked sub-claude can't silently proliferate. Surface in /status: new session_alive (bool, "subprocess up") and claude_talker (object, authoritative who-is-talking snapshot) fields per repo. ClaudeSession.owner now reads from the talker registry under _talkers_lock rather than a best-effort unsynchronized attribute — correct under free-threaded Python 3.14t with no GIL (noted in CLAUDE.md). pytest-xdist wired with addopts = -n auto. Suite drops 37s → 16s. Closes #473.

#477) (#478) Fixes the \"setup produced no tasks\" regression from #456. ## Problem When the persistent \`ClaudeSession\` was introduced in #456, \`claude_start\` / \`claude_run\` started sending only \`fido_dir/prompt\` (context) to the session. The sub-skill instructions — \`setup.md\`, \`task.md\`, \`ci.md\`, \`comments.md\`, \`resume.md\` — were built into \`fido_dir/system\` by \`build_prompt\` but **never delivered** to the session. Claude saw only the bare \"Request: ... Repo: ... Branch: ...\" with no instructions, produced empty output, and \`find_or_create_pr\` raised \`setup produced no tasks\`. Observed live after #475 landed: fido stuck on #465, watchdog kept restarting, three abandoned branches. ## Fix The persistent session already has \`persona.md\` as its system prompt, so prepending the full \`fido_dir/system\` (persona + sub-skill) would duplicate the persona. Instead: - \`build_prompt\` now writes a third file \`fido_dir/skill\` containing just the sub-skill content (\`setup.md\`, \`task.md\`, …) - New \`_session_turn_prompt\` helper reads \`fido_dir/skill\` + \`fido_dir/prompt\` and joins them with \`\\n\\n---\\n\\n\` - Both \`claude_start\` and \`claude_run\` persistent-session paths now send \`_session_turn_prompt(fido_dir)\` Tests updated: \`_setup_fido_dir\` fixtures now write \`skill\`, and the \`session_path_sends_prompt_content\` tests expect the combined \`<skill>\\n\\n---\\n\\n<prompt>\` body. 100% coverage, suite passes. Closes #477. --------- Co-authored-by: Fido Can Code <190991155+FidoCanCode@users.noreply.github.com>

…reempt (closes #479) (#481) Fulfills the 'one claude per repo' invariant from #473 / #475 by making webhook handler prompt calls share the worker's persistent \`ClaudeSession\` instead of spawning one-shot subprocesses. ## Steal / handoff \`ClaudeSession.prompt(content, model=..., system_prompt=...)\` runs one full turn on the persistent session: 1. Set the cancel event — wakes any current holder out of \`iter_events\`. 2. Acquire the session as a context manager (blocks until previous holder's \`__exit__\` releases the lock + unregisters its \`ClaudeTalker\`). 3. Send stream-json \`control_request\` interrupt + drain stale events so stdout is clean. 4. Switch model if requested, send content, return result text. ## Resume on preempt When \`iter_events\` exits early because \`_cancel\` was set, a new \`last_turn_cancelled\` property records it. The worker's \`claude_start\` / \`claude_run\` now wrap the persistent-session path in \`_run_session_turn\`, which retries preempted turns up to \`_MAX_PREEMPT_RETRIES\` times — 'hand the session back to the worker and ask it to resume what it was doing'. No more crashing on empty result after a webhook steals the session. ## Auto-routing print_prompt \`claude.print_prompt\` now checks for a session on the current thread's repo via a new \`set_session_resolver()\` hook. When one exists and is alive, the call routes through \`session.prompt()\` instead of spawning a one-shot subprocess. Falls back to one-shot on \`ClaudeStreamError\` / \`ClaudeLeakError\` (returns \`\"\"\` like the existing failure path). ## Wiring - \`WorkerRegistry.get_session(repo_name)\` exposes the worker thread's \`ClaudeSession\`. - \`server._run()\` calls \`claude.set_session_resolver(registry.get_session)\` right after constructing the registry. Closes #479. Co-authored-by: Fido Can Code <190991155+FidoCanCode@users.noreply.github.com>

Fixes #467. pytest-xdist and `-n auto` were already added in #475 — this PR fixes the last slow-test pattern by adding `poll_interval=0.01` to `_make_unregistered_server`, which was blocking 0.5s per test on shutdown. Shaves ~1s off wall time and closes out the parallel-pytest work. --- ## Work queue  <details><summary>Completed (1)</summary> - [x] Use fast poll_interval in _make_unregistered_server test helper  </details>  Co-authored-by: Fido Can Code <190991155+FidoCanCode@users.noreply.github.com>

FidoCanCode marked this pull request as ready for review April 14, 2026 17:33

FidoCanCode requested a review from rhencke April 14, 2026 17:33

rhencke approved these changes Apr 14, 2026

View reviewed changes

FidoCanCode merged commit bf513f3 into main Apr 14, 2026
2 checks passed

FidoCanCode deleted the fix-session-owner-telemetry branch April 14, 2026 17:34

This was referenced Apr 14, 2026

Route webhook handlers through the persistent ClaudeSession with steal/handoff semantics #479

Closed

Route webhook handlers through persistent ClaudeSession + resume on preempt (closes #479) #481

Merged

FidoCanCode mentioned this pull request Apr 14, 2026

Run pytest in parallel (pytest-xdist) (closes #467) #514

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ClaudeTalker registry + session_alive + halt on leak (closes #473)#475

ClaudeTalker registry + session_alive + halt on leak (closes #473)#475
FidoCanCode merged 1 commit into
mainfrom
fix-session-owner-telemetry

FidoCanCode commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FidoCanCode commented Apr 14, 2026

What's in

What's not in (follow-ups)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants