ClaudeTalker registry + session_alive + halt on leak (closes #473)#475
Merged
Conversation
Add per-repo ClaudeTalker exclusivity tracking: at most one thread may drive a claude subprocess per repo at any moment, enforced at the point of registration. ClaudeSession.__enter__ registers a "worker" talker; _run_streaming registers a "webhook" talker when current_repo() is set via a thread-local pinned at the webhook handler and worker thread entrypoints. Concurrent registration for the same repo raises ClaudeLeakError, which the worker thread and webhook dispatcher catch and handle by os._exit(3) — a loud halt so a leaked sub-claude can't silently proliferate. Surface in /status: new session_alive (bool, "subprocess up") and claude_talker (object, authoritative who-is-talking snapshot) fields per repo. ClaudeSession.owner now reads from the talker registry under _talkers_lock rather than a best-effort unsynchronized attribute — correct under free-threaded Python 3.14t with no GIL (noted in CLAUDE.md). pytest-xdist wired with addopts = -n auto. Suite drops 37s → 16s. Closes #473.
rhencke
approved these changes
Apr 14, 2026
This was referenced Apr 14, 2026
Persistent ClaudeSession ignores the sub-skill system prompt, causing 'setup produced no tasks'
#477
Closed
FidoCanCode
added a commit
that referenced
this pull request
Apr 14, 2026
#477) (#478) Fixes the \"setup produced no tasks\" regression from #456. ## Problem When the persistent \`ClaudeSession\` was introduced in #456, \`claude_start\` / \`claude_run\` started sending only \`fido_dir/prompt\` (context) to the session. The sub-skill instructions — \`setup.md\`, \`task.md\`, \`ci.md\`, \`comments.md\`, \`resume.md\` — were built into \`fido_dir/system\` by \`build_prompt\` but **never delivered** to the session. Claude saw only the bare \"Request: ... Repo: ... Branch: ...\" with no instructions, produced empty output, and \`find_or_create_pr\` raised \`setup produced no tasks\`. Observed live after #475 landed: fido stuck on #465, watchdog kept restarting, three abandoned branches. ## Fix The persistent session already has \`persona.md\` as its system prompt, so prepending the full \`fido_dir/system\` (persona + sub-skill) would duplicate the persona. Instead: - \`build_prompt\` now writes a third file \`fido_dir/skill\` containing just the sub-skill content (\`setup.md\`, \`task.md\`, …) - New \`_session_turn_prompt\` helper reads \`fido_dir/skill\` + \`fido_dir/prompt\` and joins them with \`\\n\\n---\\n\\n\` - Both \`claude_start\` and \`claude_run\` persistent-session paths now send \`_session_turn_prompt(fido_dir)\` Tests updated: \`_setup_fido_dir\` fixtures now write \`skill\`, and the \`session_path_sends_prompt_content\` tests expect the combined \`<skill>\\n\\n---\\n\\n<prompt>\` body. 100% coverage, suite passes. Closes #477. --------- Co-authored-by: Fido Can Code <190991155+FidoCanCode@users.noreply.github.com>
This was referenced Apr 14, 2026
Closed
FidoCanCode
added a commit
that referenced
this pull request
Apr 14, 2026
…reempt (closes #479) (#481) Fulfills the 'one claude per repo' invariant from #473 / #475 by making webhook handler prompt calls share the worker's persistent \`ClaudeSession\` instead of spawning one-shot subprocesses. ## Steal / handoff \`ClaudeSession.prompt(content, model=..., system_prompt=...)\` runs one full turn on the persistent session: 1. Set the cancel event — wakes any current holder out of \`iter_events\`. 2. Acquire the session as a context manager (blocks until previous holder's \`__exit__\` releases the lock + unregisters its \`ClaudeTalker\`). 3. Send stream-json \`control_request\` interrupt + drain stale events so stdout is clean. 4. Switch model if requested, send content, return result text. ## Resume on preempt When \`iter_events\` exits early because \`_cancel\` was set, a new \`last_turn_cancelled\` property records it. The worker's \`claude_start\` / \`claude_run\` now wrap the persistent-session path in \`_run_session_turn\`, which retries preempted turns up to \`_MAX_PREEMPT_RETRIES\` times — 'hand the session back to the worker and ask it to resume what it was doing'. No more crashing on empty result after a webhook steals the session. ## Auto-routing print_prompt \`claude.print_prompt\` now checks for a session on the current thread's repo via a new \`set_session_resolver()\` hook. When one exists and is alive, the call routes through \`session.prompt()\` instead of spawning a one-shot subprocess. Falls back to one-shot on \`ClaudeStreamError\` / \`ClaudeLeakError\` (returns \`\"\"\` like the existing failure path). ## Wiring - \`WorkerRegistry.get_session(repo_name)\` exposes the worker thread's \`ClaudeSession\`. - \`server._run()\` calls \`claude.set_session_resolver(registry.get_session)\` right after constructing the registry. Closes #479. Co-authored-by: Fido Can Code <190991155+FidoCanCode@users.noreply.github.com>
1 task
FidoCanCode
added a commit
that referenced
this pull request
Apr 14, 2026
Fixes #467. pytest-xdist and `-n auto` were already added in #475 — this PR fixes the last slow-test pattern by adding `poll_interval=0.01` to `_make_unregistered_server`, which was blocking 0.5s per test on shutdown. Shaves ~1s off wall time and closes out the parallel-pytest work. --- ## Work queue <!-- WORK_QUEUE_START --> <details><summary>Completed (1)</summary> - [x] Use fast poll_interval in _make_unregistered_server test helper <!-- type:spec --> </details> <!-- WORK_QUEUE_END --> Co-authored-by: Fido Can Code <190991155+FidoCanCode@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Redesigns claude-subprocess tracking per #473.
What's in
What's not in (follow-ups)
Routing webhook handlers through the persistent session (so they stop using one-shot `print_prompt` entirely) is the follow-up — the leak detector will scream the moment a webhook and worker collide, which motivates that change.
Also filed #474 to kill `test_raises_on_idle_timeout`'s 10s `sleep 60` (caps xdist parallelism).
Closes #473.