Rectify: Stale Detector Kills Actively Working Sessions During run_skill by Trecek · Pull Request #3391 · TalonT-Org/AutoSkillit

Trecek · 2026-05-31T04:41:08Z

Summary

The stale detector uses JSONL file-size growth as its sole liveness signal. When a session is blocked on a long-running MCP tool call (e.g., run_skill executing analyze-pipeline-health), the JSONL stops growing because Claude writes only on conversation turns, not while awaiting tool results. The stale detector interprets this silence as staleness and kills the session.

The architectural weakness: dispatch_food_truck creates a dispatch marker that suppresses stale kills in all three watchers (_session_log_monitor, _watch_stdout_idle, _watch_child_activity), but run_skill creates no such marker. Furthermore, the run_skill call chain through DefaultHeadlessExecutor.run() → run_headless_core() never passes marker_dir or caller_session_id to _execute_claude_headless() — even though that function already accepts both parameters. This means the monitoring tasks inside run_managed_async receive marker_dir=None and skip all marker-based suppression unconditionally.

The solution: extend the execution marker protocol to run_skill sessions by (1) renaming and generalizing the marker detection function, (2) threading marker_dir and caller_session_id through the run() call chain, and (3) creating markers in run_skill before blocking on executor.run().

Requirements

Two dispatch_food_truck sessions (issue #3246, issue #3317) completed ALL substantive work — planning, implementation, code review, PR merge via merge queue, and issue release — but were killed by the stale detector during the final run_diagnostic step. The orchestrator then triggered unnecessary resume dispatches, wasting tokens on sessions whose PRs were already merged 18–70 minutes earlier.

Closes #3349

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/remediation-20260530-174510-498188/.autoskillit/temp/rectify/rectify_stale_detector_mcp_blind_spot_2026-05-30_174510_part_a.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step	Model	count	uncached	output	cache_read	peak_ctx	turns	cache_write	time
rectify*	opus[1m]	1	7.6k	26.4k	1.9M	124.2k	263	108.4k	15m 40s
review_approach*	sonnet	1	9.9k	7.1k	225.0k	50.5k	105	38.6k	6m 42s
dry_walkthrough*	opus	4	207	46.3k	3.8M	85.2k	498	208.0k	25m 59s
implement*	sonnet	4	4.1k	112.9k	18.4M	173.2k	512	366.9k	41m 41s
audit_impl*	sonnet	4	2.8k	60.5k	2.2M	86.7k	342	295.9k	29m 35s
make_plan*	sonnet	2	1.3k	29.8k	1.4M	80.1k	130	142.2k	15m 22s
prepare_pr*	sonnet	1	66.8k	5.1k	163.9k	27.8k	17	15.9k	1m 27s
compose_pr*	sonnet	1	57.8k	3.9k	191.7k	27.8k	17	15.5k	1m 12s
review_pr*	sonnet	2	388	62.4k	3.6M	128.3k	206	289.7k	22m 29s
resolve_review*	opus	2	173	40.2k	7.8M	118.8k	186	228.7k	40m 39s
Total			151.0k	394.6k	39.6M	173.2k		1.7M	3h 20m

* Step used a non-Anthropic provider; caching behavior may differ.

Token Efficiency

Step	LoC Changed	cache_read/LoC	cache_write/LoC	output/LoC
rectify	0	—	—	—
review_approach	0	—	—	—
dry_walkthrough	0	—	—	—
implement	767	23963.4	478.4	147.1
audit_impl	0	—	—	—
make_plan	0	—	—	—
prepare_pr	0	—	—	—
compose_pr	0	—	—	—
review_pr	0	—	—	—
resolve_review	93	84195.7	2458.7	432.3
Total	860	46006.4	1988.0	458.8

Model Usage Breakdown

Model	steps	uncached	output	cache_read	cache_write	time
opus[1m]	1	7.6k	26.4k	1.9M	108.4k	15m 40s
sonnet	7	143.1k	281.7k	26.1M	1.2M	1h 58m
opus	2	380	86.5k	11.6M	436.6k	1h 6m

Adds TDD tests that fail until the execution marker glob is generalized: - test_stale_NOT_fired_when_execution_marker_active: session monitor must suppress stale when run-skill-in-progress-* marker is fresh - test_stale_fired_when_execution_marker_expired: marker present but stale still fires when execution marker is expired - test_execution_marker_suppression_bounded_by_max_suppression_seconds: suppression is bounded even with continuously-touched run-skill marker - test_stdout_idle_NOT_fired_when_execution_marker_active: idle watchdog must suppress IDLE_STALL when run-skill-in-progress-* marker is fresh Current dispatch-in-progress-* glob does not match run-skill-in-progress-* markers, so tests 1a, 1c, 1d fail before the fix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…_marker Generalizes the marker detection function and its glob pattern to match both dispatch-in-progress-* and run-skill-in-progress-* markers: Old glob: dispatch-in-progress-{session_id}-*.marker New glob: *-in-progress-{session_id}-*.marker This makes the stale detector's suppression mechanism work for any execution context, not just fleet dispatches. All callers in _process_race.py (_watch_stdout_idle, _watch_child_activity) and _process_monitor.py (_session_log_monitor) are updated. Updated all test references: monkeypatch targets, imports, class names, frozenset, arch guard predicate string, and CLAUDE.md table entry. Adds parametrized backward-compat test confirming both prefixes match and a wrong-session-id guard test. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds marker_dir: Path | None and caller_session_id: str | None to: - HeadlessExecutor.run() Protocol - DefaultHeadlessExecutor.run() concrete method - run_headless_core() function Both params are forwarded to _execute_claude_headless() as marker_dir and session_id respectively, making the stale detector's marker-based suppression available for run_skill sessions (not just dispatch_food_truck). Adds integration tests verifying the forwarding chain: - test_run_headless_core_forwards_marker_dir_and_caller_session_id - test_default_executor_run_forwards_marker_dir_and_caller_session_id - test_headless_executor_protocol_includes_marker_dir_params Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ker creation Create core/_execution_marker.py with a shared async context manager that handles marker write, heartbeat, and cleanup. Refactor fleet/_api.py to use execution_marker instead of inline marker logic, and wire run_skill in tools_execution.py to create markers before blocking on executor.run(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…fication gap Verifies that execution_marker context manager creates a marker file on entry and deletes it on exit — the lifecycle unit test that was missing from the stale-detector run_skill marker blind spot audit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…d_dispatch Add TestResumeSuccessGuard to test_resume_preflight.py covering the _run_dispatch early-return path when prior dispatch already succeeded. test_resume_rejected_when_prior_dispatch_succeeded fails until the guard is wired in Phase 2. Add TestHasCompletedDispatch to test_state.py and implement has_completed_dispatch in state_recovery.py (name-specific SUCCESS check, fail-open on missing/corrupted state). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…e-exports Wire _build_success_short_circuit into _run_dispatch: when prior dispatch already has DispatchStatus.SUCCESS, return the cached result immediately without launching a subprocess. Add has_completed_dispatch re-exports through state.py and fleet/__init__ so callers can use `from autoskillit.fleet import has_completed_dispatch`. Add dispatch_food_truck MCP tool pre-flight guard: if campaign state already shows SUCCESS for effective_name, return the cached result before execute_dispatch is called (prevents semaphore acquisition and state handle creation for duplicate calls). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…scanner depth Add stale_threshold: 2400 to all run_diagnostic* steps in implementation, remediation, and merge-prs recipes (10 steps total). The default 1200s is too short for multi-subagent scanner work; 2400s matches the existing convention for long-running steps (implement, retry_worktree). Add scanner maxTurns and wall-clock soft-deadline guidance to analyze-pipeline-health SKILL.md to bound investigation per scanner and prevent the stale detector misfiring window. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Move DispatchStateHandle.open_continued inside the try block so that FileNotFoundError (a subclass of OSError) is caught by the existing except handler. Add a create_fresh fallback in the except so handle is always bound after the try/except, enabling fail-open resume when the prior state file is absent. Extract recipe_snapshot before the if/else so it is available in both the except fallback and the else branch. Add test_resume_proceeds_when_prior_state_missing to TestResumeSuccessGuard to verify the fail-open behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace duplicate label suffix in marker filename with uuid4 hex to prevent collisions between concurrent executions sharing the same label + session_id. Replace asyncio.create_task with anyio task group for structured concurrency and proper exception visibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add find_completed_dispatch helper that returns the full DispatchRecord for a completed dispatch. Use it in the dispatch_food_truck MCP tool pre-flight guard to forward actual dispatch_id and dispatched_session_id instead of hardcoded empty strings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ch fields Add monitor_status emptiness assertion in test_stale_NOT_fired_when_execution_marker_active to distinguish STALE from other early-exit conditions. Add dispatch_id, dispatched_session_id, and reason field assertions in test_resume_rejected_when_prior_dispatch_succeeded. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Revert from anyio task group (yield inside task group body causes cleanup issues with @asynccontextmanager) to asyncio.create_task with split exception handling: CancelledError silenced, other exceptions logged. Update tests to match new _touch_marker 2-arg signature. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ppression test Adds assert elapsed < 5.0 after the lower-bound check so a broken suppression path (marker never matched) cannot silently pass when the test happens to run slowly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…checks Adds assert "marker_dir" in sig.parameters and assert "caller_session_id" in sig.parameters before accessing .default, so a missing parameter produces an actionable assertion message instead of an uninformative KeyError. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

New core/ module _execution_marker was missing from MODULE_CASCADE_CORE, causing test_all_core_stems_classified to fail. Maps it to the four consuming packages: core, execution, fleet, server. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Trecek force-pushed the stale-detector-kills-actively-working-sessions-during-run-di/3349 branch from af5551c to 81fc2d9 Compare May 31, 2026 05:11

Trecek and others added 17 commits May 30, 2026 22:45

chore: commit pending session changes

a5f301a

Trecek force-pushed the stale-detector-kills-actively-working-sessions-during-run-di/3349 branch from 81fc2d9 to e2edc1b Compare May 31, 2026 05:45

Trecek added this pull request to the merge queue May 31, 2026

Merged via the queue into develop with commit 1c665ae May 31, 2026
3 checks passed

Trecek deleted the stale-detector-kills-actively-working-sessions-during-run-di/3349 branch May 31, 2026 05:55

Trecek mentioned this pull request May 31, 2026

ProcessStaleError false positive on resume: mtime-based detection trips on external git pull #3427

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rectify: Stale Detector Kills Actively Working Sessions During run_skill#3391

Rectify: Stale Detector Kills Actively Working Sessions During run_skill#3391
Trecek merged 17 commits into
developfrom
stale-detector-kills-actively-working-sessions-during-run-di/3349

Trecek commented May 31, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Trecek commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Requirements

Implementation Plan

Token Usage Summary

Token Efficiency

Model Usage Breakdown

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Trecek commented May 31, 2026 •

edited

Loading