Channel B false-fires on stale completion marker when resuming sessions

## Bug Report

**Severity:** High — resume is completely broken for food truck dispatches. Every resumed session is killed in ~14s with 0 tokens before Claude Code can make any API call.

**Discovered:** 2026-05-30, during dispatch of issue #99 in TalonT-Org/api-simulator.

**Related issues:** #2400 (staged — cross-session JSONL recovery for resume dispatches) addresses result-parsing after the kill but does NOT prevent the kill itself. #2940 (staged — stream parser captures wrong UUID on resume) is a separate resume bug in session ID resolution. #2538 (staged — food truck timeout→resume gap) covers different resume gaps. This issue is the root cause that renders all resume attempts DOA regardless of those fixes.

## Symptom

When `dispatch_food_truck` is called with `resume_session_id` to resume a previously-failed food truck session, the result is:

- `lifespan_started: false`
- `input_tokens: 0`, `output_tokens: 0`
- `exit_code: 143` (SIGTERM)
- `kill_reason: kill_after_completion`
- `duration_seconds: ~14` (just startup overhead)
- `cli_subtype: unparseable`

The session is killed by autoskillit's own process monitor before Claude Code reaches the API.

## Root Cause

**Channel B's `_session_log_monitor` has no resume-boundary awareness.** Phase 2 unconditionally initializes `scan_pos = 0` and reads the entire JSONL file on its first poll — including content from the **original** session that already contains the completion marker.

### Causal Chain

1. **Marker reuse:** `_run_dispatch` reconstructs `DispatchIdentity` from `prior_dispatch_id` via `DispatchStateHandle.open_continued` → `DispatchIdentity.from_dispatch_id()` (`core/types/_type_dispatch_identity.py:60-68`). This deterministically produces the **same** `completion_marker` (`%%L3_DONE::{dispatch_id[:8]}%%`) that the original session already emitted.

2. **Same JSONL file:** `claude --resume <session_id>` appends to the existing session JSONL at `~/.claude/projects/<project-dir>/<session_id>.jsonl`. The file already contains the old completion marker in an `assistant`-type record.

3. **Phase 1 discovery:** `_session_log_monitor` Phase 1 (`_process_monitor.py:234-238`) filters JSONL files by `st_ctime > spawn_time`. The resumed session's JSONL passes this filter because Claude Code updates `st_ctime` when it appends new records. (Confirmed: on Linux, appending to a file updates both `st_ctime` and `st_mtime`.)

4. **Phase 2 false-fire:** Phase 2 initializes `scan_pos = 0` (`_process_monitor.py:289`). First poll: `current_size > last_size` (since `last_size=0` and file has pre-existing content). Reads `content[0:]` — the **entire** file. `_jsonl_contains_marker` finds the old `%%L3_DONE::...%%` in a historical `assistant` record (only `assistant`-type records are scanned per `session_record_types=frozenset({"assistant"})` in `CLAUDE_CODE_CAPABILITIES`). Returns `ChannelBStatus.COMPLETION` immediately (`_process_monitor.py:324`).

5. **Kill:** `resolve_termination` → `COMPLETED` → `DRAIN_THEN_KILL_IF_ALIVE` → SIGTERM after drain window. Process dies at exit code 143. Claude Code never made an API call.

**Channel A is NOT affected:** The stdout heartbeat (`_heartbeat`) reads from a fresh `tempfile.NamedTemporaryFile` created per dispatch (`_process_io.py:36-38`), so there is no pre-existing content to false-fire on.

### Evidence from the failing session

| Field | Original session (`67156089`) | Resume attempt |
|-------|------|------|
| Session ID | `67156089-92d3-429c-a5f2-cfeb63860041` | Same (resumed) |
| Duration | 1881s | 14s |
| Tokens | 130 in / 8201 out | 0 / 0 |
| Exit code | 0 | 143 |
| Kill reason | natural_exit | kill_after_completion |
| JSONL marker | `%%L3_DONE::b2fc2669%%` at line 72 | Same file, marker already present |

The JSONL file at `~/.claude/projects/-home-talon-projects-api-simulator/67156089-92d3-429c-a5f2-cfeb63860041.jsonl` shows the resume user message was written (lines 74-76) but no assistant response was ever generated — the process was killed first. The `proc_trace.jsonl` for the resume session shows Claude Code was alive and working at +5s (89 ESTABLISHED connections, 19 threads, 18.6% CPU, 110KB I/O) — it was killed externally, not self-terminated.

The completion marker also appears verbatim in the resume prompt injected into `queue-operation` (line 74) and `user` (line 76) records, but these are correctly filtered out by `session_record_types=frozenset({"assistant"})`. The false-fire is exclusively from the original session's `assistant` record at line 72.

## Why `prior_completion_markers` doesn't help

The `prior_completion_markers` parameter is threaded through `_run_dispatch` → `dispatch_food_truck` → `_execute_claude_headless` → `_build_skill_result`. However, it is used **only in post-hoc result adjudication** (`_headless_result.py`, `_session_content.py:140-146,210-211`), never in Channel B's real-time monitor. Confirmed: `prior_completion_markers` does not appear in `_process_monitor.py`, `_process_race.py`, or the `run_managed_async` function signature. The fix was applied at the wrong layer: the result parser can tolerate old markers, but the process monitor kills the session before the result parser ever runs.

## Affected Code

| File | Line(s) | Role |
|------|---------|------|
| `src/autoskillit/execution/process/_process_monitor.py` | 289 | `scan_pos = 0` — no resume offset |
| `src/autoskillit/execution/process/_process_jsonl.py` | 39-73 | `_jsonl_contains_marker` — no time-boundary awareness |
| `src/autoskillit/execution/process/_process_race.py` | 426-496 | `resolve_termination` — cannot distinguish stale vs fresh markers |
| `src/autoskillit/core/types/_type_dispatch_identity.py` | 60-68 | `from_dispatch_id` — deterministically reproduces same marker |
| `src/autoskillit/fleet/_api.py` | 497 | `completion_marker = identity.completion_marker` — reuses original |

## Recommended Fix

**Initialize `scan_pos` to the file's existing byte length when Phase 1 discovers the JSONL file.**

After Phase 1 selects the session file (`_process_monitor.py:284`), read the current file content size and set `scan_pos` (and `last_size`) to that value before entering Phase 2. This ensures Phase 2 only scans content written **after** the subprocess was spawned, skipping all historical records including stale completion markers.

This is the minimal correct fix — it doesn't require changes to the marker identity chain, the race resolution logic, or the `prior_completion_markers` threading. It works for all resume scenarios (food truck, skill, campaign) because the root cause is universal: Phase 2 starts at byte 0 regardless of file history.

A regression test should create a JSONL file with a pre-existing completion marker, spawn a monitored process against it, and assert that Channel B does NOT fire on the stale marker.

## Test Gap

No existing test exercises the resume path where Channel B monitors a JSONL file that already contains a completion marker from a prior session.


File	Line(s)	Role
`src/autoskillit/execution/process/_process_monitor.py`	289	`scan_pos = 0` — no resume offset
`src/autoskillit/execution/process/_process_jsonl.py`	39-73	`_jsonl_contains_marker` — no time-boundary awareness
`src/autoskillit/execution/process/_process_race.py`	426-496	`resolve_termination` — cannot distinguish stale vs fresh markers
`src/autoskillit/core/types/_type_dispatch_identity.py`	60-68	`from_dispatch_id` — deterministically reproduces same marker
`src/autoskillit/fleet/_api.py`	497	`completion_marker = identity.completion_marker` — reuses original

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Channel B false-fires on stale completion marker when resuming sessions #3360

Bug Report

Symptom

Root Cause

Causal Chain

Evidence from the failing session

Why `prior_completion_markers` doesn't help

Affected Code

Recommended Fix

Test Gap

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Field	Original session (`67156089`)	Resume attempt
Session ID	`67156089-92d3-429c-a5f2-cfeb63860041`	Same (resumed)
Duration	1881s	14s
Tokens	130 in / 8201 out	0 / 0
Exit code	0	143
Kill reason	natural_exit	kill_after_completion
JSONL marker	`%%L3_DONE::b2fc2669%%` at line 72	Same file, marker already present

Channel B false-fires on stale completion marker when resuming sessions #3360

Description

Bug Report

Symptom

Root Cause

Causal Chain

Evidence from the failing session

Why prior_completion_markers doesn't help

Affected Code

Recommended Fix

Test Gap

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Why `prior_completion_markers` doesn't help