Rectify: SkillResult.session_id Channel B Backfill & Prompt Contract Enforcement#535
Conversation
| ) | ||
| assert skill_result.session_id == stdout_session # NOT channel_b_uuid | ||
|
|
||
| def test_make_result_exposes_channel_b_session_id(self): |
There was a problem hiding this comment.
[warning] tests: test_make_result_exposes_channel_b_session_id tests the conftest _make_result helper rather than production code. It verifies test infrastructure, not the production SubprocessResult constructor contract. Remove or replace with a test that exercises production code directly.
There was a problem hiding this comment.
Investigated — this is intentional. The test calls _make_result() from tests/conftest.py (lines 123-140), which directly constructs the production SubprocessResult — it is not an opaque mock. Asserting result.channel_b_session_id == 'test-uuid-123' verifies the production dataclass field is correctly threaded through the conftest helper. If this wiring were broken, all four companion Channel B path tests in commit 6f283e0 would silently receive '' instead of the injected UUID, making their assertions meaningless. This is an intentional fixture-contract guard, not a test of test infrastructure.
| is_error=True, | ||
| result="", | ||
| session_id="", | ||
| session_id=result.channel_b_session_id, |
There was a problem hiding this comment.
[warning] cohesion: Asymmetric session_id resolution — stale path (L516) calls _resolve_session_id(None, result) while the timeout path here calls result.channel_b_session_id directly, bypassing the helper. Both produce identical results when session=None, but _resolve_session_id is not the single authoritative resolver for all non-session paths. Use the helper consistently or inline both.
There was a problem hiding this comment.
Investigated — this is intentional. _resolve_session_id(None, result) trivially reduces to result.channel_b_session_id (the session!=None guard is skipped, line 436 returns directly). However, the two paths operate at different abstraction levels: the stale path (L510) builds a SkillResult and returns early — the helper applies Channel A > Channel B precedence at that final construction point. The timeout path (L534) builds a ClaudeSessionResult that flows through further processing including potential stdout recovery (L530) and synthesis recovery (L551); using the helper here would be cosmetically consistent but architecturally misleading, since the timeout path may subsequently override session_id downstream. The asymmetry is intentional design, not an accidental inconsistency.
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit review found 5 blocking issues. See inline comments. (Note: changes_requested verdict — GitHub prevents requesting changes on own PR, posting as comment.)
…paths - Add _resolve_session_id(session, result) helper that prefers stdout-parsed UUID and falls back to result.channel_b_session_id - Fix stale non-recovery path: was hardcoding session_id=""; now uses _resolve_session_id(None, result) - Fix TIMED_OUT empty-stdout path: ClaudeSessionResult was constructed with session_id=""; now uses result.channel_b_session_id - Remove effective_session_id local variable from run_headless_core; flush_session_log now uses skill_result.session_id directly (which is always the best-available value) - Fix open_kitchen positional call in _prompts.py to keyword form open_kitchen(name=...) - Extend _make_result() and _sr() fixtures with channel_b_session_id parameter - Add TestResolveSessionId and five new TestBuildSkillResult tests covering stale path, TIMED_OUT path, preference ordering, and fixture contract Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When needs_retry=true and retry_reason=resume, the orchestrator now checks subtype to discriminate stale (hung-process kill → re-execute step) from context_exhaustion and other subtypes (→ on_context_limit as before). Adds tests: test_orchestrator_prompt_stale_retries_not_routed_to_context_limit, test_orchestrator_prompt_context_exhaustion_still_routes_to_context_limit, test_sous_chef_stale_routing_rule_present.
The test asserts "retry" appears in the 300-char window starting at the first "subtype" occurrence. The implementation used "Re-execute" which does not contain "retry" as a substring. Also trim pre-existing E501 violations in test comments introduced by commit 4063165 on integration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…uting assertion Asserts that 'subtype=stale' appears as the compound discriminant and that the prompt explicitly prohibits routing stale sessions to on_context_limit, rather than a 300-char window proximity check that could pass even with incorrect routing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ef SKILL.md Individual word presence checks for 'stale' and 'subtype' could be satisfied by headings or descriptions without the routing rule being wired. Assert the compound phrase 'subtype: stale' or 'subtype=stale' which only appears in the operational routing rule block. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
eeabcae to
766a359
Compare
…paths The rebase against integration replaced _resolve_session_id(None, result) with result.session_id at the stale and TIMED_OUT construction sites. result.session_id is only populated when _run_subprocess_async resolves it at the transport layer; tests that construct SubprocessResult directly set channel_b_session_id without session_id, exercising the fallback path that was lost during conflict resolution.
Integration added transport-level session_id resolution in process.py, populating SubprocessResult.session_id from both Channel A and B. The headless.py _resolve_session_id helper only checked channel_b_session_id, missing the already-resolved result.session_id. Precedence is now: stdout session_id > result.session_id > channel_b.
Summary
Three related bugs cluster around the same structural weakness: contracts that exist semantically but are not enforced structurally at the boundary where they matter.
run_headless_corecomputeseffective_session_idas a Channel B fallback forSkillResult.session_id, correctly passes it toflush_session_log, but never writes it back to the returnedSkillResult. Callers (includingtools_github.py) receivesession_id=""on all stale and empty-stdout timeout paths, even when Channel B discovered the real UUID from the JSONL filename. The fix moves resolution to construction time via a new_resolve_session_idhelper and eliminates the post-hoc enrichment anti-pattern entirely.A secondary fix corrects
_prompts.py:74which instructs Claude to callopen_kitchen('{recipe_name}')in positional form — now enforced to keyword formopen_kitchen(name=...). The sous-chef routing table is also extended to a 2D key(retry_reason, subtype)so stale sessions (watchdog kills) are re-executed rather than routed toon_context_limit.Architecture Impact
State Lifecycle Diagram
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; subgraph Sources ["INIT_ONLY SOURCE FIELDS"] direction LR ChannelA["ClaudeSessionResult<br/>━━━━━━━━━━<br/>.session_id (str)<br/>stdout-parsed UUID<br/>(Channel A — preferred)"] ChannelB["SubprocessResult<br/>━━━━━━━━━━<br/>.channel_b_session_id (str)<br/>JSONL filename stem<br/>(Channel B — fallback)"] end subgraph Gate ["★ RESOLUTION GATE — _resolve_session_id (NEW)"] direction TB Resolver["★ _resolve_session_id(session, result)<br/>━━━━━━━━━━<br/>if session and session.session_id:<br/> return session.session_id<br/>return result.channel_b_session_id<br/>Single source of truth for priority rule"] end subgraph BuildSites ["● CONSTRUCTION SITES — _build_skill_result (MODIFIED)"] direction TB StaleSuccess["Stale recovery success path<br/>━━━━━━━━━━<br/>session_id=stale_session.session_id<br/>(stdout parse wins; no fallback needed)"] StaleFailure["● Stale failure path<br/>━━━━━━━━━━<br/>session_id=_resolve_session_id(None, result)<br/>NOW uses Channel B fallback"] TimeoutPath["● TIMED_OUT + empty-stdout path<br/>━━━━━━━━━━<br/>ClaudeSessionResult(session_id=<br/> result.channel_b_session_id)<br/>Seeds session for normal constructor"] NormalPath["Normal path<br/>━━━━━━━━━━<br/>session_id=session.session_id<br/>(unchanged — already correct)"] end AntiPattern["effective_session_id [REMOVED]<br/>━━━━━━━━━━<br/>was: local var computed post-construction<br/>was: used for flush_session_log only<br/>was: NEVER written back to SkillResult"] subgraph Contract ["● INIT_ONLY CONTRACT — SkillResult.session_id (ENFORCED)"] direction TB SkillResultNode["● SkillResult<br/>━━━━━━━━━━<br/>.session_id: str ← INIT_ONLY<br/>Set once at _build_skill_result<br/>Never mutated post-return<br/>Invariant: non-empty when Channel B has UUID"] end subgraph Consumers ["DOWNSTREAM CONSUMERS"] direction LR FlushLog["flush_session_log<br/>━━━━━━━━━━<br/>session_id=skill_result.session_id<br/>(previously: effective_session_id)"] ToolsGithub["tools_github.py<br/>━━━━━━━━━━<br/>propagated to caller<br/>& _read_session_diagnostics"] DebugLog["debug log<br/>━━━━━━━━━━<br/>session_id now accurate<br/>on all termination paths"] end ChannelA --> Resolver ChannelB --> Resolver Resolver --> StaleFailure Resolver --> TimeoutPath StaleSuccess --> SkillResultNode StaleFailure --> SkillResultNode TimeoutPath --> SkillResultNode NormalPath --> SkillResultNode AntiPattern -.->|"eliminated"| SkillResultNode SkillResultNode --> FlushLog SkillResultNode --> ToolsGithub SkillResultNode --> DebugLog class ChannelA,ChannelB cli; class Resolver newComponent; class StaleSuccess,NormalPath phase; class StaleFailure,TimeoutPath handler; class SkillResultNode stateNode; class FlushLog,ToolsGithub,DebugLog output; class AntiPattern gap;Color Legend:
ClaudeSessionResult.session_id(Channel A) andSubprocessResult.channel_b_session_id(Channel B)★ _resolve_session_id— new resolution helper enforcing priority rule● Stale failureand● TIMED_OUTpaths now use Channel B fallback● SkillResult.session_id— INIT_ONLY field with enforced contracteffective_session_idlocal variable — anti-pattern removedProcess Flow Diagram
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; START([run_skill invoked]) subgraph Subprocess ["SUBPROCESS EXECUTION"] direction TB Launch["run_headless_core<br/>━━━━━━━━━━<br/>Launch Claude subprocess<br/>Channel A (stdout) + Channel B (JSONL)"] RaceResolve["resolve_termination<br/>━━━━━━━━━━<br/>RaceAccumulator signals<br/>→ TerminationReason"] TermDecision{"Termination?<br/>━━━━━━━━━━<br/>STALE / TIMED_OUT / NATURAL"} end subgraph BuildResult ["● _build_skill_result (MODIFIED — session_id backfill)"] direction TB StaleRecovery{"● STALE<br/>━━━━━━━━━━<br/>stdout has valid result?"} StaleSuccess["Recovered from stale<br/>━━━━━━━━━━<br/>success=True<br/>retry_reason=NONE<br/>subtype=recovered_from_stale"] StaleFailPath["● Stale failure<br/>━━━━━━━━━━<br/>success=False<br/>needs_retry=True<br/>retry_reason=RESUME<br/>subtype=stale<br/>session_id=_resolve_session_id(None, result)"] TimeoutBranch["● TIMED_OUT<br/>━━━━━━━━━━<br/>empty stdout? → bare ClaudeSessionResult(<br/> session_id=result.channel_b_session_id)<br/>non-empty? → parse + force subtype=TIMEOUT"] NormalBranch["Normal completion<br/>━━━━━━━━━━<br/>parse_session_result(stdout)<br/>→ ClaudeSessionResult"] ComputeOutcome["_compute_outcome<br/>━━━━━━━━━━<br/>SUCCEEDED / RETRIABLE / FAILED<br/>(recovery steps applied first)"] end subgraph Gates ["POST-OUTCOME GATES"] direction TB PathGate{"path_contamination?"} ContractGate{"CONTRACT_RECOVERY?<br/>━━━━━━━━━━<br/>write evidence + adjudicated_failure"} ZeroWriteGate{"zero_writes?<br/>━━━━━━━━━━<br/>success=True but write_count=0"} BudgetGuard["Budget guard<br/>━━━━━━━━━━<br/>retries exhausted?<br/>→ flip needs_retry=False<br/> retry_reason=BUDGET_EXHAUSTED"] end SkillResultOut["SkillResult<br/>━━━━━━━━━━<br/>(success, needs_retry,<br/>retry_reason, subtype,<br/>session_id — now accurate)"] subgraph SousChef ["● SOUS-CHEF ROUTING (MODIFIED — 2D key: retry_reason × subtype)"] direction TB NeedsRetryDecision{"needs_retry=True?"} RetryReasonDecision{"● retry_reason?<br/>━━━━━━━━━━<br/>2D routing key"} ResumeStale["● resume + stale<br/>━━━━━━━━━━<br/>Watchdog kill (NOT context limit)<br/>→ RE-EXECUTE step<br/> (decrement retries)"] ResumeContextLimit["resume + !stale<br/>━━━━━━━━━━<br/>Context limit hit<br/>→ on_context_limit<br/> (or on_failure if none)"] DrainRace["drain_race<br/>━━━━━━━━━━<br/>→ on_context_limit<br/> (or on_failure if none)"] OtherReasons["empty_output / path_contamination<br/>/ early_stop / zero_writes<br/>━━━━━━━━━━<br/>→ on_failure (never context limit)"] RetryExhausted{"Retries exhausted?"} end COMPLETE([COMPLETE — success]) CONTEXT_LIMIT([on_context_limit route]) FAILURE([on_failure route]) EXHAUSTED([on_exhausted route]) START --> Launch Launch --> RaceResolve RaceResolve --> TermDecision TermDecision -->|"STALE"| StaleRecovery TermDecision -->|"TIMED_OUT"| TimeoutBranch TermDecision -->|"NATURAL_EXIT"| NormalBranch StaleRecovery -->|"recovery OK"| StaleSuccess StaleRecovery -->|"no recovery"| StaleFailPath StaleSuccess --> SkillResultOut StaleFailPath --> BudgetGuard TimeoutBranch --> ComputeOutcome NormalBranch --> ComputeOutcome ComputeOutcome --> PathGate PathGate -->|"yes"| BudgetGuard PathGate -->|"no"| ContractGate ContractGate -->|"yes"| BudgetGuard ContractGate -->|"no"| ZeroWriteGate ZeroWriteGate -->|"yes"| BudgetGuard ZeroWriteGate -->|"no"| SkillResultOut BudgetGuard --> SkillResultOut SkillResultOut --> NeedsRetryDecision NeedsRetryDecision -->|"no"| COMPLETE NeedsRetryDecision -->|"yes"| RetryReasonDecision RetryReasonDecision -->|"resume"| ResumeStale RetryReasonDecision -->|"resume + !stale"| ResumeContextLimit RetryReasonDecision -->|"drain_race"| DrainRace RetryReasonDecision -->|"other"| OtherReasons ResumeStale --> RetryExhausted RetryExhausted -->|"no — re-execute"| Launch RetryExhausted -->|"yes"| EXHAUSTED ResumeContextLimit --> CONTEXT_LIMIT DrainRace --> CONTEXT_LIMIT OtherReasons --> FAILURE class START,COMPLETE,CONTEXT_LIMIT,FAILURE,EXHAUSTED terminal; class Launch,NormalBranch,StaleSuccess phase; class StaleRecovery,TermDecision,NeedsRetryDecision,RetryReasonDecision,PathGate,ContractGate,ZeroWriteGate,RetryExhausted stateNode; class StaleFailPath,TimeoutBranch handler; class ComputeOutcome,BudgetGuard detector; class ResumeStale,ResumeContextLimit,DrainRace,OtherReasons newComponent; class SkillResultOut output;Color Legend:
● Stale failureand● TIMED_OUTpaths with session_id backfill_compute_outcomeandBudget guardvalidation gatesSkillResultproduced from_build_skill_resultOperational Diagram
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; subgraph CLI ["CLI ENTRY POINTS"] direction LR OrderCmd["autoskillit order<br/>━━━━━━━━━━<br/>[recipe] — recipe name or picker<br/>app.py:order command"] CookCmd["autoskillit cook<br/>━━━━━━━━━━<br/>Free-form interactive session<br/>(no _build_orchestrator_prompt)"] end subgraph OrderFlow ["ORDER COMMAND WORKFLOW"] direction TB RecipeResolve["Recipe resolution<br/>━━━━━━━━━━<br/>load_recipe + validate_recipe<br/>user picker if no recipe given"] SubsetGate["Subset gate<br/>━━━━━━━━━━<br/>show disabled subsets<br/>user opt-in prompt"] PreviewRender["show_cook_preview<br/>━━━━━━━━━━<br/>flow diagram + ingredients table<br/>rendered to terminal"] UserConfirm{"[Enter/n]<br/>━━━━━━━━━━<br/>launch confirm?"} end subgraph PromptBuild ["● PROMPT CONSTRUCTION (_build_orchestrator_prompt MODIFIED)"] direction TB PromptFn["● _build_orchestrator_prompt(recipe_name)<br/>━━━━━━━━━━<br/>Reads sous-chef/SKILL.md (bundled)<br/>Builds --append-system-prompt string"] OpenKitchenLine["● Mandatory first-action line<br/>━━━━━━━━━━<br/>open_kitchen(name='{recipe_name}')<br/>keyword arg form — FIXED<br/>(was: positional → Claude guessed param name)"] OrchestratorRules["Orchestrator rules injected<br/>━━━━━━━━━━<br/>tool restrictions (AskUserQuestion only)<br/>routing rules, failure predicates<br/>context-limit + stale retry routing"] end subgraph SessionLaunch ["SESSION LAUNCH"] direction TB LaunchCook["_launch_cook_session<br/>━━━━━━━━━━<br/>claude --plugin-dir {pkg_root}<br/>--tools AskUserQuestion<br/>--append-system-prompt {prompt}"] ClaudeSession["Claude Code interactive session<br/>━━━━━━━━━━<br/>MCP server: autoskillit<br/>Kitchen gated until open_kitchen(name=...) called"] end subgraph KitchenActivation ["MCP KITCHEN — open_kitchen tool"] direction TB OpenKitchenTool["open_kitchen(name=...)<br/>━━━━━━━━━━<br/>Reveals 40 kitchen MCP tools<br/>Loads recipe from recipes/ directory<br/>name= kwarg — unambiguous param"] PreviousGap["open_kitchen('{recipe_name}') [OLD]<br/>━━━━━━━━━━<br/>Positional arg → Claude infers param<br/>Sometimes guesses 'recipe_name' not 'name'<br/>→ tool call fails, wastes retry"] end subgraph Monitoring ["OBSERVABILITY"] direction TB SessionLogs["Session diagnostics logs<br/>━━━━━━━━━━<br/>~/.local/share/autoskillit/logs/<br/>sessions/{session_id}/"] TelemetryLogs[".autoskillit/temp/<br/>━━━━━━━━━━<br/>Token/timing summaries<br/>Audit logs, PR bodies"] end OrderCmd --> RecipeResolve RecipeResolve --> SubsetGate SubsetGate --> PreviewRender PreviewRender --> UserConfirm UserConfirm -->|"confirmed"| PromptFn UserConfirm -->|"abort"| ABORT([ABORT]) PromptFn --> OpenKitchenLine PromptFn --> OrchestratorRules OpenKitchenLine --> LaunchCook OrchestratorRules --> LaunchCook LaunchCook --> ClaudeSession ClaudeSession --> OpenKitchenTool PreviousGap -.->|"eliminated"| OpenKitchenTool ClaudeSession --> SessionLogs ClaudeSession --> TelemetryLogs CookCmd -.->|"no prompt injection"| ClaudeSession class OrderCmd,CookCmd cli; class RecipeResolve,SubsetGate,PreviewRender phase; class UserConfirm stateNode; class PromptFn handler; class OpenKitchenLine newComponent; class OrchestratorRules phase; class LaunchCook handler; class ClaudeSession stateNode; class OpenKitchenTool output; class PreviousGap gap; class SessionLogs,TelemetryLogs output; class ABORT detector;Color Legend:
autoskillit orderandcookentry points_build_orchestrator_promptand_launch_cook_session● Mandatory first-action line— keyword argument form fixopen_kitchentool and observability logsCloses #527
Implementation Plan
Plan file:
/home/talon/projects/autoskillit-runs/remediation-20260326-210914-908214/.autoskillit/temp/rectify/rectify_session_id_channel_b_backfill_2026-03-26_210914_part_a.md🤖 Generated with Claude Code via AutoSkillit