fix(headless): accept clean exit + empty parent stream for context:fork skills (#160)#813
Merged
Merged
Conversation
… context:fork skills (#160) Headless tasks that invoke skills with `context: fork` in their frontmatter were failing with HTTP 500 "Task returned empty response", silently failing every scheduled invocation. The issue documented 8 consecutive daily failures on one agent. Root cause: the fork mechanism in Claude Code runs the skill's work in a sub-context whose output never reaches the parent stdout. The parent process exits cleanly with `return_code == 0` and a populated `result` line (`cost_usd` and `duration_ms` set — that's why `_classify_empty_result` returns None and execution falls through to the "build response_text" block). Pre-#160 the unconditional `if not response_text: raise 500` at the bottom of `_finalize_headless_result` had no notion of fork-style skills. Fix: when the parent process reports clean completion (`return_code == 0` AND `metadata.cost_usd is not None`), trust it. Synthesize a short placeholder response instead of raising 500. Real plumbing failures (lost result line, dropped stdout pipe, child held stdout) are already caught by `_classify_empty_result` earlier in the function and never reach this branch — the new code path is exclusively for the "claude completed cleanly with no parent-stream output" case. Tests: tests/unit/test_headless_context_fork_empty_response.py (5 cases). Pins the placeholder fires only when both guards hold, the real-output case is untouched, non-zero exits still error, and the #520 lost-result case still returns 502 via `_classify_empty_result`. Relates to #160. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same fix as the #807 follow-up: tests/lint_sys_modules.py flags bare sys.modules mutations at module scope. tests/unit/conftest.py's _preload_real_agent_server() already registers the namespace package, so plain `from agent_server.<sub> import X` resolves. 5 tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
vybe
approved these changes
May 12, 2026
Contributor
vybe
left a comment
There was a problem hiding this comment.
Validated via /validate-pr. Two-condition guard (return_code == 0 AND cost_usd is not None) is the right signal — correctly rules out the #520 lost-result-line case. 5 test cases cover all the relevant branches including the regression pin. LGTM.
AndriiPasternak31
added a commit
that referenced
this pull request
May 13, 2026
PR #797 (#678) changed _classify_empty_result's HTTP 502 detail from a plain string to a structured dict carrying salvage telemetry. The test added by #813 (test_clean_exit_but_missing_cost_falls_to_empty_result_classifier) still called .lower() on the raw detail and now hits AttributeError. Mirror the canonical pattern from tests/unit/test_empty_result_classification.py: assert detail is a dict, then read the human-readable text out of detail['message'] before doing substring checks. No production behavior change — only the assertion plumbing was stale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AndriiPasternak31
added a commit
that referenced
this pull request
May 13, 2026
PR #797 (#678) changed _classify_empty_result's HTTP 502 detail from a plain string to a structured dict carrying salvage telemetry. The test added by #813 (test_clean_exit_but_missing_cost_falls_to_empty_result_classifier) still called .lower() on the raw detail and now hits AttributeError. Mirror the canonical pattern from tests/unit/test_empty_result_classification.py: assert detail is a dict, then read the human-readable text out of detail['message'] before doing substring checks. No production behavior change — only the assertion plumbing was stale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AndriiPasternak31
added a commit
that referenced
this pull request
May 13, 2026
PR #797 (#678) changed _classify_empty_result's HTTP 502 detail from a plain string to a structured dict carrying salvage telemetry. The test added by #813 (test_clean_exit_but_missing_cost_falls_to_empty_result_classifier) still called .lower() on the raw detail and now hits AttributeError. Mirror the canonical pattern from tests/unit/test_empty_result_classification.py: assert detail is a dict, then read the human-readable text out of detail['message'] before doing substring checks. No production behavior change — only the assertion plumbing was stale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the failure mode in #160: scheduled headless tasks that invoke
skills with `context: fork` in their frontmatter were 500'ing with
"Task returned empty response". The issue documented 8 consecutive
daily failures on one agent.
Root cause
Claude Code's `context: fork` mechanism runs the skill's work in a
sub-context whose output never reaches the parent process's stdout.
The parent exits cleanly with `return_code == 0` and a populated
`result` line — so `_classify_empty_result` returns None and the
flow falls through to the end of `_finalize_headless_result`, where
the unconditional `if not response_text: raise 500` fires.
Fix
When the parent process reports clean completion (`return_code == 0`
AND `metadata.cost_usd is not None`), synthesize a placeholder
response instead of raising 500. Other failure modes — non-zero exits,
auth failures, the #520 lost-result-line case — are caught by branches
earlier in `_finalize_headless_result` and never reach this code.
Why not Option C ("check tool_count > 0")
`metadata.tool_count` reflects only parent-stream tools. Fork-context
tools don't emit to the parent stream, so they aren't counted. Using
`tool_count > 0` would defeat the purpose for the exact case the
issue reports. The clean-exit + cost-set guard is the strongest signal
we have that claude itself reported success.
Test plan
Relates to #160.
🤖 Generated with Claude Code