Skip to content

fix(headless): accept clean exit + empty parent stream for context:fork skills (#160)#813

Merged
vybe merged 2 commits into
devfrom
feature/160-headless-context-fork-empty-response
May 12, 2026
Merged

fix(headless): accept clean exit + empty parent stream for context:fork skills (#160)#813
vybe merged 2 commits into
devfrom
feature/160-headless-context-fork-empty-response

Conversation

@dolho
Copy link
Copy Markdown
Contributor

@dolho dolho commented May 12, 2026

Summary

Closes the failure mode in #160: scheduled headless tasks that invoke
skills with `context: fork` in their frontmatter were 500'ing with
"Task returned empty response". The issue documented 8 consecutive
daily failures on one agent.

Root cause

Claude Code's `context: fork` mechanism runs the skill's work in a
sub-context whose output never reaches the parent process's stdout.
The parent exits cleanly with `return_code == 0` and a populated
`result` line — so `_classify_empty_result` returns None and the
flow falls through to the end of `_finalize_headless_result`, where
the unconditional `if not response_text: raise 500` fires.

Fix

When the parent process reports clean completion (`return_code == 0`
AND `metadata.cost_usd is not None`), synthesize a placeholder
response instead of raising 500. Other failure modes — non-zero exits,
auth failures, the #520 lost-result-line case — are caught by branches
earlier in `_finalize_headless_result` and never reach this code.

Why not Option C ("check tool_count > 0")

`metadata.tool_count` reflects only parent-stream tools. Fork-context
tools don't emit to the parent stream, so they aren't counted. Using
`tool_count > 0` would defeat the purpose for the exact case the
issue reports. The clean-exit + cost-set guard is the strongest signal
we have that claude itself reported success.

Test plan

  • `tests/unit/test_headless_context_fork_empty_response.py` — 5 cases:
    • Clean exit + empty parts → placeholder
    • Clean exit + real output → unchanged
    • Non-zero exit + empty parts → still errors (does NOT placeholder)
    • Clean exit but missing cost (lost result line) → 502 via classifier
    • Source-level pin that the branch + its guards survive a future refactor
  • Neighboring tests (`test_empty_result_classification`, `test_claude_code_result_recovery`) still green — no contract break.
  • Real-world: a fleet agent with a known `context: fork` skill runs to success via the scheduler.

Relates to #160.

🤖 Generated with Claude Code

… context:fork skills (#160)

Headless tasks that invoke skills with `context: fork` in their frontmatter
were failing with HTTP 500 "Task returned empty response", silently
failing every scheduled invocation. The issue documented 8 consecutive
daily failures on one agent.

Root cause: the fork mechanism in Claude Code runs the skill's work in
a sub-context whose output never reaches the parent stdout. The parent
process exits cleanly with `return_code == 0` and a populated `result`
line (`cost_usd` and `duration_ms` set — that's why
`_classify_empty_result` returns None and execution falls through to
the "build response_text" block). Pre-#160 the unconditional
`if not response_text: raise 500` at the bottom of
`_finalize_headless_result` had no notion of fork-style skills.

Fix: when the parent process reports clean completion (`return_code == 0`
AND `metadata.cost_usd is not None`), trust it. Synthesize a short
placeholder response instead of raising 500. Real plumbing failures
(lost result line, dropped stdout pipe, child held stdout) are already
caught by `_classify_empty_result` earlier in the function and never
reach this branch — the new code path is exclusively for the
"claude completed cleanly with no parent-stream output" case.

Tests: tests/unit/test_headless_context_fork_empty_response.py (5 cases).
Pins the placeholder fires only when both guards hold, the real-output
case is untouched, non-zero exits still error, and the #520 lost-result
case still returns 502 via `_classify_empty_result`.

Relates to #160.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same fix as the #807 follow-up: tests/lint_sys_modules.py flags bare
sys.modules mutations at module scope. tests/unit/conftest.py's
_preload_real_agent_server() already registers the namespace package,
so plain `from agent_server.<sub> import X` resolves.

5 tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dolho dolho requested a review from vybe May 12, 2026 11:04
Copy link
Copy Markdown
Contributor

@vybe vybe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validated via /validate-pr. Two-condition guard (return_code == 0 AND cost_usd is not None) is the right signal — correctly rules out the #520 lost-result-line case. 5 test cases cover all the relevant branches including the regression pin. LGTM.

@vybe vybe merged commit cd71d0c into dev May 12, 2026
9 checks passed
AndriiPasternak31 added a commit that referenced this pull request May 13, 2026
PR #797 (#678) changed _classify_empty_result's HTTP 502 detail from a
plain string to a structured dict carrying salvage telemetry. The test
added by #813 (test_clean_exit_but_missing_cost_falls_to_empty_result_classifier)
still called .lower() on the raw detail and now hits AttributeError.

Mirror the canonical pattern from tests/unit/test_empty_result_classification.py:
assert detail is a dict, then read the human-readable text out of
detail['message'] before doing substring checks. No production behavior
change — only the assertion plumbing was stale.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AndriiPasternak31 added a commit that referenced this pull request May 13, 2026
PR #797 (#678) changed _classify_empty_result's HTTP 502 detail from a
plain string to a structured dict carrying salvage telemetry. The test
added by #813 (test_clean_exit_but_missing_cost_falls_to_empty_result_classifier)
still called .lower() on the raw detail and now hits AttributeError.

Mirror the canonical pattern from tests/unit/test_empty_result_classification.py:
assert detail is a dict, then read the human-readable text out of
detail['message'] before doing substring checks. No production behavior
change — only the assertion plumbing was stale.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AndriiPasternak31 added a commit that referenced this pull request May 13, 2026
PR #797 (#678) changed _classify_empty_result's HTTP 502 detail from a
plain string to a structured dict carrying salvage telemetry. The test
added by #813 (test_clean_exit_but_missing_cost_falls_to_empty_result_classifier)
still called .lower() on the raw detail and now hits AttributeError.

Mirror the canonical pattern from tests/unit/test_empty_result_classification.py:
assert detail is a dict, then read the human-readable text out of
detail['message'] before doing substring checks. No production behavior
change — only the assertion plumbing was stale.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants