Skip to content

fix: surface stderr excerpts from hand-rolled exec failure wraps#1072

Merged
thymikee merged 2 commits into
mainfrom
claude/magical-roentgen-29adbe
Jul 4, 2026
Merged

fix: surface stderr excerpts from hand-rolled exec failure wraps#1072
thymikee merged 2 commits into
mainfrom
claude/magical-roentgen-29adbe

Conversation

@thymikee

@thymikee thymikee commented Jul 4, 2026

Copy link
Copy Markdown
Member

What

Closes the biggest gap from the July 2026 error-system audit: ~42 hand-rolled COMMAND_FAILED wrap sites now set processExitError: true, so normalizeError surfaces the first meaningful stderr line to the user instead of a generic message like xcrun exited with code 22.

  • New helper execFailureDetails(result, extra?) in src/utils/exec.ts: returns the {stdout, stderr, exitCode} spread plus processExitError: true. Extras spread after the base keys, so call sites keep their stderr truncation (.slice(0, 500)) or String() normalization as overrides. createExitError now uses it too.
  • Apple (13 files, 27 sites): apps.ts launch/terminate/uninstall/clipboard paths (the audit's worst offenders), devicectl.ts, simulator.ts boot/bootstatus, perf.ts, perf-xctrace.ts, screenshot-status-bar.ts, runner-icon/runner-transport/runner-artifact-env, dsym.ts, symbolication.ts, macOS host-provider.ts.
  • Android/web (10 files, 15 sites): app-helpers, logcat, devices, perf, settings, snapshot(-helper) modules, app-lifecycle, device-input-state, multitouch-helper, agent-browser-provider.

Why

Users (and agents driving the CLI) were getting COMMAND_FAILED: xcrun exited with code 22 with a generic hint while the actionable cause sat unread in details.stderr. The kernel enrichment already existed — only createExitError set the flag it keys on.

What was deliberately NOT converted

Only throws guarded purely by exitCode !== 0 got the flag. Left alone, with reasons:

  • throws reachable at exit 0 (parse failures, still booting stdout checks, simctl privacy-services detection) — flag would be semantically wrong;
  • failures.push aggregations (simctl biometric, android settings) — not single-exec throws;
  • timeout/forced-kill errors — their specific message beats a stderr line (matches the existing unflagged createTimeoutError);
  • macos/helper.ts — already builds a richer message from the helper's JSON error envelope;
  • runner-contract.ts xcodebuild-early-exit factory — nested xcodebuild details, exit 0 possible;
  • maestro run-script.ts — crafted message with deliberate stdout omission.

Reviewer notes

  • Converted sites that previously omitted stdout/exitCode from details (e.g. runner-icon.ts, dsym.ts) now include them — strictly more diagnostic info, and details.stdout/stderr still pass through the existing redaction in normalizeError.
  • refactor: split oversized session and apple app files #1061 splits apple/core/apps.ts; the six conversions there are small per-site replacements, so whichever lands second rebases with local conflicts only at those throw sites.
  • The audit referenced docs/adr/0010-error-system.md, but that ADR never landed on any branch; this follows the kernel's observable convention directly.

Testing

  • pnpm check:quick (oxlint + tsc) clean, pnpm format applied.
  • Full unit suite: 316 files, 3056 tests pass.

@thymikee

thymikee commented Jul 4, 2026

Copy link
Copy Markdown
Member Author

This PR is currently merge-dirty against main, so it is not ready for review/CI judgment yet. Please rebase or merge current origin/main, resolve conflicts, and push the updated branch before requesting review.

normalizeError only replaces a generic COMMAND_FAILED message with the
first meaningful stderr line when details.processExitError is true, but
only createExitError in utils/exec.ts set that flag. Hand-rolled wrap
sites (throw new AppError('COMMAND_FAILED', msg, {stdout, stderr,
exitCode}) after an allowFailure run) missed it, so users saw messages
like "xcrun exited with code 22" with a generic hint instead of the
actual stderr excerpt.

Add execFailureDetails(result, extra?) to utils/exec.ts returning the
stdout/stderr/exitCode spread plus processExitError: true (extras spread
after the base keys so call sites can keep truncation or String()
normalization), and convert ~42 exit-guarded wrap sites across the apple
platform (apps launch/terminate paths, devicectl, simulator boot, perf,
perf-xctrace, runner modules, dsym/symbolication, macOS host provider)
and android/web (app-helpers, logcat, devices, perf, settings,
snapshot helpers, app-lifecycle, device-input-state, multitouch-helper,
agent-browser provider).

Sites reachable at exit 0 (parse/stdout checks), failure aggregations,
timeout/forced-kill errors, and sites that already build richer messages
(macos helper JSON, runner-contract early-exit, maestro run-script) are
deliberately left unflagged so stderr enrichment never replaces a more
specific message.
@thymikee thymikee force-pushed the claude/magical-roentgen-29adbe branch from e78d76f to 89d742e Compare July 4, 2026 07:25
@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown

Size Report

Metric Base Current Diff
JS raw 1.5 MB 1.5 MB -1.4 kB
JS gzip 479.6 kB 479.6 kB 0 B
npm tarball 581.1 kB 581.0 kB -98 B
npm unpacked 2.1 MB 2.1 MB -1.4 kB

Startup median (7 runs, lower is better):

Scenario Base Current Diff
CLI --version 25.9 ms 26.3 ms +0.4 ms
CLI --help 47.3 ms 47.0 ms -0.3 ms

Top changed chunks:

Chunk Raw diff Gzip diff
dist/src/9722.js -893 B -75 B
dist/src/8875.js +113 B +41 B
dist/src/apps.js -377 B -30 B
dist/src/android.js -116 B +12 B
dist/src/debug-symbols.js +2 B +2 B

…cing them

Spreading processExitError to curated call sites made normalizeError
replace specific messages like 'uiautomator dump did not return XML'
with the bare stderr line, losing the operation context (caught by the
provider-failures integration test). Keep wholesale replacement only for
the information-free generic '<tool> exited with code N' wraps; curated
messages now keep their specific reason and gain the stderr excerpt as a
': <excerpt>' suffix (skipped when already present).

Separate commit rather than amend so the semantics change to
maybeEnrichCommandFailedMessage stays individually reviewable on the PR.
@thymikee thymikee added the ready-for-human Valid work that needs human implementation, judgment, or maintainer merge label Jul 4, 2026
@thymikee thymikee merged commit 50a6a35 into main Jul 4, 2026
21 checks passed
@thymikee thymikee deleted the claude/magical-roentgen-29adbe branch July 4, 2026 08:25
@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-07-04 08:25 UTC

thymikee added a commit that referenced this pull request Jul 4, 2026
Review follow-up closing two classifier coverage gaps:

1. allowFailure paths bypassed the classifier: the executor wrapper only
enriches THROWN errors, but many Android flows run adb with allowFailure,
inspect the nonzero result, and throw a fresh AppError built from its
stdout/stderr — so e.g. "device offline" during uninstall still surfaced
the generic hint. There was no common construction point for these
errors, so androidAdbResultError in adb-executor.ts now IS that point:
it builds the COMMAND_FAILED error from a tolerated result and runs it
through the classifier, keeping hint strings central. The
result-inspecting throw sites (app launch/uninstall, appearance read,
keyboard/clipboard queries, package listing, logcat capture, helper APK
installs, heap-dump pull, uiautomator dump) route through it; a site
hint (heap-dump pull) still wins because attachAdbFailureHint never
overwrites.

Composes with #1072 rather than replacing it: androidAdbResultError
builds its details via execFailureDetails for nonzero exits, so
normalizeError still suffixes the curated message with the first stderr
line while the classified hint/retriable/adbFailure ride along. Semantic
failures thrown at exit 0 (the am start error-on-stdout path) stay
unflagged, matching #1072's deliberate exit-0 exclusion.

2. Provider-scoped enrichment only wrapped exec, but the transfer
helpers prefer the semantic provider methods, so a provider whose
install rejects with INSTALL_FAILED output kept the generic hint.
Enrichment now lives in normalizeAndroidAdbProvider — the single funnel
every provider passes through (scope installation and explicitly passed
providers alike) — wrapping exec plus pull/install/installBundle; a
WeakSet keeps repeated normalization from stacking wrappers. The local
provider needs no wrap since its methods delegate to the already
enriched serial executor.

Tests: androidAdbResultError classification, hint-plus-excerpt
composition through normalizeError, exit-0 no-excerpt guard, site-hint
precedence, and provider install/pull classification in
adb-executor.test.ts; an allowFailure regression on a real production
path (keyboard state query with a nonzero device-offline result) in
device-input-state.test.ts.
thymikee added a commit that referenced this pull request Jul 4, 2026
…el (#1067)

* feat: classify adb failures with actionable hints at the executor level

Android COMMAND_FAILED errors surfaced raw adb stderr with only the
generic retry-with-debug hint; the sole stderr classification lived in
snapshot.ts and was retry-only. Add a central classifier in
adb-executor.ts that recognizes the common adb failure families (device
offline/unauthorized/not found, more than one device, no devices,
server version mismatch, connection drops, INSTALL_FAILED_* variants)
and attaches the resolved hint — plus retriable for clearly transient
families — to the AppError details in both executor funnels (local
serial exec and provider-scoped exec), so every adb call site benefits
without per-site changes. The adb devices -l discovery call is wrapped
too, where server-mismatch/no-devices failures actually surface.

snapshot.ts now consumes the shared classifier for its retry decision,
keeping only its snapshot-specific dump-race patterns. normalizeError
lifts details.retriable to the top-level error (mirroring the hint
lift) and the daemon typed-error graft lets a throw-site classification
win over the code-level policy. Also fix the port-reverse removal path
throwing a bare Error that surfaced as UNKNOWN.

Update session-appstate-input-perf.test.ts: its Android sampling-failure
fixture uses stderr "error: device offline", which the classifier now
recognizes, so error.hint carries the actionable adb-reconnect hint
instead of the generic "retry with --debug". The result shape (hint
string, details.metric/package, code) is unchanged — only the hint text
improved, which is this change's intent — so the test expectation moves
to the new hint rather than reverting production.

* feat: classify tolerated adb failures and semantic provider methods

Review follow-up closing two classifier coverage gaps:

1. allowFailure paths bypassed the classifier: the executor wrapper only
enriches THROWN errors, but many Android flows run adb with allowFailure,
inspect the nonzero result, and throw a fresh AppError built from its
stdout/stderr — so e.g. "device offline" during uninstall still surfaced
the generic hint. There was no common construction point for these
errors, so androidAdbResultError in adb-executor.ts now IS that point:
it builds the COMMAND_FAILED error from a tolerated result and runs it
through the classifier, keeping hint strings central. The
result-inspecting throw sites (app launch/uninstall, appearance read,
keyboard/clipboard queries, package listing, logcat capture, helper APK
installs, heap-dump pull, uiautomator dump) route through it; a site
hint (heap-dump pull) still wins because attachAdbFailureHint never
overwrites.

Composes with #1072 rather than replacing it: androidAdbResultError
builds its details via execFailureDetails for nonzero exits, so
normalizeError still suffixes the curated message with the first stderr
line while the classified hint/retriable/adbFailure ride along. Semantic
failures thrown at exit 0 (the am start error-on-stdout path) stay
unflagged, matching #1072's deliberate exit-0 exclusion.

2. Provider-scoped enrichment only wrapped exec, but the transfer
helpers prefer the semantic provider methods, so a provider whose
install rejects with INSTALL_FAILED output kept the generic hint.
Enrichment now lives in normalizeAndroidAdbProvider — the single funnel
every provider passes through (scope installation and explicitly passed
providers alike) — wrapping exec plus pull/install/installBundle; a
WeakSet keeps repeated normalization from stacking wrappers. The local
provider needs no wrap since its methods delegate to the already
enriched serial executor.

Tests: androidAdbResultError classification, hint-plus-excerpt
composition through normalizeError, exit-0 no-excerpt guard, site-hint
precedence, and provider install/pull classification in
adb-executor.test.ts; an allowFailure regression on a real production
path (keyboard state query with a nonzero device-offline result) in
device-input-state.test.ts.
thymikee added a commit that referenced this pull request Jul 4, 2026
Follow-up to #1072, closing the structural gaps that let the
missing-processExitError bug class exist:

- requireExecSuccess(result, message, extra?) in utils/exec.ts: guards an
  allowFailure result and throws the curated COMMAND_FAILED (flag set via
  execFailureDetails) itself. Result-side by design — tool providers and
  executor overrides return results without throwing, so an ExecOptions
  knob interpreted at spawn time would silently not fire on those paths.
  ~30 pure guard-and-throw sites across apple/android/web converted; sites
  with tolerance branches, cleanup, or exit-0 reachability keep their
  explicit shape.

- Source-scan guard (src/__tests__/exec-wrap-guard.test.ts): fails when an
  AppError('COMMAND_FAILED', ...) details literal rebuilds the
  stdout/stderr/exitCode trio inline without the helper; intentional
  holdouts opt out with a documented exec-guard-allow comment. The guard
  immediately found three wrap sites the July audit missed (runtime-hints
  run-as probe, device-ready not_ready branch, perf export table) — now
  converted.

- coerceExecResult at the provider boundaries (executor overrides, apple
  tool provider scope, android adb provider scope): SDK-supplied callbacks
  cross an unchecked boundary; coercing once there replaces the per-site
  String(result.stdout ?? '') defensiveness, which is removed.

- normalizeError stderr excerpts now strip noise prefixes (adb:/xcrun:/
  simctl:/error:) before rendering; skip/strip lists stay in the kernel
  deliberately (platforms cannot hook normalizeError) with a comment
  marking the registry escape hatch if they grow.

- AppErrorDetails exported and documented in kernel/errors.ts: the magic
  keys (hint, processExitError, retriable, reason, diagnosticId, logPath,
  stdout/stderr/exitCode) now carry types and doc comments.

- runIosDevicectl gains tolerateOutput; the devicectl uninstall path in
  app-install.ts reuses it instead of duplicating the wrap + hint logic
  (its failure hint now falls back to the devicectl default hint).
thymikee added a commit that referenced this pull request Jul 4, 2026
* refactor: harden exec failure wrapping end to end

Follow-up to #1072, closing the structural gaps that let the
missing-processExitError bug class exist:

- requireExecSuccess(result, message, extra?) in utils/exec.ts: guards an
  allowFailure result and throws the curated COMMAND_FAILED (flag set via
  execFailureDetails) itself. Result-side by design — tool providers and
  executor overrides return results without throwing, so an ExecOptions
  knob interpreted at spawn time would silently not fire on those paths.
  ~30 pure guard-and-throw sites across apple/android/web converted; sites
  with tolerance branches, cleanup, or exit-0 reachability keep their
  explicit shape.

- Source-scan guard (src/__tests__/exec-wrap-guard.test.ts): fails when an
  AppError('COMMAND_FAILED', ...) details literal rebuilds the
  stdout/stderr/exitCode trio inline without the helper; intentional
  holdouts opt out with a documented exec-guard-allow comment. The guard
  immediately found three wrap sites the July audit missed (runtime-hints
  run-as probe, device-ready not_ready branch, perf export table) — now
  converted.

- coerceExecResult at the provider boundaries (executor overrides, apple
  tool provider scope, android adb provider scope): SDK-supplied callbacks
  cross an unchecked boundary; coercing once there replaces the per-site
  String(result.stdout ?? '') defensiveness, which is removed.

- normalizeError stderr excerpts now strip noise prefixes (adb:/xcrun:/
  simctl:/error:) before rendering; skip/strip lists stay in the kernel
  deliberately (platforms cannot hook normalizeError) with a comment
  marking the registry escape hatch if they grow.

- AppErrorDetails exported and documented in kernel/errors.ts: the magic
  keys (hint, processExitError, retriable, reason, diagnosticId, logPath,
  stdout/stderr/exitCode) now carry types and doc comments.

- runIosDevicectl gains tolerateOutput; the devicectl uninstall path in
  app-install.ts reuses it instead of duplicating the wrap + hint logic
  (its failure hint now falls back to the devicectl default hint).

* fix: reconcile exec hardening with the adb failure classifier

Main landed the central adb failure classifier (androidAdbResultError +
withAdbFailureHintProvider) while this branch was in flight. Resolution:

- Android call sites keep main's androidAdbResultError form — it composes
  execFailureDetails with the classified adb hint, which is strictly
  richer than a plain requireExecSuccess conversion for adb invocations.
  requireExecSuccess remains the shape for non-adb tools and apple/web.
- Provider result coercion folds into withAdbFailureHintProvider's
  enrichment pass (exec/pull/install), so the existing WeakSet memo also
  prevents coercer stacking.
- The reverse-remove wrap in adb-executor now uses androidAdbResultError
  instead of hand-rolling the trio (flagged by the exec-wrap guard).
- Guard-test scan split into helpers (fallow cyclomatic threshold) and
  the perf artifact-tail clone carries fallow-ignore markers on both
  platforms, matching the pre-existing marker on the apple side.
- Two expectations updated for the stderr noise-prefix strip: classifier
  compose test and perf sampling reason ('device offline', no 'error:').
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-human Valid work that needs human implementation, judgment, or maintainer merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant