Skip to content

perf: speed up iOS replay runner#557

Merged
thymikee merged 4 commits into
mainfrom
codex/maestro-benchmark-rebase-guard
May 19, 2026
Merged

perf: speed up iOS replay runner#557
thymikee merged 4 commits into
mainfrom
codex/maestro-benchmark-rebase-guard

Conversation

@thymikee
Copy link
Copy Markdown
Member

@thymikee thymikee commented May 18, 2026

Summary

Speeds up the iOS replay runner while keeping the reliability fixes that fell out of the Maestro comparison.

High-level changes that moved the needle:

  • Reuses XCTest runner build products safely with version/source/build-setting metadata and artifact manifests, so normal local runs avoid DerivedData cleanup and recursive xctestrun scans.
  • Adds build-time artifact manifests too, so pnpm build:xcuitest leaves the runtime cache in the fast-path state instead of forcing one follow-up scan.
  • Routes simple iOS selectors (id, label, text, value) through direct runner queries/taps for click, get, is, and the first wait probe, while preserving snapshot fallback for misses and transport failures.
  • Keeps wait/find snapshot polling fresh, so the 750ms snapshot cache cannot return stale UI during selector polling.
  • Preserves hard selector failures like ambiguous matches instead of silently falling back to snapshot and potentially touching/reading a different node.
  • Scopes replay actions to the active runner/provider, reducing repeated provider/session resolution in replay loops.
  • Keeps the iOS runner app headless for app sessions when an appBundleId is known, avoiding the visible helper-app hop.
  • Adds gesture stabilization and keyboard-aware dragging so faster replay actions still wait for the UI state that matters.
  • Improves iOS text entry by pinning coordinate-targeted fields, waiting for keyboard readiness, verifying append/replacement where appropriate, and repairing only through the pinned target.
  • Adds Maestro YAML replay support for the benchmark subset so the same form flow can be compared against Maestro.

Touched scope: 64 files. Scope expanded beyond the original benchmark prototype because the comparison exposed iOS runner startup, selector fallback, gesture, and text-entry reliability issues that affected the measured path.

Validation

Validated with cache-focused unit coverage, selector/interaction unit coverage, tooling checks, XCTest builds, and repeated iOS benchmark runs.

Checks that passed:

  • pnpm exec vitest run src/platforms/ios/__tests__/runner-client.test.ts src/platforms/ios/__tests__/runner-xctestrun.test.ts src/platforms/ios/__tests__/index.test.ts src/daemon/handlers/__tests__/snapshot-handler.test.ts
  • pnpm exec vitest run src/daemon/handlers/__tests__/interaction.test.ts src/daemon/handlers/__tests__/snapshot-handler.test.ts src/utils/__tests__/interactors.test.ts
  • node --check scripts/write-xcuitest-cache-metadata.mjs
  • pnpm build:xcuitest
  • pnpm format
  • pnpm check:quick
  • zsh -n .tmp/maestro-prototype/run-ios-benchmark.zsh
  • git diff --check

Cache reliability notes:

  • The xctestrun cache is invalidated by package version, runner source fingerprint, platform/device kind, destination family, bundle ids, signing settings, and performance build settings.
  • Cached artifact paths are trusted only when the xctestrun and every referenced product still exist with matching mtimes.
  • Stale/missing manifests fall back to scan or rebuild; package-version mismatches rebuild; stale worktree xctestruns are rejected; expected macOS runner repair failures rebuild once.
  • The only cache issue found in this pass was performance-only: build-time metadata previously omitted artifact paths, so the next runtime command had to scan DerivedData once. This PR now writes artifact manifests from the build script too. Verified locally that both iOS and macOS metadata contain an xctestrun path and three existing product paths after pnpm build:xcuitest.

Latest serial benchmark, after dropping one local close outlier:

Case Device / port Average Improvement vs 0.14.9
Local worktree agent-device iPhone 17 Pro / 8082 15.223s +74.4%
Maestro iPhone 17 / 8083 21.306s +64.1%
Global agent-device 0.14.9 baseline iPhone 16e / 8084 59.372s baseline

Parallel benchmark mode is also available for visual side-by-side inspection, but it is not the fair timing source on this machine: under three-simulator load the local runner replay stayed stable around 13.7s while app readiness moved between open and explicit wait.

@thymikee thymikee force-pushed the codex/maestro-benchmark-rebase-guard branch from ecba590 to 955a1f8 Compare May 18, 2026 17:40
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 18, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://callstackincubator.github.io/agent-device/pr-preview/pr-557/

Built to branch gh-pages at 2026-05-18 21:04 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ecba5906b8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/daemon/selector-runtime.ts Outdated
Comment on lines +322 to +337
while (Date.now() - startedAt < timeoutMs) {
const result = await queryDirectIosSelector(params, params.session!, selector);
if (result?.found) {
const payload = {
kind: 'selector',
selector: selector.raw,
waitedMs: Date.now() - startedAt,
selectorChain: [selector.raw],
directSelector: true,
};
recordIfSession(params.sessionStore, params.sessionName, params.req, payload);
return payload;
}
await sleep(DIRECT_WAIT_POLL_INTERVAL_MS);
}
throw new AppError('COMMAND_FAILED', `wait timed out for selector: ${selector.raw}`);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fall back when direct iOS wait cannot resolve

For iOS simple selectors this direct path now owns the whole wait loop, but queryDirectIosSelector returns null on runner errors/ambiguous matches and found: false for matches the runner filters out as non-hittable; in both cases this loop keeps polling until it throws instead of falling back to the snapshot selector runtime. That regresses cases the existing waitForSelector path supports, such as a selector that matches multiple nodes or a visible non-hittable/container node, because findSelectorChainMatch only needs at least one snapshot match while the runner-side findElement requires a unique hittable element.

Useful? React with 👍 / 👎.

@thymikee thymikee force-pushed the codex/maestro-benchmark-rebase-guard branch 4 times, most recently from 6326daf to c68fd4c Compare May 18, 2026 18:42
@thymikee thymikee merged commit 094c290 into main May 19, 2026
19 checks passed
@thymikee thymikee deleted the codex/maestro-benchmark-rebase-guard branch May 19, 2026 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant