Skip to content

fix: macOS Tart E2E infrastructure β€” lsappinfo, a11y via SSH, provisioning#90

Merged
Miyamura80 merged 16 commits intomasterfrom
fix/macos-tart-e2e-infra
Mar 31, 2026
Merged

fix: macOS Tart E2E infrastructure β€” lsappinfo, a11y via SSH, provisioning#90
Miyamura80 merged 16 commits intomasterfrom
fix/macos-tart-e2e-infra

Conversation

@Eito-Test-Account
Copy link
Copy Markdown
Contributor

Summary

Fixes the macOS Tart VM E2E testing infrastructure so that the full pipeline works end-to-end: VM boot β†’ desktop readiness β†’ app deploy β†’ agent loop (with accessibility tree) β†’ evaluation β†’ artifact collection.

These changes were developed and tested on an Apple Silicon Mac mini running macOS 26.2 with Tart 2.32.0.

What was done

1. Replace osascript with lsappinfo for GUI process detection (readiness.rs)

  • get_gui_process_list() used osascript β†’ System Events, which requires TCC Automation permission
  • In Tart VMs, this hangs indefinitely because the vm-agent (Python LaunchAgent) can't get Automation permission programmatically
  • Replaced with lsappinfo visibleProcessList which provides the same information without any TCC permissions

2. Enable accessibility tree extraction via SSH localhost (observation.rs)

  • The vm-agent runs as a LaunchAgent, which gets a restricted Aqua session from macOS
  • Direct subprocess calls from this context return empty/minimal accessibility trees (only headers, no element data)
  • SSH sessions inherit full TCC permissions from sshd-keygen-wrapper (pre-granted in Tart base images)
  • MACOS_A11Y_CMD now uses ssh -o BatchMode=yes localhost /usr/local/bin/a11y-helper β€” this yields 976+ lines of UI element data vs ~22 empty lines before

3. Fix golden image provisioning (init_macos.rs)

  • Install execute-action.py: the PyAutoGUI executor script was never copied to the VM β€” agent loop couldn't execute any actions
  • SSH key setup: passwordless ed25519 keys for localhost, required by the SSH-based a11y extraction
  • TCC permission grants: insert entries with proper csreq blobs (generated via codesign -d -r- + csreq tool) for a11y-helper, Python, and screencapture
  • Homebrew PATH: add /opt/homebrew/bin to /etc/paths.d/homebrew so all processes find the right python3
  • Updated success message: now mentions TCC grants are automated (requires SIP disabled in base image)

4. Fix vm-agent LaunchAgent PATH (vm-agent-install.sh)

  • The LaunchAgent plist had no EnvironmentVariables, so the default PATH (/usr/bin:/bin:/usr/sbin:/sbin) was used
  • Subprocesses (like python3 /usr/local/bin/execute-action) resolved to system Python which doesn't have PyAutoGUI
  • Added EnvironmentVariables with Homebrew PATH to the plist

5. Make a11y-helper trust check non-fatal (main.swift)

  • AXIsProcessTrustedWithOptions returns false in LaunchAgent context even when actual AX API calls succeed
  • Changed from exit(1) to a stderr warning β€” the tree extraction continues and produces data

Issues encountered and resolved

Issue Root cause Fix
osascript hangs in VM TCC Automation permission not grantable programmatically for LaunchAgent processes Replaced with lsappinfo (no TCC needed)
Empty accessibility tree LaunchAgent gets restricted Aqua session, AX API returns minimal data Route a11y-helper through ssh localhost which gets proper session
execute-action not found docker/execute-action.py was never copied during provisioning Added copy step to provision_vm() and install step to provisioning script
PyAutoGUI import errors vm-agent subprocess used system Python (no PyAutoGUI) Added EnvironmentVariables with Homebrew PATH to LaunchAgent plist
TCC DB entries not honored Entries had NULL csreq column; modern macOS requires code signing requirement blobs Generate csreq via codesign -d -r- + csreq tool and insert as hex blob
SSH keys not persisting tart stop doesn't flush VM filesystem Use sudo shutdown -h now for graceful shutdown before tart clone
AXIsProcessTrustedWithOptions false positive LaunchAgent context reports untrusted even when AX calls work Made check non-fatal (warning instead of exit)

What is NOT solved yet

  • 60s step timeout too tight for claude-cli provider: with full a11y data (~976 lines), Claude CLI calls sometimes exceed 60s, causing retries and eventual timeout. Needs a configurable or provider-aware timeout.
  • Agent struggles with macOS TextEdit UI: the agent can't reliably open a new document (File > New / Cmd+N don't seem to work). This is an agent behavior issue, not infrastructure. May need macOS-specific agent prompting or a simpler test app.
  • /home/tester artifact collection: artifacts.rs still tries to collect from Linux home dir path for Tart sessions. Should be skipped or use /Users/admin.
  • Golden image not versioned: desktest-macos:latest has no version tagging strategy, risking drift.
  • Electron test image not created: desktest-macos-electron:latest doesn't exist yet (would need --with-electron flag during init).

Test plan

  • cargo test β€” all 524 unit tests + 3 validation tests pass
  • doctor_shows_tart_status β€” passes on Apple Silicon with Tart installed
  • Manual E2E: desktest run examples/macos-textedit.json --provider claude-cli β€” infrastructure works (VM boots, a11y tree populated with 976+ lines, agent loop runs, evaluation works). Test fails at agent level (can't complete TextEdit task), not infrastructure.
  • desktest init-macos fresh provisioning β€” not re-run after code changes (would take ~10 min to pull + provision). Manual provisioning verified all steps work individually.

πŸ€– Generated with Claude Code

Edison and others added 2 commits March 31, 2026 00:15
osascript calls to System Events require TCC Automation permission,
which hangs indefinitely in Tart VMs without pre-granted TCC access.
lsappinfo visibleProcessList provides the same information without
any TCC permissions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The vm-agent runs as a LaunchAgent which gets a restricted Aqua session,
returning an empty accessibility tree even with TCC permissions granted.
SSH sessions inherit full TCC permissions from sshd-keygen-wrapper, so
running a11y-helper via `ssh localhost` gives complete UI element data.

Changes:
- observation.rs: MACOS_A11Y_CMD now uses ssh localhost to invoke a11y-helper
- init_macos.rs: provisioning sets up passwordless SSH keys, installs
  execute-action.py, grants TCC permissions with proper csreq blobs,
  and configures Homebrew PATH
- vm-agent-install.sh: add EnvironmentVariables with Homebrew PATH to
  the LaunchAgent plist so subprocesses find the right python3
- a11y-helper main.swift: make AXIsProcessTrustedWithOptions check
  non-fatal (warning instead of exit) since AX API calls may succeed
  even when the check returns false

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 31, 2026

Greptile Summary

This PR completes the macOS Tart VM E2E testing infrastructure, resolving a set of interconnected issues discovered during a first real E2E run on Apple Silicon. The changes address seven distinct failure modes β€” from osascript hanging indefinitely due to missing TCC Automation permission, to empty accessibility trees from a restricted LaunchAgent Aqua session, to missing executor scripts and incorrect TCC database entries β€” producing a working pipeline from VM boot through agent loop and evaluation.

Key changes:

  • readiness.rs: osascript β†’ lsappinfo visibleProcessList (no TCC required)
  • observation.rs: a11y extraction routed through ssh localhost to obtain a proper Aqua session (976+ lines of UI data vs ~22 previously)
  • init_macos.rs: provisioning now installs execute-action.py, sets up idempotent SSH keys, writes Homebrew PATH to /etc/paths.d, inserts TCC grants with csreq blobs, and ends with sudo shutdown -h now for guaranteed filesystem flush before tart clone
  • vm-agent-install.sh: LaunchAgent plist gains EnvironmentVariables with Homebrew PATH so subprocesses resolve the correct python3
  • main.swift: AXIsProcessTrustedWithOptions demotion from exit(1) to a stderr warning

Minor findings:

  • The second line of the /etc/paths.d/homebrew write uses tee -a (append), so re-running provisioning will accumulate a duplicate /opt/homebrew/sbin entry β€” harmless due to path_helper deduplication, but inconsistent with the careful idempotency applied to authorized_keys
  • If codesign -d -r- or the csreq tool fails silently inside grant_tcc, CSREQ_SQL falls back to NULL with no diagnostic output β€” the exact broken state this PR fixes for TCC grants on modern macOS

Confidence Score: 5/5

Safe to merge; all remaining findings are P2 style/observability issues with no functional impact on the happy path.

Previously flagged P1 issues (authorized_keys idempotency, SQL injection via unsanitized paths, PYTHON_BIN resolving the wrong binary) are all addressed in prior commits. The two remaining findings are P2: the tee -a duplication is neutralized by path_helper deduplication, and the silent csreq fallback only manifests if codesign/csreq tooling fails (unlikely on a Homebrew-equipped image). All 524 unit tests pass and the infrastructure is manually verified end-to-end.

src/init_macos.rs β€” two minor idempotency/observability issues in the generated provisioning script

Important Files Changed

Filename Overview
src/init_macos.rs Core provisioning logic overhauled: adds execute-action.py copy, SSH key setup (now idempotent), Homebrew PATH config (minor re-run idempotency issue on sbin line), TCC grants with csreq blobs (csreq failures are silent), and graceful VM shutdown. Two minor P2 issues found.
src/observation.rs MACOS_A11Y_CMD updated to route a11y-helper through ssh localhost with BatchMode and 5s timeout; test assertions updated to match. Well-documented rationale for the SSH workaround.
src/tart/readiness.rs Replaces osascript/System Events with lsappinfo visibleProcessList; output parsing correctly extracts quoted app names from ASN format lines.
macos/vm-agent-install.sh Adds EnvironmentVariables dict with Homebrew PATH to the LaunchAgent plist, fixing the system-Python resolution issue for vm-agent subprocesses.
macos/a11y-helper/Sources/A11yHelperCLI/main.swift Changes AXIsProcessTrustedWithOptions check from a fatal exit to a stderr warning, allowing a11y tree extraction to proceed even when the trust API returns false in LaunchAgent context.
src/orchestration.rs Minor cosmetic refactor: shortens expect() panic messages to avoid rustfmt line-length divergence between platforms. No behavioral change.

Sequence Diagram

sequenceDiagram
    participant Host as Host (desktest CLI)
    participant TartVM as Tart VM (vm-agent LaunchAgent)
    participant SSH as SSH Session (localhost)
    participant A11y as a11y-helper
    participant TCC as TCC Database

    Note over Host,TCC: Golden image provisioning (init-macos)
    Host->>TartVM: Copy execute-action.py, a11y-helper, provision.sh
    TartVM->>TartVM: ssh-keygen (ed25519, guarded)
    TartVM->>TartVM: authorized_keys append (idempotent grep check)
    TartVM->>TCC: INSERT OR REPLACE with csreq blob (codesign + csreq tool)
    TartVM->>TartVM: sudo shutdown -h now (flush filesystem)
    Host->>Host: tart clone β†’ desktest-macos:latest

    Note over Host,A11y: Test run (desktest run)
    Host->>TartVM: tart run + shared dir mount
    TartVM->>TartVM: vm-agent polls shared/requests/ (Homebrew PATH)
    Host->>TartVM: lsappinfo visibleProcessList (readiness check)
    Host->>TartVM: screencapture -x /tmp/screenshot.png
    TartVM->>SSH: ssh -o BatchMode=yes localhost /usr/local/bin/a11y-helper
    SSH->>A11y: AXUIElement queries (full Aqua session via sshd-keygen-wrapper)
    A11y-->>SSH: 976+ lines UI element tree
    SSH-->>Host: a11y tree (via shared dir)
    Host->>TartVM: execute-action (PyAutoGUI via Homebrew python3)
Loading

Reviews (4): Last reviewed commit: "docs: update ci.md for macOS support (no..." | Re-trigger Greptile

Comment thread src/init_macos.rs Outdated
Comment thread src/init_macos.rs
Edison and others added 7 commits March 31, 2026 12:26
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Deduplicate authorized_keys: check before appending SSH public key
  to prevent accumulation on re-provisioning runs
- Escape single quotes in SQL variables: prevent malformed SQL if
  paths from `command -v python3` ever contain single quotes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rustfmt produces different method chain formatting on x86_64-linux vs
aarch64-darwin for the same Rust version (1.94.1). Add #[rustfmt::skip]
to pin the format that CI (linux) expects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ence

The long expect strings caused rustfmt to produce different chain
formatting on x86_64-linux vs aarch64-darwin (same Rust 1.94.1).
Shortened the messages to stay well under the threshold where both
platforms agree on the chain style.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Shorten expect message in the Hybrid evaluator block that came from
master's new monitor code, matching the style used elsewhere.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The provisioning script now ends with `sudo shutdown -h now` so the
guest OS flushes TCC DB writes, SSH keys, and other artifacts to disk
before the host clones the image.

The Rust side waits up to 60s for the `tart run` child to exit
naturally (guest powered off) before falling back to force-kill.
This prevents the race where `tart stop` + `child.kill()` could
interrupt the VM mid-shutdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Eito-Test-Account
Copy link
Copy Markdown
Contributor Author

@greptileai

Comment thread src/init_macos.rs Outdated
command -v python3 resolves to /usr/bin/python3 (macOS stub) during
provisioning because /etc/paths.d/homebrew hasn't taken effect yet.
The vm-agent uses /opt/homebrew/bin/python3, so TCC grants must
target that binary. Prefer the well-known Homebrew path with a
fallback to command -v.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Eito-Test-Account
Copy link
Copy Markdown
Contributor Author

@greptileai

Edison and others added 5 commits March 31, 2026 16:03
…infra

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- TCC permissions section: note that init-macos handles grants
  automatically, list all four permission types granted
- TCC Database Setup: rewrite with csreq blob generation (old
  instructions used NULL csreq which modern macOS ignores), add
  grant_tcc helper function, document Homebrew Python path
- Add SSH localhost section explaining why a11y-helper needs it
  (LaunchAgent restricted Aqua session)
- Update golden image saving: document graceful shutdown requirement
- Update limitations table: a11y tree is no longer "limited"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- macOS requirements: remove "planned" label, add sshpass and
  init-macos details, mention claude-cli provider
- App types table: macos_tart no longer marked as planned
- Architecture diagram: show Linux and macOS paths side by side
- CLI commands: add init-macos

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Mark all 5 phases as complete
- Add "Post-Phase 5: E2E Infrastructure Fixes" section documenting
  issues discovered during first real E2E run on Apple Silicon
- Document the SSH localhost workaround for LaunchAgent Aqua sessions
- Update Phase 4 readiness: osascript replaced with lsappinfo
- Update risks table with newly discovered risks and mitigations

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove "Planned" label from macOS section
- GitHub Actions example: add sshpass install, use desktest init-macos
  instead of manual tart pull, add caching tip
- Golden Image section: rewrite to document init-macos automated
  provisioning instead of manual setup steps

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Eito-Test-Account
Copy link
Copy Markdown
Contributor Author

@greptileai

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Miyamura80 Miyamura80 merged commit fe1a3c5 into master Mar 31, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants