fix: macOS Tart E2E infrastructure — lsappinfo, a11y via SSH, provisioning by Eito-Test-Account · Pull Request #90 · Edison-Watch/desktest

Eito-Test-Account · 2026-03-31T11:15:45Z

Summary

Fixes the macOS Tart VM E2E testing infrastructure so that the full pipeline works end-to-end: VM boot → desktop readiness → app deploy → agent loop (with accessibility tree) → evaluation → artifact collection.

These changes were developed and tested on an Apple Silicon Mac mini running macOS 26.2 with Tart 2.32.0.

What was done

1. Replace `osascript` with `lsappinfo` for GUI process detection (`readiness.rs`)

get_gui_process_list() used osascript → System Events, which requires TCC Automation permission
In Tart VMs, this hangs indefinitely because the vm-agent (Python LaunchAgent) can't get Automation permission programmatically
Replaced with lsappinfo visibleProcessList which provides the same information without any TCC permissions

2. Enable accessibility tree extraction via SSH localhost (`observation.rs`)

The vm-agent runs as a LaunchAgent, which gets a restricted Aqua session from macOS
Direct subprocess calls from this context return empty/minimal accessibility trees (only headers, no element data)
SSH sessions inherit full TCC permissions from sshd-keygen-wrapper (pre-granted in Tart base images)
MACOS_A11Y_CMD now uses ssh -o BatchMode=yes localhost /usr/local/bin/a11y-helper — this yields 976+ lines of UI element data vs ~22 empty lines before

3. Fix golden image provisioning (`init_macos.rs`)

Install execute-action.py: the PyAutoGUI executor script was never copied to the VM — agent loop couldn't execute any actions
SSH key setup: passwordless ed25519 keys for localhost, required by the SSH-based a11y extraction
TCC permission grants: insert entries with proper csreq blobs (generated via codesign -d -r- + csreq tool) for a11y-helper, Python, and screencapture
Homebrew PATH: add /opt/homebrew/bin to /etc/paths.d/homebrew so all processes find the right python3
Updated success message: now mentions TCC grants are automated (requires SIP disabled in base image)

4. Fix vm-agent LaunchAgent PATH (`vm-agent-install.sh`)

The LaunchAgent plist had no EnvironmentVariables, so the default PATH (/usr/bin:/bin:/usr/sbin:/sbin) was used
Subprocesses (like python3 /usr/local/bin/execute-action) resolved to system Python which doesn't have PyAutoGUI
Added EnvironmentVariables with Homebrew PATH to the plist

5. Make a11y-helper trust check non-fatal (`main.swift`)

AXIsProcessTrustedWithOptions returns false in LaunchAgent context even when actual AX API calls succeed
Changed from exit(1) to a stderr warning — the tree extraction continues and produces data

Issues encountered and resolved

Issue	Root cause	Fix
`osascript` hangs in VM	TCC Automation permission not grantable programmatically for LaunchAgent processes	Replaced with `lsappinfo` (no TCC needed)
Empty accessibility tree	LaunchAgent gets restricted Aqua session, AX API returns minimal data	Route a11y-helper through `ssh localhost` which gets proper session
`execute-action` not found	`docker/execute-action.py` was never copied during provisioning	Added copy step to `provision_vm()` and install step to provisioning script
PyAutoGUI import errors	vm-agent subprocess used system Python (no PyAutoGUI)	Added `EnvironmentVariables` with Homebrew PATH to LaunchAgent plist
TCC DB entries not honored	Entries had NULL `csreq` column; modern macOS requires code signing requirement blobs	Generate csreq via `codesign -d -r-` + `csreq` tool and insert as hex blob
SSH keys not persisting	`tart stop` doesn't flush VM filesystem	Use `sudo shutdown -h now` for graceful shutdown before `tart clone`
`AXIsProcessTrustedWithOptions` false positive	LaunchAgent context reports untrusted even when AX calls work	Made check non-fatal (warning instead of exit)

What is NOT solved yet

60s step timeout too tight for claude-cli provider: with full a11y data (~976 lines), Claude CLI calls sometimes exceed 60s, causing retries and eventual timeout. Needs a configurable or provider-aware timeout.
Agent struggles with macOS TextEdit UI: the agent can't reliably open a new document (File > New / Cmd+N don't seem to work). This is an agent behavior issue, not infrastructure. May need macOS-specific agent prompting or a simpler test app.
/home/tester artifact collection: artifacts.rs still tries to collect from Linux home dir path for Tart sessions. Should be skipped or use /Users/admin.
Golden image not versioned: desktest-macos:latest has no version tagging strategy, risking drift.
Electron test image not created: desktest-macos-electron:latest doesn't exist yet (would need --with-electron flag during init).

Test plan

cargo test — all 524 unit tests + 3 validation tests pass
doctor_shows_tart_status — passes on Apple Silicon with Tart installed
Manual E2E: desktest run examples/macos-textedit.json --provider claude-cli — infrastructure works (VM boots, a11y tree populated with 976+ lines, agent loop runs, evaluation works). Test fails at agent level (can't complete TextEdit task), not infrastructure.
desktest init-macos fresh provisioning — not re-run after code changes (would take ~10 min to pull + provision). Manual provisioning verified all steps work individually.

🤖 Generated with Claude Code

osascript calls to System Events require TCC Automation permission, which hangs indefinitely in Tart VMs without pre-granted TCC access. lsappinfo visibleProcessList provides the same information without any TCC permissions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The vm-agent runs as a LaunchAgent which gets a restricted Aqua session, returning an empty accessibility tree even with TCC permissions granted. SSH sessions inherit full TCC permissions from sshd-keygen-wrapper, so running a11y-helper via `ssh localhost` gives complete UI element data. Changes: - observation.rs: MACOS_A11Y_CMD now uses ssh localhost to invoke a11y-helper - init_macos.rs: provisioning sets up passwordless SSH keys, installs execute-action.py, grants TCC permissions with proper csreq blobs, and configures Homebrew PATH - vm-agent-install.sh: add EnvironmentVariables with Homebrew PATH to the LaunchAgent plist so subprocesses find the right python3 - a11y-helper main.swift: make AXIsProcessTrustedWithOptions check non-fatal (warning instead of exit) since AX API calls may succeed even when the check returns false Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-03-31T11:25:44Z

Greptile Summary

This PR completes the macOS Tart VM E2E testing infrastructure, resolving a set of interconnected issues discovered during a first real E2E run on Apple Silicon. The changes address seven distinct failure modes — from osascript hanging indefinitely due to missing TCC Automation permission, to empty accessibility trees from a restricted LaunchAgent Aqua session, to missing executor scripts and incorrect TCC database entries — producing a working pipeline from VM boot through agent loop and evaluation.

Key changes:

readiness.rs: osascript → lsappinfo visibleProcessList (no TCC required)
observation.rs: a11y extraction routed through ssh localhost to obtain a proper Aqua session (976+ lines of UI data vs ~22 previously)
init_macos.rs: provisioning now installs execute-action.py, sets up idempotent SSH keys, writes Homebrew PATH to /etc/paths.d, inserts TCC grants with csreq blobs, and ends with sudo shutdown -h now for guaranteed filesystem flush before tart clone
vm-agent-install.sh: LaunchAgent plist gains EnvironmentVariables with Homebrew PATH so subprocesses resolve the correct python3
main.swift: AXIsProcessTrustedWithOptions demotion from exit(1) to a stderr warning

Minor findings:

The second line of the /etc/paths.d/homebrew write uses tee -a (append), so re-running provisioning will accumulate a duplicate /opt/homebrew/sbin entry — harmless due to path_helper deduplication, but inconsistent with the careful idempotency applied to authorized_keys
If codesign -d -r- or the csreq tool fails silently inside grant_tcc, CSREQ_SQL falls back to NULL with no diagnostic output — the exact broken state this PR fixes for TCC grants on modern macOS

Confidence Score: 5/5

Safe to merge; all remaining findings are P2 style/observability issues with no functional impact on the happy path.

Previously flagged P1 issues (authorized_keys idempotency, SQL injection via unsanitized paths, PYTHON_BIN resolving the wrong binary) are all addressed in prior commits. The two remaining findings are P2: the tee -a duplication is neutralized by path_helper deduplication, and the silent csreq fallback only manifests if codesign/csreq tooling fails (unlikely on a Homebrew-equipped image). All 524 unit tests pass and the infrastructure is manually verified end-to-end.

src/init_macos.rs — two minor idempotency/observability issues in the generated provisioning script

Important Files Changed

Filename	Overview
src/init_macos.rs	Core provisioning logic overhauled: adds execute-action.py copy, SSH key setup (now idempotent), Homebrew PATH config (minor re-run idempotency issue on sbin line), TCC grants with csreq blobs (csreq failures are silent), and graceful VM shutdown. Two minor P2 issues found.
src/observation.rs	MACOS_A11Y_CMD updated to route a11y-helper through `ssh localhost` with BatchMode and 5s timeout; test assertions updated to match. Well-documented rationale for the SSH workaround.
src/tart/readiness.rs	Replaces osascript/System Events with `lsappinfo visibleProcessList`; output parsing correctly extracts quoted app names from ASN format lines.
macos/vm-agent-install.sh	Adds EnvironmentVariables dict with Homebrew PATH to the LaunchAgent plist, fixing the system-Python resolution issue for vm-agent subprocesses.
macos/a11y-helper/Sources/A11yHelperCLI/main.swift	Changes AXIsProcessTrustedWithOptions check from a fatal exit to a stderr warning, allowing a11y tree extraction to proceed even when the trust API returns false in LaunchAgent context.
src/orchestration.rs	Minor cosmetic refactor: shortens `expect()` panic messages to avoid rustfmt line-length divergence between platforms. No behavioral change.

Sequence Diagram

sequenceDiagram
    participant Host as Host (desktest CLI)
    participant TartVM as Tart VM (vm-agent LaunchAgent)
    participant SSH as SSH Session (localhost)
    participant A11y as a11y-helper
    participant TCC as TCC Database

    Note over Host,TCC: Golden image provisioning (init-macos)
    Host->>TartVM: Copy execute-action.py, a11y-helper, provision.sh
    TartVM->>TartVM: ssh-keygen (ed25519, guarded)
    TartVM->>TartVM: authorized_keys append (idempotent grep check)
    TartVM->>TCC: INSERT OR REPLACE with csreq blob (codesign + csreq tool)
    TartVM->>TartVM: sudo shutdown -h now (flush filesystem)
    Host->>Host: tart clone → desktest-macos:latest

    Note over Host,A11y: Test run (desktest run)
    Host->>TartVM: tart run + shared dir mount
    TartVM->>TartVM: vm-agent polls shared/requests/ (Homebrew PATH)
    Host->>TartVM: lsappinfo visibleProcessList (readiness check)
    Host->>TartVM: screencapture -x /tmp/screenshot.png
    TartVM->>SSH: ssh -o BatchMode=yes localhost /usr/local/bin/a11y-helper
    SSH->>A11y: AXUIElement queries (full Aqua session via sshd-keygen-wrapper)
    A11y-->>SSH: 976+ lines UI element tree
    SSH-->>Host: a11y tree (via shared dir)
    Host->>TartVM: execute-action (PyAutoGUI via Homebrew python3)

_{Reviews (4): Last reviewed commit: "docs: update ci.md for macOS support (no..." | Re-trigger Greptile}

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Deduplicate authorized_keys: check before appending SSH public key to prevent accumulation on re-provisioning runs - Escape single quotes in SQL variables: prevent malformed SQL if paths from `command -v python3` ever contain single quotes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

rustfmt produces different method chain formatting on x86_64-linux vs aarch64-darwin for the same Rust version (1.94.1). Add #[rustfmt::skip] to pin the format that CI (linux) expects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ence The long expect strings caused rustfmt to produce different chain formatting on x86_64-linux vs aarch64-darwin (same Rust 1.94.1). Shortened the messages to stay well under the threshold where both platforms agree on the chain style. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…infra

Shorten expect message in the Hybrid evaluator block that came from master's new monitor code, matching the style used elsewhere. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The provisioning script now ends with `sudo shutdown -h now` so the guest OS flushes TCC DB writes, SSH keys, and other artifacts to disk before the host clones the image. The Rust side waits up to 60s for the `tart run` child to exit naturally (guest powered off) before falling back to force-kill. This prevents the race where `tart stop` + `child.kill()` could interrupt the VM mid-shutdown. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Eito-Test-Account · 2026-03-31T13:46:21Z

@greptileai

command -v python3 resolves to /usr/bin/python3 (macOS stub) during provisioning because /etc/paths.d/homebrew hasn't taken effect yet. The vm-agent uses /opt/homebrew/bin/python3, so TCC grants must target that binary. Prefer the well-known Homebrew path with a fallback to command -v. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Eito-Test-Account · 2026-03-31T13:55:21Z

@greptileai

…infra Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- TCC permissions section: note that init-macos handles grants automatically, list all four permission types granted - TCC Database Setup: rewrite with csreq blob generation (old instructions used NULL csreq which modern macOS ignores), add grant_tcc helper function, document Homebrew Python path - Add SSH localhost section explaining why a11y-helper needs it (LaunchAgent restricted Aqua session) - Update golden image saving: document graceful shutdown requirement - Update limitations table: a11y tree is no longer "limited" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- macOS requirements: remove "planned" label, add sshpass and init-macos details, mention claude-cli provider - App types table: macos_tart no longer marked as planned - Architecture diagram: show Linux and macOS paths side by side - CLI commands: add init-macos Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Mark all 5 phases as complete - Add "Post-Phase 5: E2E Infrastructure Fixes" section documenting issues discovered during first real E2E run on Apple Silicon - Document the SSH localhost workaround for LaunchAgent Aqua sessions - Update Phase 4 readiness: osascript replaced with lsappinfo - Update risks table with newly discovered risks and mitigations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Remove "Planned" label from macOS section - GitHub Actions example: add sshpass install, use desktest init-macos instead of manual tart pull, add caching tip - Golden Image section: rewrite to document init-macos automated provisioning instead of manual setup steps Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Eito-Test-Account · 2026-03-31T16:42:50Z

@greptileai

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Edison and others added 2 commits March 31, 2026 00:15

greptile-apps bot reviewed Mar 31, 2026

View reviewed changes

Comment thread src/init_macos.rs Outdated

Comment thread src/init_macos.rs

Edison and others added 7 commits March 31, 2026 12:26

style: cargo fmt

8106c48

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/master' into fix/macos-tart-e2e-…

7f37956

…infra

style: fix rustfmt after merging master

7d9a56e

Shorten expect message in the Hybrid evaluator block that came from master's new monitor code, matching the style used elsewhere. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

greptile-apps bot reviewed Mar 31, 2026

View reviewed changes

Comment thread src/init_macos.rs Outdated

Edison and others added 5 commits March 31, 2026 16:03

Merge remote-tracking branch 'origin/master' into fix/macos-tart-e2e-…

b12e3ca

…infra Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: wrap macOS requirements in collapsible details tag

7b68194

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Miyamura80 merged commit fe1a3c5 into master Mar 31, 2026
7 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: macOS Tart E2E infrastructure — lsappinfo, a11y via SSH, provisioning#90

fix: macOS Tart E2E infrastructure — lsappinfo, a11y via SSH, provisioning#90
Miyamura80 merged 16 commits intomasterfrom
fix/macos-tart-e2e-infra

Eito-Test-Account commented Mar 31, 2026

Uh oh!

greptile-apps bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Eito-Test-Account commented Mar 31, 2026

Uh oh!

Uh oh!

Eito-Test-Account commented Mar 31, 2026

Uh oh!

Eito-Test-Account commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Eito-Test-Account commented Mar 31, 2026

Summary

What was done

1. Replace osascript with lsappinfo for GUI process detection (readiness.rs)

2. Enable accessibility tree extraction via SSH localhost (observation.rs)

3. Fix golden image provisioning (init_macos.rs)

4. Fix vm-agent LaunchAgent PATH (vm-agent-install.sh)

5. Make a11y-helper trust check non-fatal (main.swift)

Issues encountered and resolved

What is NOT solved yet

Test plan

Uh oh!

greptile-apps bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Eito-Test-Account commented Mar 31, 2026

Uh oh!

Uh oh!

Eito-Test-Account commented Mar 31, 2026

Uh oh!

Eito-Test-Account commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Replace `osascript` with `lsappinfo` for GUI process detection (`readiness.rs`)

2. Enable accessibility tree extraction via SSH localhost (`observation.rs`)

3. Fix golden image provisioning (`init_macos.rs`)

4. Fix vm-agent LaunchAgent PATH (`vm-agent-install.sh`)

5. Make a11y-helper trust check non-fatal (`main.swift`)

greptile-apps bot commented Mar 31, 2026 •

edited

Loading