RFC: ADR 0011 — interaction guarantee contract (path × guarantee matrix) by thymikee · Pull Request #1080 · callstack/agent-device

thymikee · 2026-07-04T10:41:47Z

~~Stacked on #1075~~ #1075 merged — this branch is now a single commit on main (ADR + registry + gate only). The gate test passing post-rebase confirms every symbol the registry references survived the squash-merge.

Why

Every interaction bug this week was the same shape: a guarantee enforced on one dispatch path and silently absent on a sibling. Offscreen refusal on @ref but not selectors and not the runner's native tap (#1075); --verify evidence on press @ref but dropped by fill @ref's hand-built response (#1064); adb classification on thrown errors but not allowFailure results (#1067); wait budgets honored for snapshot/replay envelopes but not wait itself, which additionally sat on the daemon-reset timeout path (#1075). Bandaids fix cells. Nothing watches the matrix.

What

ADR 0011 (in this PR) proposes three layers; this PR lands Layer 1 so the design is falsifiable rather than aspirational:

Declare — src/contracts/interaction-guarantees.ts: 6 dispatch paths × 7 guarantees, every cell classified as runtime (points at the shared TS symbol), runner (Swift symbol + future parity table), delegated (e.g. direct path → runtime on --verify and on ELEMENT_OFFSCREEN), inapplicable, or waived(reason). Completeness is a compile error (Record over the guarantee union).
Gate — interaction-guarantees.test.ts: via symbols must actually be exported (TS) or present (Swift sources); delegations must land on a path that enforces the guarantee; waivers need substantive reasons; and the gap: list is pinned — it can only change in a reviewed diff that edits the pin. The initial classification is honest: 10 acknowledged gaps, mostly on the direct-iOS and native-ref fast paths (no occlusion check, no nonHittable promotion/annotation, missing refLabel/selectorChain, error shapes without hints).
Layers 2–3 (follow-up PRs, per the ADR migration plan): single buildInteractionResponseData construction site + hand-rolled-literal guard; golden JSON fixture tables consumed by vitest AND the runner's Swift unit tests AND the provider harness's fake runner (cross-language parity without simulators); a contract scenario suite whose coverage of non-waived cells is generated from the registry; and descriptor-declared timeout policy replacing the two hand-maintained client lists.

The pattern is deliberately the one this repo already trusts: the integration-progress flag gate (which caught #1064's unclassified --verify) and the ADR-0009 apple-leak guard — make the gap declare itself.

Ask

Design review of the ADR — especially: (a) the guarantee list (right granularity?), (b) whether the direct-path gaps should be closed by delegation-on-error (cheap, one more round trip on failure only) vs Swift parity (fast, more machinery), (c) whether the pinned gap list should require linked issues from day one.

github-actions · 2026-07-04T10:43:07Z

Size Report

Metric	Base	Current
JS raw	1.5 MB	1.5 MB
JS gzip	485.8 kB	485.8 kB
npm tarball	589.3 kB	589.3 kB
npm unpacked	2.1 MB	2.1 MB

Startup median (7 runs, lower is better):

Scenario	Base	Current	Diff
CLI --version	25.5 ms	25.5 ms	-0.0 ms
CLI --help	46.5 ms	46.7 ms	+0.2 ms

Top changed chunks: no changes in the largest emitted chunks.

…and gate Design for making interaction guarantees hold across every dispatch path (runtime selector/ref, direct iOS selector, native ref, coordinate, maestro fallback) instead of eroding at path boundaries one incident at a time — every interaction bug this week was a (path, guarantee) cell nobody was watching. Three layers (ADR 0011): declare the path x guarantee matrix as a typed registry whose completeness is a compile error; share one implementation per rule on both sides of the wire with golden fixture tables proving TS/Swift parity; prove every non-waived cell with contract scenarios generated from the registry. This lands Layer 1: the registry with an HONEST initial classification — ten cells are acknowledged gap waivers (direct-path disambiguation/ occlusion/nonHittable/responseFields/errorTaxonomy, native-ref guards, coordinate bounds) — plus the gate test that keeps entries truthful: referenced TS symbols must be exported, runner symbols must exist in the Swift sources, delegations must land on paths that actually enforce the guarantee, and the gap list is pinned so it can only change explicitly in a reviewed diff.

- Frame Layer 1 as an honesty/completeness gate, not a truth gate: it proves every path declared a stance and referenced symbols exist; behavioral parity starts with the Layer-2/3 fixture and scenario work. - Split responseFields into responseConstruction (one shared response construction site — a single Layer-2 refactor) and responseIdentity (which identity fields a path can provide — per-path capability work); note the anticipated errorTaxonomy split (codes vs diagnostics). - Encode the hybrid gap-closure strategy: runner-side parity for geometry-local rules, delegation-on-error for semantic failures (with the explicit caveat that delegation-on-error is NOT success-path parity), and a shared runtime preflight for native-ref where a silent backend success means delegation never triggers. - Gap waivers now require a trackingIssue (gate-enforced URL); all 16 pinned gaps link the umbrella issue #1081. The honest reclassification grew the pin list from 10 to 16 — responseConstruction is a gap on every path including runtime ones, which is exactly the partial progress the coarser guarantee was hiding. - Align ADR wording with the code: parityTable is optional until Layer 3, required once a runner cell claims parity.

thymikee · 2026-07-04T11:43:42Z

Design review applied in e43ce24 (review given out-of-band; recording the disposition here):

(framing) Layer 1 is now described — in the ADR, the registry header, and the gate — as an honesty/completeness gate, not a truth gate: it proves every path declared a stance and referenced symbols exist; behavioral parity starts with Layers 2–3.
(a) responseFields split into responseConstruction (one shared construction site — a single Layer-2 refactor) and responseIdentity (per-path capability). The split immediately did its job: the honest pin list grew from 10 to 16 gaps, because responseConstruction is a gap on every path including the runtime ones — exactly the partial progress the coarse guarantee was hiding. errorTaxonomy's anticipated codes-vs-diagnostics split is noted inline for when direct paths close codes ahead of diagnostics.
(b) The hybrid closure strategy is now a Decision section: runner-side parity for geometry-local rules; delegation-on-error for semantic failure shapes; the explicit caveat that delegation-on-error is not success-path parity (the direct-ios-selector/disambiguation waiver spells out the one-hittable-candidate case and stays a gap until proven); and the shared runtime preflight for native-ref, since a silently-succeeding backend fast path never triggers error delegation.
(c) gap: waivers now require a trackingIssue (gate-enforced URL format). All 16 link the umbrella issue Interaction guarantee gaps (ADR 0011 umbrella) #1081, which groups them by closure strategy; sub-issues can split off as work is scheduled.
(nit) ADR and code now agree: parityTable is optional until Layer 3, required once a runner cell claims parity.

Marked ready for review. Status stays Proposed for your formal accept.

thymikee · 2026-07-04T11:53:37Z

Review pass after latest changes: the revised framing is much stronger, and the Layer-1 gate passes locally on e43ce2403 (pnpm exec vitest run src/contracts/__tests__/interaction-guarantees.test.ts, 7 tests). CI is also green.

I found two things I would address before treating the registry as review-ready:

maestro-non-hittable-fallback/disambiguation looks overclaimed. The registry marks it as runner via RunnerTests+Interaction.swift#findElement, but the guarantee is defined as visible-first/deepest/smallest disambiguation. The same Swift finder just scans allElementsBoundByIndex, picks the single hittable match or the single non-hittable fallback, and reports ambiguity on a second match; it does not implement the runtime ranking semantics. The direct selector path correctly keeps this as a success-path parity gap for the same reason. I think the maestro fallback should either waive this as an intentional Maestro semantics divergence, or the guarantee needs to be split into weaker unique-or-ambiguous vs runtime-ranking disambiguation.
verifyEvidence is declared at path level for runtime-selector, runtime-ref, and coordinate, whose commands arrays include longpress, but longpress does not expose verify in command metadata/CLI and LongPressCommandOptions has no verify field. That makes the matrix read as if every command on those paths carries the guarantee. I would either make guarantees command-scoped/qualified (e.g. appliesTo: ['click', 'press', 'fill']) or explicitly remove commands without the guarantee from that path-level claim.

No other blockers from the ADR/registry/test pass. The issue-linking and responseConstruction/responseIdentity split addressed the earlier design concerns well.

…d-scoped verify 1. maestro-non-hittable-fallback/disambiguation was overclaimed: the guarantee is defined as visible-first/deepest/smallest ranking, but findElement only implements unique-or-ambiguous scanning. Reclassified as an intentional waiver (deliberate Maestro-semantics divergence), mirroring how the direct path keeps its success-path parity gap. 2. verifyEvidence was claimed path-wide on paths that dispatch longpress, which has no --verify. Cells can now be command-scoped via appliesTo (non-empty strict subset of the path's commands, gate-enforced), and the three affected cells scope to press/click/fill.

thymikee · 2026-07-04T11:57:06Z

Both findings addressed in the latest commit:

maestro-non-hittable-fallback/disambiguation — you're right, that was an overclaim and a good demonstration of the gate's stated limit (symbol existence ≠ semantic match). findElement implements unique-or-ambiguous scanning, not the ranking the guarantee is defined as. Reclassified as an intentional waiver ("deliberate Maestro-semantics divergence from runtime ranking") rather than runner enforcement — chosen over splitting the guarantee, since a weaker unique-or-ambiguous tier would only ever have this one occupant; if a second path wants the weaker semantics later, splitting is the right move then.
verifyEvidence vs longpress — added command scoping: cells may carry appliesTo (gate-enforced: non-empty, strict subset of the path's commands, and rejected if it redundantly lists every command). The three affected cells now scope to press/click/fill. ADR documents the qualifier.

Gate is now 8 tests; suites/fallow/typecheck green.

thymikee · 2026-07-04T12:04:50Z

Re-review on latest head f1ff6791: previous findings are resolved.

maestro-non-hittable-fallback/disambiguation is now an intentional waiver instead of overclaiming runner parity.
verifyEvidence now uses gate-checked appliesTo command scoping so longpress is not implied to support --verify.

Validation: pnpm exec vitest run src/contracts/__tests__/interaction-guarantees.test.ts passes locally (8 tests), and GitHub checks are green (21/21). Added ready-for-human.

github-actions · 2026-07-04T12:06:34Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-07-04 12:06 UTC

thymikee force-pushed the design/interaction-guarantee-contract branch from 4bb8fda to a99a303 Compare July 4, 2026 11:32

thymikee mentioned this pull request Jul 4, 2026

Interaction guarantee gaps (ADR 0011 umbrella) #1081

Closed

thymikee marked this pull request as ready for review July 4, 2026 11:43

thymikee added the ready-for-human Valid work that needs human implementation, judgment, or maintainer merge label Jul 4, 2026

thymikee merged commit c506ddf into main Jul 4, 2026
21 checks passed

thymikee deleted the design/interaction-guarantee-contract branch July 4, 2026 12:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: ADR 0011 — interaction guarantee contract (path × guarantee matrix)#1080

RFC: ADR 0011 — interaction guarantee contract (path × guarantee matrix)#1080
thymikee merged 3 commits into
mainfrom
design/interaction-guarantee-contract

thymikee commented Jul 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jul 4, 2026 •

edited

Loading

Uh oh!

thymikee commented Jul 4, 2026

Uh oh!

thymikee commented Jul 4, 2026

Uh oh!

thymikee commented Jul 4, 2026

Uh oh!

thymikee commented Jul 4, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

thymikee commented Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

Ask

Uh oh!

github-actions Bot commented Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Size Report

Uh oh!

thymikee commented Jul 4, 2026

Uh oh!

thymikee commented Jul 4, 2026

Uh oh!

thymikee commented Jul 4, 2026

Uh oh!

thymikee commented Jul 4, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thymikee commented Jul 4, 2026 •

edited

Loading

github-actions Bot commented Jul 4, 2026 •

edited

Loading