Skip to content

RFC: ADR 0011 — interaction guarantee contract (path × guarantee matrix)#1080

Merged
thymikee merged 3 commits into
mainfrom
design/interaction-guarantee-contract
Jul 4, 2026
Merged

RFC: ADR 0011 — interaction guarantee contract (path × guarantee matrix)#1080
thymikee merged 3 commits into
mainfrom
design/interaction-guarantee-contract

Conversation

@thymikee

@thymikee thymikee commented Jul 4, 2026

Copy link
Copy Markdown
Member

Stacked on #1075 #1075 merged — this branch is now a single commit on main (ADR + registry + gate only). The gate test passing post-rebase confirms every symbol the registry references survived the squash-merge.

Why

Every interaction bug this week was the same shape: a guarantee enforced on one dispatch path and silently absent on a sibling. Offscreen refusal on @ref but not selectors and not the runner's native tap (#1075); --verify evidence on press @ref but dropped by fill @ref's hand-built response (#1064); adb classification on thrown errors but not allowFailure results (#1067); wait budgets honored for snapshot/replay envelopes but not wait itself, which additionally sat on the daemon-reset timeout path (#1075). Bandaids fix cells. Nothing watches the matrix.

What

ADR 0011 (in this PR) proposes three layers; this PR lands Layer 1 so the design is falsifiable rather than aspirational:

  • Declaresrc/contracts/interaction-guarantees.ts: 6 dispatch paths × 7 guarantees, every cell classified as runtime (points at the shared TS symbol), runner (Swift symbol + future parity table), delegated (e.g. direct path → runtime on --verify and on ELEMENT_OFFSCREEN), inapplicable, or waived(reason). Completeness is a compile error (Record over the guarantee union).
  • Gateinteraction-guarantees.test.ts: via symbols must actually be exported (TS) or present (Swift sources); delegations must land on a path that enforces the guarantee; waivers need substantive reasons; and the gap: list is pinned — it can only change in a reviewed diff that edits the pin. The initial classification is honest: 10 acknowledged gaps, mostly on the direct-iOS and native-ref fast paths (no occlusion check, no nonHittable promotion/annotation, missing refLabel/selectorChain, error shapes without hints).
  • Layers 2–3 (follow-up PRs, per the ADR migration plan): single buildInteractionResponseData construction site + hand-rolled-literal guard; golden JSON fixture tables consumed by vitest AND the runner's Swift unit tests AND the provider harness's fake runner (cross-language parity without simulators); a contract scenario suite whose coverage of non-waived cells is generated from the registry; and descriptor-declared timeout policy replacing the two hand-maintained client lists.

The pattern is deliberately the one this repo already trusts: the integration-progress flag gate (which caught #1064's unclassified --verify) and the ADR-0009 apple-leak guard — make the gap declare itself.

Ask

Design review of the ADR — especially: (a) the guarantee list (right granularity?), (b) whether the direct-path gaps should be closed by delegation-on-error (cheap, one more round trip on failure only) vs Swift parity (fast, more machinery), (c) whether the pinned gap list should require linked issues from day one.

@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown

Size Report

Metric Base Current Diff
JS raw 1.5 MB 1.5 MB 0 B
JS gzip 485.8 kB 485.8 kB 0 B
npm tarball 589.3 kB 589.3 kB 0 B
npm unpacked 2.1 MB 2.1 MB 0 B

Startup median (7 runs, lower is better):

Scenario Base Current Diff
CLI --version 25.5 ms 25.5 ms -0.0 ms
CLI --help 46.5 ms 46.7 ms +0.2 ms

Top changed chunks: no changes in the largest emitted chunks.

…and gate

Design for making interaction guarantees hold across every dispatch path
(runtime selector/ref, direct iOS selector, native ref, coordinate,
maestro fallback) instead of eroding at path boundaries one incident at
a time — every interaction bug this week was a (path, guarantee) cell
nobody was watching.

Three layers (ADR 0011): declare the path x guarantee matrix as a typed
registry whose completeness is a compile error; share one implementation
per rule on both sides of the wire with golden fixture tables proving
TS/Swift parity; prove every non-waived cell with contract scenarios
generated from the registry.

This lands Layer 1: the registry with an HONEST initial classification —
ten cells are acknowledged gap waivers (direct-path disambiguation/
occlusion/nonHittable/responseFields/errorTaxonomy, native-ref guards,
coordinate bounds) — plus the gate test that keeps entries truthful:
referenced TS symbols must be exported, runner symbols must exist in the
Swift sources, delegations must land on paths that actually enforce the
guarantee, and the gap list is pinned so it can only change explicitly
in a reviewed diff.
@thymikee thymikee force-pushed the design/interaction-guarantee-contract branch from 4bb8fda to a99a303 Compare July 4, 2026 11:32
- Frame Layer 1 as an honesty/completeness gate, not a truth gate:
  it proves every path declared a stance and referenced symbols exist;
  behavioral parity starts with the Layer-2/3 fixture and scenario work.
- Split responseFields into responseConstruction (one shared response
  construction site — a single Layer-2 refactor) and responseIdentity
  (which identity fields a path can provide — per-path capability work);
  note the anticipated errorTaxonomy split (codes vs diagnostics).
- Encode the hybrid gap-closure strategy: runner-side parity for
  geometry-local rules, delegation-on-error for semantic failures (with
  the explicit caveat that delegation-on-error is NOT success-path
  parity), and a shared runtime preflight for native-ref where a silent
  backend success means delegation never triggers.
- Gap waivers now require a trackingIssue (gate-enforced URL); all 16
  pinned gaps link the umbrella issue #1081. The honest reclassification
  grew the pin list from 10 to 16 — responseConstruction is a gap on
  every path including runtime ones, which is exactly the partial
  progress the coarser guarantee was hiding.
- Align ADR wording with the code: parityTable is optional until
  Layer 3, required once a runner cell claims parity.
@thymikee thymikee marked this pull request as ready for review July 4, 2026 11:43
@thymikee

thymikee commented Jul 4, 2026

Copy link
Copy Markdown
Member Author

Design review applied in e43ce24 (review given out-of-band; recording the disposition here):

  • (framing) Layer 1 is now described — in the ADR, the registry header, and the gate — as an honesty/completeness gate, not a truth gate: it proves every path declared a stance and referenced symbols exist; behavioral parity starts with Layers 2–3.
  • (a) responseFields split into responseConstruction (one shared construction site — a single Layer-2 refactor) and responseIdentity (per-path capability). The split immediately did its job: the honest pin list grew from 10 to 16 gaps, because responseConstruction is a gap on every path including the runtime ones — exactly the partial progress the coarse guarantee was hiding. errorTaxonomy's anticipated codes-vs-diagnostics split is noted inline for when direct paths close codes ahead of diagnostics.
  • (b) The hybrid closure strategy is now a Decision section: runner-side parity for geometry-local rules; delegation-on-error for semantic failure shapes; the explicit caveat that delegation-on-error is not success-path parity (the direct-ios-selector/disambiguation waiver spells out the one-hittable-candidate case and stays a gap until proven); and the shared runtime preflight for native-ref, since a silently-succeeding backend fast path never triggers error delegation.
  • (c) gap: waivers now require a trackingIssue (gate-enforced URL format). All 16 link the umbrella issue Interaction guarantee gaps (ADR 0011 umbrella) #1081, which groups them by closure strategy; sub-issues can split off as work is scheduled.
  • (nit) ADR and code now agree: parityTable is optional until Layer 3, required once a runner cell claims parity.

Marked ready for review. Status stays Proposed for your formal accept.

@thymikee

thymikee commented Jul 4, 2026

Copy link
Copy Markdown
Member Author

Review pass after latest changes: the revised framing is much stronger, and the Layer-1 gate passes locally on e43ce2403 (pnpm exec vitest run src/contracts/__tests__/interaction-guarantees.test.ts, 7 tests). CI is also green.

I found two things I would address before treating the registry as review-ready:

  1. maestro-non-hittable-fallback/disambiguation looks overclaimed. The registry marks it as runner via RunnerTests+Interaction.swift#findElement, but the guarantee is defined as visible-first/deepest/smallest disambiguation. The same Swift finder just scans allElementsBoundByIndex, picks the single hittable match or the single non-hittable fallback, and reports ambiguity on a second match; it does not implement the runtime ranking semantics. The direct selector path correctly keeps this as a success-path parity gap for the same reason. I think the maestro fallback should either waive this as an intentional Maestro semantics divergence, or the guarantee needs to be split into weaker unique-or-ambiguous vs runtime-ranking disambiguation.

  2. verifyEvidence is declared at path level for runtime-selector, runtime-ref, and coordinate, whose commands arrays include longpress, but longpress does not expose verify in command metadata/CLI and LongPressCommandOptions has no verify field. That makes the matrix read as if every command on those paths carries the guarantee. I would either make guarantees command-scoped/qualified (e.g. appliesTo: ['click', 'press', 'fill']) or explicitly remove commands without the guarantee from that path-level claim.

No other blockers from the ADR/registry/test pass. The issue-linking and responseConstruction/responseIdentity split addressed the earlier design concerns well.

…d-scoped verify

1. maestro-non-hittable-fallback/disambiguation was overclaimed: the
   guarantee is defined as visible-first/deepest/smallest ranking, but
   findElement only implements unique-or-ambiguous scanning. Reclassified
   as an intentional waiver (deliberate Maestro-semantics divergence),
   mirroring how the direct path keeps its success-path parity gap.

2. verifyEvidence was claimed path-wide on paths that dispatch longpress,
   which has no --verify. Cells can now be command-scoped via appliesTo
   (non-empty strict subset of the path's commands, gate-enforced), and
   the three affected cells scope to press/click/fill.
@thymikee

thymikee commented Jul 4, 2026

Copy link
Copy Markdown
Member Author

Both findings addressed in the latest commit:

  1. maestro-non-hittable-fallback/disambiguation — you're right, that was an overclaim and a good demonstration of the gate's stated limit (symbol existence ≠ semantic match). findElement implements unique-or-ambiguous scanning, not the ranking the guarantee is defined as. Reclassified as an intentional waiver ("deliberate Maestro-semantics divergence from runtime ranking") rather than runner enforcement — chosen over splitting the guarantee, since a weaker unique-or-ambiguous tier would only ever have this one occupant; if a second path wants the weaker semantics later, splitting is the right move then.

  2. verifyEvidence vs longpress — added command scoping: cells may carry appliesTo (gate-enforced: non-empty, strict subset of the path's commands, and rejected if it redundantly lists every command). The three affected cells now scope to press/click/fill. ADR documents the qualifier.

Gate is now 8 tests; suites/fallow/typecheck green.

@thymikee thymikee added the ready-for-human Valid work that needs human implementation, judgment, or maintainer merge label Jul 4, 2026
@thymikee

thymikee commented Jul 4, 2026

Copy link
Copy Markdown
Member Author

Re-review on latest head f1ff6791: previous findings are resolved.

  • maestro-non-hittable-fallback/disambiguation is now an intentional waiver instead of overclaiming runner parity.
  • verifyEvidence now uses gate-checked appliesTo command scoping so longpress is not implied to support --verify.

Validation: pnpm exec vitest run src/contracts/__tests__/interaction-guarantees.test.ts passes locally (8 tests), and GitHub checks are green (21/21). Added ready-for-human.

@thymikee thymikee merged commit c506ddf into main Jul 4, 2026
21 checks passed
@thymikee thymikee deleted the design/interaction-guarantee-contract branch July 4, 2026 12:06
@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-07-04 12:06 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-human Valid work that needs human implementation, judgment, or maintainer merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant