[No QA] Add agent-device skill and flow metadata framework by kacper-mikolajczak · Pull Request #88474 · Expensify/App

kacper-mikolajczak · 2026-04-21T19:53:15Z

Note

Depends on #87662 - that PR introduces the agent-device skill, its flows/ directory, and the initial sign-in.ad recording. This PR layers a metadata framework on top. Review and merge #87662 first; once it lands, the diff here collapses to just the framework-specific work (headers, matcher loop, split peer flows, complete-onboarding.ad).

Details

Layers a snapshot-driven, composable flow framework on top of the agent-device skill added in #87662. Flows now self-describe their preconditions, postconditions, baked-in parameters, and tags via comment headers, enabling an agent to pre-filter which flow applies to the current screen before executing anything - making in-app automation token-efficient and resilient to UI drift.

Explanation of Change

This PR extends agent-device flows with a metadata layer. It does not reintroduce the base skill (that ships in #87662); it adds:

A comment-header convention on .ad files (# @desc / # @pre / # @post / # @param / # @tag) that the replay parser already treats as no-ops, so headers cost nothing at runtime.
An agent decision loop documented in SKILL.md that uses the headers plus existing CLI primitives (agent-device snapshot -i, agent-device is exists, agent-device replay) to match, pick, execute, and verify a flow against the current state.
A peer-flow split: the flat sign-in.ad from [NoQA] Add agent-device glue-code skill for mobile testing #87662 becomes sign-in-new.ad / sign-in-returning.ad so agents can pick by @param account_state and fall back on @post mismatch. A new complete-onboarding.ad lands the user on Home.

Why flows need metadata

A flow without metadata is an opaque script. The agent cannot tell, from current state, whether the flow will do the right thing. That forces the agent to either read English prose and guess, or replay optimistically and hope. Both waste tokens and frequently land the session in a bad state.

The framework addresses this by asking every flow to declare, in # @-prefixed comment headers, the conditions under which it applies (@pre), where it will leave the app (@post), the constants it bakes in (@param), and a free-form category (@tag). The replay parser already treats # lines as no-ops, so headers are free at runtime.

Flow file anatomy

flows/sign-in-returning.ad
┌────────────────────────────────────────────────────────────────┐
│ # @desc    Sign in with the shared test account (returning).   │  ← Metadata
│ # @pre     role="textfield" label="Phone or email"             │    headers:
│ # @pre     role="button" label="Continue"                      │    parser
│ # @post    text="Home"                                         │    treats as
│ # @post    role="button" label="Search"                        │    comments;
│ # @param   email=agent-device-testing@gmail.com                │    agent reads
│ # @param   account_state=returning                             │    via grep.
│ # @tag     auth                                                │
├────────────────────────────────────────────────────────────────┤
│ fill "id=\"username\" || role=\"textfield\"..." "..."          │  ← Body:
│ press "role=\"button\" label=\"Continue\" || ..."              │    executed
└────────────────────────────────────────────────────────────────┘    verbatim

Agent decision loop

With the metadata in place, an agent follows a single loop before touching the UI manually:

           ┌───────────────────┐
           │ snapshot current  │
           │ state             │
           └─────────┬─────────┘
                     ▼
           ┌───────────────────┐
           │ grep '^# @' over  │
           │ flows/*.ad        │
           └─────────┬─────────┘
                     ▼
           ┌───────────────────┐   none pass
           │ filter by @pre    │─────────────┐
           │ (is exists ...)   │             │
           └─────────┬─────────┘             │
                     ▼ some pass             │
           ┌───────────────────┐   mismatch  │
           │ filter by @param  │─────────────┤
           │ vs user intent    │             │
           └─────────┬─────────┘             │
                     ▼ match                 │
           ┌───────────────────┐             │
           │ pick by @post     │             │
           │ goal proximity    │             │
           └─────────┬─────────┘             │
                     ▼                       │
           ┌───────────────────┐             │
           │ agent-device      │             │
           │ replay <path>     │             │
           └─────────┬─────────┘             │
                     ▼                       │
           ┌───────────────────┐    fail     │
           │ verify @post      │─────────┐   │
           │ (is exists ...)   │         │   │
           └─────────┬─────────┘         ▼   ▼
                     ▼ pass          ┌──────────────┐
           ┌───────────────────┐     │ try peer,    │
           │ goal reached?     │     │ else go      │
           │ yes → done        │     │ manual       │
           │ no  → loop        │     └──────────────┘
           └───────────────────┘

Composition

Flows are narrow snippets, not self-contained scripts. They have no open / close / context and no fixed wait calls - the caller owns the session. That keeps them chainable:

          [auth wall]
              │
              ▼
  ┌─────────────────────────┐   @pre: textfield + Continue
  │ sign-in-new.ad          │   @param: account_state=new
  │                         │   @post: Welcome + Join
  └───────────┬─────────────┘
              ▼
       [Welcome / Join]
              │  (one manual tap - documented gap)
              ▼
    [onboarding step 1]
              │
              ▼
  ┌─────────────────────────┐   @pre: "What's your work email?"
  │ complete-onboarding.ad  │   @params: purpose, first_name, last_name
  │                         │   @post: Home + Search
  └───────────┬─────────────┘
              ▼
          [Home]  ✓ goal

Peer flows (e.g. sign-in-new / sign-in-returning) share the same @pre but differ on @param and @post. The agent tries the param-matching peer first and falls back to the other when the post-check fails - the decision loop catches the miss before the session is corrupted.

Matching primitives

Nothing new was added to the agent-device CLI. The framework uses commands already shipped with the tool:

Primitive	Purpose
`grep '^# @' flows/*.ad`	Discover the whole catalog in one read.
`agent-device snapshot -i`	See current UI state.
`agent-device is exists <sel>`	Check a single `@pre` or `@post` selector.
`agent-device replay <path>`	Execute the flow body.

Fixed Issues

$ #88388
PROPOSAL:

Tests

Offline tests

N/A - changes are agent-tooling under .claude/ and do not affect app runtime behavior or network state.

QA Steps

N/A - no shipped-app changes. Title includes [No QA].

Verify that no errors appear in the JS console

PR Author Checklist

Screenshots/Videos

Android: Native

Android: mWeb Chrome

iOS: Native

iOS: mWeb Safari

MacOS: Chrome / Safari

Replace the inlined sign-in walkthrough in SKILL.md with a pointer to a flows/ directory of .ad replay recordings. Each flow is invoked on explicit developer intent (not via snapshot matching) to keep the deterministic path free of LLM reasoning. Adds flows/README.md as the index; actual .ad recordings will be added once captured against a running app.

Replace the manual "ask the agent to run agent-device --version and npm root -g" instructions with dynamic context injection using the !\`cmd\` syntax. Commands run at skill load time (preprocessing), so the resolved version and canonical skill path land in the skill content directly - no tool call required from the agent. Pre-approves Bash(agent-device *) via allowed-tools in the skill frontmatter and also via .claude/settings.json so fresh checkouts do not get a permission prompt during the preprocessing step. Addresses Expensify#87662 (comment)

Add a Mobile Device Testing subsection parallel to Browser Testing in CLAUDE.md, and an optional AI-assisted testing callout in README after Platform-Specific Setup. Makes the agent-device skill discoverable for Claude Code users without claiming it's required setup.

Introduce `# @desc` / `# @pre` / `# @post` / `# @param` / `# @tag` comment headers in `.ad` flows. The replay parser already treats `#` lines as no-ops, so headers cost nothing at replay time while giving agents a machine-matchable catalog. - `@pre` / `@post` are selectors (same syntax as the flow body) that agents verify with `agent-device is exists`. This enables catalog filtering by current snapshot state and post-replay success checks. - `@param` advertises baked-in constants (email, account_state, names) so agents can match flows to user intent and skip when mismatched. - `@tag` supports free-form coarse categorization. Document the matcher loop in `SKILL.md`: snapshot -> grep catalog -> filter by `@pre` -> filter by `@param` -> pick by `@post` goal -> replay -> verify. Flesh out `flows/README.md` with the header spec, authoring rules, and updated recording workflow. Split the prior `sign-in.ad` into peer flows `sign-in-new.ad` and `sign-in-returning.ad` that share `@pre` (auth wall) but differ on `@param account_state` and `@post`. Add `complete-onboarding.ad` that skips the work-email step, picks a generic purpose, fills placeholder name fields, and lands on Home.

…mportant sentry spans

…ter photos, opening reports, scanning receipts, and submitting expenses. Each flow includes metadata headers for better organization and tracking of Sentry spans.

…etadata # Conflicts: # .claude/settings.json # .claude/skills/agent-device/SKILL.md

…ry integration. This change enhances tracking and organization of flows related to creating expenses, opening reports, scanning receipts, and navigating the app.

…ernal/Expensify-App into agent-device-flow-metadata

kacper-mikolajczak · 2026-04-23T07:50:52Z

Currently we are not able to parametrise the flows, so they consist of hard-coded values. We are considering upstream feature to enable that (for more details, please see: callstackincubator/agent-device#432)

…. Updated various flows to streamline button interactions and enhance metadata for Sentry tracking, including changes to expense creation, navigation, and report opening processes.

kacper-mikolajczak added 4 commits April 20, 2026 18:59

melvin-bot Bot assigned kacper-mikolajczak Apr 21, 2026

kacper-mikolajczak mentioned this pull request Apr 21, 2026

[No QA] Add agent-device skill and flow metadata framework #88403

Closed

51 tasks

BartekObudzinski and others added 5 commits April 22, 2026 10:58

Add flows for navigating and interacting with the Inbox, that cover i…

4692e36

…mportant sentry spans

Add new flows for creating expenses, navigating back, capturing odome…

9acaa0f

…ter photos, opening reports, scanning receipts, and submitting expenses. Each flow includes metadata headers for better organization and tracking of Sentry spans.

Merge remote-tracking branch 'upstream/main' into agent-device-flow-m…

0f24363

…etadata # Conflicts: # .claude/settings.json # .claude/skills/agent-device/SKILL.md

Update agent-device flows to replace # @span with # @tag for Sent…

acc223b

…ry integration. This change enhances tracking and organization of flows related to creating expenses, opening reports, scanning receipts, and navigating the app.

Merge branch 'agent-device-flow-metadata' of github.com:callstack-int…

fc578ed

…ernal/Expensify-App into agent-device-flow-metadata

Refactor agent-device flows to improve button role and label handling…

5881464

…. Updated various flows to streamline button interactions and enhance metadata for Sentry tracking, including changes to expense creation, navigation, and report opening processes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[No QA] Add agent-device skill and flow metadata framework#88474

[No QA] Add agent-device skill and flow metadata framework#88474
kacper-mikolajczak wants to merge 10 commits intoExpensify:mainfrom
callstack-internal:agent-device-flow-metadata

kacper-mikolajczak commented Apr 21, 2026

Uh oh!

kacper-mikolajczak commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kacper-mikolajczak commented Apr 21, 2026

Details

Explanation of Change

Why flows need metadata

Flow file anatomy

Agent decision loop

Composition

Matching primitives

Fixed Issues

Tests

Offline tests

QA Steps

PR Author Checklist

Screenshots/Videos

Uh oh!

kacper-mikolajczak commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants