[No QA] Add agent-device skill and flow metadata framework#88474
Draft
kacper-mikolajczak wants to merge 10 commits intoExpensify:mainfrom
Draft
[No QA] Add agent-device skill and flow metadata framework#88474kacper-mikolajczak wants to merge 10 commits intoExpensify:mainfrom
kacper-mikolajczak wants to merge 10 commits intoExpensify:mainfrom
Conversation
Replace the inlined sign-in walkthrough in SKILL.md with a pointer to a flows/ directory of .ad replay recordings. Each flow is invoked on explicit developer intent (not via snapshot matching) to keep the deterministic path free of LLM reasoning. Adds flows/README.md as the index; actual .ad recordings will be added once captured against a running app.
Replace the manual "ask the agent to run agent-device --version and npm root -g" instructions with dynamic context injection using the !\`cmd\` syntax. Commands run at skill load time (preprocessing), so the resolved version and canonical skill path land in the skill content directly - no tool call required from the agent. Pre-approves Bash(agent-device *) via allowed-tools in the skill frontmatter and also via .claude/settings.json so fresh checkouts do not get a permission prompt during the preprocessing step. Addresses Expensify#87662 (comment)
Add a Mobile Device Testing subsection parallel to Browser Testing in CLAUDE.md, and an optional AI-assisted testing callout in README after Platform-Specific Setup. Makes the agent-device skill discoverable for Claude Code users without claiming it's required setup.
Introduce `# @desc` / `# @pre` / `# @post` / `# @param` / `# @tag` comment headers in `.ad` flows. The replay parser already treats `#` lines as no-ops, so headers cost nothing at replay time while giving agents a machine-matchable catalog. - `@pre` / `@post` are selectors (same syntax as the flow body) that agents verify with `agent-device is exists`. This enables catalog filtering by current snapshot state and post-replay success checks. - `@param` advertises baked-in constants (email, account_state, names) so agents can match flows to user intent and skip when mismatched. - `@tag` supports free-form coarse categorization. Document the matcher loop in `SKILL.md`: snapshot -> grep catalog -> filter by `@pre` -> filter by `@param` -> pick by `@post` goal -> replay -> verify. Flesh out `flows/README.md` with the header spec, authoring rules, and updated recording workflow. Split the prior `sign-in.ad` into peer flows `sign-in-new.ad` and `sign-in-returning.ad` that share `@pre` (auth wall) but differ on `@param account_state` and `@post`. Add `complete-onboarding.ad` that skips the work-email step, picks a generic purpose, fills placeholder name fields, and lands on Home.
51 tasks
…mportant sentry spans
…ter photos, opening reports, scanning receipts, and submitting expenses. Each flow includes metadata headers for better organization and tracking of Sentry spans.
…etadata # Conflicts: # .claude/settings.json # .claude/skills/agent-device/SKILL.md
…ry integration. This change enhances tracking and organization of flows related to creating expenses, opening reports, scanning receipts, and navigating the app.
…ernal/Expensify-App into agent-device-flow-metadata
Contributor
Author
|
Currently we are not able to parametrise the flows, so they consist of hard-coded values. We are considering upstream feature to enable that (for more details, please see: callstackincubator/agent-device#432) |
…. Updated various flows to streamline button interactions and enhance metadata for Sentry tracking, including changes to expense creation, navigation, and report opening processes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
Depends on #87662 - that PR introduces the
agent-deviceskill, itsflows/directory, and the initialsign-in.adrecording. This PR layers a metadata framework on top. Review and merge #87662 first; once it lands, the diff here collapses to just the framework-specific work (headers, matcher loop, split peer flows,complete-onboarding.ad).Details
Layers a snapshot-driven, composable flow framework on top of the
agent-deviceskill added in #87662. Flows now self-describe their preconditions, postconditions, baked-in parameters, and tags via comment headers, enabling an agent to pre-filter which flow applies to the current screen before executing anything - making in-app automation token-efficient and resilient to UI drift.Explanation of Change
This PR extends
agent-deviceflows with a metadata layer. It does not reintroduce the base skill (that ships in #87662); it adds:.adfiles (# @desc/# @pre/# @post/# @param/# @tag) that the replay parser already treats as no-ops, so headers cost nothing at runtime.SKILL.mdthat uses the headers plus existing CLI primitives (agent-device snapshot -i,agent-device is exists,agent-device replay) to match, pick, execute, and verify a flow against the current state.sign-in.adfrom [NoQA] Add agent-device glue-code skill for mobile testing #87662 becomessign-in-new.ad/sign-in-returning.adso agents can pick by@param account_stateand fall back on@postmismatch. A newcomplete-onboarding.adlands the user on Home.Why flows need metadata
A flow without metadata is an opaque script. The agent cannot tell, from current state, whether the flow will do the right thing. That forces the agent to either read English prose and guess, or replay optimistically and hope. Both waste tokens and frequently land the session in a bad state.
The framework addresses this by asking every flow to declare, in
# @-prefixed comment headers, the conditions under which it applies (@pre), where it will leave the app (@post), the constants it bakes in (@param), and a free-form category (@tag). The replay parser already treats#lines as no-ops, so headers are free at runtime.Flow file anatomy
Agent decision loop
With the metadata in place, an agent follows a single loop before touching the UI manually:
Composition
Flows are narrow snippets, not self-contained scripts. They have no
open/close/contextand no fixedwaitcalls - the caller owns the session. That keeps them chainable:Peer flows (e.g.
sign-in-new/sign-in-returning) share the same@prebut differ on@paramand@post. The agent tries the param-matching peer first and falls back to the other when the post-check fails - the decision loop catches the miss before the session is corrupted.Matching primitives
Nothing new was added to the
agent-deviceCLI. The framework uses commands already shipped with the tool:grep '^# @' flows/*.adagent-device snapshot -iagent-device is exists <sel>@preor@postselector.agent-device replay <path>Fixed Issues
$ #88388
PROPOSAL:
Tests
Offline tests
N/A - changes are agent-tooling under
.claude/and do not affect app runtime behavior or network state.QA Steps
N/A - no shipped-app changes. Title includes
[No QA].PR Author Checklist
### Fixed Issuessection aboveTestssectionOffline stepssectionQA stepssectiontoggleReportand notonIconClick)src/languages/*files and using the translation methodSTYLE.md) were followedAvatar, I verified the components usingAvatarare working as expected)StyleUtils.getBackgroundAndBorderStyle(theme.componentBG))npm run compress-svg)Avataris modified, I verified thatAvataris working as expected in all cases)Designlabel and/or tagged@Expensify/designso the design team can review the changes.ScrollViewcomponent to make it scrollable when more elements are added to the page.mainbranch was merged into this PR after a review, I tested again and verified the outcome was still expected according to theTeststeps.Screenshots/Videos
Android: Native
Android: mWeb Chrome
iOS: Native
iOS: mWeb Safari
MacOS: Chrome / Safari