diff --git a/AGENTS.md b/AGENTS.md index f537530b..968bb560 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -153,7 +153,9 @@ Command-only flags (like `find --first`) that don't flow to the platform layer o - Do not duplicate `makeSessionStore`, `makeSession`, or device constants when a shared helper already exists. ## Testing Matrix -- Docs/skills only: no tests required. +- Docs/skills only: no tests required unless a more specific rule below applies. +- CLI help/guidance changes in `src/utils/command-schema.ts`: run `pnpm exec vitest run src/utils/__tests__/args.test.ts`. +- SkillGym prompt/assertion changes: run the touched `--case` checks; for broad validation, run cases in batches of 20 or fewer because full-suite runs can hang. - Non-TS, no behavior impact: no tests unless requested. - Keep tests behavioral; do not assert shapes or cases TypeScript already proves. - Any TS change: `pnpm typecheck` or `pnpm check:quick`. @@ -182,18 +184,18 @@ Command-only flags (like `find --first`) that don't flow to the platform layer o - Changing `tsconfig.lib.json`/build tooling without running `pnpm check:tooling`; declaration generation is stricter than `tsc --noEmit`. ## Docs & Skills -- For behavior/CLI surface changes, evaluate docs/skills updates. -- Update `README.md` and relevant `website/docs/**` pages for command behavior/flags/aliases/workflows. -- Update relevant `skills/**/SKILL.md` when usage examples/workflow recommendations change. -- Keep skill docs task-first: - - top-level `SKILL.md` should stay a thin router, not a full manual. - - keep detailed workflows/troubleshooting in a `references/` folder instead of growing the router. - - isolate true platform/infra exceptions (for example macOS-only or remote-tenancy-only guidance) in dedicated files. - - do not delete high-value operational guidance during refactors; move or condense it unless the behavior is obsolete. -- Optimize skills for cheap, less capable models: - - keep routing explicit, shallow, and easy to follow in one pass. - - prefer short task-first steps, concrete commands, and low-ambiguity wording over dense prose. - - avoid long reference chains or “figure it out” guidance when a direct next action can be stated. +- Versioned CLI help is the agent-facing source of truth. Put workflow guidance in `src/utils/command-schema.ts` help topics and assert important copy in `src/utils/__tests__/args.test.ts`. +- Skills are thin routers. Keep `skills/**/SKILL.md` focused on when to use the skill, version gating, which `agent-device help ` page to read, and a short default loop. Do not duplicate full CLI manuals in skills. +- For behavior/CLI surface changes, update `README.md`, relevant `website/docs/**`, and router skills only when their short routing guidance or version assumptions change. +- For command-planning guidance changes, update `test/skillgym/suites/agent-device-smoke-suite.ts` when the change should alter what an agent plans. +- Keep SkillGym cases behavioral and command-planning oriented. Prefer prompts that assert the user-visible contract and expected command family over brittle exact output, but forbid known bad patterns. +- Build before SkillGym when local CLI help is needed: `pnpm build`, then `pnpm exec skillgym run ... --case `. +- Run SkillGym broad validation in batches of 20 cases or fewer using repeated `--case` runs; do not rely on one full-suite invocation for large runs. +- Preserve current high-value workflow guidance: + - iOS Expo Go dogfood: prefer `agent-device open "Expo Go" --platform ios` when the shell is known, then `snapshot -i` to confirm the project UI rather than the runner splash. + - `keyboard dismiss` is best-effort on iOS; prefer a visible app dismiss control, or `back --system` only when system navigation is acceptable. + - Empty replacement is not a supported clear-field command; do not document or test `fill ""` as clearing. Prefer visible clear/reset controls or report the tool gap. + - Mutating commands against one session must run serially. Parallelize only read-only commands or commands on separate sessions/devices. - In final summaries, state whether docs/skills were updated; if not, explain why. ## When Blocked diff --git a/skills/agent-device/SKILL.md b/skills/agent-device/SKILL.md index ced7e006..4c6d2643 100644 --- a/skills/agent-device/SKILL.md +++ b/skills/agent-device/SKILL.md @@ -31,4 +31,4 @@ agent-device help dogfood Default loop: `open -> snapshot/-i -> get/is/find or press/fill/scroll/wait -> verify -> close`. -Keep refs current, prefer selectors/refs over coordinates, use `fill` to replace text, and use `back` for app-owned navigation. Let `help workflow` provide the exact command shapes. +Use this skill only to route into version-matched CLI help. Let `help workflow` provide exact command shapes, platform limits, and current workflow guidance. diff --git a/skills/dogfood/SKILL.md b/skills/dogfood/SKILL.md index a2deaf87..2295edda 100644 --- a/skills/dogfood/SKILL.md +++ b/skills/dogfood/SKILL.md @@ -22,4 +22,4 @@ agent-device help dogfood Loop: open named session -> snapshot -i + screenshot -> explore flows -> capture evidence per issue -> close. -Target app is required; infer platform or ask. Default output is `./dogfood-output/`. Findings must come from runtime behavior, not source reads. Re-snapshot after mutations. Use logs, network, trace, perf, overlay screenshots, or react-devtools only when they add evidence. +Target app is required; infer platform or ask. Findings must come from runtime behavior, not source reads. Let `help dogfood` provide exact report shape, evidence commands, and current workflow guidance. diff --git a/src/utils/__tests__/args.test.ts b/src/utils/__tests__/args.test.ts index 968e8d55..fa0e256b 100644 --- a/src/utils/__tests__/args.test.ts +++ b/src/utils/__tests__/args.test.ts @@ -796,9 +796,11 @@ test('usage includes agent workflows, config, environment, and examples footers' assert.match(usageText, /Truncated text\/input preview: expand first with snapshot -s @e12/); assert.match(usageText, /RN warning\/error overlays can block taps: snapshot -i/); assert.match(usageText, /Expo Go\/dev clients: use the provided URL when given/); - assert.match(usageText, /if only a target name is given, open that target/); + assert.match(usageText, /on iOS prefer open "Expo Go" /); assert.match(usageText, /Install flows: install\/install-from-source first/); assert.match(usageText, /fill 'id="field-email"' "qa@example\.com" replaces/); + assert.match(usageText, /do not use fill ""/); + assert.match(usageText, /Run mutating commands serially against one session/); assert.match(usageText, /After mutation: diff snapshot -i/); assert.match(usageText, /app-owned back uses back/); assert.match(usageText, /logs clear --restart\/mark\/path/); @@ -856,6 +858,15 @@ test('usageForCommand resolves workflow help topic', () => { assert.match(help, /report that gap instead of typing\/searching\/navigating/); assert.match(help, /If snapshot -i shows one, dismiss\/close its visible control/); assert.match(help, /iOS Allow Paste prompt cannot be exercised under XCUITest/); + assert.match(help, /Empty replacement is not a supported clear-field command/); + assert.match(help, /do not plan fill ""/); + assert.match(help, /iOS keyboard dismiss is best-effort/); + assert.match(help, /UNSUPPORTED_OPERATION/); + assert.match(help, /Stateful commands against one --session must run serially/); + assert.match( + help, + /Do not run open\/press\/fill\/type\/scroll\/back\/alert\/replay\/batch\/close commands in parallel/, + ); assert.match(help, /agent-device clipboard write "some text"/); assert.match(help, /trusted ADB keyboard IME/); assert.match(help, /if no URL is provided but a target\/app name is provided, open that target/); @@ -863,6 +874,8 @@ test('usageForCommand resolves workflow help topic', () => { assert.match(help, /do not write network log headers/); assert.match(help, /agent-device open exp:\/\/127\.0\.0\.1:8081 --platform ios/); assert.match(help, /agent-device open "Expo Go" exp:\/\/127\.0\.0\.1:8081 --platform ios/); + assert.match(help, /direct URL open can report success while leaving the runner\/shell focused/); + assert.match(help, /verify with snapshot -i after opening/); assert.match(help, /agent-device open exp:\/\/127\.0\.0\.1:8081 --platform android/); assert.match(help, /apps lookup misses the project but shows Expo Go\/dev-client/); assert.match(help, /metro prepare --kind expo/); @@ -909,6 +922,8 @@ test('usageForCommand resolves dogfood help topic', () => { assert.match(help, /Static\/on-load issues can use one screenshot/); assert.match(help, /React Native warning\/error overlays can be real findings/); assert.match(help, /Expo Go\/dev-client shells/); + assert.match(help, /Keep stateful commands serial within the same session/); + assert.match(help, /prefer agent-device open "Expo Go" /); assert.match(help, /dogfood-output\/report\.md/); assert.match(help, /ID, severity, category, title, affected flow\/screen/); assert.match(help, /Never delete screenshots, videos, traces, or report artifacts/); diff --git a/src/utils/command-schema.ts b/src/utils/command-schema.ts index e5d84939..cbe9f6a0 100644 --- a/src/utils/command-schema.ts +++ b/src/utils/command-schema.ts @@ -180,9 +180,11 @@ const AGENT_QUICKSTART_LINES = [ 'Read-only visible/state question: use snapshot/get/is/find; use snapshot -i only when refs are needed.', 'Truncated text/input preview: expand first with snapshot -s @e12, not get text.', 'RN warning/error overlays can block taps: snapshot -i, dismiss/close, then diff snapshot -i.', - 'Expo Go/dev clients: use the provided URL when given; if only a target name is given, open that target and do not search project files for a URL.', + 'Expo Go/dev clients: use the provided URL when given; on iOS prefer open "Expo Go" when the host shell is known.', 'Install flows: install/install-from-source first, then open the installed id with --relaunch.', 'Text: fill \'id="field-email"\' "qa@example.com" replaces; type appends after press.', + 'Clearing text: do not use fill ""; use a visible clear/reset control or report that clearing is unsupported.', + 'Run mutating commands serially against one session; parallelize only read-only commands or separate sessions.', 'Clipboard limits: iOS Allow Paste cannot be automated through XCUITest; prefill with clipboard write. Android non-ASCII should use fill/type, not raw adb input.', 'After mutation: diff snapshot -i. Off-screen hints: scroll, then snapshot -i.', 'Raw coordinates are fallback-only: use snapshot -i -c --json rects when iOS refs no-op or child refs are missing.', @@ -283,11 +285,17 @@ Text entry: agent-device fill 'id="field-email"' "qa@example.com" agent-device press 'id="product-note"' agent-device type "Handle with care" --delay-ms 80 - Debounced field with no result selector: agent-device wait 1000. Keyboard read-only: keyboard status/get. Blocked control: keyboard dismiss. + Empty replacement is not a supported clear-field command: do not plan fill "" or fill ''. Prefer a visible clear/reset control; if the app exposes none, report the tool gap instead of inventing a clear command. + Debounced field with no result selector: agent-device wait 1000. Keyboard read-only: keyboard status/get. Blocked control: try keyboard dismiss when supported. + iOS keyboard dismiss is best-effort and can return UNSUPPORTED_OPERATION when no native dismiss gesture/control is available. Prefer a visible app dismiss control, or use back --system only when system navigation is an acceptable side effect. Search-as-you-type fields on iOS can drop characters when driven too fast; use --delay-ms on fill/type before trying clipboard paste. iOS Allow Paste prompt cannot be exercised under XCUITest. To test paste-driven app behavior, prefill first with agent-device clipboard write "some text"; test the system prompt manually. Android non-ASCII can fail on some system images. Try fill/type normally; agent-device uses safer fallbacks. If the shell reports unsupported non-ASCII input, configure a trusted ADB keyboard IME outside the command plan and restore the previous IME afterward. +Session ordering: + Stateful commands against one --session must run serially. Do not run open/press/fill/type/scroll/back/alert/replay/batch/close commands in parallel against the same session. + It is fine to parallelize independent read-only collection or commands that use different sessions/devices. + Read-only and waits: Read-only visible/state question: use snapshot/get/is/find. agent-device snapshot @@ -334,9 +342,11 @@ React Native dev loop: agent-device find "Home" Do not use agent-device reload. Use open --relaunch for native startup reset. Warning/error overlays can obscure UI and intercept taps. If snapshot -i shows one, dismiss/close its visible control (for example Dismiss or Close) if it is not the task target, then diff snapshot -i or snapshot -i before tapping the real UI. - Expo Go is a host shell. Use a provided project URL instead of inventing a bundle id; if no URL is provided but a target/app name is provided, open that target and do not inspect project files to find one. iOS simulators can open a URL directly; use host + URL when targeting a specific host shell: - agent-device open exp://127.0.0.1:8081 --platform ios + Expo Go is a host shell. Use a provided project URL instead of inventing a bundle id; if no URL is provided but a target/app name is provided, open that target and do not inspect project files to find one. On iOS, prefer host + URL when the host shell is known because direct URL open can report success while leaving the runner/shell focused; verify with snapshot -i after opening: agent-device open "Expo Go" exp://127.0.0.1:8081 --platform ios + agent-device snapshot -i --platform ios + Direct iOS URL open remains valid when no host shell is known, but verify that the app UI loaded: + agent-device open exp://127.0.0.1:8081 --platform ios Android uses the URL target directly; do not write open there: agent-device open exp://127.0.0.1:8081 --platform android If apps lookup misses the project but shows Expo Go/dev-client and a project URL is available, open the URL/host shell; if no URL is available, ask instead of inventing an app id. @@ -536,11 +546,12 @@ Loop: 4. Map top-level navigation, then exercise primary flows and edge states. 5. For each issue, capture evidence and write the finding immediately, then continue. 6. Close the session and reconcile the report summary. + Keep stateful commands serial within the same session. Parallel runs can pollute text fields, focus, alerts, and navigation state. Coverage: Navigation, forms, empty/error/loading states, offline or retry behavior, permissions, settings, accessibility labels, orientation/keyboard, and obvious performance stalls. React Native warning/error overlays can be real findings or test blockers. Capture them, dismiss if unrelated, re-snapshot, and report them. - Expo Go/dev-client shells: use the provided exp:// or dev-client URL and record whether the shell, project load, or app UI is being tested. + Expo Go/dev-client shells: use the provided exp:// or dev-client URL and record whether the shell, project load, or app UI is being tested. On iOS dogfood, prefer agent-device open "Expo Go" when Expo Go is the known shell, then snapshot -i to confirm the project UI rather than the runner splash. Categories: visual, functional, UX, content, performance, diagnostics, permissions, accessibility. Severity: critical blocks a core flow/data/crashes; high breaks a major feature; medium has friction or workaround; low is polish. diff --git a/test/skillgym/README.md b/test/skillgym/README.md index 3e7bfe12..c24158dd 100644 --- a/test/skillgym/README.md +++ b/test/skillgym/README.md @@ -16,7 +16,7 @@ The included suite focuses on the first two layers so it stays stable and CI-saf - `../../examples/test-app/`: minimal Expo SDK 55 fixture app for broad UI coverage - `skillgym.config.ts`: starter config that runs Codex and Claude Haiku against this repo -- `suites/agent-device-smoke-suite.ts`: 66-case suite for skill routing, fixture-aware planning, and skill-guidance regressions +- `suites/agent-device-smoke-suite.ts`: planning suite for skill routing, fixture-aware flows, and skill-guidance regressions ## Current coverage @@ -28,7 +28,7 @@ Fixture smoke cases cover concrete app surfaces: - banners, alerts, toggles, and quick actions on Home - search debounce, filters, long-list scroll, favorites, and cart updates in Catalog - detail navigation, quantity edits, note append, and save-to-cart on Product -- form validation, success submit, keyboard dismiss, and reset on Checkout form +- form validation, success submit, iOS keyboard-dismiss fallback, and reset on Checkout form - diagnostics load/error/retry plus reset alert handling in Settings - accessibility audit via screenshot + snapshot @@ -36,11 +36,11 @@ Skill-guidance regression cases cover distinct command-planning habits: - read-only inspection versus mutation - fresh `@ref` targeting, durable selectors, raw-rect fallbacks, and off-screen scroll recovery -- text replacement, append semantics, keyboard status, and keyboard dismiss -- install/open setup, app discovery, session scoping, and app-owned navigation fallbacks +- text replacement, append semantics, supported field clearing, keyboard status, and keyboard fallback +- install/open setup, Expo Go host-shell launch, app discovery, session scoping, and app-owned navigation fallbacks - Metro reload, logs, network dump, alert fallback, and screenshot evidence - performance metrics, React DevTools profiling, gestures, settings, and trace capture -- remote config, macOS menu bar surfaces, replay update, and batch schema/recording +- remote config, macOS menu bar surfaces, replay update, same-session mutation ordering, and batch schema/recording `assertAgentDeviceEvidence` is intentionally soft when a runner does not expose skill-detection telemetry. When telemetry exists, the suite asserts that `agent-device` was loaded; when it is absent, the cases still judge command-planning output instead of failing on missing runner metadata. diff --git a/test/skillgym/suites/agent-device-smoke-suite.ts b/test/skillgym/suites/agent-device-smoke-suite.ts index a67b044b..bc7b4db9 100644 --- a/test/skillgym/suites/agent-device-smoke-suite.ts +++ b/test/skillgym/suites/agent-device-smoke-suite.ts @@ -115,6 +115,8 @@ const RAW_COORDINATE_TARGET = /(?:^|\n)(?:agent-device\s+)?(?:click|fill|press)\s+-?\d+(?:\.\d+)?\s+-?\d+(?:\.\d+)?/i; const PSEUDO_ASSERTION_COMMAND = /(?:^|\n)\s*(?:assert|assertVisible|waitFor|waitForText)\b/i; const COMPACT_RECT_SNAPSHOT = /snapshot\b(?=[^\n]*(?:-c\b|--compact\b))(?=[^\n]*(?:--json|--raw))/i; +const IOS_EXPO_GO_OPEN = + /(?:^|\n)(?:agent-device\s+)?open\s+["']Expo Go["']\s+["']?exp:\/\/127\.0\.0\.1:8081["']?/i; function makeCase(options: { id: string; @@ -139,9 +141,14 @@ function makeCase(options: { const FIXTURE_SMOKE_CASES: TestCase[] = [ makeCase({ id: 'open-and-snapshot', - contract: ['App name: Agent Device Tester', 'Platform: iOS', 'Launch context: Expo Go'], - task: 'Plan the commands to open Agent Device Tester in Expo Go on iOS, take a snapshot -i, then close.', - outputs: [commandPattern('open'), /snapshot -i/i, commandPattern('close')], + contract: [ + 'App name: Agent Device Tester', + 'Platform: iOS', + 'Launch context: Expo Go', + 'Project URL: exp://127.0.0.1:8081', + ], + task: 'Plan the commands to open Agent Device Tester in Expo Go on iOS, take a snapshot -i to verify the app UI loaded, then close.', + outputs: [IOS_EXPO_GO_OPEN, /snapshot -i/i, commandPattern('close')], }), makeCase({ id: 'home-dismiss-notice', @@ -165,9 +172,15 @@ const FIXTURE_SMOKE_CASES: TestCase[] = [ 'Current screen: Home tab', 'testID=home-open-modal', 'Opening it shows a native confirmation alert', + 'Home tab selector: label="Home"', + ], + task: 'Assume Agent Device Tester is already open on the Home tab. Plan the commands to open the confirmation alert, dismiss it using alert wait + alert dismiss, then verify the app is still on Home.', + outputs: [ + /home-open-modal/i, + commandPattern('alert wait'), + commandPattern('alert dismiss'), + /label=(?:["']Home["']|Home)/i, ], - task: 'Assume Agent Device Tester is already open on the Home tab. Plan the commands to open the confirmation alert and dismiss it using alert wait + alert dismiss.', - outputs: [/home-open-modal/i, commandPattern('alert wait'), commandPattern('alert dismiss')], }), makeCase({ id: 'home-refresh-metrics', @@ -267,8 +280,14 @@ const FIXTURE_SMOKE_CASES: TestCase[] = [ 'testID=quantity-decrease', 'testID=quantity-value', ], - task: 'Assume Agent Device Tester is already on a product detail screen. Plan the commands to increase quantity once, decrease it once, and get the quantity value.', - outputs: [/quantity-increase/i, /quantity-decrease/i, /quantity-value/i], + task: 'Assume Agent Device Tester is already on a product detail screen. Plan the commands to increase quantity once, decrease it once, and get the quantity value through the durable quantity-value id rather than ambiguous visible number text.', + outputs: [ + /quantity-increase/i, + /quantity-decrease/i, + commandPattern('get text'), + /id=(?:["']quantity-value["']|quantity-value)/i, + ], + forbiddenOutputs: [/get text ['"]?2['"]?/i, /wait text ['"]?2['"]?/i, /label=["']2["']/i], }), makeCase({ id: 'product-note-append', @@ -318,16 +337,18 @@ const FIXTURE_SMOKE_CASES: TestCase[] = [ outputs: [/field-name/i, /field-email/i, /checkbox-agree/i, /form-success/i], }), makeCase({ - id: 'form-keyboard-dismiss', + id: 'form-keyboard-dismiss-ios-fallback', contract: [ 'App name: Agent Device Tester', + 'Platform: iOS simulator', 'Current screen: Checkout form tab', 'testID=field-name', - 'keyboard can be dismissed after focusing the field', + 'keyboard dismiss already returned UNSUPPORTED_OPERATION', + 'visible app keyboard close control: Done', ], - task: 'Assume Agent Device Tester is on the Checkout form tab. Plan the commands to focus the Full name field and dismiss the keyboard using keyboard dismiss.', - outputs: [/field-name/i, /keyboard dismiss/i], - forbiddenOutputs: [commandPattern('back')], + task: 'Assume Agent Device Tester is on the Checkout form tab. Plan the fallback commands to focus the Full name field, close the iOS keyboard through the visible app control, and verify the field remains visible.', + outputs: [/field-name/i, /Done/i, commandAlternativesPattern(['press', 'click'])], + forbiddenOutputs: [commandPattern('keyboard dismiss'), commandPattern('back')], }), makeCase({ id: 'form-reset', @@ -335,10 +356,17 @@ const FIXTURE_SMOKE_CASES: TestCase[] = [ 'App name: Agent Device Tester', 'Current screen: Checkout form tab', 'testID=reset-form', + 'Full name field can be checked through id="field-name" or id="full-name"', + 'Validation errors can be checked through id="form-errors", id="full-name-error", or visible text "Required"', 'toast text after reset: Form cleared', ], - task: 'Assume Agent Device Tester is on the Checkout form tab. Plan the commands to press Reset form and verify the Form cleared toast appears.', - outputs: [/reset-form/i, /Form cleared/i], + task: 'Assume Agent Device Tester is on the Checkout form tab after validation errors. Plan the commands to press Reset form, verify the Form cleared toast appears, verify validation errors are hidden, and verify the Full name field state is cleared.', + outputs: [ + /reset-form/i, + /Form cleared/i, + /(?:form-errors|full-name-error|Required)/i, + /(?:field-name|full-name)/i, + ], }), makeCase({ id: 'settings-toggle-preferences', @@ -508,6 +536,27 @@ const SKILL_GUIDANCE_CASES: TestCase[] = [ ], forbiddenOutputs: [commandPattern('type'), /(?:^|\n)(?:agent-device\s+)?fill\s+\d+\s+\d+/i], }), + makeCase({ + id: 'empty-fill-not-clear-field', + contract: [ + 'App name: Agent Device Tester', + 'Current screen: Catalog tab', + 'Search field selector: id="catalog-search"', + 'Visible clear control selector: id="clear-search"', + 'Need to clear the existing search text', + 'fill "" is not supported', + ], + task: 'Plan the supported command to clear the search field without using empty fill replacement.', + outputs: [ + /id=(?:["']clear-search["']|clear-search)/i, + commandAlternativesPattern(['press', 'click']), + ], + forbiddenOutputs: [ + /fill\b[^\n]*(?:id=["']catalog-search["']|catalog-search)[^\n]*(?:""|''|\s$)/i, + commandPattern('type'), + /\bclear\s+field\b/i, + ], + }), makeCase({ id: 'ios-allow-paste-prefill-only', contract: [ @@ -731,15 +780,35 @@ const SKILL_GUIDANCE_CASES: TestCase[] = [ 'Launch context: Expo Go', 'Project URL: exp://127.0.0.1:8081', 'The native bundle id for the project is not installed separately', + 'The final command must include --platform ios', ], - task: 'Plan the command to launch the Expo project in Expo Go without inventing a native bundle id.', - outputs: [commandPattern('open'), /exp:\/\/127\.0\.0\.1:8081/i, /--platform ios/i], + task: 'Plan the command to launch the Expo project in Expo Go on iOS without inventing a native bundle id.', + outputs: [IOS_EXPO_GO_OPEN, /--platform ios/i], forbiddenOutputs: [ /open\s+Agent Device Tester/i, /host\.exp\.Exponent/i, /com\.(?:callstack|example|agent)/i, ], }), + makeCase({ + id: 'expo-go-ios-runner-splash-retry-host-shell', + contract: [ + 'Platform: iOS simulator', + 'Launch context: Expo Go', + 'Project URL: exp://127.0.0.1:8081', + 'Previous command open exp://127.0.0.1:8081 returned Opened', + 'Fresh snapshot -i showed only the Agent Device Runner splash', + 'Expo Go is available as a host shell', + ], + task: 'Plan the next commands to recover by opening the project through Expo Go and verifying the app UI loaded.', + outputs: [IOS_EXPO_GO_OPEN, /snapshot -i/i], + forbiddenOutputs: [ + /open\s+Agent Device Runner/i, + /open\s+Agent Device Tester/i, + /com\.(?:callstack|example|agent)/i, + /host\.exp\.Exponent/i, + ], + }), makeCase({ id: 'expo-go-ios-after-app-id-miss', contract: [ @@ -750,7 +819,7 @@ const SKILL_GUIDANCE_CASES: TestCase[] = [ 'Project URL: exp://127.0.0.1:8081', ], task: 'Plan the next command to launch the project after the app-id lookup miss without inventing a native bundle id.', - outputs: [commandPattern('open'), /exp:\/\/127\.0\.0\.1:8081/i], + outputs: [IOS_EXPO_GO_OPEN], forbiddenOutputs: [ /open\s+Agent Device Tester/i, /com\.(?:callstack|example|agent)/i, @@ -888,15 +957,16 @@ const SKILL_GUIDANCE_CASES: TestCase[] = [ 'Current screen: onboarding carousel', 'Need to advance and return across pages repeatedly', 'Gesture should use a swipe series, not scroll', + 'Use one direct swipe command with --count and --pattern; do not use batch', ], - task: 'Plan the gesture command to swipe horizontally across the carousel eight times with a 30ms pause and ping-pong pattern.', + task: 'Plan one direct gesture command to swipe horizontally across the carousel eight times with a 30ms pause and ping-pong pattern.', outputs: [ commandPattern('swipe'), /--count\s+8/i, /--pause-ms\s+30/i, /--pattern\s+ping-pong/i, ], - forbiddenOutputs: [commandPattern('scroll'), RAW_COORDINATE_TARGET], + forbiddenOutputs: [commandPattern('scroll'), commandPattern('batch'), RAW_COORDINATE_TARGET], }), makeCase({ id: 'gesture-longpress-context-menu', @@ -1088,6 +1158,30 @@ const SKILL_GUIDANCE_CASES: TestCase[] = [ ], forbiddenOutputs: [PSEUDO_ASSERTION_COMMAND, /workflow batch/i, commandPattern('trace')], }), + makeCase({ + id: 'same-session-mutations-serial', + contract: [ + 'Session: dogfood-test-app', + 'Current screen: Checkout form tab', + 'Name field selector: id="field-name"', + 'Email field selector: id="field-email"', + 'Submit button selector: id="submit-order"', + 'Need to fill name, fill email, and press submit as three separate commands', + 'All commands mutate the same active device session', + 'Parallel same-session mutations can pollute focus and field state', + 'Do not use batch for this case; demonstrate serial command ordering', + ], + task: 'Plan the three separate serial commands for this same-session form flow using the durable selectors.', + outputs: [/--session dogfood-test-app/i, /field-name/i, /field-email/i, /submit-order/i], + forbiddenOutputs: [ + /Based on my/i, + /Let me/i, + /Promise\.all/i, + /(?:^|\n).*(?:fill|press).*(?:&|&&).*(?:fill|press)/i, + /parallel/i, + commandPattern('batch'), + ], + }), makeCase({ id: 'batch-inline-step-schema-positionals', contract: [ diff --git a/website/docs/docs/commands.md b/website/docs/docs/commands.md index 5425ae0a..10b9d425 100644 --- a/website/docs/docs/commands.md +++ b/website/docs/docs/commands.md @@ -536,9 +536,10 @@ agent-device keyboard dismiss - `keyboard status` (or `keyboard get`) returns keyboard visibility and best-effort input type classification on Android. - `keyboard dismiss` attempts a non-navigation keyboard dismissal on Android and a native dismiss gesture/control on iOS, then confirms the keyboard is hidden. - If the keyboard remains visible after the platform-native dismiss path, the command returns an explicit `UNSUPPORTED_OPERATION` error instead of falling back to back navigation. +- On iOS, `keyboard dismiss` is best-effort and can fail when the active app exposes no native dismiss gesture/control. Prefer a visible app dismiss control, or use `back --system` only when system navigation is an acceptable side effect. - Works with active sessions and explicit selectors (`--platform`, `--device`, `--udid`, `--serial`). - `keyboard status|get` is supported on Android emulator/device. -- `keyboard dismiss` is supported on Android emulator/device and iOS simulator/device. +- `keyboard dismiss` is supported on Android emulator/device and best-effort on iOS simulator/device. ## Performance metrics diff --git a/website/docs/docs/introduction.md b/website/docs/docs/introduction.md index a97cd426..975676f7 100644 --- a/website/docs/docs/introduction.md +++ b/website/docs/docs/introduction.md @@ -36,7 +36,7 @@ For agent-oriented operating guidance, start with `agent-device help` or `agent- - Physical-device recording defaults to 15 FPS and supports `--fps` caps. - `record start --quality <5-10>` scales recording resolution from 50% through native resolution; omitting it keeps native/current resolution. - Android supports the same core interaction set, plus `rotate`, `push` notification simulation, `clipboard read/write`, and `keyboard status|get|dismiss`. -- iOS supports `keyboard dismiss` through the XCTest runner when the on-screen keyboard is visible. +- iOS `keyboard dismiss` is best-effort through the XCTest runner and can fail when the app exposes no native dismiss gesture/control. - App-event triggers are available on iOS and Android through app-defined deep-link hooks (`trigger-app-event`), using active session context or explicit device selectors. ## Architecture (high level) diff --git a/website/docs/docs/sessions.md b/website/docs/docs/sessions.md index d388ec70..d60fcfe1 100644 --- a/website/docs/docs/sessions.md +++ b/website/docs/docs/sessions.md @@ -35,6 +35,6 @@ Notes: - On iOS devices, `http(s)://` URLs open in Safari when no app is active. Custom scheme URLs require an active app in the session. - On iOS, `appstate` is session-scoped and requires a matching active session on the target device. - For remote `connect --remote-config` sessions, see [Commands](/docs/commands#remote-metro-workflow). -- Use `--session ` to run multiple sessions in parallel. +- Use `--session ` to run multiple sessions in parallel. Do not parallelize mutating commands against the same session; serialize stateful actions such as open, press, fill, type, scroll, back, alert, replay, batch, and close. For replay scripts and deterministic E2E guidance, see [Replay & E2E (Experimental)](/docs/replay-e2e).