From 29ffc01209a3db47048083b7b2cb952e918bee8d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Pierzcha=C5=82a?= Date: Thu, 26 Feb 2026 11:41:39 +0100 Subject: [PATCH] feat: add Android keyboard status and dismiss command --- README.md | 8 + skills/agent-device/SKILL.md | 3 + src/core/__tests__/capabilities.test.ts | 7 + src/core/capabilities.ts | 1 + src/core/dispatch.ts | 35 ++++ src/daemon/handlers/__tests__/session.test.ts | 106 +++++++++++ src/daemon/handlers/session.ts | 14 ++ src/platforms/android/__tests__/index.test.ts | 165 ++++++++++++++++++ src/platforms/android/index.ts | 131 ++++++++++++++ src/utils/__tests__/args.test.ts | 18 ++ src/utils/command-schema.ts | 6 + website/docs/docs/commands.md | 13 ++ website/docs/docs/introduction.md | 2 +- 13 files changed, 508 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 2bac44363..5582e93de 100644 --- a/README.md +++ b/README.md @@ -17,6 +17,7 @@ The project is in early development and considered experimental. Pull requests a - Core commands: `open`, `back`, `home`, `app-switcher`, `press`, `long-press`, `focus`, `type`, `fill`, `scroll`, `scrollintoview`, `wait`, `alert`, `screenshot`, `close`, `reinstall`, `push`, `trigger-app-event`. - Inspection commands: `snapshot` (accessibility tree), `diff snapshot` (structural baseline diff), `appstate`, `apps`, `devices`. - Clipboard commands: `clipboard read`, `clipboard write `. +- Keyboard commands: `keyboard status|get|dismiss` (Android). - Performance command: `perf` (alias: `metrics`) returns a metrics JSON blob for the active session; startup timing is currently sampled. - App logs and traffic inspection: `logs path` returns session log metadata; `logs start` / `logs stop` stream app output; `logs clear` truncates session app logs; `logs clear --restart` resets and restarts stream in one step; `logs doctor` checks readiness; `logs mark` writes timeline markers; `network dump` parses recent HTTP(s) entries from session logs. - Device tooling: `adb` (Android), `simctl`/`devicectl` (iOS via Xcode). @@ -152,6 +153,7 @@ agent-device scrollintoview @e42 - `trace start`, `trace stop` - `logs path`, `logs start`, `logs stop`, `logs clear`, `logs clear --restart`, `logs doctor`, `logs mark` (session app log file for grep; iOS simulator + iOS device + Android) - `clipboard read`, `clipboard write ` (iOS simulator + Android) +- `keyboard [status|get|dismiss]` (Android emulator/device) - `network dump [limit] [summary|headers|body|all]`, `network log ...` (best-effort HTTP(s) parsing from session app log) - `settings wifi|airplane|location on|off` - `settings appearance light|dark|toggle` @@ -418,6 +420,12 @@ Clipboard: - Supported on Android emulator/device and iOS simulator. - iOS physical devices currently return `UNSUPPORTED_OPERATION` for clipboard commands. +Keyboard: +- `keyboard status` (or `keyboard get`) reports Android keyboard visibility and best-effort input type classification (`text`, `number`, `email`, `phone`, `password`, `datetime`). +- `keyboard dismiss` issues Android back keyevent only when keyboard is visible, then verifies hidden state. +- Works with an active session device or explicit selectors (`--platform`, `--device`, `--udid`, `--serial`). +- Supported on Android emulator/device. + ## Debug - **App logs (token-efficient):** Logging is off by default in normal flows. Enable it on demand when debugging. With an active session, run `logs path` to get path + state metadata (e.g. `/sessions//app.log`). Run `logs start` to stream app output to that file; use `logs stop` to stop. Run `logs clear` to truncate `app.log` (and remove rotated `app.log.N` files) before a new repro window. Run `logs doctor` for tool/runtime checks and `logs mark "step"` to insert timeline markers. Grep the file when you need to inspect errors (e.g. `grep -n "Error\|Exception" `) instead of pulling full logs into context. Supported on iOS simulator, iOS physical device, and Android. diff --git a/skills/agent-device/SKILL.md b/skills/agent-device/SKILL.md index 4e0d4fd2a..12a196b35 100644 --- a/skills/agent-device/SKILL.md +++ b/skills/agent-device/SKILL.md @@ -138,6 +138,8 @@ agent-device is visible 'id="anchor"' agent-device appstate agent-device clipboard read agent-device clipboard write "token" +agent-device keyboard status +agent-device keyboard dismiss agent-device perf --json agent-device network dump [limit] [summary|headers|body|all] agent-device push @@ -169,6 +171,7 @@ agent-device batch --steps-file /tmp/batch-steps.json --json - Use `fill` for clear-then-type semantics; use `type` for focused append typing. - iOS `appstate` is session-scoped; Android `appstate` is live foreground state. - Clipboard helpers: `clipboard read` / `clipboard write ` are supported on Android and iOS simulators; iOS physical devices are not supported yet. +- Android keyboard helpers: `keyboard status|get|dismiss` report keyboard visibility/type and dismiss via keyevent when visible. - `network dump` is best-effort and parses HTTP(s) entries from the session app log file. - Biometric settings: iOS simulator supports `settings faceid|touchid `; Android supports `settings fingerprint ` where runtime tooling is available. - For AndroidTV/tvOS selection, always pair `--target` with `--platform` (`ios`, `android`, or `apple` alias); target-only selection is invalid. diff --git a/src/core/__tests__/capabilities.test.ts b/src/core/__tests__/capabilities.test.ts index 7c0008454..2d2bede54 100644 --- a/src/core/__tests__/capabilities.test.ts +++ b/src/core/__tests__/capabilities.test.ts @@ -56,6 +56,12 @@ test('simulator-only iOS commands with Android support reject iOS devices', () = } }); +test('keyboard command is Android-only', () => { + assert.equal(isCommandSupportedOnDevice('keyboard', iosSimulator), false, 'keyboard on iOS sim'); + assert.equal(isCommandSupportedOnDevice('keyboard', iosDevice), false, 'keyboard on iOS device'); + assert.equal(isCommandSupportedOnDevice('keyboard', androidDevice), true, 'keyboard on Android'); +}); + test('swipe supports iOS simulator, iOS device, and Android', () => { assert.equal(isCommandSupportedOnDevice('swipe', iosSimulator), true, 'swipe on iOS sim'); assert.equal(isCommandSupportedOnDevice('swipe', iosDevice), true, 'swipe on iOS device'); @@ -127,6 +133,7 @@ test('tvOS follows iOS capability matrix by device kind', () => { for (const cmd of ['pinch', 'push', 'settings', 'alert']) { assert.equal(isCommandSupportedOnDevice(cmd, tvOsSimulator), true, `${cmd} on tvOS simulator`); } + assert.equal(isCommandSupportedOnDevice('keyboard', tvOsSimulator), false, 'keyboard on tvOS simulator'); }); test('unknown commands default to supported', () => { diff --git a/src/core/capabilities.ts b/src/core/capabilities.ts index 1c28165f4..c271d9a40 100644 --- a/src/core/capabilities.ts +++ b/src/core/capabilities.ts @@ -22,6 +22,7 @@ const COMMAND_CAPABILITY_MATRIX: Record = { boot: { ios: { simulator: true, device: true }, android: { emulator: true, device: true, unknown: true } }, click: { ios: { simulator: true, device: true }, android: { emulator: true, device: true, unknown: true } }, clipboard: { ios: { simulator: true }, android: { emulator: true, device: true, unknown: true } }, + keyboard: { ios: {}, android: { emulator: true, device: true, unknown: true } }, close: { ios: { simulator: true, device: true }, android: { emulator: true, device: true, unknown: true } }, fill: { ios: { simulator: true, device: true }, android: { emulator: true, device: true, unknown: true } }, diff: { ios: { simulator: true, device: true }, android: { emulator: true, device: true, unknown: true } }, diff --git a/src/core/dispatch.ts b/src/core/dispatch.ts index 512e7da7d..93cefa847 100644 --- a/src/core/dispatch.ts +++ b/src/core/dispatch.ts @@ -6,7 +6,9 @@ import { listAndroidDevices } from '../platforms/android/devices.ts'; import { appSwitcherAndroid, backAndroid, + dismissAndroidKeyboard, ensureAdb, + getAndroidKeyboardState, homeAndroid, pushAndroidNotification, readAndroidClipboardText, @@ -461,6 +463,39 @@ export async function dispatchCommand( else await writeAndroidClipboardText(device, text); return { action, textLength: Array.from(text).length }; } + case 'keyboard': { + if (device.platform !== 'android') { + throw new AppError('UNSUPPORTED_OPERATION', 'keyboard is currently supported only on Android'); + } + const action = (positionals[0] ?? 'status').toLowerCase(); + if (action !== 'status' && action !== 'get' && action !== 'dismiss') { + throw new AppError('INVALID_ARGS', 'keyboard requires a subcommand: status, get, or dismiss'); + } + if (positionals.length > 1) { + throw new AppError('INVALID_ARGS', 'keyboard accepts at most one subcommand argument'); + } + if (action === 'dismiss') { + const result = await dismissAndroidKeyboard(device); + return { + platform: 'android', + action: 'dismiss', + attempts: result.attempts, + wasVisible: result.wasVisible, + dismissed: result.dismissed, + visible: result.visible, + inputType: result.inputType, + type: result.type, + }; + } + const state = await getAndroidKeyboardState(device); + return { + platform: 'android', + action: 'status', + visible: state.visible, + inputType: state.inputType, + type: state.type, + }; + } case 'settings': { const [setting, state, target, mode, appBundleId] = positionals; const permissionOptions = diff --git a/src/daemon/handlers/__tests__/session.test.ts b/src/daemon/handlers/__tests__/session.test.ts index 7edfeac55..361837110 100644 --- a/src/daemon/handlers/__tests__/session.test.ts +++ b/src/daemon/handlers/__tests__/session.test.ts @@ -838,6 +838,112 @@ test('clipboard requires an active session or explicit device selector', async ( } }); +test('keyboard requires an active session or explicit device selector', async () => { + const sessionStore = makeSessionStore(); + const response = await handleSessionCommands({ + req: { + token: 't', + session: 'default', + command: 'keyboard', + positionals: ['status'], + flags: {}, + }, + sessionName: 'default', + logPath: path.join(os.tmpdir(), 'daemon.log'), + sessionStore, + invoke: noopInvoke, + }); + + assert.ok(response); + assert.equal(response?.ok, false); + if (response && !response.ok) { + assert.equal(response.error.code, 'INVALID_ARGS'); + assert.match(response.error.message, /keyboard requires an active session or an explicit device selector/i); + } +}); + +test('keyboard dismiss supports explicit selector without active session', async () => { + const sessionStore = makeSessionStore(); + const selectedDevice: SessionState['device'] = { + platform: 'android', + id: 'emulator-5554', + name: 'Pixel Emulator', + kind: 'emulator', + booted: true, + }; + + const response = await handleSessionCommands({ + req: { + token: 't', + session: 'default', + command: 'keyboard', + positionals: ['dismiss'], + flags: { platform: 'android', serial: 'emulator-5554' }, + }, + sessionName: 'default', + logPath: path.join(os.tmpdir(), 'daemon.log'), + sessionStore, + invoke: noopInvoke, + ensureReady: async () => {}, + resolveTargetDevice: async () => selectedDevice, + dispatch: async (device, command, positionals) => { + assert.equal(device.id, 'emulator-5554'); + assert.equal(command, 'keyboard'); + assert.deepEqual(positionals, ['dismiss']); + return { platform: 'android', action: 'dismiss', dismissed: true, visible: false }; + }, + }); + + assert.ok(response); + assert.equal(response?.ok, true); + if (response && response.ok) { + assert.equal(response.data?.platform, 'android'); + assert.equal(response.data?.action, 'dismiss'); + assert.equal(response.data?.dismissed, true); + assert.equal(response.data?.visible, false); + } +}); + +test('keyboard rejects unsupported iOS simulator devices', async () => { + const sessionStore = makeSessionStore(); + const sessionName = 'ios-sim-session'; + sessionStore.set( + sessionName, + makeSession(sessionName, { + platform: 'ios', + id: 'sim-1', + name: 'iPhone 17 Pro', + kind: 'simulator', + booted: true, + }), + ); + + const response = await handleSessionCommands({ + req: { + token: 't', + session: sessionName, + command: 'keyboard', + positionals: ['status'], + flags: {}, + }, + sessionName, + logPath: path.join(os.tmpdir(), 'daemon.log'), + sessionStore, + invoke: noopInvoke, + ensureReady: async () => {}, + dispatch: async () => { + throw new Error('dispatch should not run for unsupported targets'); + }, + }); + + assert.ok(response); + assert.equal(response?.ok, false); + if (response && !response.ok) { + assert.equal(response.error.code, 'UNSUPPORTED_OPERATION'); + assert.match(response.error.message, /keyboard is not supported on this device/i); + } +}); + test('clipboard read uses active session device', async () => { const sessionStore = makeSessionStore(); const sessionName = 'ios-sim-session'; diff --git a/src/daemon/handlers/session.ts b/src/daemon/handlers/session.ts index 74e51353f..00b1c1332 100644 --- a/src/daemon/handlers/session.ts +++ b/src/daemon/handlers/session.ts @@ -806,6 +806,20 @@ export async function handleSessionCommands(params: { }); } + if (command === 'keyboard') { + return await runSessionOrSelectorDispatch({ + req, + sessionName, + logPath, + sessionStore, + ensureReady, + resolveDevice, + dispatch, + command: 'keyboard', + positionals: req.positionals ?? [], + }); + } + if (command === 'perf') { const session = sessionStore.get(sessionName); if (!session) { diff --git a/src/platforms/android/__tests__/index.test.ts b/src/platforms/android/__tests__/index.test.ts index f2af52ad0..049a440db 100644 --- a/src/platforms/android/__tests__/index.test.ts +++ b/src/platforms/android/__tests__/index.test.ts @@ -4,7 +4,9 @@ import { promises as fs } from 'node:fs'; import os from 'node:os'; import path from 'node:path'; import { + dismissAndroidKeyboard, fillAndroid, + getAndroidKeyboardState, inferAndroidAppName, isAmStartError, listAndroidApps, @@ -918,6 +920,169 @@ test('readAndroidClipboardText uses adb cmd clipboard get text', async () => { }, ); }); + +test('getAndroidKeyboardState reads visibility and input type', async () => { + await withMockedAdb( + 'agent-device-android-keyboard-state-', + [ + '#!/bin/sh', + 'if [ "$1" = "-s" ]; then', + ' shift', + ' shift', + 'fi', + 'if [ "$1" = "shell" ] && [ "$2" = "dumpsys" ] && [ "$3" = "input_method" ]; then', + ' echo "mInputShown=true mIsInputViewShown=true"', + ' echo "inputType=0x21 imeOptions=0x12000000 privateImeOptions=null"', + ' exit 0', + 'fi', + 'echo "unexpected args: $@" >&2', + 'exit 1', + '', + ].join('\n'), + async ({ device }) => { + const state = await getAndroidKeyboardState(device); + assert.equal(state.visible, true); + assert.equal(state.inputType, '0x21'); + assert.equal(state.type, 'email'); + }, + ); +}); + +test('getAndroidKeyboardState falls back to mImeWindowVis flag', async () => { + await withMockedAdb( + 'agent-device-android-keyboard-window-vis-', + [ + '#!/bin/sh', + 'if [ "$1" = "-s" ]; then', + ' shift', + ' shift', + 'fi', + 'if [ "$1" = "shell" ] && [ "$2" = "dumpsys" ] && [ "$3" = "input_method" ]; then', + ' echo "mImeWindowVis=0x1"', + ' echo "inputType=0x2"', + ' exit 0', + 'fi', + 'echo "unexpected args: $@" >&2', + 'exit 1', + '', + ].join('\n'), + async ({ device }) => { + const state = await getAndroidKeyboardState(device); + assert.equal(state.visible, true); + assert.equal(state.inputType, '0x2'); + assert.equal(state.type, 'number'); + }, + ); +}); + +test('getAndroidKeyboardState uses latest visibility value when dumpsys contains duplicates', async () => { + await withMockedAdb( + 'agent-device-android-keyboard-duplicate-visibility-', + [ + '#!/bin/sh', + 'if [ "$1" = "-s" ]; then', + ' shift', + ' shift', + 'fi', + 'if [ "$1" = "shell" ] && [ "$2" = "dumpsys" ] && [ "$3" = "input_method" ]; then', + ' echo "mInputShown=true"', + ' echo "mInputShown=false"', + ' echo "mIsInputViewShown=false"', + ' echo "inputType=0x21"', + ' exit 0', + 'fi', + 'echo "unexpected args: $@" >&2', + 'exit 1', + '', + ].join('\n'), + async ({ device }) => { + const state = await getAndroidKeyboardState(device); + assert.equal(state.visible, false); + assert.equal(state.inputType, '0x21'); + assert.equal(state.type, 'email'); + }, + ); +}); + +test('dismissAndroidKeyboard skips keyevent when keyboard is already hidden', async () => { + await withMockedAdb( + 'agent-device-android-keyboard-dismiss-hidden-', + [ + '#!/bin/sh', + 'printf "__CMD__\\n" >> "$AGENT_DEVICE_TEST_ARGS_FILE"', + 'printf "%s\\n" "$@" >> "$AGENT_DEVICE_TEST_ARGS_FILE"', + 'if [ "$1" = "-s" ]; then', + ' shift', + ' shift', + 'fi', + 'if [ "$1" = "shell" ] && [ "$2" = "dumpsys" ] && [ "$3" = "input_method" ]; then', + ' echo "mInputShown=false mIsInputViewShown=false"', + ' exit 0', + 'fi', + 'if [ "$1" = "shell" ] && [ "$2" = "input" ] && [ "$3" = "keyevent" ] && [ "$4" = "4" ]; then', + ' echo "unexpected keyevent" >&2', + ' exit 1', + 'fi', + 'echo "unexpected args: $@" >&2', + 'exit 1', + '', + ].join('\n'), + async ({ argsLogPath, device }) => { + const result = await dismissAndroidKeyboard(device); + assert.equal(result.attempts, 0); + assert.equal(result.wasVisible, false); + assert.equal(result.dismissed, false); + assert.equal(result.visible, false); + + const logged = await fs.readFile(argsLogPath, 'utf8'); + assert.doesNotMatch(logged, /shell\ninput\nkeyevent\n4/); + }, + ); +}); + +test('dismissAndroidKeyboard sends back keyevent and confirms hidden state', async () => { + await withMockedAdb( + 'agent-device-android-keyboard-dismiss-visible-', + [ + '#!/bin/sh', + 'STATE_FILE="$(dirname "$AGENT_DEVICE_TEST_ARGS_FILE")/keyboard_hidden.txt"', + 'printf "__CMD__\\n" >> "$AGENT_DEVICE_TEST_ARGS_FILE"', + 'printf "%s\\n" "$@" >> "$AGENT_DEVICE_TEST_ARGS_FILE"', + 'if [ "$1" = "-s" ]; then', + ' shift', + ' shift', + 'fi', + 'if [ "$1" = "shell" ] && [ "$2" = "dumpsys" ] && [ "$3" = "input_method" ]; then', + ' if [ -f "$STATE_FILE" ]; then', + ' echo "mInputShown=false mIsInputViewShown=false"', + ' exit 0', + ' fi', + ' echo "mInputShown=true mIsInputViewShown=true"', + ' echo "inputType=0x2"', + ' exit 0', + 'fi', + 'if [ "$1" = "shell" ] && [ "$2" = "input" ] && [ "$3" = "keyevent" ] && [ "$4" = "4" ]; then', + ' touch "$STATE_FILE"', + ' exit 0', + 'fi', + 'echo "unexpected args: $@" >&2', + 'exit 1', + '', + ].join('\n'), + async ({ argsLogPath, device }) => { + const result = await dismissAndroidKeyboard(device); + assert.equal(result.attempts, 1); + assert.equal(result.wasVisible, true); + assert.equal(result.dismissed, true); + assert.equal(result.visible, false); + + const logged = await fs.readFile(argsLogPath, 'utf8'); + assert.match(logged, /shell\ndumpsys\ninput_method/); + assert.match(logged, /shell\ninput\nkeyevent\n4/); + }, + ); +}); + test('setAndroidSetting permission grant camera uses pm grant', async () => { await withMockedAdb( 'agent-device-android-permission-camera-', diff --git a/src/platforms/android/index.ts b/src/platforms/android/index.ts index 6121f66c2..52177d2b8 100644 --- a/src/platforms/android/index.ts +++ b/src/platforms/android/index.ts @@ -20,6 +20,19 @@ const ALIASES: Record = { const ANDROID_LAUNCHER_CATEGORY = 'android.intent.category.LAUNCHER'; const ANDROID_LEANBACK_CATEGORY = 'android.intent.category.LEANBACK_LAUNCHER'; const ANDROID_DEFAULT_CATEGORY = 'android.intent.category.DEFAULT'; +const ANDROID_INPUT_TYPE_CLASS_MASK = 0x0000000f; +const ANDROID_INPUT_TYPE_CLASS_TEXT = 0x00000001; +const ANDROID_INPUT_TYPE_CLASS_NUMBER = 0x00000002; +const ANDROID_INPUT_TYPE_CLASS_PHONE = 0x00000003; +const ANDROID_INPUT_TYPE_CLASS_DATETIME = 0x00000004; +const ANDROID_INPUT_TYPE_VARIATION_MASK = 0x00000ff0; +const ANDROID_TEXT_VARIATION_EMAIL_ADDRESS = 0x00000020; +const ANDROID_TEXT_VARIATION_WEB_EMAIL_ADDRESS = 0x000000d0; +const ANDROID_TEXT_VARIATION_PASSWORD = 0x00000080; +const ANDROID_TEXT_VARIATION_WEB_PASSWORD = 0x000000e0; +const ANDROID_TEXT_VARIATION_VISIBLE_PASSWORD = 0x00000090; +const ANDROID_KEYBOARD_DISMISS_MAX_ATTEMPTS = 2; +const ANDROID_KEYBOARD_DISMISS_RETRY_DELAY_MS = 120; type AndroidBroadcastPayload = { action?: string; @@ -27,6 +40,14 @@ type AndroidBroadcastPayload = { extras?: Record; }; +type AndroidKeyboardType = 'text' | 'number' | 'email' | 'phone' | 'password' | 'datetime' | 'unknown'; + +export type AndroidKeyboardState = { + visible: boolean; + inputType?: string; + type?: AndroidKeyboardType; +}; + function adbArgs(device: DeviceInfo, args: string[]): string[] { return ['-s', device.id, ...args]; } @@ -185,6 +206,116 @@ export async function getAndroidAppState( return {}; } +export async function getAndroidKeyboardState( + device: DeviceInfo, +): Promise { + const result = await runCmd('adb', adbArgs(device, ['shell', 'dumpsys', 'input_method']), { + allowFailure: true, + }); + if (result.exitCode !== 0) { + throw new AppError('COMMAND_FAILED', 'Failed to query Android keyboard state', { + stdout: result.stdout, + stderr: result.stderr, + exitCode: result.exitCode, + }); + } + return parseAndroidKeyboardState(result.stdout); +} + +export async function dismissAndroidKeyboard(device: DeviceInfo): Promise<{ + attempts: number; + wasVisible: boolean; + dismissed: boolean; + visible: boolean; + inputType?: string; + type?: AndroidKeyboardType; +}> { + const initialState = await getAndroidKeyboardState(device); + let state = initialState; + let attempts = 0; + + while (state.visible && attempts < ANDROID_KEYBOARD_DISMISS_MAX_ATTEMPTS) { + await backAndroid(device); + attempts += 1; + await sleep(ANDROID_KEYBOARD_DISMISS_RETRY_DELAY_MS); + state = await getAndroidKeyboardState(device); + } + + return { + attempts, + wasVisible: initialState.visible, + dismissed: initialState.visible && !state.visible, + visible: state.visible, + inputType: state.inputType, + type: state.type, + }; +} + +function parseAndroidKeyboardState(stdout: string): AndroidKeyboardState { + const visibility = parseAndroidKeyboardVisibility(stdout); + let visible = visibility ?? false; + if (visibility === null) { + const imeWindowVisibility = stdout.match(/\bmImeWindowVis=0x([0-9a-fA-F]+)\b/); + if (imeWindowVisibility?.[1]) { + const flags = Number.parseInt(imeWindowVisibility[1], 16); + if (!Number.isNaN(flags)) { + visible = (flags & 0x1) !== 0; + } + } + } + + const inputTypeMatches = Array.from(stdout.matchAll(/\binputType=0x([0-9a-fA-F]+)\b/gi)); + const lastInputType = inputTypeMatches.length > 0 + ? inputTypeMatches[inputTypeMatches.length - 1]?.[1] + : undefined; + const inputType = lastInputType ? `0x${lastInputType.toLowerCase()}` : undefined; + + return { + visible, + inputType, + type: inputType ? classifyAndroidKeyboardType(inputType) : undefined, + }; +} + +function parseAndroidKeyboardVisibility(stdout: string): boolean | null { + const latestByKey = new Map(); + const pattern = /\b(mInputShown|mIsInputViewShown|isInputViewShown)=([a-zA-Z]+)\b/g; + for (const match of stdout.matchAll(pattern)) { + const key = match[1]; + const value = match[2]?.toLowerCase(); + if (!key || (value !== 'true' && value !== 'false')) continue; + latestByKey.set(key, value === 'true'); + } + if (latestByKey.size === 0) return null; + for (const visible of latestByKey.values()) { + if (visible) return true; + } + return false; +} + +function classifyAndroidKeyboardType(inputType: string): AndroidKeyboardType { + const parsed = Number.parseInt(inputType.replace(/^0x/i, ''), 16); + if (Number.isNaN(parsed)) return 'unknown'; + const inputClass = parsed & ANDROID_INPUT_TYPE_CLASS_MASK; + if (inputClass === ANDROID_INPUT_TYPE_CLASS_NUMBER) return 'number'; + if (inputClass === ANDROID_INPUT_TYPE_CLASS_PHONE) return 'phone'; + if (inputClass === ANDROID_INPUT_TYPE_CLASS_DATETIME) return 'datetime'; + if (inputClass !== ANDROID_INPUT_TYPE_CLASS_TEXT) return 'unknown'; + + const variation = parsed & ANDROID_INPUT_TYPE_VARIATION_MASK; + if (variation === ANDROID_TEXT_VARIATION_EMAIL_ADDRESS || variation === ANDROID_TEXT_VARIATION_WEB_EMAIL_ADDRESS) { + return 'email'; + } + if ( + variation === ANDROID_TEXT_VARIATION_PASSWORD || + variation === ANDROID_TEXT_VARIATION_WEB_PASSWORD || + variation === ANDROID_TEXT_VARIATION_VISIBLE_PASSWORD + ) { + return 'password'; + } + return 'text'; +} + async function readAndroidFocus( device: DeviceInfo, commands: string[][], diff --git a/src/utils/__tests__/args.test.ts b/src/utils/__tests__/args.test.ts index ca26c09c3..b58db205d 100644 --- a/src/utils/__tests__/args.test.ts +++ b/src/utils/__tests__/args.test.ts @@ -79,6 +79,16 @@ test('parseArgs accepts clipboard subcommands', () => { assert.deepEqual(write.positionals, ['write', 'otp', '123456']); }); +test('parseArgs accepts keyboard subcommands', () => { + const status = parseArgs(['keyboard', 'status'], { strictFlags: true }); + assert.equal(status.command, 'keyboard'); + assert.deepEqual(status.positionals, ['status']); + + const dismiss = parseArgs(['keyboard', 'dismiss'], { strictFlags: true }); + assert.equal(dismiss.command, 'keyboard'); + assert.deepEqual(dismiss.positionals, ['dismiss']); +}); + test('parseArgs recognizes --debug alias for verbose mode', () => { const parsed = parseArgs(['open', 'settings', '--debug']); assert.equal(parsed.command, 'open'); @@ -309,6 +319,7 @@ test('usage includes --relaunch flag', () => { assert.match(usage(), /network dump/); assert.match(usage(), /--save-script \[path\]/); assert.match(usage(), /clipboard read \| clipboard write /); + assert.match(usage(), /keyboard \[status\|get\|dismiss\]/); assert.match(usage(), /trigger-app-event \[payloadJson\]/); assert.match(usage(), /pinch \[x\] \[y\]/); assert.match(usage(), /--state-dir /); @@ -476,6 +487,13 @@ test('clipboard command usage is documented', () => { assert.match(help, /Read or write device clipboard text/); }); +test('keyboard command usage is documented', () => { + const help = usageForCommand('keyboard'); + if (help === null) throw new Error('Expected command help text'); + assert.match(help, /keyboard \[status\|get\|dismiss\]/); + assert.match(help, /Inspect Android keyboard visibility\/type or dismiss it/); +}); + test('settings usage documents canonical faceid states', () => { const help = usageForCommand('settings'); if (help === null) throw new Error('Expected command help text'); diff --git a/src/utils/command-schema.ts b/src/utils/command-schema.ts index b8804a955..a0a69ba64 100644 --- a/src/utils/command-schema.ts +++ b/src/utils/command-schema.ts @@ -528,6 +528,12 @@ const COMMAND_SCHEMAS: Record = { allowsExtraPositionals: true, allowedFlags: [], }, + keyboard: { + usageOverride: 'keyboard [status|get|dismiss]', + description: 'Inspect Android keyboard visibility/type or dismiss it', + positionalArgs: ['action?'], + allowedFlags: [], + }, perf: { description: 'Show session performance metrics (startup timing)', positionalArgs: [], diff --git a/website/docs/docs/commands.md b/website/docs/docs/commands.md index 722bc4472..5812c4521 100644 --- a/website/docs/docs/commands.md +++ b/website/docs/docs/commands.md @@ -258,6 +258,19 @@ agent-device clipboard write "" # clear clipboard - Supported on Android emulator/device and iOS simulator. - iOS physical devices currently return `UNSUPPORTED_OPERATION` for clipboard commands. +## Keyboard (Android) + +```bash +agent-device keyboard status +agent-device keyboard get +agent-device keyboard dismiss +``` + +- `keyboard status` (or `keyboard get`) returns keyboard visibility and best-effort input type classification. +- `keyboard dismiss` dismisses keyboard with Android back keyevent only when the keyboard is visible, then confirms hidden state. +- Works with active sessions and explicit selectors (`--platform`, `--device`, `--udid`, `--serial`). +- Supported on Android emulator/device. + ## Performance metrics ```bash diff --git a/website/docs/docs/introduction.md b/website/docs/docs/introduction.md index 1e8b40a6d..b70d966c7 100644 --- a/website/docs/docs/introduction.md +++ b/website/docs/docs/introduction.md @@ -30,7 +30,7 @@ For exploratory QA and bug-hunting workflows, see `skills/dogfood/SKILL.md` in t - Physical devices use runner screenshot capture (`XCUIScreen.main.screenshot()` frames) stitched into MP4, so FPS is best-effort (not guaranteed 60 even with `--fps 60`). - Physical-device recording requires an active app session context (`open ` first). - Physical-device recording defaults to uncapped (max available) FPS and supports `--fps` caps. -- Android supports the same core interaction set, plus `push` notification simulation and `clipboard read/write` via adb shell commands. +- Android supports the same core interaction set, plus `push` notification simulation, `clipboard read/write`, and `keyboard status|get|dismiss` via adb shell commands. - App-event triggers are available on iOS and Android through app-defined deep-link hooks (`trigger-app-event`), using active session context or explicit device selectors. ## Architecture (high level)