Skip to content

fix: improve Android text entry stability#540

Merged
thymikee merged 7 commits into
mainfrom
fix/android-text-entry-provider
May 15, 2026
Merged

fix: improve Android text entry stability#540
thymikee merged 7 commits into
mainfrom
fix/android-text-entry-provider

Conversation

@thymikee
Copy link
Copy Markdown
Member

@thymikee thymikee commented May 14, 2026

Summary

  • make Android IME-owned input a fast terminal failure and add provider-native text injection support
  • replace Android text fallback with chunk-safe ASCII shell input from the first attempt; no raw adb, clipboard, paste, speculative helper fallback, or provider result object is documented as an agent path
  • require settled Android fill verification so React Native controlled inputs cannot report success for near-complete prefixes like filed the expens
  • keep keyboard next-action guidance derived at the CLI/error boundary instead of storing it in Android keyboard state
  • treat active IME package from dumpsys as the durable input-ownership signal; use known IME packages only as a diagnosed fallback when dumpsys omits the active method
  • add richer keyboard status, helper snapshot diagnostics, and clearer empty interactive snapshot guidance
  • default app discovery to user-installed apps, ignore Android package-query metadata rows, and fix Android apps/runtime import paths in the packaged daemon
  • stabilize Metro companion tunnel test cleanup, debug log tailing, and refreshed Fallow baselines for the changed files

Closes

Related

Validation

  • /bin/zsh -lc 'PATH=/Users/thymikee/.nvm/versions/node/v24.13.0/bin:$PATH ./node_modules/.bin/vitest run'
  • /bin/zsh -lc 'PATH=/Users/thymikee/.nvm/versions/node/v24.13.0/bin:$PATH ./node_modules/.bin/vitest run src/platforms/android/tests/index.test.ts src/platforms/android/tests/input-actions-fill.test.ts src/tests/cli-client-commands.test.ts'
  • /bin/zsh -lc 'PATH=/Users/thymikee/.nvm/versions/node/v24.13.0/bin:$PATH ./node_modules/.bin/tsc -p tsconfig.json'
  • /bin/zsh -lc 'PATH=/Users/thymikee/.nvm/versions/node/v24.13.0/bin:$PATH ./node_modules/.bin/oxlint . --deny-warnings'
  • /bin/zsh -lc 'PATH=/Users/thymikee/.nvm/versions/node/v24.13.0/bin:$PATH ./node_modules/.bin/rslib build'
  • /bin/zsh -lc 'PATH=/Users/thymikee/.nvm/versions/node/v24.13.0/bin:$PATH ./node_modules/.bin/fallow audit --changed-since 1987198 --summary'
  • node --test test/integration/smoke-*.test.ts
  • git diff --check
  • live Android test app: opened Expo test app on port 8082, filled checkout form name field with "filed the expense", and verified the snapshot showed the complete text
  • live Android Expensify full flow on com.expensify.chat.dev: opened Adam Horodyski chat, created a camera-scan expense from the composer action sheet, verified Outstanding increased from 21 to 22 expenses, sent filed the expense, captured React DevTools profile and RedBox artifact
  • live Android Expensify post-fix text-entry check: rebuilt/restarted daemon, filled composer with filed the expense, verified snapshot showed the complete text, then cleared the unsent validation text

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 14, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://callstackincubator.github.io/agent-device/pr-preview/pr-540/

Built to branch gh-pages at 2026-05-15 09:20 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@thymikee thymikee force-pushed the fix/android-text-entry-provider branch 2 times, most recently from 114856b to 1e9ddca Compare May 15, 2026 06:58
Copy link
Copy Markdown
Member Author

Architecture Review

Read the PR description, file list, and the patches for the architecturally significant files (input-actions.ts, new input-ownership.ts, device-input-state.ts, fill-verification.ts, adb-executor.ts, snapshot.ts, client-types.ts, core/dispatch.ts, cli/commands/client-command.ts), plus context from Interactor and the iOS runner to compare patterns. Did not read every test file or the Fallow baselines — they're mechanical.

Direction overall

Solid. The PR replaces a guessing-game (clipboard-paste fallback, speculative IME detection) with a fast-fail, structured-error contract plus an explicit provider hook for native injection. That's the right direction for an agent-driven tool: deterministic failure beats silent half-success.

What's well done

  • input-ownership.ts extracted as a shared module; both fill-verification and device-input-state now share one classifier.
  • Fast-fail KEYBOARD_OVERLAY_BLOCKING with failureReason: 'ime_capture' gives the agent a structured signal instead of letting it type into Gboard suggestions.
  • nextAction natural-language hint on keyboard status is pragmatic — the consumer is an LLM.
  • emitDiagnostic timers on snapshot helper resolution/install/capture make the slow path observable.
  • ADB-shell restricted to printable ASCII + \n with a clean assertAndroidShellTextSupported boundary — no more guessing whether input text will survive.

Concerns worth addressing

  1. Wrong layer for AndroidTextInjector. It's mounted on AndroidAdbProvider, but native Android text injection is fundamentally an accessibility-service / instrumentation-helper concern, not an adb-transport concern. iOS already routes type/fill through RunnerCommand on the runner protocol. Long term you'll likely want one cross-platform "text-injection backend" on the Interactor/runner layer rather than a per-platform side-channel on the transport provider. Right now it works because the tunnel happens to wrap adb, but the coupling is conceptually misplaced and will get awkward when a non-adb backend appears (scrcpy, Vysor, dedicated companion).

  2. Hardcoded IME allowlist will rot. input-ownership.ts hardcodes Gboard/Honeyboard/SwiftKey package names and a substring check on 'inputmethod'. You already parse mCurMethodId from dumpsys and pass it in as activeInputMethodPackage — that's the durable signal. Make the allowlist a last-resort fallback when activeInputMethodPackage is missing, and emit a diagnostic when it's used so unknown IMEs surface in telemetry. Also: normalizedPackageName.includes('inputmethod') will false-positive on legitimate apps containing that substring.

  3. nextAction is prose baked into platform code. It mixes presentation with data and isn't testable as a contract. Consider a discriminated union:

    nextAction: { kind: 'dismiss-ime' | 'focus-field' | 'proceed'; message: string }

    so consumers can switch on kind while prose evolves freely.

  4. KeyboardCommandResult is becoming an Android grab-bag. inputMethodPackage, focusedPackage, focusedResourceId, inputOwner, nextAction are all Android-only optionals on a shared type. The CLI already discriminates on platform === 'android'. A discriminated union ({ platform: 'android'; android: {...} } | { platform: 'ios'; ios: {...} }) keeps the shared type honest.

  5. AndroidTextInjectionResult is too thin. Returns { backend, textLength } only. The shell path has rich verification with reasons/attempts; the provider path opaquely "did it." Future-proof with optional structured outcome: method?: 'commit' | 'setText' | 'paste', attempts?, durationMs? — same instrumentation surface across backends.

  6. assertAndroidShellInputIsAppOwned swallows dumpsys failures silently (try { … } catch { return; }). That's exactly when IME-capture might be invisibly breaking us. At minimum emitDiagnostic({ level: 'warn', phase: 'input_ownership_probe_failed' }) so flakes are debuggable.

  7. Provider fill path skips the multi-strategy retry loop that the shell path uses (input_textchunked_input with re-clear). If the provider has a transient focus-loss, there's no second chance. Consider restructuring fillAndroid so the verify/retry loop is the outer shell and "provider", "input_text", "chunked_input" are strategies inside it.

  8. emitAndroidTextDiagnostic for shell fills doesn't include the strategy (input_text vs chunked_input). Telemetry can't tell which strategy is winning.

  9. parseAndroidLaunchablePackages contract changed (now requires / and .) — bundled into a "text entry stability" PR. The change is probably fine, but it deserves a dedicated test asserting the intent and a line in the PR body so reviewers don't miss it.

Forward-looking suggestion

The cleanest evolution would be:

  • Promote text injection to Interactor-level concept (parallel to how iOS does it via RunnerCommand), with AndroidTextInjector registered through the companion/runner protocol, not the adb provider.
  • Treat input-ownership classification as signal-first (active IME package from dumpsys) with the hardcoded list as a fallback.
  • Promote nextAction to a discriminated union before agents bake the current prose into prompts.

None of these are blockers — they're "do this before the next person adds a third backend" cleanups. The current PR is a real improvement; ship it, then refactor on top.


Generated by Claude Code

@thymikee
Copy link
Copy Markdown
Member Author

Addressed the review follow-ups in d49a3513:

  • made active mCurMethodId the primary IME ownership signal, moved known IME packages to last-resort fallback only, removed the broad inputmethod substring match, and added a diagnostic when the fallback is used
  • added a diagnostic when assertAndroidShellInputIsAppOwned cannot probe dumpsys state instead of swallowing the failure silently
  • added parser coverage for parseAndroidLaunchablePackages ignoring cmd package query metadata rows, and called that out in the PR body
  • added regression coverage for the fallback IME diagnostic and the removed inputmethod false positive

Already covered by the previous cleanup commit on this PR:

  • removed AndroidTextInjectionResult rather than growing a thin provider result object
  • moved keyboard nextAction prose out of Android keyboard state and into CLI/error presentation
  • removed named shell pseudo-strategies; shell text entry is now chunk-safe from the first attempt, so there is no separate input_text versus chunked_input strategy to report

Left as follow-up design work rather than expanding this PR:

  • moving Android provider text injection from the adb provider seam to a future interactor/runner-level backend
  • reshaping KeyboardCommandResult into a discriminated platform union
  • provider-level retry orchestration around transient native helper failures

@thymikee thymikee force-pushed the fix/android-text-entry-provider branch from 1f6372b to e562273 Compare May 15, 2026 09:20
@thymikee thymikee merged commit 7e14dec into main May 15, 2026
18 checks passed
@thymikee thymikee deleted the fix/android-text-entry-provider branch May 15, 2026 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant