Skip to content

[NoQA] Add Android emulator smoke workflow (agent-device · Phase 0)#89896

Draft
rustam-callstack wants to merge 6 commits into
Expensify:mainfrom
rustam-callstack:feat/agent-device-smoke-android-phase0
Draft

[NoQA] Add Android emulator smoke workflow (agent-device · Phase 0)#89896
rustam-callstack wants to merge 6 commits into
Expensify:mainfrom
rustam-callstack:feat/agent-device-smoke-android-phase0

Conversation

@rustam-callstack
Copy link
Copy Markdown

@rustam-callstack rustam-callstack commented May 7, 2026

Explanation of Change

Adds a Phase-0 build-health canary that runs on every PR + on manual dispatch:

  1. Pulls the developmentDebug APK from Rock's S3 cache (npx rock build:android --variant developmentDebug) — falls back to a local Gradle build if the fingerprint misses.
  2. Boots a Pixel 8 / API 35 / google_apis / x86_64 emulator on blacksmith-4vcpu-ubuntu-2404 via reactivecircus/android-emulator-runner@v2 with the standard two-stage AVD-cache pattern.
  3. Inside the emulator's script: block, the new .github/scripts/agent-device-smoke.sh:
    • tees adb logcat into artifacts/logcat.txt
    • adb installs the APK and adb reverse tcp:8081 tcp:8081
    • brings up Metro and gates on /status reaching packager-status:running
    • launches via agent-device open … --relaunch (pinned to agent-device@0.14.7)
    • captures a screenshot, accessibility-tree snapshot, foreground-app dump, and a short cold-start MP4
    • fails the run if the app is not in the foreground after launch (catches install-but-immediate-crash regressions).
  4. Uploads artifacts/ with if: always() so failures are debuggable.

Phase 0 explicitly does not log in or run a tab tour — the magic-code 6-cell input rejects all programmatic input we tested locally (agent-device fill, adb shell input text, paste). The full tab-tour smoke is deferred to a separate Phase 1 change which would add a build-variant + Onyx-state seed delivery + 1Password rotation. This PR is the workflow harness only.

Cost guards

  • concurrency.cancel-in-progress keyed by ref: newer pushes to the same PR cancel older smoke runs
  • timeout-minutes: 35 — hard ceiling
  • paths-ignore for docs/**, help/**, .github/**, contributingGuides/**, tests/**, **.md, **.sh (matches reassurePerformanceTests.yml)
  • if: github.actor != 'OSBotify' (matches reassurePerformanceTests.yml)

Required secrets

All already used by other workflows in this repo: OS_BOTIFY_TOKEN, MAPBOX_SDK_DOWNLOAD_TOKEN, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY.

Fixed Issues

$
PROPOSAL:

No tracking issue exists for this work yet — opening as a draft so it can be discussed before going through formal Bug Zero / proposal flow. Happy to move it to ready-for-review and link an issue once direction is confirmed.

Tests

This PR introduces a CI workflow only — no runtime app code changes. Verification is exercising the workflow itself.

  1. From this branch, run Actions → Android Smoke (agent-device · Phase 0) → Run workflow (workflow_dispatch).
  2. Confirm the run lands on blacksmith-4vcpu-ubuntu-2404.
  3. Confirm the Verify KVM step succeeds (/dev/kvm exists and is writable).
  4. Confirm the Build / fetch developmentDebug APK via Rock step logs Downloaded cached build from S3 (…) and that an APK lands at android/app/build/outputs/apk/development/debug/*.apk.
  5. First run only: confirm Prime AVD snapshot completes in ≤ 5 min (cache miss path); subsequent runs hit the AVD cache and skip this step.
  6. Confirm the Run smoke step writes:
    • artifacts/landing.png showing the Expensify SignIn screen
    • artifacts/snapshot.txt containing an accessibility tree with refs like @e<n> [text-field] \"Phone or email\" [editable]
    • artifacts/appstate.txt containing Foreground app: com.expensify.chat.dev
    • artifacts/logcat.txt (non-empty)
    • artifacts/cold-start.mp4 (≥ 1 KB)
  7. Confirm the artifact smoke-android-<run_id>-<run_attempt> uploads successfully (visible in the run summary).
  8. Open a no-op PR (e.g. a whitespace change in src/) and confirm the workflow auto-fires.
  9. Push a follow-up commit to the same PR before the run completes; confirm cancel-in-progress cancels the prior run.

Failure-mode probe (optional):

  • Toggle emulator-options: -accel off to force the boot to fail; confirm timeout-minutes kicks in and the Upload artifacts step still runs (if: always()).

  • Verify that no errors appear in the JS console

Offline tests

N/A — this PR adds CI infrastructure only. No app runtime is changed.

QA Steps

N/A — files under .github/ are not shipped to staging or production.

  • Verify that no errors appear in the JS console

PR Author Checklist

  • I linked the correct issue in the `### Fixed Issues` section above

    No tracking issue exists yet; opened as a draft to discuss direction first.

  • I wrote clear testing steps that cover the changes made in this PR

    • I added steps for local testing in the `Tests` section
    • I added steps for the expected offline behavior in the `Offline steps` section
    • I added steps for Staging and/or Production testing in the `QA steps` section
    • I added steps to cover failure scenarios
  • I included screenshots or videos for tests on all platforms — N/A, CI workflow change only

  • I ran the tests on all platforms — N/A, CI workflow change only:

    • Android: Native — N/A
    • Android: mWeb Chrome — N/A
    • iOS: Native — N/A
    • iOS: mWeb Safari — N/A
    • MacOS: Chrome / Safari — N/A
  • I verified there are no console errors

  • I followed proper code patterns

  • I followed the guidelines as stated in the Review Guidelines

  • I tested other components that can be impacted by my changes — N/A

  • I verified all code is DRY

  • I verified any variables that can be defined as constants are

  • I verified that if a function's arguments changed that all usages have also been updated correctly

  • If any new file was added I verified that:

    • The file has a description of what it does and/or why is needed at the top of the file if the code is not self explanatory

Screenshots/Videos

Android: Native

N/A — CI workflow change only. The workflow itself produces landing.png + cold-start.mp4 artifacts that demonstrate end-to-end behavior; those will be attached after the first successful run from this branch.

Android: mWeb Chrome

N/A

iOS: Native

N/A — Phase 0 is Android only (see Phase 1 plan in the workflow file's header comment).

iOS: mWeb Safari

N/A

MacOS: Chrome / Safari

N/A

Phase-0 build-health canary: pulls the developmentDebug APK from Rock's
S3 cache, boots a Pixel 8 / API 35 emulator on a Blacksmith Docker
runner, installs the APK, launches via agent-device, and uploads a
landing-screen screenshot, accessibility tree, foreground-app dump,
logcat, and a short cold-start MP4. Does NOT log in or run a tab
tour — that's Phase 1, deferred until an Onyx-seed login bypass
exists.

Triggered on PR open/synchronize (with the same paths-ignore /
branches-ignore / OSBotify-skip guards as reassurePerformanceTests.yml)
and via workflow_dispatch.

Three cost guards: concurrency cancel-in-progress keyed by ref,
35-min timeout, paths-ignore for docs/tests/.github/.sh.

Files:
- .github/workflows/smokeAndroid.yml
- .github/scripts/agent-device-smoke.sh
Rustam Zeinalov added 4 commits May 7, 2026 15:22
Replaces the bare sleep 8 (which captured the splash on slower runners)
with a snapshot poll up to 120s waiting for the email text-field to
appear. After it lands:

- screenshot landing.png + snapshot-signin.txt
- agent-device fill 'label="Phone or email"' rustam.zeinalov@callstack.com
- agent-device press 'label="Continue"'
- snapshot poll up to 30s waiting for the magic-code field
- screenshot post-continue.png + snapshot-magic-code.txt

Hard fails on either timeout. The flow stops short of typing the magic
code itself: that 6-cell composite input rejects every programmatic
input we tested. Solving it is Phase 1 (Onyx-seed login bypass).
Hermes-engine's configureCMakeRelease[x86_64] task pins CMake 3.30.5
exactly. Pre-installed runner images don't always ship it (GitHub bumps
ubuntu-latest periodically; Blacksmith image contents may also shift).
If Rock's S3 fingerprint misses and we fall back to local Gradle, the
build dies with [CXX1300] CMake '3.30.5' was not found.

sdkmanager --install "cmake;3.30.5" lands the binaries under
$ANDROID_HOME/cmake/3.30.5/bin where Gradle auto-detects them.
Idempotent on cached runners; ~30s overhead on cold.
Free ubuntu-latest is significantly slower than the 4-core Blacksmith
runner. The 120s budget covers it on the upstream path but leaves the
fork-test capturing only the splash screen at timeout (verified in
artifacts: green E logo, no SignIn UI).

Also bumped post-Continue timeout to 60s for symmetry; the magic-code
screen typically appears within 5s but a slow runner could need more.
Last run hit a Diagnostic at fill, because Android's text-field label is
"Phone or email," (with trailing comma) while iOS has no comma, and
agent-device's selector form does exact-match. Cross-platform fix:
grep the @e ref out of the snapshot we already captured (refs are
stable inside one session as long as we don't re-snapshot in between)
and act on it directly.

Also bumped SIGNIN_LOAD_TIMEOUT 300 -> 360. Last run reached SignIn at
291s, dangerously close to the previous 300s ceiling.
Bash has no native block-comment, but `: <<'COMMENT' ... COMMENT` is
the canonical idiom (ShellCheck recognizes it). Mirrors the TS PR's
recent comment-style change for consistency. Single-line `#` comments
left as-is — no clean equivalent. Shebang and ShellCheck directives
preserved. `bash -n` syntax check still passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant