Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ Full module documentation: [hexdocs.pm/mob](https://hexdocs.pm/mob).

## [Unreleased]

### Added
- **Element positions without a screenshot.** `element_frames/0` NIF surfaced as `Mob.Test.element_frames/1` (`%{id => {x,y,w,h}}`), `frame/2`, and `tap_id/2` (drive by id at real coordinates). Any rendered node given an `:id` reports its live on-screen frame (logical points iOS / dp Android) to a registry the agent reads over dist — a compact structured map instead of image bytes, with no accessibility activation. The renderer also sets the `:id` as the element's accessibility identifier (iOS `accessibilityIdentifier`, Android Compose `testTag`), so the same tags are visible to XCUITest/Espresso. Opt-in per element: untagged nodes cost nothing (the tracking modifier only attaches when an `:id` is present). iOS records the full element frame via a `GeometryReader` background; Android via `Modifier.onGloballyPositioned`. Verified on iOS sim, Android device, and a physical iPhone. The Android Kotlin side lives in the `mob_new` `MobBridge.kt.eex` template.
- **In-process screenshot + scroll control over dist (no adb/xcrun).** Three test-harness NIFs (`screenshot/3`, `scroll_info/1`, `scroll_to/3`) surfaced as `Mob.Test.screenshot/2`, `scroll_info/2`, `scroll_to/4`, and `screenshot_tour/3`. A remotely-connected agent gets pixels and deterministic scroll entirely over Erlang distribution — the capability Sloppy Joe and WireTap need to drive a device an agent can only reach over dist. Capture is in-process (iOS `UIGraphicsImageRenderer` + `drawViewHierarchy`; Android `PixelCopy` against the activity window). Scroll views are addressed by their `:id` prop; `scroll_info` reports `kind: :pixel` (iOS `UIScrollView`, Android `verticalScroll`) or `:index` (Android `LazyColumn`, where y is an item index and viewport is the visible-item count). Captures the app's own surface only — `FLAG_SECURE`/secure fields render blank, and a backgrounded app returns `{:error, :no_window}`. The Android Kotlin side (`screenshot`/`scrollInfo`/`scrollTo`) lives in the `mob_new` `MobBridge.kt.eex` template; existing apps pick it up on regeneration. Debug-only (iOS `#if !MOB_RELEASE`). See `decisions/2026-05-29-bridge-nif-screenshot-scroll.md`.

### Changed
- **`Mob.Bt` extracted to standalone `mob_bluetooth` plugin.** See `plugin_extraction_plan.md` Wave 1. Session A moved the Elixir wrappers (`Mob.Bt`, `Mob.Bt.Hfp`, `Mob.Bt.Hid`, `Mob.Bt.Spp`) out of core into a separate repo as `MobBluetooth.*`; the Zig NIF (`android/jni/mob_nif.zig`) and the iOS stubs (`ios/mob_nif.m`) stay here until Session B promotes the plugin to tier-1. Apps that used `Mob.Bt.*` should add `{:mob_bluetooth, path: "..."}` and rename their references to `MobBluetooth.*` — there is intentionally no compatibility shim.

Expand Down
137 changes: 137 additions & 0 deletions android/jni/mob_nif.zig
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,10 @@ pub const BridgeMethods = extern struct {
clear_text: jni.JMethodID = null,
long_press_xy: jni.JMethodID = null,
swipe_xy: jni.JMethodID = null,
screenshot: jni.JMethodID = null,
scroll_info: jni.JMethodID = null,
scroll_to: jni.JMethodID = null,
element_frames: jni.JMethodID = null,
// ── Mob.Peripheral.VendorUsb ─────────────────────────────────────────
// Each takes a pid as jlong (so Kotlin can echo it back when calling
// mob_deliver_vendor_usb_*) plus the operation's typed payload.
Expand Down Expand Up @@ -590,6 +594,131 @@ export fn nif_screen_info(
return erts.makeMap(env, &keys, &vvals) orelse erts.atom(env, "error");
}

// ── In-process screenshot + scroll control (agent driving over dist) ─────────
//
// Mirrors the iOS NIFs. These delegate to MobBridge (PixelCopy for capture,
// Compose scroll state for scroll) so a remotely-connected agent gets pixels +
// deterministic scroll with no adb/xcrun. Bridge methods are optional: apps
// generated before these existed return {:error, :not_loaded}.

// Copy an id binary into a NUL-terminated buffer; returns the C string and an
// optional heap pointer the caller must free. Mirrors the nif_tap/type_text idiom.
const IdBuf = struct { cstr: [*:0]const u8, heap: ?*anyopaque };

fn idCString(bin: erts.ErlNifBinary, stack_buf: []u8) ?IdBuf {
const use_heap = bin.size + 1 > stack_buf.len;
const heap_buf: ?*anyopaque = if (use_heap) jni.malloc(bin.size + 1) else null;
if (use_heap and heap_buf == null) return null;
const buf_ptr: [*]u8 = if (use_heap) @ptrCast(heap_buf) else stack_buf.ptr;
@memcpy(buf_ptr[0..bin.size], bin.data[0..bin.size]);
buf_ptr[bin.size] = 0;
return .{ .cstr = @ptrCast(buf_ptr), .heap = heap_buf };
}

// nif_screenshot/3 — capture the activity window; returns PNG/JPEG bytes.
export fn nif_screenshot(
env: ?*erts.ErlNifEnv,
argc: c_int,
argv: [*]const erts.ERL_NIF_TERM,
) callconv(.c) erts.ERL_NIF_TERM {
_ = argc;
if (Bridge.screenshot == null) return notLoaded(env);

var fmt: [8]u8 = @splat(0);
if (erts.enif_get_atom(env, argv[0], &fmt, fmt.len, erts.ERL_NIF_LATIN1) == 0)
return erts.badarg(env);
var quality: c_int = 90;
_ = erts.enif_get_int(env, argv[1], &quality);
const scale = erts.getNumber(env, argv[2]) orelse 1.0;
const fmt_cstr: [*:0]const u8 = @ptrCast(&fmt);

var attached: c_int = 0;
const jenv = get_jenv(&attached) orelse return erts.atom(env, "error");
defer detachIfAttached(attached);

const jfmt = jni.newStringUTF(jenv, fmt_cstr);
const jbytes = jenv.*.CallStaticObjectMethod.?(jenv, Bridge.cls, Bridge.screenshot, jfmt, @as(jni.JInt, @intCast(quality)), @as(f64, scale));
jni.deleteLocalRef(jenv, jfmt);
if (jbytes == null) return errorAtom(env, "no_window");

const len = jni.getArrayLength(jenv, jbytes);
var bin: erts.ErlNifBinary = undefined;
_ = erts.enif_alloc_binary(@intCast(len), &bin);
if (len > 0) jni.getByteArrayRegion(jenv, jbytes, 0, len, @ptrCast(bin.data));
jni.deleteLocalRef(jenv, jbytes);
return erts.enif_make_binary(env, &bin);
}

// nif_scroll_info/1 — read a scroll view's offset/extent (JSON string by :id).
export fn nif_scroll_info(
env: ?*erts.ErlNifEnv,
argc: c_int,
argv: [*]const erts.ERL_NIF_TERM,
) callconv(.c) erts.ERL_NIF_TERM {
_ = argc;
if (Bridge.scroll_info == null) return notLoaded(env);
var bin: erts.ErlNifBinary = undefined;
if (erts.enif_inspect_binary(env, argv[0], &bin) == 0) return erts.badarg(env);

var stack_buf: [256]u8 = undefined;
const id = idCString(bin, &stack_buf) orelse return erts.atom(env, "error");
defer if (id.heap) |h| jni.free(h);

var attached: c_int = 0;
const jenv = get_jenv(&attached) orelse return erts.atom(env, "error");
defer detachIfAttached(attached);

const jid = jni.newStringUTF(jenv, id.cstr);
const jresult = jenv.*.CallStaticObjectMethod.?(jenv, Bridge.cls, Bridge.scroll_info, jid);
jni.deleteLocalRef(jenv, jid);
if (jresult == null) return errorAtom(env, "scroll_view_not_found");
return jstringToBin(env, jenv, jresult); // releases jresult
}

// nif_scroll_to/3 — scroll a view (by :id) to absolute (x, y).
export fn nif_scroll_to(
env: ?*erts.ErlNifEnv,
argc: c_int,
argv: [*]const erts.ERL_NIF_TERM,
) callconv(.c) erts.ERL_NIF_TERM {
_ = argc;
if (Bridge.scroll_to == null) return notLoaded(env);
var bin: erts.ErlNifBinary = undefined;
if (erts.enif_inspect_binary(env, argv[0], &bin) == 0) return erts.badarg(env);
const x = erts.getNumber(env, argv[1]) orelse return erts.badarg(env);
const y = erts.getNumber(env, argv[2]) orelse return erts.badarg(env);

var stack_buf: [256]u8 = undefined;
const id = idCString(bin, &stack_buf) orelse return erts.atom(env, "error");
defer if (id.heap) |h| jni.free(h);

var attached: c_int = 0;
const jenv = get_jenv(&attached) orelse return erts.atom(env, "error");
defer detachIfAttached(attached);

const jid = jni.newStringUTF(jenv, id.cstr);
const ok = jenv.*.CallStaticBooleanMethod.?(jenv, Bridge.cls, Bridge.scroll_to, jid, @as(f64, x), @as(f64, y));
jni.deleteLocalRef(jenv, jid);
return if (ok != 0) erts.ok(env) else errorAtom(env, "scroll_view_not_found");
}

// nif_element_frames/0 — JSON {id:[x,y,w,h],...} of tagged element frames (dp).
export fn nif_element_frames(
env: ?*erts.ErlNifEnv,
argc: c_int,
argv: [*]const erts.ERL_NIF_TERM,
) callconv(.c) erts.ERL_NIF_TERM {
_ = argc;
_ = argv;
if (Bridge.element_frames == null) return notLoaded(env);
var attached: c_int = 0;
const jenv = get_jenv(&attached) orelse return erts.atom(env, "error");
const jresult = jenv.*.CallStaticObjectMethod.?(jenv, Bridge.cls, Bridge.element_frames);
const result = jstringToBin(env, jenv, jresult);
detachIfAttached(attached);
return result;
}

// nif_ax_action/2 + nif_ax_action_at_xy/3 — Android stubs.
//
// Both are iOS-only today. Compose semantics walker (the proper Android
Expand Down Expand Up @@ -4886,6 +5015,10 @@ fn nifLoad(env: ?*erts.ErlNifEnv, priv: *?*anyopaque, info: erts.ERL_NIF_TERM) c
cacheOptional(jenv, "uiTree", "()Ljava/lang/String;", &Bridge.ui_tree);
cacheOptional(jenv, "uiViewTree", "()Ljava/lang/String;", &Bridge.ui_view_tree);
cacheOptional(jenv, "screenInfo", "()[F", &Bridge.screen_info);
cacheOptional(jenv, "screenshot", "(Ljava/lang/String;ID)[B", &Bridge.screenshot);
cacheOptional(jenv, "scrollInfo", "(Ljava/lang/String;)Ljava/lang/String;", &Bridge.scroll_info);
cacheOptional(jenv, "scrollTo", "(Ljava/lang/String;DD)Z", &Bridge.scroll_to);
cacheOptional(jenv, "elementFrames", "()Ljava/lang/String;", &Bridge.element_frames);
cacheOptional(jenv, "tapXy", "(FF)Z", &Bridge.tap_xy);
cacheOptional(jenv, "tapByLabel", "(Ljava/lang/String;)Z", &Bridge.tap_by_label);
cacheOptional(jenv, "typeText", "(Ljava/lang/String;)Z", &Bridge.type_text);
Expand Down Expand Up @@ -4921,6 +5054,10 @@ const nif_funcs = [_]erts.ErlNifFunc{
.{ .name = "clear_text", .arity = 0, .fptr = nif_clear_text, .flags = 0 },
.{ .name = "long_press_xy", .arity = 3, .fptr = nif_long_press_xy, .flags = 0 },
.{ .name = "swipe_xy", .arity = 4, .fptr = nif_swipe_xy, .flags = 0 },
.{ .name = "screenshot", .arity = 3, .fptr = nif_screenshot, .flags = erts.ERL_NIF_DIRTY_JOB_CPU_BOUND },
.{ .name = "scroll_info", .arity = 1, .fptr = nif_scroll_info, .flags = 0 },
.{ .name = "scroll_to", .arity = 3, .fptr = nif_scroll_to, .flags = 0 },
.{ .name = "element_frames", .arity = 0, .fptr = nif_element_frames, .flags = erts.ERL_NIF_DIRTY_JOB_CPU_BOUND },
// Core mob functions.
.{ .name = "platform", .arity = 0, .fptr = nif_platform, .flags = 0 },
.{ .name = "color_scheme", .arity = 0, .fptr = nif_color_scheme, .flags = 0 },
Expand Down
7 changes: 6 additions & 1 deletion android/jni/mob_zig.zig
Original file line number Diff line number Diff line change
Expand Up @@ -511,7 +511,8 @@ pub const JNINativeInterface = extern struct {
ReleaseFloatArrayElements: ?*anyopaque,
ReleaseDoubleArrayElements: ?*anyopaque,
GetBooleanArrayRegion: ?*anyopaque,
GetByteArrayRegion: ?*anyopaque,
// Typed (used by nif_screenshot to read a Kotlin byte[] into a binary).
GetByteArrayRegion: ?*const fn (env: *JNIEnv, arr: JByteArray, start: JInt, len: JInt, buf: [*]JByte) callconv(.c) void,
GetCharArrayRegion: ?*anyopaque,
GetShortArrayRegion: ?*anyopaque,
GetIntArrayRegion: ?*anyopaque,
Expand Down Expand Up @@ -618,6 +619,10 @@ pub inline fn getFloatArrayRegion(env: *JNIEnv, arr: JObject, start: JInt, len:
env.*.GetFloatArrayRegion.?(env, arr, start, len, buf);
}

pub inline fn getByteArrayRegion(env: *JNIEnv, arr: JByteArray, start: JInt, len: JInt, buf: [*]JByte) void {
env.*.GetByteArrayRegion.?(env, arr, start, len, buf);
}

pub inline fn newByteArray(env: *JNIEnv, len: JSize) JByteArray {
return env.*.NewByteArray.?(env, len);
}
Expand Down
69 changes: 69 additions & 0 deletions decisions/2026-05-29-bridge-nif-screenshot-scroll.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# In-process screenshot + scroll control via the bridge NIF

- Date: 2026-05-29
- Status: accepted

## Context

`Mob.Test` already drives Mob apps fully over Erlang distribution (state reads,
taps, navigation, synthetic touches) with no adb/xcrun. The one remaining hard
dependency on external device tooling was the *observe-visually* half of the
agent loop: `PLAN.md`'s Layer 5 (Visual) is "MCP, external" — screenshots came
only from `xcrun simctl io` / `adb screencap`. There was also no way over dist
to read a scroll view's offset/extent or command it to a position (only the
imprecise `swipe_xy` and iOS-AX-only `ax_action :scroll_*`).

This blocks Sloppy Joe and WireTap, which must be programmable by a remote agent
that can only reach the device over dist. The agent needs eyes and deterministic
scroll through the bridge NIF itself.

## Decision

Add three test-harness NIFs, surfaced on `Mob.Test`:

- `screenshot/3` (format, quality, scale) → PNG/JPEG bytes, returned over dist.
- `scroll_info/1` (id) → flat JSON `{offset,content,viewport,max,kind}`.
- `scroll_to/3` (id, x, y) → absolute offset (clamped by the Elixir wrapper).

`Mob.Test` adds `screenshot/2`, `scroll_info/2`, `scroll_to/4`, and
`screenshot_tour/3` (page top→bottom, capture each). Target resolution
(`:top`/`:bottom`/`{:page,n}`/`{x,y}`) and the tour paging are pure, unit-tested
helpers; the NIF stays a dumb absolute setter.

Scroll views are addressed by their `:id` prop:

- **iOS**: the SwiftUI renderer applies `node.nativeViewId` as the scroll view's
`accessibilityIdentifier`; the NIF walks `UIScrollView`s and matches it. In
practice SwiftUI does **not** reliably propagate `.accessibilityIdentifier` onto
the backing `UIScrollView` (verified on-device 2026-05-29), so the NIF falls back
to the largest scroll view (the main content scroller) when an explicit id does
not match — correct for the common one-scroll-per-screen case. Pixel units.
- **Android**: the Compose renderer registers each `:scroll`/lazy-list state in an
id-keyed registry in `MobBridge` (with the measured viewport for `ScrollState`,
which doesn't expose it). `kind` is `"pixel"` for `verticalScroll`/`ScrollState`
and `"index"` for `LazyColumn`/`LazyListState` (y is an item index, viewport is
the visible-item count). The `kind` field makes the asymmetry explicit so paging
stays coherent in either unit.

Capture is in-process: iOS `UIGraphicsImageRenderer` + `drawViewHierarchy`;
Android `PixelCopy` against the activity window (decor-view `draw` fallback
pre-API-26). Both are debug-only harness code (iOS `#if !MOB_RELEASE`).

This is core test-harness work (same bucket as `ui_tree`/`tap_xy`), not a
plugin-shaped feature, so it lands under the current plugin-first hold.

## Consequences

- A remote agent gets pixels + deterministic scroll with zero adb/xcrun — the
capability `wiretap_screenshot` will build on.
- Capture is the app's own surface only; `FLAG_SECURE` (Android) and secure text
fields (iOS) render blank, and a backgrounded app has no window (returns
`{:error, :no_window}` / not_found).
- Cross-repo: the Android side spans `mob` (Zig NIF) and the `mob_new`
`MobBridge.kt.eex` template; existing apps pick it up on regeneration or a
manual `MobBridge.kt` patch.
- `:scroll` (ScrollState) is not persisted across BEAM re-renders the way lists
are; the registry holds the live state, which is current during a scroll→shot
tour. Persisting it by id is a possible follow-up.
- The Compose-semantics walker for arbitrary (non-Mob) apps remains deferred to
WireTap (see `future_developments.md`); this change covers Mob-rendered apps.
6 changes: 6 additions & 0 deletions ios/MobDemo-Bridging-Header.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,9 @@ void mob_send_component_event(int handle, const char *event, const char *payload
// the OS appearance toggles (light/dark). Dispatches to Mob.Device subscribers.
// `scheme` is "light" or "dark".
void mob_notify_color_scheme(const char *scheme);

// Called from MobFrameTracker (SwiftUI) as a tagged element lays out, recording
// its on-screen frame (logical points) keyed by the element's :id. Read back via
// the element_frames NIF so an agent can locate/drive elements without a
// screenshot. Implemented in mob_nif.m.
void mob_register_frame(const char *id, double x, double y, double w, double h);
34 changes: 34 additions & 0 deletions ios/MobRootView.swift
Original file line number Diff line number Diff line change
Expand Up @@ -343,6 +343,9 @@
.scrollDismissesKeyboard(.interactively)
.padding(node.paddingEdgeInsets)
.background(node.backgroundColor.map { Color($0) } ?? Color.clear)
// Expose the node :id on the backing UIScrollView so the test
// harness (Mob.Test.scroll_info/scroll_to) can address it by id.
.ifLet(node.nativeViewId) { view, id in view.accessibilityIdentifier(id) }
// ── Batch 5 Tier 1: scroll position observation ──
// SwiftUI's onScrollGeometryChange is iOS 18+. On older iOS
// there's no clean SwiftUI API for raw offset; UIKit-backed
Expand Down Expand Up @@ -405,6 +408,7 @@
.frame(maxHeight: .infinity)
.padding(node.paddingEdgeInsets)
.background(node.backgroundColor.map { Color($0) } ?? Color.clear)
.ifLet(node.nativeViewId) { view, id in view.accessibilityIdentifier(id) }

case .progress:
let trackColor = node.color.map { Color($0) } ?? Color.accentColor
Expand Down Expand Up @@ -470,6 +474,36 @@
// (0, 0) which is a no-op. Used by SquareTriangle's hexagonal
// snowflake to position rings absolutely within a center-aligned box.
.offset(x: CGFloat(node.offsetX), y: CGFloat(node.offsetY))
// Record on-screen frame + set accessibilityIdentifier for any node
// carrying an :id, so the agent can read positions via the
// element_frames NIF without a screenshot.
.modifier(MobFrameTracker(node: node))
}
}

// MobFrameTracker — for any node with an :id, set it as the accessibility
// identifier and report the element's global frame (logical points) to the C
// registry as it lays out / moves. Untagged nodes pass through untouched, so
// there's no cost unless a dev opts an element in by giving it an :id.
private struct MobFrameTracker: ViewModifier {
let node: MobNode

func body(content: Content) -> some View {
if let id = node.nativeViewId {
content
.accessibilityIdentifier(id)
.background(
GeometryReader { geo in
Color.clear.onChange(of: geo.frame(in: .global), initial: true) { _, frame in
mob_register_frame(
id, Double(frame.minX), Double(frame.minY),
Double(frame.width), Double(frame.height))
}
}
)
} else {
content
}
}
}

Expand Down Expand Up @@ -869,7 +903,7 @@
// no manual frame management required.
private class CameraPreviewUIView: UIView {
override class var layerClass: AnyClass { AVCaptureVideoPreviewLayer.self }
var cameraLayer: AVCaptureVideoPreviewLayer { layer as! AVCaptureVideoPreviewLayer }

Check warning on line 906 in ios/MobRootView.swift

View workflow job for this annotation

GitHub Actions / Native formatters (clang-format + swiftlint)

Force casts should be avoided (force_cast)
}

private struct MobCameraPreviewView: UIViewRepresentable {
Expand Down
Loading
Loading