Skip to content

Update mobile screen-control client for H.264 streaming#11

Closed
Copilot wants to merge 9 commits intopoc-appsfrom
copilot/update-mobile-screen-control
Closed

Update mobile screen-control client for H.264 streaming#11
Copilot wants to merge 9 commits intopoc-appsfrom
copilot/update-mobile-screen-control

Conversation

Copy link
Copy Markdown

Copilot AI commented Mar 14, 2026

The mobile screen-control client used a removed tile-polling API (getWindowSnapshot, RemoteAppWindowUIState). Updates it to consume the new ReadableStream<Uint8Array>-based H.264 streaming API matching the web module.

Native H.264 decoders

  • iOS (H264Decoder.swift): VideoToolbox VTDecompressionSession, Annex B → AVCC conversion, outputs raw JPEG bytes. Uses nested withUnsafeBufferPointer for correct C pointer lifetime in CMVideoFormatDescriptionCreateFromH264ParameterSets. Clears lastDecodedImage before each decode to prevent stale frame returns.
  • Android (H264Decoder.kt): MediaCodec hardware decoder, YUV420→JPEG conversion via YuvImage.compressToJpeg() (native libjpeg) for efficient 30fps throughput, real SPS dimension parsing with exp-Golomb decoding and emulation prevention byte removal
  • Native decoders return binary Uint8Array/Data/ByteArray (matching the existing generateThumbnailJpeg pattern) — base64 conversion for data URIs is done on the JS side
  • Zero new dependencies — uses only built-in platform codecs

Session-based decoder API

  • Decoder instances are managed per-session via h264DecoderCreate() → sessionId, h264DecoderDecode(sessionId, ...), h264DecoderDestroy(sessionId) — supports concurrent window captures (e.g. multiple child window modals) without interference
  • Both platforms maintain a dictionary/map of decoder instances keyed by session ID, matching the pattern used for TCP connections
  • iOS thread safety: All decoder operations (create/decode/destroy) are serialized via a dedicated serial DispatchQueue (decoderQueue), preventing races between concurrent sessions and between decode/destroy on the same session

useWindowCapture hook rewrite

  • Reads ReadableStream from startStreamingSession(windowId) instead of polling getWindowSnapshot()
  • Parses chunks via decodeMediaChunk from shared/mediaStream
  • Creates a dedicated native decoder session per capture, feeds H.264 payloads, converts returned JPEG bytes to data URIs via Buffer.from().toString('base64')
  • Heartbeat via streamControl(windowId, 30, 0.8) every 3s — 30fps/0.8 quality for full-fidelity streaming
  • Auto-reconnect with backoff and captureId token to prevent stale stream loops from interfering
  • Exposes isReconnecting, retryAttempt, and cancelReconnect for reconnection UX
  • Error state is cleared on successful frame decode to prevent stale errors after stream recovery

Screen control UI

  • Single expo-image frame replaces tile grid (tileImages, TILE_SIZE, positioned <Image> components all removed)
  • WindowCanvas, ChildWindowModal, ScreenControlScreen updated to new WindowFrameState interface
  • DPI-aware coordinate mapping: Canvas layout and input mapping use logical dimensions (width / dpi, height / dpi) matching the web implementation — touch/pointer coordinates are sent in logical points, not pixels
  • Reconnection state shows "Reconnecting (attempt N)..." with cancel button overlaid on the dimmed last frame using StyleSheet.absoluteFillObject with semi-transparent background
  • expo-image uses cachePolicy="none" with stable recyclingKey to prevent memory growth from rapidly-changing base64 data URIs during streaming
  • All touch interaction logic (pointer/touch modes, scroll, keyboard) unchanged
// Old (removed API)
const session = await sc.apps.startStreamingSession(windowId, TILE_SIZE);
const snapshot = await sc.apps.getWindowSnapshot(wId, session.key, QUALITY);

// New (stream-based, session-scoped decoder)
const decoderSession = superman.h264DecoderCreate();
const { stream, width, height, dpi } = await sc.apps.startStreamingSession(windowId);
const reader = stream.getReader();
const { metadata, payload } = decodeMediaChunk((await reader.read()).value);
const jpegBytes = await superman.h264DecoderDecode(decoderSession, payload, metadata.type === 'keyframe');
const base64 = Buffer.from(jpegBytes).toString('base64');

Note: Files under modules/superman/ios/ and android/ are gitignored by the expo prebuild patterns in mobile/.gitignore — the native decoder files are force-tracked.

Original prompt

Update the mobile screen-control client for the new streaming approach, referring the web module to check. Keeping efficiency, performance and app size the priority.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 3 commits March 14, 2026 08:16
- Add native H.264 decoder (iOS VideoToolbox, Android MediaCodec)
- Rewrite useWindowCapture hook to use ReadableStream-based API
- Replace tile-based rendering with single-frame Image display
- Use decodeMediaChunk from shared/mediaStream for chunk parsing
- Request mobile-optimized stream params (8fps, 0.4 quality)

Co-authored-by: asrient <44570278+asrient@users.noreply.github.com>
Co-authored-by: asrient <44570278+asrient@users.noreply.github.com>
…nd docs

Co-authored-by: asrient <44570278+asrient@users.noreply.github.com>
Copilot AI changed the title [WIP] Update mobile screen-control client for new streaming approach Update mobile screen-control client for H.264 streaming Mar 14, 2026
Copilot AI requested a review from asrient March 14, 2026 08:23
import UIKit

/// Lightweight H.264 Annex B decoder using VideoToolbox hardware acceleration.
/// Decodes frames synchronously and returns JPEG-compressed base64 strings for display.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we passing the jpeg as base64 string? Why cant we just pass and use binary buffer. Also how often does it produce the jpeg for 30fps? 30 jpeg per sec?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f01b7f4 — native decoders now return raw JPEG bytes (Uint8Array/Data/ByteArray) instead of base64 strings, matching the existing generateThumbnailJpeg pattern. The base64 conversion for the data URI happens on the JS side via Buffer.from(jpegBytes).toString('base64').

Regarding frame rate: we request 8fps (not 30) from the server via streamControl(windowId, 8, 0.4) — so it's ~8 JPEG decodes per second, tuned for mobile bridge throughput.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's increase it to 30fps and quality 0.8

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 2d41e41MOBILE_STREAM_FPS bumped to 30 and MOBILE_STREAM_QUALITY to 0.8. Also optimized the Android YUV→JPEG path (replaced per-pixel Kotlin loop with YuvImage.compressToJpeg() which uses native libjpeg) to handle 30fps throughput.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already done in 2d41e41MOBILE_STREAM_FPS = 30 and MOBILE_STREAM_QUALITY = 0.8.

Native decoders now return raw Uint8Array/Data/ByteArray (matching the
existing generateThumbnailJpeg pattern). Base64 encoding for the data
URI is done on the JS side via Buffer.from().toString('base64').
This avoids unnecessary base64 encoding overhead on the native side.

Co-authored-by: asrient <44570278+asrient@users.noreply.github.com>
Copilot AI requested a review from asrient March 14, 2026 08:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the mobile screen-control client to consume the new ReadableStream<Uint8Array> H.264 streaming API (matching the web approach) by adding native H.264 decode paths and rewriting the mobile capture/render loop to display a single decoded frame.

Changes:

  • Replaced tile-polling window capture with a stream reader + decodeMediaChunk pipeline and a heartbeat/reconnect loop.
  • Added native H.264 decoders to the superman module (iOS VideoToolbox, Android MediaCodec) returning base64 JPEG frames.
  • Updated screen-control UI to render a single expo-image frame and adapt to the new WindowFrameState interface.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
mobile/package-lock.json Lockfile updates (devOptional flags; adds peer entries like react-dom/scheduler).
mobile/modules/superman/src/SupermanModule.ts Extends the typed native module surface with H.264 decode/destroy APIs.
mobile/modules/superman/ios/SupermanModule.swift Wires JS-accessible decode/destroy functions to an iOS H.264 decoder instance.
mobile/modules/superman/ios/H264Decoder.swift Implements Annex B → decoded frame → base64 JPEG pipeline via VideoToolbox.
mobile/modules/superman/android/src/main/java/com/asrient/superman/SupermanModule.kt Wires JS-accessible decode/destroy functions to an Android H.264 decoder instance.
mobile/modules/superman/android/src/main/java/com/asrient/superman/H264Decoder.kt Implements MediaCodec decode + YUV→JPEG→base64 conversion for streamed frames.
mobile/hooks/useApps.ts Rewrites useWindowCapture to read the streaming API, decode chunks, and surface WindowFrameState.
mobile/app/device/[fingerprint]/screen-control.tsx Migrates screen-control UI from tile grid to a single streamed frame render.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 677 to 680
const { frameState, isConnecting, error, startCapture, stopCapture } = useWindowCapture(
selectedWindowId,
deviceFingerprint,
);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Comment on lines +503 to 506
const { frameState, isConnecting, error, startCapture, stopCapture } = useWindowCapture(
childWindow.id,
deviceFingerprint,
);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix this

Comment thread mobile/hooks/useApps.ts
Comment on lines +293 to +296

// Stream ended normally
if (!cancelled && isMountedRef.current) {
setError('Window stream ended.');
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better ux is to show user the connection is broken and it's attempting reconnect with option to cancel.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in fa8db58 — the hook now exposes isReconnecting, retryAttempt, and cancelReconnect. The UI shows "Reconnecting (attempt N)..." with a cancel button overlaid on the dimmed last frame. A captureId token prevents stale stream loops from setting spurious error states during reconnect.

Comment on lines +60 to +62
// H.264 decoder
private val h264Decoder = H264Decoder()

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should either use a class/obj or use session keys, currently entire impl is global

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in fa8db58 — the decoder API is now session-based: h264DecoderCreate() → sessionId, h264DecoderDecode(sessionId, ...), h264DecoderDestroy(sessionId). Both iOS and Android modules maintain a dictionary/map of decoder instances keyed by session ID, matching the pattern used for TCP connections. Each useWindowCapture hook creates its own session on start and destroys it on cleanup.

inputBuffer.clear()
inputBuffer.put(annexBData)

val flags = if (isKeyframe) MediaCodec.BUFFER_FLAG_KEY_FRAME else 0
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consult android and ios docs for correct api usage across all changes.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in fa8db58 — Android: removed incorrect BUFFER_FLAG_KEY_FRAME on decoder input (per MediaCodec docs, decoders infer frame type from NAL headers; KEY_FRAME is an output/encoder flag). Also fixed UV plane stride/pixelStride to use separate values per plane per the Image.Plane docs, implemented real SPS dimension parsing with emulation prevention byte removal and exp-Golomb decoding per ITU-T H.264 §7.3.2.1, and added SPS/PPS change detection for mid-stream reconfiguration. iOS: removed unused DispatchSemaphore.

Comment on lines +231 to +245
* Falls back to 1920x1080 which is used only for initial MediaCodec configuration.
* MediaCodec adapts to actual stream dimensions from the SPS/PPS data in csd-0/csd-1,
* so this fallback is safe — the decoder will produce correctly sized output regardless.
*/
private fun parseSPSDimensions(sps: ByteArray): Pair<Int, Int> {
try {
if (sps.size < 4) return Pair(1920, 1080)
// Simplified: use MediaFormat to extract dimensions when possible
// For now, use a reasonable default and let MediaCodec handle the actual SPS
// The decoder will adapt to the actual frame dimensions
return Pair(1920, 1080)
} catch (_: Exception) {
return Pair(1920, 1080)
}
}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix this

Comment thread mobile/hooks/useApps.ts
Comment on lines +210 to 220
const cleanup = useCallback(() => {
if (heartbeatTimerRef.current) {
clearInterval(heartbeatTimerRef.current);
heartbeatTimerRef.current = null;
}

if (isMountedRef.current && windowIdRef.current === wId) {
captureTimerRef.current = setTimeout(captureLoop, CAPTURE_INTERVAL_MS);
if (readerRef.current) {
readerRef.current.cancel().catch(() => {});
readerRef.current = null;
}
superman.h264DecoderDestroy();
}, []);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need multiple decoder instances that can run/be used simultaneously.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already implemented in fa8db58 — the decoder API is session-based: h264DecoderCreate() → sessionId, h264DecoderDecode(sessionId, ...), h264DecoderDestroy(sessionId). Both platforms maintain a dictionary/map of decoder instances keyed by session ID. Each useWindowCapture hook creates its own session, so concurrent child window captures work independently.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already session-based since fa8db58. In aab9677, further hardened with thread safety via a serial decoderQueue on iOS that serializes all create/decode/destroy operations across concurrent sessions.

Comment on lines 5 to 9
private let tcpNetworking = TcpNetworking()
private let udpNetworking = UdpNetworking()
private let thumbnailGenerator = ThumbnailGenerator()
private let h264Decoder = H264Decoder()

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix this

Comment on lines +142 to +145
val uvRowStride = uPlane.rowStride
val uvPixelStride = uPlane.pixelStride

// Convert YUV420 to ARGB bitmap
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix this


// Decode synchronously
var decodedImage: CGImage?
let semaphore = DispatchSemaphore(value: 0)
@asrient asrient requested a review from Copilot March 14, 2026 09:10
Copy link
Copy Markdown
Owner

@asrient asrient left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix needed

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the mobile screen-control client to consume the new ReadableStream<Uint8Array> H.264 streaming API (matching the web approach), replacing the removed tile/snapshot polling flow.

Changes:

  • Added native H.264 decoder APIs to the superman Expo module (iOS VideoToolbox, Android MediaCodec).
  • Rewrote useWindowCapture to read/decode HCMediaStream chunks, decode H.264 frames natively, and render a single JPEG frame URI.
  • Updated screen control UI to render a single expo-image frame instead of a tiled grid.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
mobile/modules/superman/src/SupermanModule.ts Adds JS typings for H.264 decoder functions.
mobile/modules/superman/ios/SupermanModule.swift Exposes iOS decoder functions to JS via Expo Modules.
mobile/modules/superman/ios/H264Decoder.swift New iOS VideoToolbox-based H.264 decoder producing JPEG bytes.
mobile/modules/superman/android/src/main/java/com/asrient/superman/SupermanModule.kt Exposes Android decoder functions to JS via Expo Modules.
mobile/modules/superman/android/src/main/java/com/asrient/superman/H264Decoder.kt New Android MediaCodec-based H.264 decoder producing JPEG bytes.
mobile/hooks/useApps.ts Replaces polling capture loop with stream reader + decoder integration + heartbeat/reconnect.
mobile/app/device/[fingerprint]/screen-control.tsx Switches UI rendering from tiles to a single streamed frame image.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +161 to +168
return CMVideoFormatDescriptionCreateFromH264ParameterSets(
allocator: kCFAllocatorDefault,
parameterSetCount: 2,
parameterSetPointers: &pointers,
parameterSetSizes: &sizes,
nalUnitHeaderLength: 4,
formatDescriptionOut: &formatDesc
)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix this

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in aab9677 — now uses nested withUnsafeBufferPointer on both the pointers and sizes arrays to pass stable baseAddress pointers to CMVideoFormatDescriptionCreateFromH264ParameterSets, per the VideoToolbox C API contract. Also cleared lastDecodedImage before each decode to prevent stale frames.

Comment on lines +141 to +157
val yRowStride = yPlane.rowStride
val uvRowStride = uPlane.rowStride
val uvPixelStride = uPlane.pixelStride

// Convert YUV420 to ARGB bitmap
val argb = IntArray(width * height)
for (row in 0 until height) {
for (col in 0 until width) {
val yIdx = row * yRowStride + col
val uvRow = row / 2
val uvCol = col / 2
val uIdx = uvRow * uvRowStride + uvCol * uvPixelStride
val vIdx = uvRow * uvRowStride + uvCol * uvPixelStride

val y = (yBuffer.get(yIdx).toInt() and 0xFF) - 16
val u = (uBuffer.get(uIdx).toInt() and 0xFF) - 128
val v = (vBuffer.get(vIdx).toInt() and 0xFF) - 128
Comment thread mobile/hooks/useApps.ts
Comment on lines +280 to +289
if (jpegBytes && jpegBytes.byteLength > 0) {
const base64 = Buffer.from(jpegBytes).toString('base64');
setFrameState({
frameUri: `data:image/jpeg;base64,${base64}`,
width: currentWidth,
height: currentHeight,
dpi: currentDpi,
});
setIsConnecting(false);
}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix this

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed in 2d41e41setError(null) is called at line 310 on successful frame decode. Verified still present in current code.

Comment on lines 6 to 9
private let udpNetworking = UdpNetworking()
private let thumbnailGenerator = ThumbnailGenerator()
private let h264Decoder = H264Decoder()

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix this

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already session-based since fa8db58. In aab9677, further hardened with a serial decoderQueue that serializes all create/decode/destroy operations, preventing race conditions between concurrent sessions or between decode and destroy on the same session.

Comment on lines +40 to +41
val flags = if (isKeyframe) MediaCodec.BUFFER_FLAG_KEY_FRAME else 0
codec.queueInputBuffer(inputIndex, 0, annexBData.size, 0, flags)
Comment thread mobile/hooks/useApps.ts
Comment on lines +210 to 220
const cleanup = useCallback(() => {
if (heartbeatTimerRef.current) {
clearInterval(heartbeatTimerRef.current);
heartbeatTimerRef.current = null;
}

if (isMountedRef.current && windowIdRef.current === wId) {
captureTimerRef.current = setTimeout(captureLoop, CAPTURE_INTERVAL_MS);
if (readerRef.current) {
readerRef.current.cancel().catch(() => {});
readerRef.current = null;
}
superman.h264DecoderDestroy();
}, []);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix this

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already session-based since fa8db58 — each useWindowCapture hook creates its own decoder session via h264DecoderCreate() and only destroys that session on cleanup. In aab9677, added serial queue synchronization on the iOS side to prevent races.

Comment thread mobile/hooks/useApps.ts Outdated
Comment on lines +294 to +297
// Stream ended normally
if (!cancelled && isMountedRef.current) {
setError('Window stream ended.');
}
Comment on lines +145 to +175
// Convert YUV420 to ARGB bitmap
val argb = IntArray(width * height)
for (row in 0 until height) {
for (col in 0 until width) {
val yIdx = row * yRowStride + col
val uvRow = row / 2
val uvCol = col / 2
val uIdx = uvRow * uvRowStride + uvCol * uvPixelStride
val vIdx = uvRow * uvRowStride + uvCol * uvPixelStride

val y = (yBuffer.get(yIdx).toInt() and 0xFF) - 16
val u = (uBuffer.get(uIdx).toInt() and 0xFF) - 128
val v = (vBuffer.get(vIdx).toInt() and 0xFF) - 128

var r = (1.164 * y + 1.596 * v).toInt()
var g = (1.164 * y - 0.813 * v - 0.391 * u).toInt()
var b = (1.164 * y + 2.018 * u).toInt()

r = r.coerceIn(0, 255)
g = g.coerceIn(0, 255)
b = b.coerceIn(0, 255)

argb[row * width + col] = (0xFF shl 24) or (r shl 16) or (g shl 8) or b
}
}

val bitmap = Bitmap.createBitmap(argb, width, height, Bitmap.Config.ARGB_8888)
val outputStream = ByteArrayOutputStream()
bitmap.compress(Bitmap.CompressFormat.JPEG, 70, outputStream)
bitmap.recycle()

Comment on lines +60 to +62
// H.264 decoder
private val h264Decoder = H264Decoder()

Copilot AI and others added 2 commits March 14, 2026 09:17
- Make H.264 decoder instance-based with session keys (h264DecoderCreate/
  h264DecoderDecode/h264DecoderDestroy) instead of global singleton —
  supports concurrent child window captures without interference
- Android: remove incorrect BUFFER_FLAG_KEY_FRAME input flag (decoder
  infers frame type from NAL headers), fix UV plane stride/pixelStride
  using separate values for V plane, implement real SPS dimension parsing
  with exp-Golomb decoding, cache SPS/PPS and reconfigure on change
- iOS: remove unused DispatchSemaphore variable
- Add isReconnecting/retryAttempt state to useWindowCapture hook with
  cancelReconnect callback; use captureId token to prevent stale stream
  loops from setting error state during reconnect
- Screen-control UI shows "Reconnecting (attempt N)..." with cancel
  button overlay on the dimmed last frame during reconnection

Co-authored-by: asrient <44570278+asrient@users.noreply.github.com>
Co-authored-by: asrient <44570278+asrient@users.noreply.github.com>
Copy link
Copy Markdown
Author

Copilot AI commented Mar 14, 2026

Fix needed

All review items addressed in fa8db58 and a4bee7b — session-based decoders, Android API fixes (input flags, UV strides, SPS parsing, reconfiguration), iOS cleanup, and reconnection UX with cancel support.

Copilot AI requested a review from asrient March 14, 2026 09:21
Copy link
Copy Markdown
Owner

@asrient asrient left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using a custom pipeline for H.265/264 stream to jpeg and then render the jpeg. Is this the standard approach for expo/react native? Any better approach or lib to use?
Also look at copilot comments to fix.

…Android YUV→JPEG via YuvImage

Co-authored-by: asrient <44570278+asrient@users.noreply.github.com>
Copy link
Copy Markdown
Author

Copilot AI commented Mar 14, 2026

We are using a custom pipeline for H.265/264 stream to jpeg and then render the jpeg. Is this the standard approach for expo/react native? Any better approach or lib to use?
Also look at copilot comm...

On the approach: The H.264→JPEG→expo-image pipeline is the practical choice for this setup because the H.264 stream arrives over the RPC bridge as binary chunks (not a standard media URL), so standard React Native video players (expo-av, react-native-video) can't consume it directly — they expect HTTP URLs or local files. There's no mature RN library for decoding raw H.264 NAL units from an arbitrary byte stream. The web client uses WebCodecs VideoDecoder + canvas, which doesn't exist in RN. So the native decoder (VideoToolbox/MediaCodec) → JPEG → expo-image path is the standard workaround for this kind of custom streaming pipeline. If performance needs increase beyond what this can handle, the next step would be rendering decoded frames directly to a native view (SurfaceView/MTKView) via a custom Expo view module — but that's significantly more complexity for marginal gain at current frame rates.

Copilot reviewer fixes in 2d41e41:

  • Error state is now cleared on successful frame decode (prevents stale errors after stream recovery)
  • Android YUV→JPEG optimized via YuvImage.compressToJpeg() (native libjpeg) replacing the per-pixel Kotlin loop + Bitmap allocation — critical for 30fps throughput

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the mobile screen-control client to consume the new ReadableStream<Uint8Array>-based H.264 streaming API (matching the web approach), with native H.264 decoding on iOS/Android and a simplified single-frame UI render path.

Changes:

  • Added session-based native H.264 decoder APIs (create/decode/destroy) to the Superman Expo module on iOS and Android.
  • Rewrote useWindowCapture to read the streaming session, parse media chunks, decode via native decoders, and expose reconnect UX state.
  • Updated the mobile screen-control UI to render a single expo-image frame (removing the old tile grid approach) and show reconnect overlay controls.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
mobile/modules/superman/src/SupermanModule.ts Extends the JS native module interface with H.264 decoder session methods.
mobile/modules/superman/ios/SupermanModule.swift Adds per-session decoder management and exposes create/decode/destroy functions.
mobile/modules/superman/ios/H264Decoder.swift New VideoToolbox-based Annex B H.264 decoder emitting JPEG bytes.
mobile/modules/superman/android/src/main/java/com/asrient/superman/SupermanModule.kt Adds per-session decoder management and exposes create/decode/destroy functions.
mobile/modules/superman/android/src/main/java/com/asrient/superman/H264Decoder.kt New MediaCodec-based Annex B H.264 decoder emitting JPEG bytes.
mobile/hooks/useApps.ts Replaces tile polling capture with stream consumption + native decoder session + reconnect logic.
mobile/app/device/[fingerprint]/screen-control.tsx Switches UI rendering from tiles to a single frame image, and adds reconnect overlay UI.
Comments suppressed due to low confidence (1)

mobile/app/device/[fingerprint]/screen-control.tsx:736

  • The corrective resize logic uses frameState.width/height directly, but those values are pixel dimensions; comparing them against idealSize (logical) will cause incorrect correction/resizes on HiDPI hosts. Use logical sizes (width/dpi, height/dpi) for these comparisons and any Resize actions.
        const actualWidth = frameState.width;
        const actualHeight = frameState.height;

        // If the width was clamped wider than requested, compensate with more height
        if (actualWidth > idealSize.width * 1.05) {
            hasCorrectedRef.current.add(selectedWindowId);
            const correctedHeight = Math.round(actualWidth / idealSize.aspect);


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1096 to +1098
position: 'absolute',
justifyContent: 'center',
alignItems: 'center',
Comment on lines +160 to +167
return CMVideoFormatDescriptionCreateFromH264ParameterSets(
allocator: kCFAllocatorDefault,
parameterSetCount: 2,
parameterSetPointers: &pointers,
parameterSetSizes: &sizes,
nalUnitHeaderLength: 4,
formatDescriptionOut: &formatDesc
)
Comment on lines +114 to +117
if status == noErr {
VTDecompressionSessionWaitForAsynchronousFrames(session)
decodedImage = lastDecodedImage
}
Comment on lines +177 to +183
// H.264 decoder functions — session-based for concurrent window captures
Function("h264DecoderCreate") {
self.decoderIdCounter += 1
let sessionId = "h264-\(self.decoderIdCounter)"
self.h264Decoders[sessionId] = H264Decoder()
return sessionId
}
Comment on lines +185 to +196
AsyncFunction("h264DecoderDecode") { (sessionId: String, data: Data, isKeyframe: Bool, promise: Promise) in
guard let decoder = self.h264Decoders[sessionId] else {
promise.reject("ERR_NO_SESSION", "No decoder session: \(sessionId)")
return
}
DispatchQueue.global(qos: .userInteractive).async {
if let jpegData = decoder.decode(annexBData: data, isKeyframe: isKeyframe) {
promise.resolve(jpegData)
} else {
promise.resolve(nil)
}
}
Comment on lines +57 to +71
// Scale factor: fit remote window into canvasWidth/canvasHeight
// width/height are in pixel space; divide by dpi for logical dimensions
const scale = useMemo(() => {
if (!uiState) return 1;
const sx = canvasWidth / uiState.width;
const sy = canvasHeight / uiState.height;
if (!frameState) return 1;
const sx = canvasWidth / frameState.width;
const sy = canvasHeight / frameState.height;
return Math.min(sx, sy, 1);
}, [uiState, canvasWidth, canvasHeight]);
}, [frameState, canvasWidth, canvasHeight]);

const displayWidth = uiState.width * scale;
const displayHeight = uiState.height * scale;
const displayWidth = frameState.width * scale;
const displayHeight = frameState.height * scale;

// Pointer mode state
const pointerPos = useRef<{ x: number; y: number }>({
x: uiState.width / 2,
y: uiState.height / 2,
x: frameState.width / 2,
y: frameState.height / 2,
source={{ uri: frameState.frameUri }}
style={{ width: displayWidth, height: displayHeight }}
contentFit="contain"
cachePolicy="memory"
Copy link
Copy Markdown
Owner

@asrient asrient left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using a custom pipeline for H.265/264 stream to jpeg and then render the jpeg. Is this the standard approach for expo/react native? Any better approach or lib to use?
Also look at copilot comments to fix.

…itioning, and cache policy

Co-authored-by: asrient <44570278+asrient@users.noreply.github.com>
Copy link
Copy Markdown
Author

Copilot AI commented Mar 14, 2026

We are using a custom pipeline for H.265/264 stream to jpeg and then render the jpeg. Is this the standard approach for expo/react native? Any better approach or lib to use?
Also look at copilot comme...

On the approach: The H.264→JPEG→expo-image pipeline is the practical standard for this kind of custom byte-stream decoding in Expo/RN. Standard RN video players (expo-av, react-native-video) require HTTP URLs or local files — they can't consume raw H.264 NAL units from an RPC byte stream. There's no mature RN library for decoding arbitrary H.264 byte streams. The web client uses WebCodecs VideoDecoder + canvas, which doesn't exist in RN. For higher performance the next step would be rendering decoded frames directly to a native SurfaceView/MTKView via a custom Expo view module, but that's significantly more complexity for marginal gain at current frame rates.

Copilot reviewer fixes addressed in aab9677:

  • iOS pointer passing: nested withUnsafeBufferPointer for CMVideoFormatDescriptionCreateFromH264ParameterSets
  • iOS stale frame: lastDecodedImage = nil before each decode
  • iOS thread safety: serial decoderQueue for all create/decode/destroy operations (prevents races)
  • DPI coordinates: canvas layout and input mapping now use logical dimensions (width/dpi, height/dpi), matching the web implementation
  • reconnectOverlay: absoluteFillObject + semi-transparent background
  • expo-image cache: cachePolicy="none" with stable recyclingKey to avoid memory growth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants