Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
144 changes: 144 additions & 0 deletions docs/ios-runner-protocol-optimizations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# iOS runner protocol optimization plan

Issue #656 is now split into protocol infrastructure plus follow-up optimizations. The lifecycle
protocol makes commands identifiable, but the performance wins come from changing when the daemon
uses `uptime`, retries, invalidates sessions, and asks the runner for lifecycle status.

## Work slices

### 1. Status-before-invalidate recovery

Status: in progress on `codex/ios-runner-status-recovery`.

Goal: when a command has been sent and the HTTP response is lost, ask the runner for
`status(statusCommandId)` before invalidating the session or surfacing an ambiguous transport
failure.

Acceptance criteria:

- Post-send retryable transport failures issue one bounded `status` probe with the original
`commandId` before session invalidation.
- `completed` with retained small response JSON returns the recovered command result without
invalidating or resending the command.
- `failed` returns the runner failure code/message/hint instead of a generic transport failure.
- `notAccepted`, status timeout, or status transport failure preserves the existing invalidation
behavior.
- Read-only commands whose response was not retained keep the existing retry behavior.
- Status recovery probes are short-budget and do not consume the full command timeout.

iOS simulator validation:

- Unit: `pnpm exec vitest run src/platforms/ios/__tests__/runner-command-retry.test.ts`.
- Unit bundle: `pnpm exec vitest run src/platforms/ios/__tests__/runner-client.test.ts src/platforms/ios/__tests__/runner-session.test.ts src/platforms/ios/__tests__/runner-command-retry.test.ts src/platforms/ios/__tests__/runner-provider.test.ts`.
- Build: `pnpm build:xcuitest`.
- Manual sim smoke after build:
- `pnpm build`
- `pnpm clean:daemon`
- run a simple iOS simulator session against Settings with `open`, `snapshot -i`, one selector
interaction, and `close`.
- confirm there is no visible behavior change and diagnostics show no unexpected session
invalidation.

### 2. Adaptive `uptime` preflight policy

Goal: stop paying eager `uptime` before low-risk mutating commands when the runner has recently
completed a command, relying on status-before-invalidate recovery for the rare ambiguous transport
failure.

Acceptance criteria:

- Existing first-command/startup readiness behavior is preserved.
- Existing failed-preflight stale-session recovery is preserved.
- Repeated hot interactions skip `uptime` when the runner has a recent successful response.
- Commands that still need conservative readiness checks remain preflighted until measured.
- A transport failure after skipping preflight runs status recovery before invalidation.
- Diagnostics expose whether a command used, skipped, or recovered from a readiness preflight.

iOS simulator validation:

- Start a fresh simulator session and run one interaction: verify the first mutating command still
preflights.
- Run a hot loop of repeated selector interactions against the same visible control: verify only
the first command pays `uptime`, subsequent commands emit `ios_runner_readiness_preflight_skipped`,
and the UI still responds correctly.
- Compare median command latency for a hot interaction loop before and after the change. A useful
threshold is at least one fewer runner request per hot command and no increase in failure rate.

### 3. Status-visible transport path

Goal: make `accepted` and `started` states practically observable while a command is still running.
The Swift journal already records these states, but the runner currently serializes connection
handling, so a concurrent status request can be blocked behind the command it is querying.

Acceptance criteria:

- `status` can be answered while another runner command is waiting on main-thread XCTest work.
- The status path remains journal-only and does not touch app activation, XCTest dispatch, or
command retry logic.
- Long-running command status can report `accepted` or `started` before the command reaches a
terminal state.
- Existing command execution remains serial where mutation ordering matters.

iOS simulator validation:

- Run a deliberately long runner command in one request.
- While it is in flight, query `status(statusCommandId)` from another request.
- Verify status returns before the long command completes and reports `accepted` or `started`.
- Verify normal command ordering is unchanged for back-to-back mutating commands.

### 4. Session invalidation reduction

Goal: avoid tearing down otherwise healthy runner sessions when lifecycle status proves the command
completed or failed cleanly.

Acceptance criteria:

- Completed/failed lifecycle status suppresses invalidation for ambiguous post-send transport
errors when the runner remains reachable.
- Unknown status states still invalidate to preserve current safety.
- Diagnostics record why invalidation was skipped or retained.
- No command is replayed after an observed mutating `accepted`, `started`, `completed`, or `failed`
state.

iOS simulator validation:

- Inject or simulate a lost response after a command completes.
- Verify status recovery prevents runner restart.
- Run the next command in the same session and verify it succeeds without re-launching xcodebuild.

### 5. Response retention tuning

Goal: retain enough small command results for useful recovery without making the runner retain large
snapshots or binary-like payloads.

Acceptance criteria:

- Small scalar responses can be recovered from `lifecycleResponseJson`.
- Snapshot node trees and screenshots are not serialized or retained in the journal.
- The journal memory cap remains bounded by entry count and response JSON size.
- Retention policy is documented in tests or runner fixtures so future commands do not accidentally
store large payloads.

iOS simulator validation:

- Run small-result commands and verify status can recover retained JSON.
- Run snapshot-heavy commands and verify status reports terminal state without retained response JSON.
- Confirm the runner remains responsive after repeated snapshots.

## Suggested ordering

1. Land status-before-invalidate recovery first. It is the safety net needed before reducing
defensive preflights.
2. Add diagnostics/metrics for preflight use, skipped preflights, status recovery, and invalidation
reason. This can happen alongside slice 1 or 2.
3. Reduce `uptime` for hot interaction loops with a conservative command allowlist.
4. Make the status transport path observable during long-running commands.
5. Broaden the preflight policy only after simulator measurements show stable behavior.

## Side-by-side work

- Status recovery and diagnostics can be developed together or separately.
- Transport status visibility can proceed independently once the protocol is on `main`.
- Adaptive `uptime` should wait for status recovery, because it relies on the same recovery path for
ambiguous post-send failures.
- Response retention tuning can proceed independently as long as it preserves the current caps.
8 changes: 2 additions & 6 deletions src/compat/maestro/__tests__/runtime-assertions.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -123,9 +123,7 @@ test('invokeMaestroAssertVisible does not dismiss React Native overlays during n
});

assert.equal(response.ok, false);
assert.deepEqual(calls, [
['wait', ['Ready', '60000']],
]);
assert.deepEqual(calls, [['wait', ['Ready', '60000']]]);
});

test('invokeMaestroAssertVisible uses snapshot resolution for short iOS assertions', async () => {
Expand Down Expand Up @@ -267,9 +265,7 @@ test('invokeMaestroAssertVisible fails fast when a RedBox has no dismiss target'
if (!response.ok) {
assert.match(response.error.message, /React Native overlay is covering app content/);
}
assert.deepEqual(calls, [
['snapshot', []],
]);
assert.deepEqual(calls, [['snapshot', []]]);
});

test('invokeMaestroAssertNotVisible passes after a slow hidden sample exhausts the timeout', async () => {
Expand Down
4 changes: 1 addition & 3 deletions src/compat/maestro/runtime-assertions.ts
Original file line number Diff line number Diff line change
Expand Up @@ -151,9 +151,7 @@ function handleFailedVisibleSample(
args: MaestroVisibilityAssertionArgs,
sample: Exclude<MaestroVisibilitySample, { visible: true }>,
startedAt: number,
):
| { kind: 'continue' }
| { kind: 'return'; response: DaemonResponse } {
): { kind: 'continue' } | { kind: 'return'; response: DaemonResponse } {
if (isReactNativeOverlayBlockingAssertion(sample.response)) {
return { kind: 'return', response: sample.response };
}
Expand Down
1 change: 0 additions & 1 deletion src/compat/maestro/runtime-interactions.ts
Original file line number Diff line number Diff line change
Expand Up @@ -481,7 +481,6 @@ async function clickMaestroSnapshotTarget(
};
}


async function invokeMaestroFuzzyTapOn(
params: MaestroTapOnParams,
query: string,
Expand Down
171 changes: 165 additions & 6 deletions src/platforms/ios/__tests__/runner-command-retry.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -135,9 +135,9 @@ test('mutating commands do not restart or replay after command send failure', as
const session = makeRunnerSession({ port: 8100, ready: true });

mockEnsureRunnerSession.mockResolvedValueOnce(session);
mockExecuteRunnerCommandWithSession.mockRejectedValueOnce(
new AppError('COMMAND_FAILED', 'fetch failed'),
);
mockExecuteRunnerCommandWithSession
.mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
.mockResolvedValueOnce({ lifecycleState: 'notAccepted' });

await assert.rejects(() =>
runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 }),
Expand All @@ -150,7 +150,165 @@ test('mutating commands do not restart or replay after command send failure', as
'transport_error_after_command_send',
]);
assert.equal(mockStopRunnerSession.mock.calls.length, 0);
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 1);
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2);
});

test('mutating commands recover cached responses before invalidating after command send failure', async () => {
const session = makeRunnerSession({ port: 8100, ready: true });

mockEnsureRunnerSession.mockResolvedValueOnce(session);
mockExecuteRunnerCommandWithSession
.mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
.mockResolvedValueOnce({
lifecycleState: 'completed',
lifecycleResponseJson: JSON.stringify({ ok: true, data: { message: 'tapped' } }),
});

const result = await runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 });

assert.deepEqual(result, { message: 'tapped' });
assert.equal(mockInvalidateRunnerSession.mock.calls.length, 0);
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2);
const sentCommand = mockExecuteRunnerCommandWithSession.mock.calls[0]?.[2];
const statusCommand = mockExecuteRunnerCommandWithSession.mock.calls[1]?.[2];
assert.equal(statusCommand.command, 'status');
assert.equal(statusCommand.statusCommandId, sentCommand.commandId);
});

test('mutating commands keep invalidating when status cannot find the command', async () => {
const session = makeRunnerSession({ port: 8100, ready: true });

mockEnsureRunnerSession.mockResolvedValueOnce(session);
mockExecuteRunnerCommandWithSession
.mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
.mockResolvedValueOnce({
lifecycleState: 'notAccepted',
});

await assert.rejects(() =>
runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 }),
);

assert.deepEqual(mockInvalidateRunnerSession.mock.calls, [
[session, 'transport_error_after_command_send'],
]);
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2);
});

test('read-only commands retry when completed status has no retained response', async () => {
const session = makeRunnerSession({ port: 8100, ready: true });

mockEnsureRunnerSession.mockResolvedValue(session);
mockExecuteRunnerCommandWithSession
.mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
.mockResolvedValueOnce({ lifecycleState: 'completed' })
.mockResolvedValueOnce({ nodes: [], truncated: false });

const result = await runIosRunnerCommand(IOS_SIMULATOR, { command: 'snapshot' });

assert.deepEqual(result, { nodes: [], truncated: false });
assert.equal(mockInvalidateRunnerSession.mock.calls.length, 0);
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 3);
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls[1]?.[2].command, 'status');
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls[2]?.[2].command, 'snapshot');
});

test('read-only commands retry when status shows in-flight work', async () => {
const session = makeRunnerSession({ port: 8100, ready: true });

mockEnsureRunnerSession.mockResolvedValue(session);
mockExecuteRunnerCommandWithSession
.mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
.mockResolvedValueOnce({ lifecycleState: 'started' })
.mockResolvedValueOnce({ nodes: [], truncated: false });

const result = await runIosRunnerCommand(IOS_SIMULATOR, { command: 'snapshot' });

assert.deepEqual(result, { nodes: [], truncated: false });
assert.equal(mockInvalidateRunnerSession.mock.calls.length, 0);
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 3);
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls[1]?.[2].command, 'status');
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls[2]?.[2].command, 'snapshot');
});

test('mutating commands report recovery guidance when completed status has no retained response', async () => {
const session = makeRunnerSession({ port: 8100, ready: true });

mockEnsureRunnerSession.mockResolvedValueOnce(session);
mockExecuteRunnerCommandWithSession
.mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
.mockResolvedValueOnce({ lifecycleState: 'completed' });

await assert.rejects(
() => runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 }),
(error: unknown) => {
assert.ok(error instanceof AppError);
assert.match(error.message, /"tap" completed after the transport response was lost/);
assert.equal(error.details?.recovery, 'completed_without_retained_response');
assert.match(String(error.details?.hint), /will not replay/);
assert.match(String(error.details?.hint), /snapshot -i/);
assert.equal(error.details?.transportError, 'fetch failed');
return true;
},
);

assert.equal(mockInvalidateRunnerSession.mock.calls.length, 0);
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2);
});

test('mutating commands preserve runner failure details from status recovery', async () => {
const session = makeRunnerSession({ port: 8100, ready: true });

mockEnsureRunnerSession.mockResolvedValueOnce(session);
mockExecuteRunnerCommandWithSession
.mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
.mockResolvedValueOnce({
lifecycleState: 'failed',
lifecycleErrorCode: 'AMBIGUOUS_MATCH',
lifecycleErrorMessage: 'Found 2 matching buttons',
lifecycleErrorHint: 'Use a more specific selector.',
});

await assert.rejects(
() => runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 }),
(error: unknown) => {
assert.ok(error instanceof AppError);
assert.equal(error.code, 'AMBIGUOUS_MATCH');
assert.equal(error.message, 'Found 2 matching buttons');
assert.equal(error.details?.recovery, 'runner_reported_failure');
assert.equal(error.details?.hint, 'Use a more specific selector.');
assert.equal(error.details?.transportError, 'fetch failed');
return true;
},
);

assert.equal(mockInvalidateRunnerSession.mock.calls.length, 0);
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2);
});

test('mutating commands report wait-and-inspect guidance when status shows in-flight work', async () => {
const session = makeRunnerSession({ port: 8100, ready: true });

mockEnsureRunnerSession.mockResolvedValueOnce(session);
mockExecuteRunnerCommandWithSession
.mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
.mockResolvedValueOnce({ lifecycleState: 'started' });

await assert.rejects(
() => runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 }),
(error: unknown) => {
assert.ok(error instanceof AppError);
assert.match(error.message, /"tap" is still started/);
assert.equal(error.details?.recovery, 'command_still_in_flight');
assert.match(String(error.details?.hint), /may still finish/);
assert.match(String(error.details?.hint), /snapshot -i/);
assert.equal(error.details?.transportError, 'fetch failed');
return true;
},
);

assert.equal(mockInvalidateRunnerSession.mock.calls.length, 0);
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2);
});

test('mutating commands invalidate the retry session without replaying again', async () => {
Expand All @@ -160,7 +318,8 @@ test('mutating commands invalidate the retry session without replaying again', a
mockEnsureRunnerSession.mockResolvedValueOnce(staleSession).mockResolvedValueOnce(freshSession);
mockExecuteRunnerCommandWithSession
.mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'Runner did not accept connection'))
.mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'));
.mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
.mockResolvedValueOnce({ lifecycleState: 'notAccepted' });

await assert.rejects(() =>
runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 }),
Expand All @@ -171,7 +330,7 @@ test('mutating commands invalidate the retry session without replaying again', a
[staleSession, 'runner_connect_failed_before_command_send'],
[freshSession, 'transport_error_after_retry_command_send'],
]);
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2);
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 3);
});

function makeRunnerSession(overrides: Partial<RunnerSession> = {}): RunnerSession {
Expand Down
Loading
Loading