Skip to content

Wait for host AX publish before rescheduling, not a fixed 150ms#379

Merged
FuJacob merged 1 commit into
mainfrom
fix/chrome-ax-snapshot-gate
May 28, 2026
Merged

Wait for host AX publish before rescheduling, not a fixed 150ms#379
FuJacob merged 1 commit into
mainfrom
fix/chrome-ax-snapshot-gate

Conversation

@FuJacob
Copy link
Copy Markdown
Owner

@FuJacob FuJacob commented May 28, 2026

Summary

Replaces the fixed 150ms post-divergence reschedule delay from PR #376 with a snapshot-changed gate: capture the focused element identity, preceding text, and selection at keystroke time, then poll focusModel.snapshot every 30ms until any of those actually move on. Fires schedulePrediction() as soon as the host has published, with a 400ms ceiling so a silent host never hangs the pipeline.

The logs in #381 showed work=93 and work=95 generating against the same oij3 prefix because Chrome hadn't published the new keystroke by the time the deferred reschedule fired — the rescheduled suggestion then appears with stale context, which reads to the user as the character being swallowed. This patch waits for evidence the publish actually happened instead of guessing a fixed delay.

Validation

swiftlint lint --quiet
# exit 0, no new warnings

xcodebuild -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS' build
# ** BUILD SUCCEEDED **

xcodebuild -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS' build-for-testing
# ** TEST BUILD SUCCEEDED **

Manual reproduction required: type rapidly in a Chrome contenteditable, confirm regenerated suggestions reflect the latest typed character instead of the prior state.

Linked issues

Refs #381

Risk / rollout notes

  • All three call sites in handleInputEvent (no-session path, active-session .textMutation, active-session .shortcutMutation) share the helper, so the gate carries to every place the old fixed delay applied.
  • 400ms ceiling is the fallback when AX never moves. Without it a dead-key composition or a host that genuinely produces no change would hang the pipeline. Generation against pre-keystroke text in that edge case matches the pre-fix behavior, which is acceptable since the user can always type again.
  • AX refresh cost. Each poll calls focusModel.refreshNow() which is 5–15ms based on logged resolve timings. With a 30ms cadence and 400ms ceiling that's up to ~13 polls, so worst-case ~200ms of AX work concentrated in the rare hung case — still well within the AX poll interval budget elsewhere in the pipeline.
  • Selection-change detection uses selection.location only. Length changes during a multi-key paste won't trigger the gate by themselves, but the trailing text change will, so behavior is preserved.

Greptile Summary

This PR replaces the fixed 150ms post-keystroke reschedule delay with a polling gate that captures the AX state (preceding text, element identity, selection) at keystroke time and waits for any of those fields to change before calling schedulePrediction(), falling back to a 400ms ceiling if the host never publishes.

  • The core logic — snapshot baseline, poll every 30ms via asyncAfter, fire on first detected change — is sound and directly addresses the oij3 prefix stale-context regression from [Bug] The app is blocking typing than it suddenly crashes/close #381.
  • The first pollForHostPublish call fires synchronously inside handleInputEvent, adding a 5–15ms synchronous AX read to every keystroke's event-handler path; since the CGEvent tap fires before Chrome processes the key this first read almost always finds no change and is wasted.
  • Each schedulePredictionAfterHostPublishDelay call spawns an independent 400ms chain; with rapid typing multiple chains run concurrently, multiplying the AX refresh rate beyond the single-keystroke budget analysis in the PR description.

Confidence Score: 4/5

Safe to merge for the targeted regression fix; the polling logic is correct and the ceiling prevents pipeline hangs, though the synchronous first AX read and potential for concurrent loops in rapid typing are worth a follow-up.

The change correctly solves the stated problem and all three call sites are covered. The main concerns are performance-shaped rather than correctness-shaped: a wasted synchronous AX read on every keystroke handler and unbounded concurrent poll loops during rapid typing bursts. Neither breaks prediction correctness, but the concurrent-loops case multiplies AX work beyond what the PR's budget analysis accounts for.

Cotabby/App/Coordinators/SuggestionCoordinator+Input.swift — specifically the synchronous first poll in schedulePredictionAfterHostPublishDelay and the lack of a guard against concurrent polling chains.

Important Files Changed

Filename Overview
Cotabby/App/Coordinators/SuggestionCoordinator+Input.swift Replaces the fixed 150ms post-keystroke reschedule delay with a polling loop that gates on AX snapshot changes; logic is sound but the first poll fires synchronously (blocking the main actor with an AX read on every keystroke), and multiple overlapping loops accumulate for rapid typing bursts.

Sequence Diagram

sequenceDiagram
    participant CGEventTap as CGEvent Tap
    participant HC as handleInputEvent (MainActor)
    participant SPHD as schedulePredictionAfterHostPublishDelay
    participant PFH as pollForHostPublish
    participant FM as focusModel.refreshNow()
    participant Chrome as Chrome AX

    CGEventTap->>HC: keystroke event
    HC->>SPHD: "(elapsedMs=0, synchronous)"
    SPHD->>SPHD: "baseline = focusModel.snapshot.context"
    SPHD->>PFH: "pollForHostPublish(baseline, elapsedMs=0)"
    PFH->>FM: refreshNow() [5-15ms, SYNCHRONOUS]
    FM-->>PFH: snapshot (still pre-keystroke)
    PFH->>PFH: "textChanged=false, schedule next poll (30ms)"

    Note over Chrome: Chrome processes keystroke, publishes AX update

    PFH->>FM: "refreshNow() [t=30ms]"
    FM-->>PFH: snapshot (post-keystroke)
    PFH->>PFH: "textChanged=true, schedulePrediction()"

    alt Ceiling path (host never publishes)
        loop "every 30ms until nextElapsed >= 400"
            PFH->>FM: refreshNow()
            FM-->>PFH: snapshot (unchanged)
        end
        PFH->>PFH: schedulePrediction() at ceiling (~390ms)
    end
Loading

Fix All in Codex Fix All in Claude Code

Reviews (1): Last reviewed commit: "Wait for the host's AX publish before re..." | Re-trigger Greptile

Greptile also left 3 inline comments on this PR.

PR #376 added a fixed 150ms delay before the post-divergence reschedule so Chromium-based editors had time to publish updated AX text. The logs in #381 show the publish lag can exceed that ceiling on a busy page — `work=93` and `work=95` both generated against the same `oij3` prefix because Chrome hadn't published the user's new keystroke by the time the deferred reschedule fired. The rescheduled suggestion appears with stale context, which reads to the user as their character being swallowed.

Replace the fixed delay with a snapshot-changed gate: capture the focused element identity, preceding text, and selection at keystroke time, then poll `focusModel.snapshot` every 30ms until any of those move on. Fires `schedulePrediction()` as soon as the host has actually published, with a 400ms ceiling so a silent host never hangs the pipeline (we generate against whatever's there once the ceiling is hit, matching the old fixed-delay behavior). All three call sites in the input handler use the same helper, so this fix carries to both the active-session and no-session paths.
@FuJacob FuJacob merged commit ac1f3ee into main May 28, 2026
4 checks passed
@FuJacob FuJacob deleted the fix/chrome-ax-snapshot-gate branch May 28, 2026 11:33
Comment on lines +212 to +218
private func pollForHostPublish(
baselineText: String?,
baselineElementID: String?,
baselineSelectionLocation: Int?,
elapsedMs: Int
) {
focusModel.refreshNow()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 First poll fires synchronously, blocking the main actor

pollForHostPublish is called synchronously inside handleInputEvent (both are @MainActor), so focusModel.refreshNow() — a full AX tree walk of 5–15ms — runs before the main run loop returns from keyboard event handling. Since the CGEvent tap fires before Chrome processes the keystroke, this first refresh will almost always see unchanged text, making the read wasted. The old asyncAfter(150ms) design never touched AX during event delivery. With rapid typing this also means one synchronous AX read is added to every single keystroke's event-handler cost.

A simple fix is to skip the synchronous call entirely and start with a 30ms asyncAfter (same as every subsequent poll), since there is no information to be gained at t=0 before Chrome has processed the key.

Fix in Codex Fix in Claude Code

Comment on lines 199 to +207
private func schedulePredictionAfterHostPublishDelay() {
DispatchQueue.main.asyncAfter(deadline: .now() + .milliseconds(150)) { [weak self] in
guard let self else { return }
self.focusModel.refreshNow()
self.schedulePrediction()
let baseline = focusModel.snapshot.context
pollForHostPublish(
baselineText: baseline?.precedingText,
baselineElementID: baseline?.elementIdentifier,
baselineSelectionLocation: baseline?.selection.location,
elapsedMs: 0
)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Concurrent poll loops accumulate for rapid typing

Each call to schedulePredictionAfterHostPublishDelay (invoked on every shouldSchedulePrediction keystroke across all three call sites) spawns an independent 400ms polling chain. Rapid typing — say 8 keys in 200ms — leaves up to 8 simultaneous loops each calling focusModel.refreshNow() every 30ms. That is ~8× the budgeted AX read rate during a burst, and refreshNow() resets the tracker's idle backoff on every call, so the normal back-off never gets a chance to engage. The PR's "up to ~13 polls" budget analysis only covers the single-keystroke case.

Consider tracking whether a poll loop is already in flight (e.g., a Bool flag set in schedulePredictionAfterHostPublishDelay and cleared when schedulePrediction() fires) and returning early when one is already running. The replaceDebouncedWork inside schedulePrediction() already collapses generations, so only the most recent baseline matters anyway.

Fix in Codex Fix in Claude Code

Comment on lines +234 to +238
let nextElapsed = elapsedMs + Self.hostPublishPollIntervalMs
guard nextElapsed < Self.hostPublishWaitCeilingMs else {
schedulePrediction()
return
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 elapsedMs tracks logical intervals, not wall time

elapsedMs increments by 30ms per iteration regardless of how long each AX read takes. Since refreshNow() resolves in 5–15ms, the real wall-clock duration of the ceiling path is up to ≈390ms + (13 polls × 15ms) ≈ 585ms — substantially more than the 400ms stated in comments and the PR description. The discrepancy matters when callers reason about "how long can a stale prediction be generated" in the dead-key / non-publishing host path.

Fix in Codex Fix in Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant