fix(ui): harden permission event reconciliation#422
Conversation
Permission prompts could remain visible or reappear when SSE events arrived out of order: local replies only cleared the global queue, duplicate permission updates were discarded even when they carried better metadata, and delayed permission updates could resurrect already-handled requests. Keep the global permission queue and message-v2 store in sync after successful replies, tombstone recently replied permission IDs to ignore stale updates, update duplicate permission payloads and pending counters, and reconcile permissions whenever tool parts arrive just like questions. Message-v2 now keeps one attachment per permission ID and recalculates active permission ordering from the queue so inline tool-call, timeline, and permission-center state converge after out-of-order events. Validated with git diff --check, npm run typecheck --workspace @codenomad/ui, and npm run build --workspace @codenomad/ui.
YOLO auto-accept was driven by an InstanceShell effect, so permission replies could be delayed by heavy message rendering or permission UI churn even though the mode itself is session-scoped and always replies once. Move the drain trigger into the permission queue flow and run it from permission sync, live permission enqueue, and the status-tab toggle. Keep the product behavior unchanged: YOLO remains per session and sends only allow-once replies. Add a bounded retry backoff for failed auto-accept replies so transient failures can recover without immediate reactive retry loops that can amplify UI freezes. Validated with git diff --check, npm run typecheck --workspace @codenomad/ui, and npm run build --workspace @codenomad/ui.
|
PR builds are available as GitHub Actions artifacts: https://github.com/NeuralNomadsAI/CodeNomad/actions/runs/25610676103 Artifacts expire in 7 days.
|
Keep duplicate permission events from downgrading richer payloads so late or partial SSE updates do not detach pending approvals from their session, message, or tool metadata. Preserve optimistic user message parts when temporary message IDs are replaced, then remove matching synthetic parts only after official server parts arrive. This avoids dropping resolved attachments during event reordering without leaving duplicate user input behind. Clean up YOLO auto-accept retries when permissions are removed, queues are cleared, or per-session auto-accept is disabled, while keeping the behavior session-scoped and replying with once-only approvals. Validated with git diff --check, UI typecheck, UI build, and a final read-only gatekeeper review.
|
Final gatekeeper pass is clear after the latest fixes. Validated locally:
Latest commit pushed: |
|
I want you to perform the gatekeeper checks for latest PR state check for regressions |
|
Gatekeeper review for latest PR state complete. What I understood: you wanted a fresh, ruthless read-only review of PR #422’s current head, focused on regressions and whether the permission/message-v2/YOLO implementation is sound. I treated this as a review only; no code changes were made. Findings: No blocking regressions found. Non-blocking quality notes:
Regression review notes:
Validation run:
Build notes:
Worktree note:
Gatekeeper verdict: clear to proceed. The implementation is more complex than ideal, but the complexity is targeted at real event-ordering failures and I do not see a simpler safe implementation that preserves the same behavior without reintroducing stale permission/UI races. -- |
Keep replied permission tombstones bounded in long-lived UI sessions by pruning expired entries whenever replies are marked or checked, and clear the map when an instance is removed. Also removes a dead local variable from the permission removal path noted during gatekeeper review. The permission merge helper duplication is intentionally left untouched to avoid adding cross-store churn in this PR. Validated with git diff --check, UI typecheck, and UI build.
|
Addressed the non-blocking cleanup notes from the latest gatekeeper pass in
I intentionally left the duplicate permission merge helper in place for this PR because extracting a shared helper would touch both queue/message-v2 boundaries again and add churn after the behavior is already validated. If this logic grows further, a shared helper is the right follow-up. Revalidated:
|
|
PR builds are available as GitHub Actions artifacts: https://github.com/NeuralNomadsAI/CodeNomad/actions/runs/25611188333 Artifacts expire in 7 days.
|
1 similar comment
|
PR builds are available as GitHub Actions artifacts: https://github.com/NeuralNomadsAI/CodeNomad/actions/runs/25611188333 Artifacts expire in 7 days.
|
|
@CodeNomadBot |
|
Fresh gatekeeper review after What I understood: rerun the same read-only gatekeeper review after the cleanup commit that addressed the prior non-blocking notes. I reviewed the latest branch state only; no code changes were made. Findings: No blocking regressions found. Prior cleanup notes status:
New non-blocking note:
Regression review notes:
Validation run:
Build notes:
Gatekeeper verdict: clear to proceed. The cleanup commit resolves the actionable notes from the previous review, and I do not see a blocker in the latest PR state. -- |
|
@pascalandr - Were you able to reproduce this issue? |
|
we already had this discussion #367 (comment) I already had all kind of buggy behavior with permission this is why I have dig into it and found several issues. |
|
TBH, this is touching too many core parts and I am not sure why synthetic parts are touched while fixing permissions? |
|
Well, I described the permission problems I had observed, then explained what I thought could be causing them and looked for existing bugs or weak spots matching those behaviors. |
|
I split this broad PR into smaller targeted PRs for better review:
For reminder, these are issues I have personally hit multiple times. I cannot reproduce them deterministically on demand because they happen when the UI is under heavy pressure, but they happened often enough to reveal a consistent pattern. |
## Summary - Split out from #422 as the YOLO-only permission fix. - Moves YOLO auto-accept draining out of `InstanceShell` render effects and into the permission queue flow. - Keeps behavior unchanged: YOLO remains per-session and replies with `once` only. - Keeps in-flight cleanup for auto-accept attempts so duplicate sends are guarded outside UI render timing. ## Why In YOLO mode, the app could still appear to wait on a permission or block the UI, even though auto-accept was enabled and the permission was already queued. The auto-accept drain was tied to an `InstanceShell` render effect, so the reply path depended on a specific UI shell rendering. Draining from permission sync/enqueue and from the YOLO toggle makes auto-accept run from the permission queue itself instead of UI render timing. ## Validation - `git diff --check` - `npm run typecheck --workspace @codenomad/ui`
## Summary - Split out from #422 as the stale-permission-events fix. - Clears permission UI state immediately after a successful local permission reply instead of waiting for `permission.replied` SSE. - Tracks replied permission IDs until a newer pending-permission sync observes that the server no longer reports them pending. - Marks SSE `permission.replied` events into the same replied-ID path so delayed pending events/sync results cannot resurrect prompts that were just answered. ## Why A permission prompt could remain on screen after clicking Allow/Deny, or clicking Allow could look like it did not take effect and require another click. Reloading could fix the UI, which pointed to stale local permission state rather than the server still waiting. The UI receives permission state from local replies, SSE events, and pending-permission sync. If an older pending event or sync result is processed after a confirmed reply, the UI can re-add a permission that was already answered. Replied IDs stay suppressed until a sync started after the local reply proves the server has dropped that permission from the pending list. ## Validation - `git diff --check` - `node --test packages/ui/src/stores/permission-replies.test.ts` - `npm run typecheck --workspace @codenomad/ui`
## Summary - Split out from #422 as the permission/tool-call reconciliation fix. - Reconciles pending permissions whenever live tool parts update, matching the existing question re-link path. - Keeps one message-v2 attachment per server permission ID and recalculates the active permission from queue order. - Conservatively merges duplicate or out-of-order permission updates so known session/message/tool routing metadata is not lost. - Fixes #290 ## Why The observed failure shape is that permission prompts can appear missing, frozen, or attached in unexpected places when permission events and tool-call parts are observed in different orders. In those cases, the server-side permission may exist, but the UI can temporarily attach it globally, attach it to the wrong tool location, or fail to move it when the matching tool part arrives later. This PR focuses on the UI-side attachment/order problem: one UI attachment per server permission ID, re-linking permissions when tool parts arrive, and preserving known routing metadata across duplicate/out-of-order updates. It does not attempt semantic deduplication across different permission IDs that happen to ask for the same logical approval. ## Validation - `git diff --check` - `npm exec --no -- tsx --test packages/ui/src/types/permission.test.ts packages/ui/src/stores/message-v2/instance-store.test.ts` - `node --test packages/ui/src/stores/permission-replies.test.ts` - `npm run typecheck --workspace @codenomad/ui`
Fixes #290
Supersedes #367
Summary
Bugs and Fragile Flows Fixed
permission.repliedSSE:sendPermissionResponse()removed only the global permission queue entry, so if the replied event was delayed or missed, inline tool-call UI could keep showing a stale permission.permission.updatedfor an already accepted permission could re-add it locally until reload corrected state frompermission.list().message.part.updatedre-linked questions, but permissions depended on narrower incidental rebind paths or full message reload.message.updatedcould replace a synthetic message ID before user part events arrived and clear the optimistic parts, including resolved pasted/file attachment content.instance-shell2, so heavy message rendering or permission UI churn could delay replies even though the request was already queued.Implementation
instances.tsandmessage-v2after successful replies so the UI does not depend on receivingpermission.replied.reconcilePendingPermissionsV2()for every live tool part update.message-v2enforce onebyMessageattachment per permission ID and recalculate active permission from the sorted queue.partIdexists but whosebyMessageattachment was lost.permission-auto-accept.tsand trigger it from permission sync, live permission enqueue, and the Status-tab toggle.YOLO Mode
onceonly.Validation
git diff --checknpm run typecheck --workspace @codenomad/uinpm run build --workspace @codenomad/uiNotes
packages/uidoes not currently define a test script/runner, so this PR is validated with typecheck/build rather than new automated unit tests.virtuaJSX import-source warning and large chunk warnings.