Cache integration event recipients and pace broker sends#112
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
CodeAnt AI is reviewing your PR. |
|
Warning Review limit reached
More reviews will be available in 41 minutes and 23 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (4)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
kjgbot
left a comment
There was a problem hiding this comment.
claude-reviewer-2 — Track E review (head 30f49da, base 162dc77)
Comment-review per shared-bot-account limitation; verdict line binding per the issue #82 team charter.
Scope conformance: PASS. Recipient/target caching, per-project send pacing, telemetry, and the minimal IPC invalidation hooks — all Track E. No dispatcher-semantics drift: pre-enqueue filtering, summary compaction, and the single-preview composition from #106/#105 are untouched.
Acceptance ledger (locked with claude-reviewer-1 during the #106 rounds) — all four items covered:
- ✅ Broker-send bound:
ProjectBrokerSendPacercaps 25 sends/sec/project with window-based drain. Critically, boundedness is end-to-end by construction: the dispatcherawaitsdeliverItem→injectEventawaits each paced send sequentially, so at most one event's recipient fan-out occupies the pacer queue (bounded by roster size R); upstream overflow lands in the dispatcher's capped queue (50) + summary compaction witheventsDroppedaccounting. Verified the await chain explicitly. Note the intentional consequence: under full fan-out the effective event rate becomes ~25/R events/sec — correct backpressure, absorbed by compaction. Pacing test: 26 recipients → second window ≥900ms later, counters verified. - ✅ Confirmation-stall disposition: integration-event delivery deliberately uses
sendMessageacceptance, neversendMessageAndWaitForDelivery— code comment documents the head-of-line rationale; test asserts the confirmation path is never called while sends drain. Accepted as the designed disposition per the ledger. One ask: the #82 note should state the trade-off explicitly (this path drops the #96/#101-era delivery-confirmation semantics in favor of broker-acceptance + pacing) so a future "agent missed an event" investigation finds the decision. - ✅ Cache invalidation at the right points: matched-spec target cache invalidated on reconcile (pre-close) and close; the cache key embeds the target values themselves, so a stale entry can never yield wrong targets. Roster cache: 2s TTL + in-flight refresh dedupe + explicit invalidation on spawn/spawn-persona/release IPC. The spawn-invalidation + TTL-expiry test covers exactly the staleness contract I pinned (post-spawn agent visible within bound).
- ✅ Issue criteria: burst test proves one
listAgentsper cache window (10 events → 1 call); sends bounded AND observable (brokerSends/brokerSendsDeferredcounters +brokerSendQueueDepthgauge, wired through shared IPC types single-sourced as required).
Non-blocking notes:
- Stale-write-after-invalidate race: an in-flight
listAgentsthat resolves after an explicit invalidation re-populates the cache with the pre-spawn roster; staleness is still bounded by the 2s TTL, so harmless — a one-line comment would prevent a future "fix" from removing the TTL safety net. - Teardown telemetry gap: pacer
dispose()rejects queued sends → those events end asevent delivery failedaggregated warns without aneventsDroppedincrement (injectEvent aborts before the injected counter too). Cosmetic at close-time; worth folding into the #108 bounded-flush follow-up rather than this PR. notificationTargetCachegrowth: keyed by matched-spec combination; combinatorial worst case is theoretical and reconcile-invalidated — fine, but a size cap costs one line if you're touching it again.- Rebase incoming: main moved to 94012d3 (#111 merged, which touched the bridge's writeback heuristic) — expect DIRTY; standing convention applies (clean → note; semantic composition → SHA pair).
CI at 30f49da: checks ✅, CodeRabbit ✅, packaged-mcp-smoke + cubic pending at review time. Author validation: bridge 40/40, npm test 66/66, build, drift-check, diff-check.
Verdict: APPROVE (content-based at head 30f49da)
Merge gates: full green at the merge head + the post-#111 rebase under the standing convention. After merge: Track E fix note on #82 (include the confirmation-disposition trade-off and the #108 pacing-layer relationship) — that closes the final implementation track of issue #82.
| .then((agents) => { | ||
| this.projectAgentRecipientCache.set(projectId, { | ||
| agents, | ||
| expiresAt: Date.now() + PROJECT_AGENT_RECIPIENT_CACHE_TTL_MS | ||
| }) | ||
| return agents |
There was a problem hiding this comment.
Suggestion: The cached agent list is written unconditionally when a pending listAgents call resolves, even if the cache was explicitly invalidated after that request started. This creates a race where stale recipients can be re-cached for the full TTL right after spawn/release invalidation. Track a generation/version token (or keep the pending entry and validate identity) so stale in-flight results are discarded instead of repopulating the cache. [race condition]
Severity Level: Major ⚠️
- ❌ Integration events may ignore newly spawned agents for short periods.
- ⚠️ Released agents might still receive events after being torn down.Steps of Reproduction ✅
1. When an integration event is injected, `IntegrationEventBridge.injectEvent()` at
`src/main/integration-event-bridge.ts:1934-1996` calls `recipientsForMatchedSpecs()`
(`:1998-2022`), which in turn calls `listProjectAgentsCached(projectId, bridge)` at
`:2041-2070` whenever `needsProjectAgents` is true.
2. `listProjectAgentsCached()` starts an async refresh with `const pending =
bridge.listAgents(projectId)` at `:2047`, then stores a pending entry in
`projectAgentRecipientCache` at `:2064-2068` and finally, when the promise resolves,
unconditionally writes the result back into the cache in the `.then((agents) => { ... })`
block at `:2053-2058`, setting `expiresAt` to `Date.now() +
PROJECT_AGENT_RECIPIENT_CACHE_TTL_MS`.
3. Separately, the IPC layer in `src/main/ipc-handlers.ts` calls
`integrationEventBridge.invalidateProjectAgentCache(projectId)` after agent roster
changes: in `broker:spawn-agent` (`:212-215`), `broker:spawn-persona` (`:222-224`), and
`broker:release-agent` (`:291-293`), which delete the cache entry for that project while
the `listAgents` call may still be in flight.
4. If an integration event triggers `listProjectAgentsCached()` (starting
`bridge.listAgents(projectId)`) and then, before that promise resolves, the user spawns or
releases an agent for the same project via one of the IPC handlers above,
`invalidateProjectAgentCache(projectId)` deletes the cache entry, but when the original
`listAgents` promise resolves later, the `.then` at `:2053-2058` re-populates
`projectAgentRecipientCache` with the stale pre-change agent list for a fresh TTL;
subsequent events within that TTL will route using this outdated recipient set, ignoring
newly spawned agents or including released ones.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** src/main/integration-event-bridge.ts
**Line:** 2053:2058
**Comment:**
*Race Condition: The cached agent list is written unconditionally when a pending `listAgents` call resolves, even if the cache was explicitly invalidated after that request started. This creates a race where stale recipients can be re-cached for the full TTL right after spawn/release invalidation. Track a generation/version token (or keep the pending entry and validate identity) so stale in-flight results are discarded instead of repopulating the cache.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix|
|
||
| async function waitForSent(harness: { sent: SentMessage[] }, count: number): Promise<void> { | ||
| const deadline = Date.now() + 1_000 | ||
| async function waitForSent(harness: { sent: SentMessage[] }, count: number, timeoutMs = 1_000): Promise<void> { |
There was a problem hiding this comment.
Suggestion: This timeout helper relies on Date.now(), but several tests monkeypatch Date.now to a fixed value; when that happens, the deadline never advances and this loop can hang forever if messages are not sent. Use a monotonic real-time source for timeout checks (or assert with a bounded attempt count) so the wait always terminates. [logic error]
Severity Level: Major ⚠️
- ❌ Failing integration-event tests can hang indefinitely when clock mocked.
- ⚠️ CI test runs may stall waiting for hung promises.Steps of Reproduction ✅
1. In `src/main/__tests__/integration-event-bridge.test.ts:216-221`, `waitForSent`
computes a deadline using `const deadline = Date.now() + timeoutMs` and then loops while
`harness.sent.length < count && Date.now() < deadline`, sleeping 10ms per iteration.
2. The test `integration event agent cache invalidates for newly spawned agents and
expires briefly` at `src/main/__tests__/integration-event-bridge.test.ts:1318-1355`
overrides `Date.now` to a fixed function (`Date.now = () => now`) for the entire test,
only mutating the `now` variable manually.
3. Inside that test, after setting up the bridge and emitting events, multiple calls to
`waitForSent(harness, expectedCount)` are made while `Date.now` is mocked (`lines
1334-1335`, `1341-1342`, `1348-1349`), so the deadline is computed with a fixed base time
and `Date.now()` inside the loop does not advance unless the test code explicitly changes
`now`.
4. If a regression in the integration pipeline (e.g. in `injectEvent()` at
`src/main/integration-event-bridge.ts:1934-1996` or `sendBrokerMessage()` at
`src/main/integration-event-bridge.ts:2072-2087`) causes messages not to be sent (so
`harness.sent.length` never reaches the requested `count`), then under the mocked
`Date.now` the loop condition `Date.now() < deadline` never becomes false and
`waitForSent` spins indefinitely, hanging the test run instead of timing out after
`timeoutMs`.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** src/main/__tests__/integration-event-bridge.test.ts
**Line:** 216:221
**Comment:**
*Logic Error: This timeout helper relies on `Date.now()`, but several tests monkeypatch `Date.now` to a fixed value; when that happens, the deadline never advances and this loop can hang forever if messages are not sent. Use a monotonic real-time source for timeout checks (or assert with a bounded attempt count) so the wait always terminates.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fixThere was a problem hiding this comment.
2 issues found across 4 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/main/__tests__/integration-event-bridge.test.ts">
<violation number="1" location="src/main/__tests__/integration-event-bridge.test.ts:217">
P2: `waitForSent` uses `Date.now()` for its timeout deadline, but the test at line ~1318 monkeypatches `Date.now` to a fixed value. If the expected messages are never sent (e.g. due to a regression), the condition `Date.now() < deadline` will always be true and this loop hangs forever instead of timing out. Use a monotonic real-time source (e.g. `performance.now()`) or a bounded iteration count so the wait always terminates regardless of clock mocking.</violation>
</file>
<file name="src/main/integration-event-bridge.ts">
<violation number="1" location="src/main/integration-event-bridge.ts:2054">
P1: Race condition: the `.then()` callback unconditionally repopulates `projectAgentRecipientCache` when the `listAgents` promise resolves, even if `invalidateProjectAgentCache` was called while the request was in flight. This allows stale agent lists to overwrite invalidation and persist for the full 2s TTL, causing events to be routed to released agents or miss newly spawned ones. Guard the write with a generation/version check so that in-flight results are discarded when the cache was invalidated after the request started.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| .map((agent) => agent.name) | ||
| )) | ||
| .then((agents) => { | ||
| this.projectAgentRecipientCache.set(projectId, { |
There was a problem hiding this comment.
P1: Race condition: the .then() callback unconditionally repopulates projectAgentRecipientCache when the listAgents promise resolves, even if invalidateProjectAgentCache was called while the request was in flight. This allows stale agent lists to overwrite invalidation and persist for the full 2s TTL, causing events to be routed to released agents or miss newly spawned ones. Guard the write with a generation/version check so that in-flight results are discarded when the cache was invalidated after the request started.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/main/integration-event-bridge.ts, line 2054:
<comment>Race condition: the `.then()` callback unconditionally repopulates `projectAgentRecipientCache` when the `listAgents` promise resolves, even if `invalidateProjectAgentCache` was called while the request was in flight. This allows stale agent lists to overwrite invalidation and persist for the full 2s TTL, causing events to be routed to released agents or miss newly spawned ones. Guard the write with a generation/version check so that in-flight results are discarded when the cache was invalidated after the request started.</comment>
<file context>
@@ -1858,13 +1990,101 @@ export class IntegrationEventBridge {
+ .map((agent) => agent.name)
+ ))
+ .then((agents) => {
+ this.projectAgentRecipientCache.set(projectId, {
+ agents,
+ expiresAt: Date.now() + PROJECT_AGENT_RECIPIENT_CACHE_TTL_MS
</file context>
| const deadline = Date.now() + timeoutMs | ||
| while (harness.sent.length < count && Date.now() < deadline) { |
There was a problem hiding this comment.
P2: waitForSent uses Date.now() for its timeout deadline, but the test at line ~1318 monkeypatches Date.now to a fixed value. If the expected messages are never sent (e.g. due to a regression), the condition Date.now() < deadline will always be true and this loop hangs forever instead of timing out. Use a monotonic real-time source (e.g. performance.now()) or a bounded iteration count so the wait always terminates regardless of clock mocking.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/main/__tests__/integration-event-bridge.test.ts, line 217:
<comment>`waitForSent` uses `Date.now()` for its timeout deadline, but the test at line ~1318 monkeypatches `Date.now` to a fixed value. If the expected messages are never sent (e.g. due to a regression), the condition `Date.now() < deadline` will always be true and this loop hangs forever instead of timing out. Use a monotonic real-time source (e.g. `performance.now()`) or a bounded iteration count so the wait always terminates regardless of clock mocking.</comment>
<file context>
@@ -196,16 +205,16 @@ function makeHarness(
-async function waitForSent(harness: { sent: SentMessage[] }, count: number): Promise<void> {
- const deadline = Date.now() + 1_000
+async function waitForSent(harness: { sent: SentMessage[] }, count: number, timeoutMs = 1_000): Promise<void> {
+ const deadline = Date.now() + timeoutMs
while (harness.sent.length < count && Date.now() < deadline) {
await new Promise((resolve) => setTimeout(resolve, 10))
</file context>
| const deadline = Date.now() + timeoutMs | |
| while (harness.sent.length < count && Date.now() < deadline) { | |
| const start = performance.now() | |
| while (harness.sent.length < count && performance.now() - start < timeoutMs) { |
|
CodeAnt AI finished reviewing your PR. |
|
Reviewed PR #112 against No validated breakage found in the current checkout, so I did not leave any source edits. I installed dependencies locally to verify, then restored generated build/install artifacts. Local verification run:
|
30f49da to
4779c12
Compare
kjgbot
left a comment
There was a problem hiding this comment.
claude-reviewer-2 — clean rebase verified, approval carries (30f49da → 4779c12, onto post-#111 main 4f20552): independent git range-diff 162dc77..30f49da origin/main..4779c12 reports 1: 30f49da = 1: 4779c12 patch-identical; no semantic composition from #111. APPROVE stands at head 4779c12 per the standing convention — no re-review needed. Author validation includes the #111 nested non-numeric Slack regression (41/41 bridge, 67/67 full). Merge gates: full green at 4779c12 (checks/smoke restarted at my check). On merge: Track E fix note on #82 — including the confirmation-disposition trade-off and the #108 pacing-layer relationship — closes the final implementation track of issue #82.
4779c12 to
af877c1
Compare
kjgbot
left a comment
There was a problem hiding this comment.
claude-reviewer-2 — delta re-check 4779c12 → af877c1: verified by direct diff. Changes are exactly (1) the large-burst compaction test normalizing brokerSendsDeferred (a legitimately timing-window-dependent counter) while keeping every other counter assertion exact — and the dedicated 26-recipient pacing test still pins brokerSendsDeferred exactly, so pacing coverage is not weakened; (2) the 2-line stale-write-after-invalidate comment from my non-blocking note. Zero production-logic change. (Cosmetic only: assert.equal(telemetry.brokerSendsDeferred >= 0, true) is vacuously true — <= 1 would pin the flake range — not worth a round.) APPROVE stands at head af877c1. Merge on full green at this head; then the Track E fix note on #82 closes the run's final implementation track.
|
Addressed the validated review findings:
Validation run locally:
|
|
Reviewed PR #112 against the current checkout. I did not find a validated defect that needed a code change. Verification run:
Build warnings were existing Vite dynamic/static import chunking warnings, not failures. |
|
ℹ️ pr-reviewer: review only — no file changes were applied to the PR (nothing to commit after review). The notes below are advisory and were not pushed. Reviewed PR #112 against the current checkout. I did not find a validated defect that needed a code change. Verification run:
Build warnings were existing Vite dynamic/static import chunking warnings, not failures. |
User description
Summary
Implements issue #82 Track E on top of the merged Track B dispatcher baseline.
Design notes
Track B bounds events/sec, but fanout can still produce 25 x recipient-count broker sends/sec. Track E adds an independent per-project broker-send bound. This also becomes the pacing layer for the residual teardown-send ledger discussed in follow-up #108 if bounded flush-on-close is implemented later.
Project-agent recipient cache TTL is 2s: Pear spawn/persona/release invalidates immediately, while externally-created agents become eligible within a short staleness window. Explicit notification targets are cached longer because they are derived from matched integration specs and are invalidated by integration reconcile/close.
Delivery confirmation is intentionally not awaited here. Integration-event dispatch is paced on broker send acceptance; waiting up to the broker's 15s delivery-confirmation timeout per send would serialize unrelated later events behind a slow receipt path.
Tests
node --experimental-strip-types --no-warnings --test src/main/__tests__/integration-event-bridge.test.ts(40/40)npm test(66/66)npm run buildnpm run verify:mcp-resources-driftgit diff --checkFixes #82 (Track E)
CodeAnt-AI Description
Cache integration recipients and pace broker sends
What Changed
Impact
✅ Fewer repeated agent lookups during event bursts✅ Smoother notifications for large projects✅ Less delay when many integration events arrive at once💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.