Use one-shot latest-event lookup to seed mount cursor#196
Conversation
resolveLatestEventCursor walked /fs/events page-by-page (limit=1000) to
find the workspace's tail event id. At ~437ms per page, the 20s
cursorTimeout (default) buys ~50 pages = ~50k events; workspaces past
that threshold deterministically time out every cycle. The fast path
post-bootstrap therefore never engages on busy workspaces, and the
daemon stays in degraded periodic-full-reconcile mode.
Verified on rw_fc7b534b: 132,972 events in the feed, eventsCursor was
None for 6+ hours despite bootstrap=True. After this change the cursor
seeds in ~ms via /fs/events?direction=desc&limit=1 (cloud#927).
Adds LatestEventID to RemoteClient with HTTPClient, fakeClient, and
bootstrapClient implementations. Tries the one-shot first; on
BadRequest / NotFound (older self-host cloud without direction=desc)
silently falls through to the legacy page-walk, so behaviour is strictly
additive against any backend.
Closes AgentWorkforce/cloud#926.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughAdds a one-shot LatestEventID API to RemoteClient and HTTPClient (GET events?direction=desc&limit=1), server/store support for descending tail queries, updates resolveLatestEventCursor to prefer the one-shot call with 400/404 fallback to ListEvents, and extends tests and fakes. ChangesLatestEventID API and cursor resolution
Sequence DiagramsequenceDiagram
participant Caller
participant resolveLatestEventCursor
participant HTTPClient
participant ServerHandler
participant Store
Caller->>resolveLatestEventCursor: resolve cursor (workspace, provider)
resolveLatestEventCursor->>HTTPClient: LatestEventID(ctx, workspace, provider)
HTTPClient->>ServerHandler: GET /v1/workspaces/{id}/fs/events?direction=desc&limit=1
ServerHandler->>Store: GetEventsTail(workspace, provider, limit=1)
Store-->>ServerHandler: EventFeed (newest-first, maybe empty)
alt Success (event returned)
ServerHandler-->>HTTPClient: 200 {events: [{eventId: "latest-id"}, ...]}
HTTPClient-->>resolveLatestEventCursor: "latest-id"
resolveLatestEventCursor-->>Caller: "latest-id"
else Unsupported (400/404)
ServerHandler-->>HTTPClient: 400/404 error
HTTPClient-->>resolveLatestEventCursor: HTTPError{Status: 400/404}
resolveLatestEventCursor->>HTTPClient: ListEvents(ctx, ...) [paginated walk]
HTTPClient->>ServerHandler: GET /v1/workspaces/{id}/fs/events (paginated)
ServerHandler->>Store: GetEvents(workspace, provider, cursor, limit)
Store-->>ServerHandler: EventFeed (asc pages)
ServerHandler-->>HTTPClient: events pages
HTTPClient-->>resolveLatestEventCursor: latest id from page-walk
resolveLatestEventCursor-->>Caller: latest id
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Relayfile Eval ReviewRun: Passed: 4 | Needs human: 0 | Reviewable: 0 | Missing output: 0 | Failed: 0 | Skipped: 0 Human Review CasesNo reviewable human-review cases captured Relayfile output. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/mountsync/syncer_test.go (1)
814-846: ⚡ Quick winAdd explicit 404 unsupported fallback coverage.
This test currently verifies only the 400 path; the contract also relies on 404 triggering the same legacy fallback. Please add a sibling test for a 404
LatestEventIDerror to lock that behavior in.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/mountsync/syncer_test.go` around lines 814 - 846, Add a sibling unit test that mirrors TestResolveLatestEventCursorFallsBackOnUnsupported but forces the fakeClient to return a 404 from LatestEventID (not 400) and asserts the same fallback behavior: call NewSyncer with the fakeClient, call syncer.resolveLatestEventCursor, expect the returned cursor to be "evt_3", assert client.latestEventIDCalls == 1, and assert client.listEventsCalls > 0; name it e.g. TestResolveLatestEventCursorFallsBackOnNotFound and reuse the same fakeClient/events setup but configure the fakeClient to produce a 404 LatestEventID response so the legacy page-walk is exercised.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@internal/mountsync/syncer_test.go`:
- Around line 814-846: Add a sibling unit test that mirrors
TestResolveLatestEventCursorFallsBackOnUnsupported but forces the fakeClient to
return a 404 from LatestEventID (not 400) and asserts the same fallback
behavior: call NewSyncer with the fakeClient, call
syncer.resolveLatestEventCursor, expect the returned cursor to be "evt_3",
assert client.latestEventIDCalls == 1, and assert client.listEventsCalls > 0;
name it e.g. TestResolveLatestEventCursorFallsBackOnNotFound and reuse the same
fakeClient/events setup but configure the fakeClient to produce a 404
LatestEventID response so the legacy page-walk is exercised.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: cebf82d9-04f3-4df2-ba1f-f4b4d86af4bb
📒 Files selected for processing (3)
internal/mountsync/bootstrap_test.gointernal/mountsync/syncer.gointernal/mountsync/syncer_test.go
| if latest, err := s.client.LatestEventID(cctx, s.workspace, s.eventProvider); err == nil { | ||
| return latest, nil | ||
| } else { | ||
| var httpErr *HTTPError | ||
| if !errors.As(err, &httpErr) || (httpErr.StatusCode != http.StatusBadRequest && httpErr.StatusCode != http.StatusNotFound) { | ||
| return "", err | ||
| } | ||
| // Fall through: older self-host cloud may not yet support | ||
| // direction=desc; degrade to the legacy page-walk. | ||
| } |
There was a problem hiding this comment.
🔴 Self-hosted server silently ignores direction=desc, causing LatestEventID to return the oldest event instead of the latest
The HTTPClient.LatestEventID sends direction=desc&limit=1 to /fs/events (internal/mountsync/syncer.go:322), but the self-hosted server's handleEvents (internal/httpapi/server.go:1822-1830) does not read the direction query parameter — it only parses limit, provider, and cursor. The server silently ignores direction=desc and returns events in ascending order with limit=1, yielding the oldest event rather than the newest. Because no error is returned (HTTP 200 OK), the fallback to the legacy page-walk at syncer.go:3378-3385 never triggers (it only activates on HTTP 400 or 404). The syncer then stores the oldest event ID as EventsCursor (syncer.go:2228 or syncer.go:2267), causing every subsequent incremental sync cycle to replay all events from the beginning — the exact O(N) behaviour this optimisation was meant to eliminate.
Server handler that silently ignores direction
internal/httpapi/server.go:1822-1830:
func (s *Server) handleEvents(w http.ResponseWriter, r *http.Request, workspaceID, correlationID string) {
limit := parseBoundedInt(r.URL.Query().Get("limit"), 200, 1, 1000)
feed, err := s.store.GetEvents(workspaceID, r.URL.Query().Get("provider"), r.URL.Query().Get("cursor"), limit)
// direction is never read ^
}Prompt for agents
The self-hosted server in internal/httpapi/server.go handleEvents (line 1822) does not read the direction query parameter. When the client sends direction=desc&limit=1, the server silently ignores it and returns events in ascending order, yielding the oldest event instead of the newest. The fallback logic in resolveLatestEventCursor (syncer.go:3378-3385) only triggers on HTTP 400/404 errors, so a successful 200 response with wrong data is never caught.
To fix this, either:
1. Implement direction=desc support in handleEvents (server.go:1822) and store.GetEvents (store.go:1906), then update the OpenAPI spec at openapi/relayfile-v1.openapi.yaml to document the direction parameter on /fs/events.
2. Or, if the self-hosted server intentionally doesn't support direction=desc, have it return HTTP 400 when it receives an unknown direction parameter, so the client fallback triggers correctly.
Approach 1 is preferred since it fixes the performance issue for self-hosted deployments as well.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Good catch — fixed in 4af7383. The self-hosted Go server's handleEvents now dispatches on direction (asc/desc/400) and Store.GetEventsTail implements the desc semantics. Server test asserts direction=desc&limit=1 returns the newest event id (not the oldest), and direction=sideways returns 400. OpenAPI updated to document the new param + the 400 response. Picked option (1) from your prompt.
…sing test fakes Addresses CI failures and the Devin review finding on PR #196. CI compile failures: Three RemoteClient fakes in internal/mountfuse (fakeRemoteClient, fakeLazyRemoteClient via embedding, layoutRemoteClient) did not pick up the new LatestEventID method. `go build ./...` does not compile _test.go files, which is why the worktree build was clean while CI failed. `go vet ./...` catches it; added LatestEventID stubs to all three fakes. Silent-ignore bug (Devin review): HTTPClient.LatestEventID sends `direction=desc&limit=1` to /fs/events. The self-hosted Go server's handleEvents ignored the `direction` query string entirely — it returned HTTP 200 with limit=1 events in ASCENDING order, i.e. the OLDEST event. The fallback in resolveLatestEventCursor only triggers on 400/404, so self-hosted daemons would have silently seeded EventsCursor to the oldest event and replayed ALL events on every cycle — the exact O(N) behaviour this PR was meant to eliminate. Implement direction end-to-end: - Store.GetEventsTail returns the last `limit` events for the workspace (provider-filtered, newest-first). Mirrors the cloud's /fs/events?direction=desc semantics. - handleEvents dispatches on `direction`: "" / "asc" route through the existing GetEvents; "desc" routes through GetEventsTail; anything else returns 400 bad_request so a misspelled value surfaces loudly instead of being silently ignored (and so that the daemon's BadRequest fallback triggers correctly on older self-hosts without this change). - OpenAPI: document the `direction` parameter on /fs/events with enum [asc, desc] default asc, plus 400 in the responses. - Server tests: assert direction=desc&limit=1 returns the newest event id (not the oldest) and direction=sideways returns 400. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/relayfile/store.go`:
- Around line 1964-2004: GetEventsTail currently emits NextCursor but lacks a
cursor parameter so descending pagination repeats the newest page; update the
GetEventsTail function signature to accept a cursor (e.g., cursor *string) and
use that cursor to compute the start index into the events slice (find the index
of the event with EventID == *cursor after applying normalizeProvider/event
filtering), then page backward from that index by limit, build the chunk in
descending order as currently done, and compute NextCursor from the event
immediately before the returned range (or nil at start); ensure EventFeed and
any call-sites/handler that call GetEventsTail are updated to pass and consume
the cursor and that eventProvider/normalizeProvider filtering is applied before
locating the cursor.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 8511d40e-0f19-43d0-86cd-477508a1bad7
📒 Files selected for processing (6)
internal/httpapi/server.gointernal/httpapi/server_test.gointernal/mountfuse/fuse_test.gointernal/mountfuse/layout_test.gointernal/relayfile/store.goopenapi/relayfile-v1.openapi.yaml
CodeRabbit caught that the original GetEventsTail emitted a NextCursor hint but the signature took no cursor input, so any client that followed NextCursor on /fs/events?direction=desc would loop on the newest page forever. The daemon's primary use case (limit=1 for cursor seeding) was unaffected, but any other paginating caller was broken. Take the cursor as an exclusive upper bound: only events older than the cursored event are returned. handleEvents now forwards the ?cursor= query string. Server test asserts that a follow-up call using page1's NextCursor returns a strictly older event id than page1's. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tus (#199) The desc&limit=1 assertion in TestEventsAndSyncStatus (introduced in #196) computed `newest = ascFeed.Events[len(...)-1]` and asserted the desc call returned the same id. That races with any background event emission between the asc and desc calls — observed in CI failing with: desc&limit=1 must return newest event ("evt_3"), got "evt_4" Switch to a race-tolerant assertion: extract the numeric ordinal from the "evt_N" id and require desc_id >= asc_tail_id. The bug the assertion was designed to catch — server silently ignoring direction and returning the OLDEST event — still fails loudly (desc_id would be 1, well below the asc tail). Verified by running TestEventsAndSyncStatus 20× consecutively, all green. Co-authored-by: Proactive Runtime Bot <agent@agent-relay.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
`resolveLatestEventCursor` walks `/fs/events` page by page (limit=1000) just to find the workspace's tail event id. At ~437 ms per page, the 20 s `cursorTimeout` (default) buys ~50 pages = ~50 k events; workspaces past that threshold time out on every cycle. The post-bootstrap fast path therefore never engages on busy workspaces and the daemon stays stuck in degraded periodic-full-reconcile mode.
Live evidence on `rw_fc7b534b`:
Change
Test plan
Related
🤖 Generated with Claude Code