Skip to content

Use one-shot latest-event lookup to seed mount cursor#196

Merged
khaliqgant merged 3 commits into
mainfrom
codex/issue-926-events-latest
May 22, 2026
Merged

Use one-shot latest-event lookup to seed mount cursor#196
khaliqgant merged 3 commits into
mainfrom
codex/issue-926-events-latest

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

Summary

`resolveLatestEventCursor` walks `/fs/events` page by page (limit=1000) just to find the workspace's tail event id. At ~437 ms per page, the 20 s `cursorTimeout` (default) buys ~50 pages = ~50 k events; workspaces past that threshold time out on every cycle. The post-bootstrap fast path therefore never engages on busy workspaces and the daemon stays stuck in degraded periodic-full-reconcile mode.

Live evidence on `rw_fc7b534b`:

  • Events feed: 132,972 events
  • Pre-fix: `eventsCursor: None` for 6+ hours despite `bootstrap: True`; each cycle logged `restart fast-path: cursor resolution failed (context deadline exceeded); falling through to full pull`.
  • Post-fix (this PR, freshly built binary): `eventsCursor: evt_132972` populated in ~milliseconds on daemon startup.

Change

  • Add `LatestEventID(ctx, workspaceID, provider)` to `RemoteClient`. Implemented by `HTTPClient` via `/fs/events?direction=desc&limit=1` (the endpoint shape that landed in AgentWorkforce/cloud#927), plus matching fakes in `fakeClient` and `bootstrapClient`.
  • `resolveLatestEventCursor` calls `LatestEventID` first. On `400 bad_request` / `404 not_found` (older self-host cloud without `direction=desc`), it transparently falls through to the existing page-walk so the change is strictly additive against any backend.

Test plan

  • `go test ./internal/mountsync/... -count=1` — full package green.
  • New tests:
    • `TestResolveLatestEventCursorPrefersLatestEventID` — asserts the one-shot is called and ListEvents is not invoked when supported.
    • `TestResolveLatestEventCursorFallsBackOnUnsupported` — asserts a 400 from LatestEventID transparently engages the legacy page-walk and returns the same cursor.
    • Existing `TestResolveLatestEventCursorIndependentTimeout` still passes — no regression.
  • Live-verified on `rw_fc7b534b` (see Summary): cursor seeded in ms after binary swap; previously stuck at `None` for hours.

Related

  • Closes AgentWorkforce/cloud#926.
  • Consumes AgentWorkforce/cloud#927 (`fix(relayfile): support latest fs event lookup` — adds `?direction=desc` to `/fs/events`, merged 21:58Z).

🤖 Generated with Claude Code

resolveLatestEventCursor walked /fs/events page-by-page (limit=1000) to
find the workspace's tail event id. At ~437ms per page, the 20s
cursorTimeout (default) buys ~50 pages = ~50k events; workspaces past
that threshold deterministically time out every cycle. The fast path
post-bootstrap therefore never engages on busy workspaces, and the
daemon stays in degraded periodic-full-reconcile mode.

Verified on rw_fc7b534b: 132,972 events in the feed, eventsCursor was
None for 6+ hours despite bootstrap=True. After this change the cursor
seeds in ~ms via /fs/events?direction=desc&limit=1 (cloud#927).

Adds LatestEventID to RemoteClient with HTTPClient, fakeClient, and
bootstrapClient implementations. Tries the one-shot first; on
BadRequest / NotFound (older self-host cloud without direction=desc)
silently falls through to the legacy page-walk, so behaviour is strictly
additive against any backend.

Closes AgentWorkforce/cloud#926.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

Review Change Stack

Warning

Rate limit exceeded

@khaliqgant has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 53 minutes and 30 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8bd1d7fa-b627-4c1e-9dfc-19f8afd9b648

📥 Commits

Reviewing files that changed from the base of the PR and between 4af7383 and c3a4c5b.

📒 Files selected for processing (3)
  • internal/httpapi/server.go
  • internal/httpapi/server_test.go
  • internal/relayfile/store.go
📝 Walkthrough

Walkthrough

Adds a one-shot LatestEventID API to RemoteClient and HTTPClient (GET events?direction=desc&limit=1), server/store support for descending tail queries, updates resolveLatestEventCursor to prefer the one-shot call with 400/404 fallback to ListEvents, and extends tests and fakes.

Changes

LatestEventID API and cursor resolution

Layer / File(s) Summary
RemoteClient LatestEventID contract
internal/mountsync/syncer.go
RemoteClient gains LatestEventID to fetch the newest event id with a descending single-result query; returns "" when empty and allows 400/404 to indicate unsupported query semantics.
Server, store, and OpenAPI: descending feed support
internal/httpapi/server.go, internal/relayfile/store.go, openapi/relayfile-v1.openapi.yaml
handleEvents accepts direction (asc
HTTPClient LatestEventID and cursor resolution integration
internal/mountsync/syncer.go
HTTPClient.LatestEventID calls events endpoint with direction=desc&limit=1 and returns the first eventId or ""; resolveLatestEventCursor prefers this call and falls back to legacy ListEvents only for 400/404 errors.
Test doubles and cursor resolution tests
internal/mountsync/bootstrap_test.go, internal/mountsync/syncer_test.go, internal/mountfuse/*, internal/httpapi/server_test.go
Test doubles implement LatestEventID (configurable behavior); new tests assert prefer-LatestEventID success path and fallback when unsupported; server tests validate direction=desc&limit=1 and invalid-direction handling.

Sequence Diagram

sequenceDiagram
  participant Caller
  participant resolveLatestEventCursor
  participant HTTPClient
  participant ServerHandler
  participant Store

  Caller->>resolveLatestEventCursor: resolve cursor (workspace, provider)
  resolveLatestEventCursor->>HTTPClient: LatestEventID(ctx, workspace, provider)
  HTTPClient->>ServerHandler: GET /v1/workspaces/{id}/fs/events?direction=desc&limit=1
  ServerHandler->>Store: GetEventsTail(workspace, provider, limit=1)
  Store-->>ServerHandler: EventFeed (newest-first, maybe empty)
  alt Success (event returned)
    ServerHandler-->>HTTPClient: 200 {events: [{eventId: "latest-id"}, ...]}
    HTTPClient-->>resolveLatestEventCursor: "latest-id"
    resolveLatestEventCursor-->>Caller: "latest-id"
  else Unsupported (400/404)
    ServerHandler-->>HTTPClient: 400/404 error
    HTTPClient-->>resolveLatestEventCursor: HTTPError{Status: 400/404}
    resolveLatestEventCursor->>HTTPClient: ListEvents(ctx, ...) [paginated walk]
    HTTPClient->>ServerHandler: GET /v1/workspaces/{id}/fs/events (paginated)
    ServerHandler->>Store: GetEvents(workspace, provider, cursor, limit)
    Store-->>ServerHandler: EventFeed (asc pages)
    ServerHandler-->>HTTPClient: events pages
    HTTPClient-->>resolveLatestEventCursor: latest id from page-walk
    resolveLatestEventCursor-->>Caller: latest id
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • AgentWorkforce/relayfile#166: Modifies resolveLatestEventCursor and cursor timeout/deadline behavior; directly related to cursor-resolution changes.
  • AgentWorkforce/relayfile#92: Previously adjusted cursor-seeding/list-events probe logic; relevant to the new LatestEventID preference.

Poem

🐰 I probed the tail with one quick hop,
No more long walks where old events stop.
If servers say "I don't know that way,"
I'll backtrack kindly, continue the play.
Hoppity-sync, onward we stray.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding a one-shot LatestEventID lookup to efficiently seed the mount cursor instead of paging through events.
Description check ✅ Passed The description is directly related to the changeset, explaining the motivation (timeout issues on busy workspaces), the solution (one-shot LatestEventID via direction=desc), implementation details, testing, and live verification.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/issue-926-events-latest

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 21, 2026

Relayfile Eval Review

Run: .relayfile/evals/runs/2026-05-22T06-19-45-802Z-HEAD-provider
Mode: provider
Git SHA: 074ae50

Passed: 4 | Needs human: 0 | Reviewable: 0 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

No reviewable human-review cases captured Relayfile output.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
internal/mountsync/syncer_test.go (1)

814-846: ⚡ Quick win

Add explicit 404 unsupported fallback coverage.

This test currently verifies only the 400 path; the contract also relies on 404 triggering the same legacy fallback. Please add a sibling test for a 404 LatestEventID error to lock that behavior in.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/mountsync/syncer_test.go` around lines 814 - 846, Add a sibling unit
test that mirrors TestResolveLatestEventCursorFallsBackOnUnsupported but forces
the fakeClient to return a 404 from LatestEventID (not 400) and asserts the same
fallback behavior: call NewSyncer with the fakeClient, call
syncer.resolveLatestEventCursor, expect the returned cursor to be "evt_3",
assert client.latestEventIDCalls == 1, and assert client.listEventsCalls > 0;
name it e.g. TestResolveLatestEventCursorFallsBackOnNotFound and reuse the same
fakeClient/events setup but configure the fakeClient to produce a 404
LatestEventID response so the legacy page-walk is exercised.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@internal/mountsync/syncer_test.go`:
- Around line 814-846: Add a sibling unit test that mirrors
TestResolveLatestEventCursorFallsBackOnUnsupported but forces the fakeClient to
return a 404 from LatestEventID (not 400) and asserts the same fallback
behavior: call NewSyncer with the fakeClient, call
syncer.resolveLatestEventCursor, expect the returned cursor to be "evt_3",
assert client.latestEventIDCalls == 1, and assert client.listEventsCalls > 0;
name it e.g. TestResolveLatestEventCursorFallsBackOnNotFound and reuse the same
fakeClient/events setup but configure the fakeClient to produce a 404
LatestEventID response so the legacy page-walk is exercised.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: cebf82d9-04f3-4df2-ba1f-f4b4d86af4bb

📥 Commits

Reviewing files that changed from the base of the PR and between a264e22 and f4e401d.

📒 Files selected for processing (3)
  • internal/mountsync/bootstrap_test.go
  • internal/mountsync/syncer.go
  • internal/mountsync/syncer_test.go

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

Comment on lines +3376 to +3385
if latest, err := s.client.LatestEventID(cctx, s.workspace, s.eventProvider); err == nil {
return latest, nil
} else {
var httpErr *HTTPError
if !errors.As(err, &httpErr) || (httpErr.StatusCode != http.StatusBadRequest && httpErr.StatusCode != http.StatusNotFound) {
return "", err
}
// Fall through: older self-host cloud may not yet support
// direction=desc; degrade to the legacy page-walk.
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Self-hosted server silently ignores direction=desc, causing LatestEventID to return the oldest event instead of the latest

The HTTPClient.LatestEventID sends direction=desc&limit=1 to /fs/events (internal/mountsync/syncer.go:322), but the self-hosted server's handleEvents (internal/httpapi/server.go:1822-1830) does not read the direction query parameter — it only parses limit, provider, and cursor. The server silently ignores direction=desc and returns events in ascending order with limit=1, yielding the oldest event rather than the newest. Because no error is returned (HTTP 200 OK), the fallback to the legacy page-walk at syncer.go:3378-3385 never triggers (it only activates on HTTP 400 or 404). The syncer then stores the oldest event ID as EventsCursor (syncer.go:2228 or syncer.go:2267), causing every subsequent incremental sync cycle to replay all events from the beginning — the exact O(N) behaviour this optimisation was meant to eliminate.

Server handler that silently ignores direction

internal/httpapi/server.go:1822-1830:

func (s *Server) handleEvents(w http.ResponseWriter, r *http.Request, workspaceID, correlationID string) {
	limit := parseBoundedInt(r.URL.Query().Get("limit"), 200, 1, 1000)
	feed, err := s.store.GetEvents(workspaceID, r.URL.Query().Get("provider"), r.URL.Query().Get("cursor"), limit)
	// direction is never read ^
}
Prompt for agents
The self-hosted server in internal/httpapi/server.go handleEvents (line 1822) does not read the direction query parameter. When the client sends direction=desc&limit=1, the server silently ignores it and returns events in ascending order, yielding the oldest event instead of the newest. The fallback logic in resolveLatestEventCursor (syncer.go:3378-3385) only triggers on HTTP 400/404 errors, so a successful 200 response with wrong data is never caught.

To fix this, either:
1. Implement direction=desc support in handleEvents (server.go:1822) and store.GetEvents (store.go:1906), then update the OpenAPI spec at openapi/relayfile-v1.openapi.yaml to document the direction parameter on /fs/events.
2. Or, if the self-hosted server intentionally doesn't support direction=desc, have it return HTTP 400 when it receives an unknown direction parameter, so the client fallback triggers correctly.

Approach 1 is preferred since it fixes the performance issue for self-hosted deployments as well.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 4af7383. The self-hosted Go server's handleEvents now dispatches on direction (asc/desc/400) and Store.GetEventsTail implements the desc semantics. Server test asserts direction=desc&limit=1 returns the newest event id (not the oldest), and direction=sideways returns 400. OpenAPI updated to document the new param + the 400 response. Picked option (1) from your prompt.

…sing test fakes

Addresses CI failures and the Devin review finding on PR #196.

CI compile failures:
  Three RemoteClient fakes in internal/mountfuse (fakeRemoteClient,
  fakeLazyRemoteClient via embedding, layoutRemoteClient) did not pick
  up the new LatestEventID method. `go build ./...` does not compile
  _test.go files, which is why the worktree build was clean while CI
  failed. `go vet ./...` catches it; added LatestEventID stubs to all
  three fakes.

Silent-ignore bug (Devin review):
  HTTPClient.LatestEventID sends `direction=desc&limit=1` to /fs/events.
  The self-hosted Go server's handleEvents ignored the `direction` query
  string entirely — it returned HTTP 200 with limit=1 events in
  ASCENDING order, i.e. the OLDEST event. The fallback in
  resolveLatestEventCursor only triggers on 400/404, so self-hosted
  daemons would have silently seeded EventsCursor to the oldest event
  and replayed ALL events on every cycle — the exact O(N) behaviour
  this PR was meant to eliminate.

  Implement direction end-to-end:
    - Store.GetEventsTail returns the last `limit` events for the
      workspace (provider-filtered, newest-first). Mirrors the cloud's
      /fs/events?direction=desc semantics.
    - handleEvents dispatches on `direction`: "" / "asc" route through
      the existing GetEvents; "desc" routes through GetEventsTail;
      anything else returns 400 bad_request so a misspelled value
      surfaces loudly instead of being silently ignored (and so that
      the daemon's BadRequest fallback triggers correctly on older
      self-hosts without this change).
    - OpenAPI: document the `direction` parameter on /fs/events with
      enum [asc, desc] default asc, plus 400 in the responses.
    - Server tests: assert direction=desc&limit=1 returns the newest
      event id (not the oldest) and direction=sideways returns 400.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/relayfile/store.go`:
- Around line 1964-2004: GetEventsTail currently emits NextCursor but lacks a
cursor parameter so descending pagination repeats the newest page; update the
GetEventsTail function signature to accept a cursor (e.g., cursor *string) and
use that cursor to compute the start index into the events slice (find the index
of the event with EventID == *cursor after applying normalizeProvider/event
filtering), then page backward from that index by limit, build the chunk in
descending order as currently done, and compute NextCursor from the event
immediately before the returned range (or nil at start); ensure EventFeed and
any call-sites/handler that call GetEventsTail are updated to pass and consume
the cursor and that eventProvider/normalizeProvider filtering is applied before
locating the cursor.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8511d40e-0f19-43d0-86cd-477508a1bad7

📥 Commits

Reviewing files that changed from the base of the PR and between f4e401d and 4af7383.

📒 Files selected for processing (6)
  • internal/httpapi/server.go
  • internal/httpapi/server_test.go
  • internal/mountfuse/fuse_test.go
  • internal/mountfuse/layout_test.go
  • internal/relayfile/store.go
  • openapi/relayfile-v1.openapi.yaml

Comment thread internal/relayfile/store.go Outdated
CodeRabbit caught that the original GetEventsTail emitted a NextCursor
hint but the signature took no cursor input, so any client that
followed NextCursor on /fs/events?direction=desc would loop on the
newest page forever. The daemon's primary use case (limit=1 for cursor
seeding) was unaffected, but any other paginating caller was broken.

Take the cursor as an exclusive upper bound: only events older than the
cursored event are returned. handleEvents now forwards the
?cursor= query string. Server test asserts that a follow-up call using
page1's NextCursor returns a strictly older event id than page1's.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@khaliqgant khaliqgant merged commit d46a992 into main May 22, 2026
9 checks passed
@khaliqgant khaliqgant deleted the codex/issue-926-events-latest branch May 22, 2026 06:22
khaliqgant added a commit that referenced this pull request May 22, 2026
…tus (#199)

The desc&limit=1 assertion in TestEventsAndSyncStatus (introduced in
#196) computed `newest = ascFeed.Events[len(...)-1]` and asserted the
desc call returned the same id. That races with any background event
emission between the asc and desc calls — observed in CI failing with:

  desc&limit=1 must return newest event ("evt_3"), got "evt_4"

Switch to a race-tolerant assertion: extract the numeric ordinal from
the "evt_N" id and require desc_id >= asc_tail_id. The bug the
assertion was designed to catch — server silently ignoring direction
and returning the OLDEST event — still fails loudly (desc_id would be
1, well below the asc tail).

Verified by running TestEventsAndSyncStatus 20× consecutively, all
green.

Co-authored-by: Proactive Runtime Bot <agent@agent-relay.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant