fix(read-path): reach complete threads, dense-second timelines, and all people in the GUI#1418
Conversation
Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Threads rendered silently incomplete because the desktop assembled them
from the channel cache instead of fetching the reply subtree. This wires
a first-class server-side thread read (`get_thread_replies` over
`thread_metadata`) through the bridge and a new Tauri command, and fixes
a correctness hole in its pagination.
The pagination bug: `get_thread_replies` keyset-ordered on
`event_created_at` alone, with `WHERE event_created_at > $cursor` and no
tiebreak. Thread replies routinely share a `created_at` second (bursty
threads), so once a page ended mid-second the cursor advanced past the
whole second and every remaining tied reply was silently dropped —
reproduced live on a seeded 40-reply thread spanning two seconds
(24 + 16): paging at limit 10 lost 14 of the 24 tied replies. That is
the exact "missed messages" class this work targets.
Fix (Tier-1, no schema change): composite `(event_created_at, event_id)`
keyset. The query orders and compares on the pair
(`WHERE (event_created_at, event_id) > ($ts, $id)
ORDER BY event_created_at ASC, event_id ASC`), and the cursor carries the
last reply's event id as the tiebreak end to end:
- buzz-db: composite cursor decode (8-byte BE seconds + raw event id) and
row-comparison keyset; a bare 8-byte cursor still works for back-compat.
- buzz-relay bridge: `extract_thread_cursor` reads `thread_cursor` +
`thread_cursor_id` and encodes the composite bytes.
- desktop: `ThreadCursor { created_at, event_id }` request/response type;
the `get_thread_replies` command derives the next cursor from the last
event's `(created_at, id)` transparently — no server-issued token.
Verified end to end against a live relay: paging the seeded thread at
limit 10 returns a set byte-identical to the full-thread oracle (40/40,
no gaps, no duplicates). New DB regression test pins the same-second-tie
case; +5 bridge unit tests cover the composite cursor encoding.
Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
…the GUI Second half of the Tier-1 read-path work (the backend thread fetch + composite keyset landed in the prior commit). This wires the frontend to those server-side reads and adds a channel-timeline escape hatch, closing the two remaining "missed messages" gaps in the desktop GUI without a schema change or a hot-path rewrite. Threads (descendant gap): the thread panel derives its replies from the channel cache, and `useLoadMissingAncestors` only backfills *ancestors* (walking `e`-tags upward). Replies that fell outside the channel cold-load window were therefore never fetched — deep or old threads rendered silently incomplete. `useThreadReplies` closes this using the same cache seam: on thread open it pages `get_thread_replies` to the floor over the gap-free `(created_at, event_id)` keyset and merges each event into the channel cache (`mergeMessages`, dedupe by id). Idempotent per (channel, root); retries on error rather than caching a partial subtree. All downstream grouping/ordering/unread derivation is unchanged — the thread simply becomes complete. Channel timeline (dense-second wall): scrollback pages history over WS `REQ` with a bare `until` (`created_at`) cursor. Ordinary same-second overlap is duplicate-safe (relay `until` is inclusive), so the timeline does not drop messages in the normal case. The one genuinely unreachable case is a single `created_at` second holding more messages than one WS page: `until` keeps returning that second's newest slice, the boundary never advances, and everything behind it is stuck. `get_channel_messages_before` gives the pager a bridge composite-keyset fallback that advances within a tied second via `id > before_id` under the relay's `created_at DESC, id ASC` order. It activates *only* on a full-page same-second stall (the proven wall), seeds from the max event id already known at the boundary second (not `baseline[0].id`, which would re-request held rows), and drains that entire boundary second before honoring the per-pass row-floor/batch budget — yielding mid-second would strand the tail behind the same stall with no guaranteed sentinel re-arm. Older history then resumes on the normal budget. The WS live-tail/optimistic/cold-load hot path is untouched. Verification: new Playwright parity test seeds a 450-message dense second (>2 WS pages) behind the cold-load window and proves the keyset fallback fires and every row is reachable. It caught a real tail-stranding bug in the first cut (drain honored the batch budget mid-second, stranding 50 of 450 permanently); the boundary-second-drain invariant is the fix. Green: tsc, biome, file-sizes, px-text, cargo check, 1399 unit tests, and the new spec. `tauri.ts`'s existing file-size override is bumped for the two load-bearing bindings (split is known debt, a separate PR). Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Lane 2 (Wren): page user search so add-users / tag-users interactions reach every match instead of a hacky first-page limit. Independent of Lane 1's message/thread read-path (disjoint files); merged to land the read-path work as one PR. Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz> * feat/read-path-people-model: test(desktop): cover people search pagination fix(desktop): page user search reachability
Keep the read-path branch mergeable so CI materializes the merge ref. Relay-only change, disjoint from the desktop read-path work. Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz> * origin/main: fix(relay): enable Redis TLS for rediss:// (ElastiCache) (#1417)
…eads The channel-timeline dense-second fallback sent its composite keyset tiebreak under the filter key `n`, but the relay bridge's `extract_before_id` reads it from `before_id` (crates/buzz-relay/src/api/ bridge.rs). So on the real `/query` path `query.before_id` was never set: the fallback silently degraded to a bare inclusive `until`, re-returned the same full dense-second page, re-emitted the same cursor, and `drainOlderViaKeyset` could spin inside the boundary-second drain loop instead of breaking the wall — the exact case the escape hatch exists to fix. Caught by Wren in PR review. The Playwright spec passed anyway because its mock-store path reimplements the keyset in JS (never exercising the relay filter field), and config mode sent the same wrong key. Fix both call sites to send `before_id`, and add a Rust unit test that pins the field name so the client/relay contract can't drift silently again — extracting `build_channel_messages_before_filter` to make the emitted filter testable, mirroring `build_search_messages_filter`. Verified: 863/863 desktop cargo tests (incl. the 2 new filter tests), tsc, biome, file-sizes, px-text, 1399 node unit tests, dense-second Playwright. Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
The empty-query people directory listing was terminal: it could never page past its first page, so on busy relays not all people were reachable via the DM picker / add-user / tag flows (Tyler's "all people" goal). Two halves, both fixed: - Relay: the non-search general query path (POST /query, no `search` field) built an EventQuery but never read the raw `page` extension, so no SQL OFFSET was set. Only `search`-bearing filters route to handle_bridge_search (which has its own page/per_page). Empty kind:0 listings fell through here. New `extract_page_offset` wires page -> (page-1)*limit into query.offset; query_events already applies OFFSET with deterministic `created_at DESC, id ASC` ordering, so offset paging is stable. Only fires when `page` > 1 is explicitly present, so keyset/general queries are untouched. - Desktop: search_users' empty-query branch called list_user_search_results, which always returned next_cursor: None, so the frontend could never request page 2. Now emits Some(page+1) when the relay returned a full page (raw events.len() >= max), mirroring the existing NIP-50 search branch. Raw length is the correct fullness signal since list_user_search_results dedupes/truncates. Config-mode e2eBridge delegates empty-query paging to the real relay /query, so this fix makes that path correct too; the mock-store path already paged locally. Added Rust unit tests pinning the page->offset contract (absent/page-1 -> None, page N -> (N-1)*limit, missing limit -> None) — the durable server-side guard the JS mock structurally can't verify. Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
…nings Rust Lint CI runs clippy with -D warnings, so the pre-existing `.as_ref().map(|c| c.as_slice())` on the Lane 1 thread-cursor path (bridge.rs) was a red gate regardless of provenance. `as_deref()` is the exact equivalent (Option<Vec<u8>> -> Option<&[u8]>). Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
The headline Lane-1 thread-backfill fix (get_thread_replies / useThreadReplies) was dead against a real relay: the desktop command built a KINDLESS /query filter, and the relay bridge p-gate (p_gated_filters_authorized) rejects any kindless filter with 403 before routing — a kindless filter 'could match' a p-gated kind, so it demands a #p tag we don't send. The relay routes the thread query purely off #e + depth_limit (kind does not filter it), but the gate still runs first. The e2e mock didn't model p-gating, so CI stayed green on a 403. Fixes: - messages.rs: give get_thread_replies' filter the non-p-gated TIMELINE_KINDS (extracted a shared const + build_thread_replies_filter helper mirroring the channel sibling, so the two filters can't drift). Guard unit tests assert the filter carries kinds and every kind is NOT in P_GATED_KINDS (the real invariant), plus composite-cursor paging. - e2eBridge.ts: model the relay p-gate for /query and WS REQ so a kindless #e read is now rejected in the mock too — closes the CI blind spot. Also fix the mock subtree walk to traverse nested replies transitively. - useThreadReplies.ts: clear the fetched-root mark on cancellation unless the paging loop completed, so an interrupted first fetch retries on reopen instead of silently stranding the feature. Narrow effect deps to primitive channel id/type to avoid object-identity churn. - Correct the false 'includes root at depth 0' contract at all three doc sites (messages.rs / tauri.ts / types.ts): the query keys on root_event_id, which root rows lack, so results are replies-only (depth >= 1). - thread.rs: fix the nested root-stub INSERT in insert_thread_metadata (column list omitted community_id while binding it, scrambling every placeholder) to match the correct production sibling in event.rs. Add depth-2 reachability and a direct stub-branch guard (Postgres-gated). - check-file-sizes.mjs: record load-bearing overrides for messages.rs (fix + guard tests crossed 1000 from 995) and tauri.ts (+3 doc-only lines). Verified: tauri 864/0, buzz-db 124/0 (incl. Postgres, serialized, with a red-check confirming the stub test fails on the pre-fix SQL), and a live clean-room relay at this tip: kindless #e -> 403, shipped kinded filter -> 200 with 250/250 dense same-second depth-1 replies + a depth-2 nested reply reached, root excluded. Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
|
Non-blocking adversarial note: the new In I don't see a current leak path from this PR because the sensitive p-gated / author-only kinds I checked don't appear to be channel-scoped thread rows today. But the invariant feels easy to regress later: a future sensitive channel-scoped kind with thread metadata could be returned here even though the request advertised only safe Suggested hardening: mirror the general path's result-level filtering/auth checks in the thread route before serializing each reply. That keeps the auth gate and emitted rows coupled, rather than relying on today's set of thread-metadata-capable kinds. |
Brings the branch current with main (~20 commits, incl. relay perf #1453/#1454 and mention ranking #1431). One conflict resolved in useMentions.ts: kept main's rankMentionCandidates pipeline, re-applied this branch's suggestion slice change (Math.max(MENTION_SUGGESTION_LIMIT, mentionCandidates.length)). Verified post-merge: tsc --noEmit clean, biome check clean, desktop unit tests 1475/1475, cargo check --workspace clean. Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
…into HEAD * origin/paul/nip-am-agent-turn-metrics: fix(profile): consolidate agent profile runtime metadata (#1451) fix(desktop): simplify workspace rail badges (#1462) perf(desktop): instant channel switching — non-blocking first paint, persisted snapshots (#1452) perf(relay): bounded-concurrency multi-filter query execution (S2) (#1457) fix(desktop): classify timeline prepends so history loads don't bump unread (#1416) fix(desktop): quiet gate for workspace switches instead of boot splash (#1449) fix(read-path): reach complete threads, dense-second timelines, and all people in the GUI (#1418) E1+E3: reduce relay ingest/fan-out DB round trips; ack p99 −7–16%, fd p99 −6–28%, p999 tails −29–53% vs PR #1453 tip (#1454) perf(relay): defer post-commit dispatch and avoid verify clone (#1453) fix(relay): include git hook tools in runtime image (#1326) feat(chart): per-pod emptyDir git scratch when persistence disabled (multi-replica HA) (#1450) fix(relay): remove media bearer-token auth (#1444) fix(desktop): stop search shortcut from hijacking the sidebar (#1447) Co-authored-by: Will Pfleger <pfleger.will@gmail.com> Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
What & why
This PR fundamentally fixes how the desktop GUI reaches channel messages, thread replies, and people — removing the "hacky limits" that silently miss messages/people. This is the Tier-1 read-path work: reachability via keyset pagination over the already-merged relay substrate (PR #1321), no schema change, no hot-path rewrite. Two lanes, one PR.
Lane 1 — messages + threads (Eva)
Threads (descendant gap). The thread panel derived replies from the channel cache, and
useLoadMissingAncestorsonly backfilled ancestors. Replies outside the cold-load window were never fetched, so deep/old threads rendered silently incomplete.useThreadRepliescloses it via the same cache seam: pageget_thread_repliesto the floor over the gap-free(created_at, event_id)keyset and merge into the channel cache (dedupe by id, idempotent per root, retry on error). Downstream grouping/ordering/unread unchanged.get_thread_replieskeyset-ordered onevent_created_atalone with no tiebreak, so a page ending mid-second dropped every remaining tied reply. Composite(event_created_at, event_id)keyset closes it. The query keys onroot_event_id(which root rows lack), so results are replies-only (depth >= 1) — root excluded by construction.Channel timeline (dense-second wall). Ordinary same-second overlap is duplicate-safe (relay
untilis inclusive), so the timeline doesn't drop messages normally. The one genuinely unreachable case: a singlecreated_atsecond with more messages than one WS page —untilnever advances, everything behind it is stuck.get_channel_messages_beforegives the pager a composite-keyset fallback that:id > before_idundercreated_at DESC, id ASC),Lane 2 — people search / mentions / members (Wren)
Pages user search so add-users / tag-users / mention interactions reach every match instead of a hacky first-page limit (
user_search.rs,useMentions.ts, members/mention/DM UIs,profile/hooks.ts). Empty-query directory browse is now paged end-to-end (relaypage→offset on the non-search/querypath; desktop emits a next cursor on full raw pages). E2E coverage inchannels.spec.ts+mentions.spec.ts.The blocker a live relay surfaced (and its fix)
Static review passed the thread path, but a dynamic redteam against a real relay caught a production-fatal blocker the mock hid:
get_thread_repliesbuilt a kindless/queryfilter, and the relay bridge p-gate (p_gated_filters_authorized) 403s any kindless filter before routing — a kindless filter "could match" a p-gated kind, so the gate demands a#p=selftag the thread read doesn't send. The relay routes the thread query purely off#e+depth_limit(kind doesn't filter it), but the gate runs first. The e2e mock never modeled p-gating, so CI stayed green on a 403 — every thread-backfill call would have failed in prod.Fixes:
messages.rs— give the thread filter the non-p-gatedTIMELINE_KINDS(via a shared const +build_thread_replies_filterhelper mirroring the channel-messages sibling, so the two p-gate filters can't drift). Guard unit tests assert the filter carries kinds and that every kind is absent fromP_GATED_KINDS(the real invariant), plus composite-cursor paging.e2eBridge.ts— model the relay p-gate for/queryand WS REQ so a kindless#eread is now rejected in the mock too (faithful port ofp_gated_filters_authorized), closing the CI blind spot. Also fix the mock subtree walk to traverse nested replies transitively (frontier BFS for depth > 1).useThreadReplies.ts— clear the fetched-root mark on cancellation unless the paging loop completed, so an interrupted first fetch retries on reopen instead of stranding the feature. Narrow effect deps to primitive channel id/type.messages.rs/tauri.ts/types.ts): results are replies-only.Verification
Live re-verify at the final tip (the gate that caught the blocker): built the relay at this SHA on a clean-room DB, seeded root + 250 dense same-second depth-1 replies + 1 depth-2 grandchild.
#efilter → HTTP 403 (blocker reproduced exactly),Green on the integrated branch:
tsc, biome, file-sizes, px-text,cargo check, unit suites (buzz-db 124/0 incl. Postgres with a red-check proving the stub test fails on the pre-fix SQL; desktop 864/0), the dense-second Playwright spec, and all pre-push CI hooks (desktop-test, rust-tests, mobile-test, desktop-tauri-test).Notes for review
relay_seqwork (Tier-2) — never implemented (spec-only), named as follow-on, not reopened here.messages.rs,tauri.ts) for load-bearing bindings/fixes from both lanes. Splitting the central bindings is known debt / a separate PR — no opportunistic refactor here.scroll-history.spec.ts"preserves user scroll" / "does not teleport" flake on local runners at the clean baseline too (wheel timing); pass under CI's retry policy. Not touched by this change.Final score: 9/10.