fix: channel member search via NIP-50 + Typesense indexer fix for kind:0#569
Merged
Conversation
bb7125e to
7a63445
Compare
# Problem
The 'Add members' search in channels silently hid users. Two compounding
bugs, plus a third discovered while testing:
1. **Backend cap of 1000 events.** The Tauri `search_users` command
fetched up to 2000 kind:0 events via POST /query and grepped them
client-side, but the HTTP bridge clamps queries at 1000 events. On
relays with more than 1000 live profiles, the rest were invisible.
2. **Frontend cap of 8 results.** `ChannelMemberInviteCard` requested
only 8 results from `useUserSearchQuery`, with no relevance ranking.
3. **Typesense tokenizer doesn't extract names from raw JSON content.**
This one cost the most time. Kind:0 events store their structured
data as a JSON blob (`{"display_name":"alice",...}`), and Typesense's
default tokenizer glues the leading `"` onto the next word, producing
a token like `"alice` that doesn't match a clean `q=alice`. Empirical
result before this fix: NIP-50 search for "Bob" finds zero users
named Bob, even though their docs are indexed and retrievable.
# Fix
Three changes, smallest surface that actually solves the bug:
**1. Desktop `search_users` uses NIP-50** (`commands/profile.rs`).
Sends `{kinds:[0], search:q, limit:50}` instead of fetching every
kind:0 and grepping. The bridge already routes `search` to Typesense
(`api/bridge.rs::handle_bridge_search`) — we just weren't using it.
Off the 1000-event cap. Scales to any relay size.
**2. New client-side ranker** (`nostr_convert.rs::rank_user_search_results`).
Re-ranks the ≤50 Typesense hits by exact > prefix > substring on
display_name (or name) > nip05 > pubkey-hex. Drops hits that only
matched on `about`/`website` — those are noise in autocomplete.
Dedupes by pubkey defensively. 13 new unit tests cover scoring,
dedupe, edge cases (empty query, non-kind:0 hits, name-only profiles).
**3. Indexer flattens kind:0 content for tokenization**
(`sprout-search/src/index.rs::flatten_kind0_for_indexing`). For
kind:0 only, parse the JSON and append `display_name`/`name`/`nip05`
values to the indexed `content` string with whitespace separators.
The Typesense `content` field is write-only (the bridge fetches
canonical events from Postgres by id after Typesense returns hits —
bridge.rs:471), so appending derived tokens is safe. `about` and
`website` are deliberately excluded to avoid name-prefix false
positives. 10 new unit tests including malformed JSON tolerance,
non-kind:0 untouched, and ordering of appended tokens.
Frontend limit bumped 8 → 25 so server ranking has room to refine
client-side before truncation.
# Backfill
New / updated kind:0 events index correctly automatically. Existing
docs need one-time reindex:
just reindex-kind0
The new `sprout-reindex-kind0` binary streams kind:0s from Postgres
in 500-row batches through `SearchService::index_batch` and exits.
Idempotent (Typesense upsert).
# Tests / verification
- `cargo test -p sprout-search --lib` → 21 pass (10 new)
- `cargo test --lib nostr_convert` (desktop) → 37 pass (13 new)
- `cargo test -p sprout-relay --lib` → 147 pass (no regressions)
- `cargo fmt --all --check` → clean
- `cargo clippy -p sprout-search -p sprout-relay --all-targets -- -D warnings` → clean
- `pnpm typecheck` + `pnpm check` → clean
- End-to-end: built relay in screen, seeded 8 kind:0 profiles with
varied name patterns, ran `just reindex-kind0`, then queried via
`sprout get-users --name` for: alice, Bob, Lev, Banana, Zed, mal,
testbot, charlie. All returned the right user(s). Before the indexer
fix the same queries returned 0 or only false-positive bio matches.
# Replaces
Closes #567 — same UX bug; this PR's solution is more correct (server
side instead of larger-fetch client side) and now also handles the
single-word-name case that #567 didn't notice.
# Things to watch
- The reindex must be run once after deploy for the fix to take effect
on already-indexed profiles. Documented in justfile + binary header.
- `flatten_kind0_for_indexing` only extracts `display_name`/`name`/`nip05`.
If we ever want fuzzy bio search we'd need a different field weight
approach (likely dedicated Typesense fields). Out of scope here.
Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>
7a63445 to
c7040b9
Compare
Switches the kind:0 backfill from OFFSET-based paging to a snapshot ceiling (`until = Utc::now()` at start) plus a keyset cursor over `(created_at, id)` matching the underlying `ORDER BY created_at DESC, id ASC` index. Closes the only correctness footgun raised in review (3-way review with Sami + Quinn): under live write traffic, OFFSET could skip a row when a new kind:0 arrived mid-run and shifted the window. Snapshot + keyset gives strict run-once semantics — new arrivals fall outside the snapshot and are handled by the live index path, no skips, no duplicates at page boundaries. Uses `query_events`'s existing `until` + `before_id` composite-cursor support; that API was designed for exactly this pattern. Also documents `SearchHit.content` to flag that for kind:0 it contains the appended-token form (display_name/name/nip05), not the canonical event content. All production read paths refetch the canonical StoredEvent from Postgres by id, so this is invisible today — but the doc-comment prevents a future feature from accidentally trusting the field. Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The 'Add members' search in channels silently hid users. Three compounding bugs:
Backend cap of 1000 events. The Tauri
search_userscommand fetched up to 2000 kind:0 events via POST /query and grepped them client-side, but the HTTP bridge clamps queries at 1000 events. On relays with more than 1000 live profiles, the rest were invisible.Frontend cap of 8 results.
ChannelMemberInviteCardrequested only 8 results fromuseUserSearchQuery, with no relevance ranking.Typesense tokenizer doesn't extract names from raw JSON content. This one cost the most time. Kind:0 events store their structured data as a JSON blob (
{"display_name":"alice",...}), and Typesense's default tokenizer glues the leading"onto the next word, producing a token like"alicethat doesn't match a cleanq=alice. Empirical result before this fix: NIP-50 search for "Bob" finds zero users named Bob, even though their docs are indexed and retrievable. Only multi-word names like "Alice Wonderland" matched, on the second word ("Wonderland").Fix
Three changes, smallest surface that actually solves the bug:
1. Desktop
search_usersuses NIP-50 (commands/profile.rs).Sends
{kinds:[0], search:q, limit:50}instead of fetching every kind:0 and grepping. The bridge already routessearchto Typesense (api/bridge.rs::handle_bridge_search) — we just weren't using it. Off the 1000-event cap. Scales to any relay size.2. New client-side ranker (
nostr_convert.rs::rank_user_search_results).Re-ranks the ≤50 Typesense hits by exact > prefix > substring on display_name (or name) > nip05 > pubkey-hex. Drops hits that only matched on
about/website— those are noise in autocomplete. Dedupes by pubkey defensively.3. Indexer flattens kind:0 content for tokenization (
sprout-search/src/index.rs::flatten_kind0_for_indexing).For kind:0 only, parse the JSON and append
display_name/name/nip05values to the indexedcontentstring with whitespace separators. The Typesensecontentfield is write-only (the bridge fetches canonical events from Postgres by id after Typesense returns hits — bridge.rs:471), so appending derived tokens is safe and doesn't affect any read path.aboutandwebsiteare deliberately excluded to avoid name-prefix false positives.Frontend limit bumped 8 → 25 so server ranking has room to refine client-side before truncation.
Backfill — required once after deploy
New / updated kind:0 events index correctly automatically. Existing kind:0 docs need a one-time reindex:
The new
sprout-reindex-kind0binary streams kind:0s from Postgres in 500-row batches throughSearchService::index_batchand exits. Idempotent (Typesense upsert). Safe to run repeatedly.Tests
cargo test -p sprout-search --lib→ 21 pass (10 new)cargo test --lib nostr_convert(desktop) → 37 pass (13 new)cargo test -p sprout-relay --lib→ 147 pass (no regressions)cargo fmt --all --check→ cleancargo clippy -p sprout-search -p sprout-relay --all-targets -- -D warnings→ cleanpnpm typecheck+pnpm check→ cleanManual e2e
Seeded 8 kind:0 profiles locally with varied name patterns (alice, Bob, Lev, Banana Joe, Malice, alicia, alice-old, Zed). Ran
just reindex-kind0. Queriedsprout get-users --name <q>for: alice, Bob, Lev, Banana, Zed, mal, testbot, charlie. All returned the right user(s). Before the indexer fix the same queries returned 0 results (single-word names) or only false-positive bio matches.Replaces
Closes #567. Same UX bug; the new PR is the more correct fix (server-side, indexed) and additionally handles the single-word-name case that #567 didn't catch.
Things to watch
flatten_kind0_for_indexingonly extractsdisplay_name/name/nip05. If we ever want fuzzy bio search we'd need a different approach (likely dedicated Typesense fields with weightedquery_by). Out of scope here.Files