Skip to content

fix(desktop): rank and dedupe channel member search results#567

Closed
tlongwell-block wants to merge 3 commits into
mainfrom
fix/channel-add-members-search-ranking
Closed

fix(desktop): rank and dedupe channel member search results#567
tlongwell-block wants to merge 3 commits into
mainfrom
fix/channel-add-members-search-ranking

Conversation

@tlongwell-block
Copy link
Copy Markdown
Collaborator

Problem

The "Add members" search in the channel UI silently hid users. When a user typed a name or pubkey, some people consistently never appeared in the dropdown while others did — and the hidden set was stable across attempts.

Two compounding bugs:

  1. Frontend cap of 8. ChannelMemberInviteCard requested only 8 results from useUserSearchQuery (which also defaults to 8). The backend would have accepted up to 50.

  2. Backend had no ranking and truncated in arrival order. commands::profile::search_users fetched up to 2000 kind:0 events from the relay, scanned them in whatever order the relay returned them, and breaked as soon as users.len() >= max. If the match for "alice" was relay event #1500 and 8 other matches appeared earlier, alice was lost every time. Relay return order is stable-ish, so the same people stayed hidden — matching the reported symptom exactly.

    The DB-side function in crates/sprout-db/src/user.rs already had proper ranking (exact > prefix > contains). The Tauri relay path just wasn't using equivalent logic.

Fix

  • Frontend: bump ChannelMemberInviteCard from 8 → 25. Backend already clamps at 50.
  • Backend: replace the inline arrival-order filter in commands::profile::search_users with a new pure helper nostr_convert::filter_and_rank_user_search(events, query, limit) that:
    1. Dedupes to the latest kind:0 per pubkey first (max created_at, tiebreak min event id per NIP-01). kind:0 is replaceable, so a stale older profile must never outrank the live one regardless of how well its content happens to match.
    2. Scores every remaining match (exact > prefix > substring; display_name > nip05 > pubkey-hex), sorts, then truncates.

The ranking mirrors the ORDER BY in sprout-db::user::search_users so the relay path and the DB path stay consistent.

Tests

12 new unit tests in nostr_convert::tests, including:

  • Direct regression for the late-match-dropped-under-limit bug.
  • Three dedupe-semantics tests for replaceable kind:0 (stale-better-rank loses, live profile wins, stale-only-match drops the pubkey).
  • NIP-01 id-tiebreak when two kind:0s for one pubkey share created_at.
  • Exact > prefix > substring ordering across display_name, nip05, and pubkey-hex.
  • Empty-query and zero-limit guards.
cargo test nostr_convert::tests::      ->  35/35 pass (12 new)
cargo fmt --check                       ->  clean
pnpm typecheck                          ->  clean
pnpm check (biome + file-sizes)         ->  clean
pre-push hook (full workspace tests +   ->  all green
  clippy + builds for web/desktop/mobile)

Notes / things to watch

  • Pre-existing, not regressed: relay query is still capped at 2000 kind:0 events. If a relay has more profiles than that, results outside the 2000-event window remain invisible. Out of scope for this PR.
  • NewDirectMessageDialog still passes limit: 8 to the same hook by design (DM groups are capped at 8 recipients). Same backend bug used to affect it; with this fix the ranking is correct globally, so its limit-of-8 now reflects the intended UX.
  • Minor cosmetic: the DB-path tiebreak sorts on original-case display_name; the new client-path sorts on lowercased. Different orderings for mixed-case names within the same rank tier. Not user-visible at limit 25; left as-is for simpler code.
  • Bumped the file-size override on nostr_convert.rs (870 → 1210) to fit the new helper and tests.

Files

desktop/scripts/check-file-sizes.mjs                              |   2 +-
desktop/src-tauri/src/commands/profile.rs                         |  41 +--
desktop/src-tauri/src/nostr_convert.rs                            | 336 +++++++++++++++++++++
desktop/src/features/channels/ui/ChannelMemberInviteCard.tsx      |   5 +-
4 files changed, 353 insertions(+), 31 deletions(-)

The 'Add members' search in the channel UI was silently hiding people:
relay-returned kind:0 events were filtered in arrival order and the
loop broke as soon as it had `limit` matches. Match #1500 was lost
behind 8 earlier matches every time, producing the reported
'consistent' missing-user behavior.

Two compounding bugs:

1. `ChannelMemberInviteCard` requested only 8 results (hook default).
   Bumped to 25; the backend already clamps at 50.

2. `commands::profile::search_users` had no ranking and truncated in
   arrival order. Replaced the inline filter with a new pure helper
   `nostr_convert::filter_and_rank_user_search` that:

   a. Dedupes the incoming events to the latest kind:0 per pubkey
      (max created_at, tiebreak min event id) — kind:0 is replaceable
      per NIP-01, so an older stale profile must never outrank the
      live one regardless of how well its content happens to match.
   b. Scores every remaining match (exact > prefix > substring;
      display_name > nip05 > pubkey), sorts, then truncates.

   The ranking mirrors the ORDER BY in `sprout-db::user::search_users`
   so the relay path and the DB path stay consistent.

Also bumped the file-size override on `nostr_convert.rs` (870->1170)
to fit the new helper and 11 unit tests, including direct regressions
for the late-match-dropped-under-limit bug and the three replaceable-
event dedupe semantics (stale-better-rank loses, live profile wins,
stale-only-match drops the pubkey).

cargo test nostr_convert::tests::  ->  35/35 pass
pnpm typecheck + pnpm check (biome + file-sizes)  ->  clean
cargo fmt --check  ->  clean

Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>
Self-review of the channel-member-search ranker turned up one
under-tested corner: the NIP-01 'lowest event id retained' tiebreak
that kicks in when two replaceable kind:0 events for one pubkey share
`created_at`. The dedupe logic handled it, but nothing locked the
behavior in, and silent regression there would put a stale profile
back on top.

Test builds two events with the same Keys + custom_created_at, derives
the expected winner from `a.id < b.id` (so the test is independent of
which random hash happens to be smaller), then asserts both input
orderings produce the same single result with the expected display name.

Bumps the file-size override on nostr_convert.rs from 1170 to 1210 to
fit the new test.

Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>
* origin/main:
  chore(acp): raise default idle timeout from 320s to 620s (#566)
  fix(cli): derive thread root from parent event tags (#564)
  fix: skip empty assistant turns instead of placeholder space (#560)

Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>
@tlongwell-block tlongwell-block force-pushed the fix/channel-add-members-search-ranking branch from d2304d4 to 9d12187 Compare May 13, 2026 15:16
@tlongwell-block
Copy link
Copy Markdown
Collaborator Author

Closing in favor of #__ (link to follow) — same bug, but the new PR catches a third compounding issue (Typesense tokenization of raw kind:0 JSON content, which made vanilla NIP-50 silently miss single-word display names). Net result: server-side ranking via NIP-50 + indexer fix + a smaller client-side ranker than this PR's helper. Branch is fix/channel-add-members-nip50.

@tlongwell-block
Copy link
Copy Markdown
Collaborator Author

Replacement PR: #569

tlongwell-block added a commit that referenced this pull request May 13, 2026
# Problem

The 'Add members' search in channels silently hid users. Two compounding
bugs, plus a third discovered while testing:

1. **Backend cap of 1000 events.** The Tauri `search_users` command
   fetched up to 2000 kind:0 events via POST /query and grepped them
   client-side, but the HTTP bridge clamps queries at 1000 events. On
   relays with more than 1000 live profiles, the rest were invisible.

2. **Frontend cap of 8 results.** `ChannelMemberInviteCard` requested
   only 8 results from `useUserSearchQuery`, with no relevance ranking.

3. **Typesense tokenizer doesn't extract names from raw JSON content.**
   This one cost the most time. Kind:0 events store their structured
   data as a JSON blob (`{"display_name":"alice",...}`), and Typesense's
   default tokenizer glues the leading `"` onto the next word, producing
   a token like `"alice` that doesn't match a clean `q=alice`. Empirical
   result before this fix: NIP-50 search for "Bob" finds zero users
   named Bob, even though their docs are indexed and retrievable.

# Fix

Three changes, smallest surface that actually solves the bug:

**1. Desktop `search_users` uses NIP-50** (`commands/profile.rs`).
   Sends `{kinds:[0], search:q, limit:50}` instead of fetching every
   kind:0 and grepping. The bridge already routes `search` to Typesense
   (`api/bridge.rs::handle_bridge_search`) — we just weren't using it.
   Off the 1000-event cap. Scales to any relay size.

**2. New client-side ranker** (`nostr_convert.rs::rank_user_search_results`).
   Re-ranks the ≤50 Typesense hits by exact > prefix > substring on
   display_name (or name) > nip05 > pubkey-hex. Drops hits that only
   matched on `about`/`website` — those are noise in autocomplete.
   Dedupes by pubkey defensively. 13 new unit tests cover scoring,
   dedupe, edge cases (empty query, non-kind:0 hits, name-only profiles).

**3. Indexer flattens kind:0 content for tokenization**
   (`sprout-search/src/index.rs::flatten_kind0_for_indexing`). For
   kind:0 only, parse the JSON and append `display_name`/`name`/`nip05`
   values to the indexed `content` string with whitespace separators.
   The Typesense `content` field is write-only (the bridge fetches
   canonical events from Postgres by id after Typesense returns hits —
   bridge.rs:471), so appending derived tokens is safe. `about` and
   `website` are deliberately excluded to avoid name-prefix false
   positives. 10 new unit tests including malformed JSON tolerance,
   non-kind:0 untouched, and ordering of appended tokens.

Frontend limit bumped 8 → 25 so server ranking has room to refine
client-side before truncation.

# Backfill

New / updated kind:0 events index correctly automatically. Existing
docs need one-time reindex:

    just reindex-kind0

The new `sprout-reindex-kind0` binary streams kind:0s from Postgres
in 500-row batches through `SearchService::index_batch` and exits.
Idempotent (Typesense upsert).

# Tests / verification

- `cargo test -p sprout-search --lib`           → 21 pass (10 new)
- `cargo test --lib nostr_convert` (desktop)    → 37 pass (13 new)
- `cargo test -p sprout-relay --lib`            → 147 pass (no regressions)
- `cargo fmt --all --check`                     → clean
- `cargo clippy -p sprout-search -p sprout-relay --all-targets -- -D warnings` → clean
- `pnpm typecheck` + `pnpm check`                → clean
- End-to-end: built relay in screen, seeded 8 kind:0 profiles with
  varied name patterns, ran `just reindex-kind0`, then queried via
  `sprout get-users --name` for: alice, Bob, Lev, Banana, Zed, mal,
  testbot, charlie. All returned the right user(s). Before the indexer
  fix the same queries returned 0 or only false-positive bio matches.

# Replaces

Closes #567 — same UX bug; this PR's solution is more correct (server
side instead of larger-fetch client side) and now also handles the
single-word-name case that #567 didn't notice.

# Things to watch

- The reindex must be run once after deploy for the fix to take effect
  on already-indexed profiles. Documented in justfile + binary header.
- `flatten_kind0_for_indexing` only extracts `display_name`/`name`/`nip05`.
  If we ever want fuzzy bio search we'd need a different field weight
  approach (likely dedicated Typesense fields). Out of scope here.

Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>
tlongwell-block added a commit that referenced this pull request May 13, 2026
# Problem

The 'Add members' search in channels silently hid users. Two compounding
bugs, plus a third discovered while testing:

1. **Backend cap of 1000 events.** The Tauri `search_users` command
   fetched up to 2000 kind:0 events via POST /query and grepped them
   client-side, but the HTTP bridge clamps queries at 1000 events. On
   relays with more than 1000 live profiles, the rest were invisible.

2. **Frontend cap of 8 results.** `ChannelMemberInviteCard` requested
   only 8 results from `useUserSearchQuery`, with no relevance ranking.

3. **Typesense tokenizer doesn't extract names from raw JSON content.**
   This one cost the most time. Kind:0 events store their structured
   data as a JSON blob (`{"display_name":"alice",...}`), and Typesense's
   default tokenizer glues the leading `"` onto the next word, producing
   a token like `"alice` that doesn't match a clean `q=alice`. Empirical
   result before this fix: NIP-50 search for "Bob" finds zero users
   named Bob, even though their docs are indexed and retrievable.

# Fix

Three changes, smallest surface that actually solves the bug:

**1. Desktop `search_users` uses NIP-50** (`commands/profile.rs`).
   Sends `{kinds:[0], search:q, limit:50}` instead of fetching every
   kind:0 and grepping. The bridge already routes `search` to Typesense
   (`api/bridge.rs::handle_bridge_search`) — we just weren't using it.
   Off the 1000-event cap. Scales to any relay size.

**2. New client-side ranker** (`nostr_convert.rs::rank_user_search_results`).
   Re-ranks the ≤50 Typesense hits by exact > prefix > substring on
   display_name (or name) > nip05 > pubkey-hex. Drops hits that only
   matched on `about`/`website` — those are noise in autocomplete.
   Dedupes by pubkey defensively. 13 new unit tests cover scoring,
   dedupe, edge cases (empty query, non-kind:0 hits, name-only profiles).

**3. Indexer flattens kind:0 content for tokenization**
   (`sprout-search/src/index.rs::flatten_kind0_for_indexing`). For
   kind:0 only, parse the JSON and append `display_name`/`name`/`nip05`
   values to the indexed `content` string with whitespace separators.
   The Typesense `content` field is write-only (the bridge fetches
   canonical events from Postgres by id after Typesense returns hits —
   bridge.rs:471), so appending derived tokens is safe. `about` and
   `website` are deliberately excluded to avoid name-prefix false
   positives. 10 new unit tests including malformed JSON tolerance,
   non-kind:0 untouched, and ordering of appended tokens.

Frontend limit bumped 8 → 25 so server ranking has room to refine
client-side before truncation.

# Backfill

New / updated kind:0 events index correctly automatically. Existing
docs need one-time reindex:

    just reindex-kind0

The new `sprout-reindex-kind0` binary streams kind:0s from Postgres
in 500-row batches through `SearchService::index_batch` and exits.
Idempotent (Typesense upsert).

# Tests / verification

- `cargo test -p sprout-search --lib`           → 21 pass (10 new)
- `cargo test --lib nostr_convert` (desktop)    → 37 pass (13 new)
- `cargo test -p sprout-relay --lib`            → 147 pass (no regressions)
- `cargo fmt --all --check`                     → clean
- `cargo clippy -p sprout-search -p sprout-relay --all-targets -- -D warnings` → clean
- `pnpm typecheck` + `pnpm check`                → clean
- End-to-end: built relay in screen, seeded 8 kind:0 profiles with
  varied name patterns, ran `just reindex-kind0`, then queried via
  `sprout get-users --name` for: alice, Bob, Lev, Banana, Zed, mal,
  testbot, charlie. All returned the right user(s). Before the indexer
  fix the same queries returned 0 or only false-positive bio matches.

# Replaces

Closes #567 — same UX bug; this PR's solution is more correct (server
side instead of larger-fetch client side) and now also handles the
single-word-name case that #567 didn't notice.

# Things to watch

- The reindex must be run once after deploy for the fix to take effect
  on already-indexed profiles. Documented in justfile + binary header.
- `flatten_kind0_for_indexing` only extracts `display_name`/`name`/`nip05`.
  If we ever want fuzzy bio search we'd need a different field weight
  approach (likely dedicated Typesense fields). Out of scope here.

Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant