Skip to content

feat(search): IndexedDB-backed persistent search index for all rooms#881

Open
Just-Insane wants to merge 23 commits into
SableClient:devfrom
Just-Insane:feat/encrypted-search-idb
Open

feat(search): IndexedDB-backed persistent search index for all rooms#881
Just-Insane wants to merge 23 commits into
SableClient:devfrom
Just-Insane:feat/encrypted-search-idb

Conversation

@Just-Insane
Copy link
Copy Markdown
Contributor

@Just-Insane Just-Insane commented May 19, 2026

Description

Adds a Phase 2 IndexedDB-backed persistent search index, building on top of the in-memory encrypted-room search (#871). A MiniSearch 7.2.0 web worker owns the index and IDB persistence with multi-tab write safety via navigator.locks, debounced flushing, and per-room LRU eviction. Gated behind a second experimental settings toggle ("Message Search Index").

The index covers all rooms (not just encrypted ones), which deserves a note on rationale:

For unencrypted rooms the Matrix SDK's own IndexedDB sync cache already holds plaintext event bodies, so at first glance indexing them again looks redundant. However the SDK's IDB is an opaque sync store — it isn't queryable for full-text search or chip-filter scans (Has: Image / File / Audio / Video / Link) without loading events page-by-page into memory, which means results are limited to what happens to be in the live timeline. The MiniSearch index buys O(1) chip-filter lookups across full backfilled history regardless of room type. Only a small set of fields are stored (eventId, roomId, sender, msgtype, ts, body), so the overhead is modest, and the result is consistent search depth across encrypted and unencrypted rooms alike.

Fixes #

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings

AI disclosure:

  • Fully AI generated (explain what all the generated code does in moderate detail).
  • Partially AI assisted (clarify which code was AI assisted and briefly explain what it does).

The worker owns a MiniSearch instance and IDB object stores for document records and a pending-flush queue. The main thread posts ADD_EVENTS messages after timeline updates; the worker debounces these into batches, acquires a navigator.locks exclusive write lock, and flushes to IDB. On startup the worker rehydrates MiniSearch by reading all stored document records from IDB. An LRU counter per room tracks total stored events; when the cap is exceeded the worker evicts the room with the oldest last-access timestamp and removes its records from IDB and MiniSearch.

Split the search room list into encrypted / plaintext buckets.
Server search covers plaintext rooms unchanged. Encrypted rooms are
searched synchronously against their in-memory live timeline so
decrypted content is always available.

Key details:
- partitionRoomsByEncryption() splits the room filter; for global
  search (rooms=undefined) all joined encrypted rooms are scanned
- In-memory results are merged into the first page only (no
  pagination token for local results)
- For 'recent' order, groups are interleaved by timestamp; for
  'rank' order, server results come first
- An info banner is shown when encrypted rooms were searched so
  users know coverage is limited to cached messages
- Controlled by features.encryptedSearch in config.json (default true)
- 18 unit tests covering matching, filtering, partitioning, merging
…arch

- Adds 'Encrypted Room Search' toggle to Settings > Experimental
- Setting defaults to true; operator can hard-disable via
  config.json features.encryptedSearch = false
- Lock icon shown next to encrypted rooms in the search room picker
  when the feature is active, indicating local-cache coverage
- useMessageSearch now checks both the operator flag and user setting
Removes the guard that hid the search icon in encrypted room headers.
Encrypted rooms now navigate to message search pre-filtered to that
room, showing in-memory results when the feature is enabled.
Tooltip reads "Search (local cache)" for encrypted rooms.
- Fix DM rooms missing from search: replace useRooms (excludes DMs)
  with useSelectedRooms+isRoom selector so DM room IDs pass URL param
  validation; room picker always uses the full allRooms list
- Add SearchHasType (image/file/audio/video/link) to searchEncryptedRooms.ts
  with mEventMatchesHasTypes filtering in in-memory timeline search
- Add hasTypes to MessageSearchParams; pass contains_url:true for has:link
  on server requests; post-filter server results by msgtype/URL pattern
- Add HasFilterChips and SelectSenderButton components to SearchFilters;
  new has: row with Image/File/Audio/Video/Link toggles plus From: sender
  chips with Matrix ID input popup
- Wire has URL param through MessageSearch: parse, encode, pass to
  SearchFilters and msgSearchParams; add handleHasTypesChange/handleSendersChange
- Fix mDirects undefined crash in MessageSearch (re-add atom import)
- Allow has: filters to trigger search without a text term
  - searchEncryptedRooms: skip body text check when lowerTerm is empty
  - useMessageSearch: only early-return when both term and hasTypes are absent
  - When no term: skip server search (server requires search_term), in-memory only
  - MessageSearch: enable query when hasTypes is set even without a term
- Add DM search page at /direct/search/
  - DIRECT_SEARCH_PATH constant in paths.ts
  - getDirectSearchPath() helper in pathUtils.ts
  - useDirectSearchSelected() hook in useDirectSelected.ts
  - DirectSearch component (scoped to DM rooms)
  - Route registered in Router.tsx
  - 'Message Search' nav item added to Direct Messages panel
  - RoomViewHeader: clicking search in a DM navigates to DM search
…display names; fix DM create button alignment
Typing > in the room search modal switches to message search mode.
A 'Search messages: <query>' item appears; pressing Enter or clicking
it navigates to the context-appropriate message search page with the
term pre-filled:
- /direct/ context → DM message search
- /:spaceIdOrAlias/ context → space message search
- /home/ or other → home message search

The hint text is updated to include > for messages. The prefix is
disabled when the modal is used for room-picking (forwarding).
- Install MiniSearch 7.2.0 for TypeScript-native full-text search
- Add idbSearchIndex and searchIndexMessageLimit to settings atom
- Create SearchIndexToggle experimental toggle (second opt-in)
- Add searchWorker.ts Web Worker owning MiniSearch index + IDB persistence
  - IDB schema: 'index' store (serialised index + room queues), 'backfill' store
  - Multi-tab write safety via navigator.locks
  - Debounced flush (5s) + beforeunload flush
  - Per-room LRU eviction when queue exceeds 110% of configured limit
- Create useSearchIndex.tsx React context + hook
  - Live indexing via RoomEvent.Timeline listener
  - Headless EventTimelineSet backfill in idle callbacks
  - Query, getStats, clearIndex public API
- Wrap ClientNonUIFeatures in SearchIndexProvider
- Add SearchIndexCache to Developer Tools: stats, per-room limit selector,
  backfill progress, clear button (auto-refreshes every 5s)
- Wire useMessageSearch to use IDB index when idbSearchIndex is enabled
- Export EMPTY_CONTEXT from searchEncryptedRooms for reuse
@Just-Insane Just-Insane changed the title feat(search): IndexedDB-backed persistent search index for encrypted rooms feat(search): IndexedDB-backed persistent search index for all rooms May 19, 2026
…pted; update UI text

- Remove isRoomEncrypted guard from indexEvent and startBackfill so all
  non-space rooms are backfilled and live-indexed (not just encrypted ones)
- Add IDB chip-only query path for unencrypted rooms in useMessageSearch
  (useIdbSearch flag, usedIdbForUnencrypted for accurate inMemoryRoomCount)
- Rename 'Encrypted Search Index' → 'Message Search Index' throughout UI
- Update SearchIndexToggle description to disclose plaintext IDB storage
- Update EncryptedSearch description to clarify in-memory-only (no write)
- Remove stale mx dep from indexEvent useCallback (isRoomEncrypted removed)
- Cap simultaneous room backfills at 2 (MAX_CONCURRENT_BACKFILLS) so the
  HTTP connection pool is never saturated by pagination requests, keeping
  the /sync long-poll responsive on mobile.
- Track Matrix sync state in syncStateRef; pause backfill when sync is
  unhealthy (Error / Reconnecting) and automatically resume via a
  ClientEvent.Sync listener when it recovers.
- Raise the requestIdleCallback fallback delay from 200 ms to 1 s for
  environments (iOS Safari) that lack the API.
- Replace the 'schedule all at once' loop in startBackfill with a proper
  queue (backfillQueueRef) drained by resumeBackfill().
idbEventsToGroups now looks up each event via mx.getRoom().findEventById()
and calls toSearchEvent() for full decrypted content (url, file, info).
Falls back to msgtype m.text when the event is no longer in memory,
preventing BrokenContent from showing 'Broken message: [filename]'.

Regression introduced in 544658d which added ev.msgtype to the
synthetic event without providing the media fields renderers require.
…results

Extended IndexableEvent with url/file/info/filename fields so media events
render correctly from IDB without requiring the live room timeline cache.

Changes:
- types.ts: add optional url/file/info/filename to IndexableEvent
- toIndexableEvent: extract media fields from getContent() for m.image,
  m.file, m.audio, m.video
- searchWorker: add new fields to storeFields; bump IDB schema to v3
  (clears old index so all rooms re-backfill with full media content)
- idbEventsToGroups: reconstruct full content from stored IDB fields;
  only fall back to m.text for pre-v3 entries that lack media fields

Previously only events still in the live timeline window rendered as
images — all older history showed 'Broken message'. After re-backfill,
all indexed media events will render with full thumbnails and previews.
…uestIdleCallback is available

MAX_CONCURRENT_BACKFILLS is now Infinity on desktop/Android (where the
browser's idle scheduler is the natural throttle) and 4 on iOS (where we
cap concurrency to protect the HTTP connection pool). Also restores the
iOS fallback delay to 150ms (was raised to 1000ms in bf4d8d6, making
backfill ~5x slower with no benefit beyond caution).
@Just-Insane Just-Insane marked this pull request as ready for review May 20, 2026 02:49
@Just-Insane Just-Insane requested review from 7w1 and hazre as code owners May 20, 2026 02:49
Copilot AI review requested due to automatic review settings May 20, 2026 02:49

This comment was marked as spam.

…useSearchIndex

- MessageSearch.tsx: move VALID_HAS_TYPES to module scope as
  Set<SearchHasType> (prefer-set-has); remove redundant !! on isRoom()
  call (no-unnecessary-type-conversion).
- useMessageSearch.ts: remove unnecessary 'as IResultContext' on
  EMPTY_CONTEXT; remove two unnecessary non-null assertions on searchIndex.
- searchEncryptedRooms.ts: fix no-unsafe-enum-comparison — cast
  mEvent.getType() to EventType before comparing with EventType.RoomMessage.
- useSearchIndex.tsx: replace multi-line worker.postMessage with postToWorker()
  (eliminates multi-line oxlint-disable-next-line scope issue); add
  postToWorker to useEffect dep array; return () => {} in early exits
  for consistent-return.
…ing sync begins with an initial window of 100 rooms, so\n`startBackfill` only sees those 100 when it first runs. Additional\nrooms are loaded progressively as the list window expands, firing\n`ClientEvent.Room` on the Matrix client. A new listener for that event\nenqueues each newly-discovered room (using its persisted backfill state\nif present, or a fresh default state) so all rooms are eventually\nindexed, not just the initial 100."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants