MFT/USN-journal indexing for local NTFS roots (+ search/index fixes & perf)#10
Conversation
Adds the approved design for an alternative indexing backend that reads the NTFS MFT directly (full size/date parity) and tracks changes via the USN journal, with auto-detect of elevation + NTFS and transparent fallback to the existing parallel crawler. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…xing Two phased, TDD, bite-sized plans: - Phase 1 (mft-bulk-scan): pure parsers (data-run, FILE record, path build, entry assembler, strategy selector) + Win32 volume interop + crawler fallback. - Phase 2 (usn-incremental): frn/journal schema, USN parser, journal interop, delta application, incremental-first refresh with rescan-on-gap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lets callers build an entry from already-known parts (name + parent dir), avoiding per-file path parsing. FromFileSystem now delegates to it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Crawler: work-queue parallel directory enumeration (uses CrawlParallelism, default raised to 8) with serialized onBatch; cheaper entry construction via shared parent string + FromComponents. - IndexStore: drop 4 secondary indexes never used by any query (search is in-memory) — ~1.5x faster bulk inserts; add temp_store/mmap/cache pragmas. - SearchEngine: PLINQ filter (AsParallel.AsOrdered) above 20k entries. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Content search now scans the whole index filtered by size/date/type (not the name match), so it actually finds text inside files; made cancellable. - Placeholder hint aligned to the real caret origin (border + padding). - RefreshAsync builds the crawler with the configured parallelism. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pre-flight review caught a prefix-match bug in the Phase 1 entry assembler (C:\Me would capture C:\MeToo); use an exact-or-separator-boundary check and add a sibling-prefix test case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… tests Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y parser Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The plan's example read volume geometry from the wrong offsets; align it with the implemented NtfsVolumeData.Parse (BytesPerCluster@0x2C, BytesPerFileRecordSegment@0x30, MftStartLcn@0x40). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…a tested parser Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The plan read the cursor from 0x08 (FirstUsn); NextUsn is at 0x10. Align with the implemented pure UsnJournalData.Parse. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 84c5dc0c14
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| var frns = changes.Select(c => c.Frn).Distinct().ToList(); | ||
| // Old rows for every touched FRN go first (handles delete + rename-away cleanly). | ||
| _store.RemoveByFrn(rootId, frns); |
There was a problem hiding this comment.
Reconcile subtree paths for directory USN changes
When the USN batch contains a directory rename or move, Windows reports the changed directory FRN but not every descendant, while this code deletes/reinserts only the directly touched FRNs. Because child rows store the old ParentPath, all files under that directory remain indexed at their previous paths until each child happens to change; directory moves out of the indexed root similarly leave stale descendants. Directory FRNs need a subtree update/rescan rather than only RemoveByFrn on the touched set.
Useful? React with 👍 / 👎.
| if (!vol.DeviceControl(NativeMethods.FSCTL_READ_USN_JOURNAL, input, outBuf, out var returned)) | ||
| return (startUsn, Array.Empty<UsnChange>()); |
There was a problem hiding this comment.
Treat failed USN reads as requiring a rescan
If the stored cursor has fallen behind the journal or FSCTL_READ_USN_JOURNAL fails transiently, this returns an empty change list with the original startUsn; the caller then applies no deltas and stores that cursor as if the refresh succeeded. In that scenario any deletes or renames in the missed range remain stale indefinitely, so a failed read should be surfaced as a journal mismatch/full-rescan condition rather than "no changes".
Useful? React with 👍 / 👎.
| catch (Exception) // MFT path failed → transparent fallback to the crawler | ||
| { | ||
| mgr.UpdateRoot(id, path, token, progress); |
There was a problem hiding this comment.
Clear USN cursors after crawler fallback
When the MFT path throws after a previous successful MFT scan, this fallback rewrites rows through the crawler, whose entries have Frn == null, but the root's stored usn_journal_id/usn_next is left intact. The next elevated refresh will take the incremental branch and RemoveByFrn cannot delete or move those null-FRN rows, leaving stale or duplicate paths for later deletes/renames; the fallback should clear the USN state or force the next MFT pass to do a full scan.
Useful? React with 👍 / 👎.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Summary
Adds WizFile/Everything-style indexing for local NTFS drives by reading the MFT directly and tracking changes via the USN journal, plus a batch of search/index fixes and speedups. The fast path is pure upside: it activates only when the process is elevated and the root is a local fixed NTFS volume, and any failure transparently falls back to the existing crawler — non-elevated / network / non-NTFS roots behave exactly as before.
Delivered in three parts (see
docs/superpowers/specsanddocs/superpowers/plans):Search/index fixes & perf
CrawlParallelism(hides SMB latency).FileEntryconstruction (shared parent string) → ~5× faster; PRAGMA tuning; PLINQ filter above 20k entries.Phase 1 — MFT bulk scan
New
src/NetSearch.Core/Native/layer: pure, unit-tested parsers (DataRunParser,MftRecordParser,PathBuilder,MftEntryAssembler,NtfsVolumeData) + Win32 interop (NativeMethods,NtfsVolume,MftEnumerator) +IndexStrategySelector. Wired intoIndexManager/MainViewModelwith per-root backend choice and crawler fallback.Phase 2 — USN incremental
entries.frncolumn + per-root journal cursor,UsnRecordParser,UsnJournalData,UsnJournalinterop,IndexManager.ApplyUsnDeltas(delete-then-reinsert by FRN), incremental-first refresh with full rescan on journal gap/rotation.Testing
dotnet build: 0 warnings, 0 errors (Corenet9.0, Appnet9.0-windows).dotnet test: 83/83 passing. All byte-level parsing is unit-tested against synthetic records.docs/superpowers/manual-checks/(mft-bulk-scan.md, usn-incremental.md). The transparent fallback keeps the app correct even if the native path has a latent issue.Notes / non-blocking follow-ups
ReadMftExtentsintentionally throws on a corrupt MFT record 0 (→ crawler fallback) rather than returning empty extents, which would otherwise look like a zero-entry success.🤖 Generated with Claude Code