Improve chunking and search retrieval quality

## Problem Statement

LibScope's current chunking and search pipelines have several limitations that reduce retrieval quality — the core value prop of a knowledge base. This issue tracks a set of targeted, incremental improvements informed by a codebase audit and independent technical review.

---

## Current State

### Chunking (`src/core/indexing.ts`)
- `chunkContent()` splits on markdown h1-h3 headings, hard-caps at 1500 chars
- Injects `` breadcrumbs as HTML comments (noisy for embeddings)
- When `maxChunkSize` is hit, cuts at the current line — no paragraph awareness
- **Chunk embeddings contain only `chunk.content`** — document title, library, version, and topic are never embedded (biggest quality hole)
- No inter-chunk overlap (while `chunkContentStreaming()` has 10% overlap)

### Search (`src/core/search.ts`)
- Sequential fallback: vector → FTS5 → LIKE (no hybrid fusion)
- FTS5 query uses OR logic (`"React" OR "hooks"`) — extremely noisy for multi-word queries
- Title never factors into ranking
- Count query re-executes the full search wrapped in `SELECT COUNT(*)`
- No retrieval quality tests exist — all tests are purely functional

---

## Proposed Improvements

### Improvement 0: Retrieval Quality Benchmark & Regression Gate — P0 (FIRST)

Create a curated test corpus (~10-15 realistic docs across overlapping topics), ~20 test queries with ground-truth expected results, and metric functions (Recall@k, MRR). Establish baseline quality metrics for the current FTS5 implementation, then use those baselines as **CI regression gates** — any future change that degrades retrieval quality fails CI. Thresholds ratchet upward as improvements land.

**Components:**
- `tests/fixtures/benchmark-corpus.ts` — curated test documents
- `tests/fixtures/benchmark-queries.ts` — ground-truth query set
- `tests/fixtures/benchmark-metrics.ts` — Recall@k, MRR metric functions
- `tests/benchmark/retrieval-quality.test.ts` — benchmark test suite

**Constraints:** Runs against FTS5 only (test DB has no sqlite-vec). Uses `MockEmbeddingProvider`. This is acceptable — FTS5 is the production fallback path.

---

### Improvement 1: Metadata Embedding — P0

Prepend structured document metadata (title, library, version, topic) to each chunk **before computing its embedding**. Store original chunk content without prefix in the DB.

```typescript
const metadataPrefix = [
  `Title: ${input.title}`,
  input.library ? `Library: ${input.library}` : null,
].filter(Boolean).join("\n") + "\n\n";

const chunksForEmbedding = chunks.map(chunk => metadataPrefix + chunk);
const embeddings = await provider.embedBatch(chunksForEmbedding);
```

**Impact:** Fixes the biggest quality hole — metadata-based queries can't match semantically today.

---

### Improvement 2: Chunk Overlap — P0

Add configurable overlap (~150 chars, ~10% of maxChunkSize) between consecutive chunks. Tail of chunk N is prepended to chunk N+1. Align `chunkContentStreaming()` to use the same configurable overlap.

**Impact:** Industry-standard RAG practice. Significantly improves recall for boundary-spanning queries.

---

### Improvement 3: Replace HTML Comment Breadcrumbs — P1

Replace `` with plain text prefix `H1 > H2\n`. HTML comments waste embedding token budget and dilute semantic signal.

---

### Improvement 4: Title Boosting — P1

Boost result score when query terms appear in document title (case-insensitive, configurable factor ~1.5x). Include in `scoreExplanation.boostFactors`. Trivial to implement (~30 lines).

---

### Improvement 5: FTS5 AND-by-Default — P1

Change FTS5 query from `"w1" OR "w2"` to `"w1" "w2"` (implicit AND). Support quoted phrase search. **No auto-OR fallback** — return zero results rather than confusing non-deterministic behavior.

---

### Improvement 6: Hybrid Search (RRF) — P0-deferred

Run vector + FTS5 search, merge via Reciprocal Rank Fusion (`score = Σ 1/(60 + rank)`). Add `searchMode` option: `hybrid`/`vector`/`keyword`. Graceful fallback. Deferred until foundation is solid.

**Testing strategy:** RRF fusion function is pure (takes two ranked lists → merged list) — fully testable without sqlite-vec. Unit tests mock vector path.

---

### Improvement 7: Lazy Count — P2

Make `totalCount` optional, add `hasMore: boolean`. Skip expensive double count query by default. Opt-in via `countMode: 'exact'`.

---

### Improvement 8: Paragraph-Boundary Splitting — P3

When maxChunkSize is hit, scan backward for `\n\n` within last 200 chars. If found, split there. No code block tracking, no strategy enum. 80% of benefit for 20% of complexity.

---

## Implementation Priority & Order

| # | Improvement | Impact | Effort | Priority |
|---|-------------|--------|--------|----------|
| 0 | Quality Benchmark & Regression Gate | 🔴 High | Medium | **P0 — FIRST** |
| 1 | Metadata Embedding | 🔴 High | Low | **P0** |
| 2 | Chunk Overlap | 🔴 High | Low | **P0** |
| 3 | Text Breadcrumbs | 🟠 Medium | Low | **P1** |
| 4 | Title Boosting | 🟠 Medium | Low | **P1** |
| 5 | FTS5 AND-by-Default | 🟠 Medium | Low | **P1** |
| 6 | Hybrid Search (RRF) | 🔴 High | Medium | **P0-deferred** |
| 7 | Lazy Count | 🟢 Low | Medium | **P2** |
| 8 | Paragraph Splitting | 🟢 Low | Low | **P3** |

**Order:** 0 → 1 → 2 → 3 → 4 → 5 → 6 → 7 → 8

## Notes

- **Quality-first:** Benchmark (0) is implemented first so every subsequent improvement is objectively measurable. Thresholds ratchet upward after each improvement.
- **Re-indexing:** Chunking improvements (1-3, 8) require existing docs to be re-indexed. Consider a `chunking_version` field for selective re-indexing.
- **API compatibility:** Improvement 7 changes `SearchResponse` shape — use feature flags / make `totalCount` optional.
- **Testing:** Hybrid search RRF fusion logic should be a pure function, testable without sqlite-vec.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve chunking and search retrieval quality #362

Problem Statement

Current State

Chunking (`src/core/indexing.ts`)

Search (`src/core/search.ts`)

Proposed Improvements

Improvement 0: Retrieval Quality Benchmark & Regression Gate — P0 (FIRST)

Improvement 1: Metadata Embedding — P0

Improvement 2: Chunk Overlap — P0

Improvement 3: Replace HTML Comment Breadcrumbs — P1

Improvement 4: Title Boosting — P1

Improvement 5: FTS5 AND-by-Default — P1

Improvement 6: Hybrid Search (RRF) — P0-deferred

Improvement 7: Lazy Count — P2

Improvement 8: Paragraph-Boundary Splitting — P3

Implementation Priority & Order

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

#	Improvement	Impact	Effort	Priority
0	Quality Benchmark & Regression Gate	🔴 High	Medium	P0 — FIRST
1	Metadata Embedding	🔴 High	Low	P0
2	Chunk Overlap	🔴 High	Low	P0
3	Text Breadcrumbs	🟠 Medium	Low	P1
4	Title Boosting	🟠 Medium	Low	P1
5	FTS5 AND-by-Default	🟠 Medium	Low	P1
6	Hybrid Search (RRF)	🔴 High	Medium	P0-deferred
7	Lazy Count	🟢 Low	Medium	P2
8	Paragraph Splitting	🟢 Low	Low	P3

Improve chunking and search retrieval quality #362

Description

Problem Statement

Current State

Chunking (src/core/indexing.ts)

Search (src/core/search.ts)

Proposed Improvements

Improvement 0: Retrieval Quality Benchmark & Regression Gate — P0 (FIRST)

Improvement 1: Metadata Embedding — P0

Improvement 2: Chunk Overlap — P0

Improvement 3: Replace HTML Comment Breadcrumbs — P1

Improvement 4: Title Boosting — P1

Improvement 5: FTS5 AND-by-Default — P1

Improvement 6: Hybrid Search (RRF) — P0-deferred

Improvement 7: Lazy Count — P2

Improvement 8: Paragraph-Boundary Splitting — P3

Implementation Priority & Order

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Chunking (`src/core/indexing.ts`)

Search (`src/core/search.ts`)