blokzdev · blokzdev · Jun 3, 2026 · Jun 3, 2026
diff --git a/docs/BACKLOG.md b/docs/BACKLOG.md
@@ -8,6 +8,17 @@
 _(nothing active — pick the next batch from below)_
 
 ## Deferred / future refinements
+- [ ] **GraphRAG — retrieval-only answer persistence.** On low / ineligible tiers (`ragAvailability ==
+      retrievalOnly`) P13d falls back to an ephemeral "most relevant items" answer that **isn't** saved to the
+      chat history (nothing is generated to revisit). Decide at d-3 whether these should be persisted as a
+      special turn type or kept purely transient. *(From P13d-1.)*
+- [ ] **GraphRAG — per-tier history-budget tuning.** P13d-1's `fitHistory` bounds the fed-back chat history by
+      a char budget (`historyCharBudget`, default 1500). The budget should scale with the device tier / model
+      context window (flagship → deeper) rather than a single constant; tune against real models at d-3.
+      *(From P13d-1.)*
+- [ ] **GraphRAG — LLM + Cozo HNSW RAM co-residency.** "Ask your library" runs the generation model **and**
+      the live HNSW vector index in RAM together. Validate co-residency (and tune retrieval `k` / source caps)
+      on real low/mid devices so it doesn't OOM — carried from P12d-2; verified at P13d-3. *(From P13d-1.)*
 - [ ] **Library "hide / filter AI tags" facet.** P13c-2 marks AI-applied tags (`media_tags.source = 'ai'`)
       and shows a ✦ on their chips, but the library tag facet (`watchDistinctTags`) treats them like any tag.
       Add a "hide AI tags" / "AI-tagged only" filter (and maybe a bulk "remove all AI tags on this item") if

diff --git a/docs/VERIFICATION.md b/docs/VERIFICATION.md
@@ -985,11 +985,18 @@ entries, or verify after P11c lands.)*
 - [ ] A **manually-added** tag has no marker. **Default off:** downloads aren't auto-tagged; with generation
       off there's a one-time "finish setting up auto-tagging" nudge; the queue still drains.
 
+### P13d-1 — GraphRAG retrieval engine  *(CI-covered; no APK check)*
+- No on-device check: P13d-1 ships the **pure-Dart retrieval/context engine** only (no UI, schema, or
+      native path). It's exercised by unit tests (fake embedder + graph + seeded in-memory metadata). The
+      end-to-end **"Ask your library"** flow is verified at P13d-2 (chat screen + generation).
+
 ### P13 (later subphases)
 - [ ] **Transcription / summarization / translation / OCR** each work (capability-gated) and write
       results back to the item.
-- [ ] **"Ask your library"**: a natural-language question returns a grounded answer citing real
-      library items — **fully offline** (airplane mode).
+- [ ] **"Ask your library"**: a multi-turn chat answers natural-language questions with grounded answers
+      citing real library items — **fully offline** (airplane mode); conversations persist (list / continue
+      / rename / archive / delete); low / ineligible tiers fall back to a retrieval-only "most relevant
+      items" answer.
 - [ ] **Graph-clustered auto-albums**, **"Rediscover"** (centrality), and **path/bridge** discovery
       produce sensible results.
 - [ ] All P13 features gate gracefully on incapable devices.

diff --git a/docs/design/P13-PLAN.md b/docs/design/P13-PLAN.md
@@ -213,29 +213,51 @@ user-curated (they drive facets), AI tags are **marked** (provenance) rather tha
   'ai' + entry; default-off no-op). **No deps.** **Pending APK spot-check** (real download → AI-marked tags +
   facets, offline). A library "hide/filter AI tags" facet is deferred (BACKLOG).
 
-### `[ ]` P13d — Local GraphRAG "Ask your library" *(flagship; split into 3 PRs)*
+### `[~]` P13d — Local GraphRAG "Ask your library" *(flagship; split into 4 PRs)*
 The headline differentiator — natural-language Q&A grounded in the private library, fully on-device
 (AI-SPEC §6, GRAPH-SPEC §7). Sequenced **mid-phase** so the generation patterns (P13a/c) are proven first.
+**Revised target (maintainer call): a real multi-turn chat**, not single-shot — persistent conversations
+(list / continue / rename / archive / delete) on capable tiers, each turn re-retrieving **fresh RAG sources**
+plus a **bounded recent-history window** whose depth scales with the device tier; entry from the **Dashboard**.
+Incapable / low tiers fall back to an ephemeral **retrieval-only** answer (d-3).
 
-#### `[ ]` P13d-1 — Retrieval + context & citation assembly *(pure Dart; CI-verifiable)*
-- A pure-Dart **retrieval/context packer** that reuses `GraphQueryService.relatedTo` (vector + graph re-rank)
-  and `neighborhood` to select the most relevant nodes + their graph neighborhood for a query, then assembles
-  a **bounded, cited** context block (node → deep-linkable item) and the generation prompt. No UI, no model —
-  fully unit-testable.
-- **Exit / review:** for a seeded graph, the packer returns the expected relevant nodes + a well-formed,
-  size-bounded prompt with stable citations; covered by unit tests.
+#### `[~]` P13d-1 — Retrieval + context & citation assembly *(pure Dart; CI-verifiable)*
+- A pure-Dart **retrieval/context packer** that reuses the P10 semantic substrate (`embedderEngine.embed` →
+  `GraphQueryService.vectorSearch`) plus a light `relatedTo` graph re-rank to select the most relevant items
+  for a query, then assembles a **bounded, cited** context block (item → deep-linkable source) and a
+  **history-aware** generation prompt. No UI, no model, no schema — fully unit-testable.
+- **History-aware prompt builder** (`fitHistory` char-budget knob) so d-2 multi-turn drops in cleanly and the
+  per-tier history depth is a graceful budget, not a hard mode switch.
+- **Status:** implemented (CI-green). New `lib/features/ai/data/`: `rag_context.dart` (pure — `RagSource`,
+  `RagChatTurn`, `RagContext`, `kRagSystemPrompt`, `buildSourceSnippet`, `selectRagSources`, `fitHistory`,
+  `buildRagPrompt`), `rag_availability.dart` (pure — `RagAvailability {unavailable, retrievalOnly, full}` +
+  `ragAvailability(...)`, the d-3 gate), `rag_retriever.dart` (`RagRetriever` + provider: embed → vectorSearch
+  → `relatedTo` re-rank → hydrate via `MetadataRepository` → cited context; empty-sources when retrieval isn't
+  ready). Tests: prompt/snippet/`fitHistory`/`selectRagSources`, the `ragAvailability` truth table, and the
+  retriever with fake embedder + graph + seeded in-memory metadata. **No deps, no schema, no UI.**
+- **Exit / review:** for seeded sources, the retriever returns the expected ordered, cited items + a
+  well-formed, size-bounded, history-aware prompt; degrades to empty-sources when retrieval is unavailable;
+  covered by unit tests. ✓
 
-#### `[ ]` P13d-2 — Chat UI + streaming grounded answer + citations *(native; APK)*
-- A dedicated **"Ask your library"** screen (reached from Dashboard/Library) that runs P13d-1's context
-  through `GenerationEngine.generate()` and **streams** a grounded answer with **tappable citations** that
-  deep-link to the cited library items.
-- **Exit / review:** ask a natural-language question on a capable device and get a streamed, grounded answer
-  citing real library items **offline**; citations navigate correctly. APK spot-check.
+#### `[ ]` P13d-2a — Chat schema + Ask screen (single conversation) *(native; APK)*
+- Drift **`chats` + `chat_messages`** schema; a dedicated **"Ask your library"** screen from the Dashboard
+  that runs P13d-1's per-turn fresh retrieval + bounded history through `GenerationEngine.generate()` and
+  **streams** a grounded answer with **tappable citations** deep-linking to the cited items. Generation-gated
+  via `aiSummaryAction` (on-ramp when no model).
+- **Exit / review:** ask a natural-language question on a capable device → a streamed, grounded, cited answer
+  **offline**; the turn persists; citations navigate. APK spot-check.
 
-#### `[ ]` P13d-3 — Low-tier fallback + RAM co-residency validation *(native; APK)*
-- On ineligible / low tiers, fall back to **retrieval-only** ("here are the most relevant items") plus the
-  extractive summary — no generation, clearly framed. Validate **LLM + Cozo HNSW RAM co-residency** on real
-  devices (the index lives in RAM with the model — BACKLOG from P12d-2) and tune limits.
+#### `[ ]` P13d-2b — Conversation list + manage *(native)*
+- A conversation **list** with **continue / rename / archive / delete**; resuming a chat re-feeds the bounded
+  history into each new turn's prompt.
+- **Exit / review:** prior chats list, reopen and continue with retained context, and archive/delete/rename
+  behave; covered where CI can (provider/repository) + an APK spot-check for the flow.
+
+#### `[ ]` P13d-3 — Low-tier fallback + tier-aware depth + RAM co-residency *(native; APK)*
+- On ineligible / low tiers (`ragAvailability == retrievalOnly`), fall back to an ephemeral **retrieval-only**
+  answer ("here are the most relevant items") — no generation, clearly framed, nothing persisted. Tune the
+  **tier-aware history-depth** budget. Validate **LLM + Cozo HNSW RAM co-residency** on real devices (the index
+  lives in RAM with the model — BACKLOG from P12d-2) and tune limits.
 - **Exit / review:** a low-end device gives a useful retrieval-only answer without OOM; a capable device runs
   generation + the live HNSW index together within memory budget (verified on real hardware).
 

diff --git a/lib/features/ai/data/rag_availability.dart b/lib/features/ai/data/rag_availability.dart
@@ -0,0 +1,31 @@
+/// Pure decision for whether "Ask your library" (P13d) can run, and at what
+/// level — so the UI gates consistently and d-3's retrieval-only fallback has a
+/// single source of truth.
+library;
+
+/// What the Ask feature can do on this device right now.
+enum RagAvailability {
+  /// No retrieval index (no embedder / graph) — the feature can't run.
+  unavailable,
+
+  /// Retrieval works but there's no generation model (low/ineligible tier) →
+  /// answer with "most relevant items" only (d-3 fallback), no LLM.
+  retrievalOnly,
+
+  /// Full GraphRAG: retrieve + generate a grounded, cited answer.
+  full,
+}
+
+/// [generationEligible] is whether the device tier offers a generation model;
+/// [embedderReady] is whether semantic search (the query embedder) is ready;
+/// [graphAvailable] is whether the on-device graph/vector store is usable.
+RagAvailability ragAvailability({
+  required bool generationEligible,
+  required bool embedderReady,
+  required bool graphAvailable,
+}) {
+  if (!embedderReady || !graphAvailable) return RagAvailability.unavailable;
+  return generationEligible
+      ? RagAvailability.full
+      : RagAvailability.retrievalOnly;
+}
diff --git a/lib/features/ai/data/rag_context.dart b/lib/features/ai/data/rag_context.dart
@@ -0,0 +1,137 @@
+/// Pure, engine-free building blocks for the local GraphRAG "Ask your library"
+/// retrieval (P13d-1): the grounding-source + context types, the prompt builder,
+/// source selection, and history-window fitting. Kept out of the retriever/UI so
+/// the prompt shape + bounds are unit-testable in isolation.
+library;
+
+/// System instruction: answer only from the provided sources, cite them, and
+/// admit ignorance rather than invent. On-device; nothing leaves the device.
+const String kRagSystemPrompt =
+    "You answer questions about the user's personal media library using ONLY "
+    'the numbered sources provided. Cite the sources you use inline as [n]. If '
+    'the sources do not contain the answer, say you do not know — never invent '
+    'items, facts, or citations.';
+
+/// One retrieved library item used to ground an answer. [index] is its 1-based
+/// citation number; [snippet] is the compact, capped text the model sees.
+class RagSource {
+  const RagSource({
+    required this.index,
+    required this.itemId,
+    required this.title,
+    required this.snippet,
+  });
+
+  final int index;
+  final String itemId;
+  final String title;
+  final String snippet;
+}
+
+/// A prior question/answer turn, for multi-turn history (fed back, bounded).
+class RagChatTurn {
+  const RagChatTurn({required this.question, required this.answer});
+  final String question;
+  final String answer;
+}
+
+/// The assembled retrieval context + prompt for one question.
+class RagContext {
+  const RagContext({
+    required this.question,
+    required this.sources,
+    required this.systemPrompt,
+    required this.prompt,
+  });
+
+  final String question;
+  final List<RagSource> sources;
+  final String systemPrompt;
+  final String prompt;
+
+  bool get hasSources => sources.isNotEmpty;
+}
+
+/// Builds a compact, capped grounding snippet for one item from its signals.
+/// Prefers the distilled `aiSummary` over the raw description; includes a slice
+/// of the transcript + OCR text; whole thing is truncated to [maxChars].
+String buildSourceSnippet({
+  String? uploader,
+  List<String> tags = const [],
+  String? description,
+  String? transcript,
+  String? aiSummary,
+  String? ocrText,
+  int maxChars = 400,
+}) {
+  String? clean(String? s) =>
+      (s != null && s.trim().isNotEmpty) ? s.trim() : null;
+  final parts = <String>[
+    if (clean(uploader) != null) 'by ${uploader!.trim()}',
+    if (tags.isNotEmpty) 'tags: ${tags.join(', ')}',
+    ?(clean(aiSummary) ?? clean(description)),
+    if (clean(transcript) != null) transcript!.trim(),
+    if (clean(ocrText) != null) 'text in image: ${ocrText!.trim()}',
+  ];
+  final joined = parts.join(' · ');
+  return joined.length > maxChars
+      ? joined.substring(0, maxChars).trimRight()
+      : joined;
+}
+
+/// De-duplicates [orderedIds] (preserving order) and caps to [max] — the final
+/// source set, most-relevant first.
+List<String> selectRagSources(List<String> orderedIds, {int max = 6}) {
+  final seen = <String>{};
+  final out = <String>[];
+  for (final id in orderedIds) {
+    if (seen.add(id)) out.add(id);
+    if (out.length >= max) break;
+  }
+  return out;
+}
+
+/// Keeps the most **recent** history turns that fit within [charBudget]
+/// (oldest dropped first), returned chronologically. The tier knob's mechanism:
+/// a smaller budget on smaller models feeds back less history.
+List<RagChatTurn> fitHistory(List<RagChatTurn> turns, int charBudget) {
+  final kept = <RagChatTurn>[];
+  var used = 0;
+  for (final t in turns.reversed) {
+    final cost = t.question.length + t.answer.length;
+    if (used + cost > charBudget && kept.isNotEmpty) break;
+    kept.add(t);
+    used += cost;
+    if (used >= charBudget) break;
+  }
+  return kept.reversed.toList();
+}
+
+/// Assembles the user prompt: a bounded slice of prior turns (if any), the
+/// numbered sources, and the question.
+String buildRagPrompt(
+  String question,
+  List<RagSource> sources, {
+  List<RagChatTurn> history = const [],
+  int historyCharBudget = 1500,
+}) {
+  final b = StringBuffer();
+  final fitted = fitHistory(history, historyCharBudget);
+  if (fitted.isNotEmpty) {
+    b.writeln('Conversation so far:');
+    for (final t in fitted) {
+      b
+        ..writeln('Q: ${t.question}')
+        ..writeln('A: ${t.answer}');
+    }
+    b.writeln();
+  }
+  b.writeln('Sources:');
+  for (final s in sources) {
+    b.writeln('[${s.index}] ${s.title} — ${s.snippet}');
+  }
+  b
+    ..writeln()
+    ..write('Question: $question');
+  return b.toString();
+}
diff --git a/lib/features/ai/data/rag_retriever.dart b/lib/features/ai/data/rag_retriever.dart
@@ -0,0 +1,91 @@
+import 'package:flutter_riverpod/flutter_riverpod.dart';
+import 'package:grabbit/core/ai/embedder_engine_provider.dart';
+import 'package:grabbit/core/graph/graph_query_provider.dart';
+import 'package:grabbit/features/ai/data/rag_context.dart';
+import 'package:grabbit/features/library/data/metadata_repository.dart';
+import 'package:grabbit/features/library/presentation/semantic_search_provider.dart';
+
+/// Retrieves the most relevant library items for a question and assembles the
+/// grounding context + prompt for the local LLM (P13d-1) — the engine the Ask
+/// chat (d-2) drives. Reuses the existing semantic-search substrate (embed →
+/// vector search) + a light graph expansion; degrades to an empty-sources
+/// context when retrieval isn't available (no embedder / empty index) so the
+/// caller can fall back gracefully. No generation here — that's d-2.
+class RagRetriever {
+  RagRetriever(this._ref);
+
+  final Ref _ref;
+
+  /// Retrieves sources for [question] and builds the prompt. [history] (prior
+  /// turns) is folded in, bounded by [historyCharBudget] (the tier knob).
+  Future<RagContext> retrieve(
+    String question, {
+    List<RagChatTurn> history = const [],
+    int historyCharBudget = 1500,
+    int maxSources = 6,
+    int k = 30,
+  }) async {
+    final q = question.trim();
+    final empty = RagContext(
+      question: q,
+      sources: const [],
+      systemPrompt: kRagSystemPrompt,
+      prompt: '',
+    );
+    if (q.isEmpty) return empty;
+    // Retrieval needs the query embedder ready; the vector search itself returns
+    // [] when the graph/index is unavailable.
+    if (!await _ref.read(semanticSearchReadyProvider.future)) return empty;
+
+    final vector = await _ref.read(embedderEngineProvider).embed(q);
+    final query = _ref.read(graphQueryServiceProvider);
+    final hits = await query.vectorSearch(vector, k: k);
+    if (hits.isEmpty) return empty;
+
+    // Light graph re-rank: add a few items connected to the top hit so context
+    // isn't purely vector-nearest (bounded; cheap on modest libraries).
+    final related = await query.relatedTo(hits.first.id, limit: 4);
+    final ids = selectRagSources([
+      for (final h in hits) h.id,
+      ...related,
+    ], max: maxSources);
+
+    final repo = _ref.read(metadataRepositoryProvider);
+    final sources = <RagSource>[];
+    for (final id in ids) {
+      final item = await repo.mediaItemById(id);
+      if (item == null) continue;
+      final meta = await repo.metadataForItem(id);
+      final tags = await repo.tagNamesForItem(id);
+      sources.add(
+        RagSource(
+          index: sources.length + 1,
+          itemId: id,
+          title: item.title,
+          snippet: buildSourceSnippet(
+            uploader: meta?.uploader,
+            tags: tags,
+            description: meta?.description,
+            transcript: meta?.transcript,
+            aiSummary: meta?.aiSummary,
+            ocrText: meta?.ocrText,
+          ),
+        ),
+      );
+    }
+    if (sources.isEmpty) return empty;
+    return RagContext(
+      question: q,
+      sources: sources,
+      systemPrompt: kRagSystemPrompt,
+      prompt: buildRagPrompt(
+        q,
+        sources,
+        history: history,
+        historyCharBudget: historyCharBudget,
+      ),
+    );
+  }
+}
+
+final ragRetrieverProvider = Provider<RagRetriever>(RagRetriever.new);