[backend] CorpusBackend — wiki/raw knowledge storage abstracted from filesystem walks

## Background

`<agent>/wiki/` (Atomic Wiki — distilled knowledge corpus, Karpathy-style) and `<agent>/raw/` (source documents that feed the wiki) are filesystem directories. Discovery is recursive walks; reads are direct `frontmatter.load()`.

There's no abstraction layer between the agent and the corpus storage.

## Why it matters

Memory backend (#57) handles short-term, agent-evolving notes. Wiki/raw is a different beast — long-term, often-large knowledge corpus. The scaling pressures are different:

- **Volume.** A wiki/raw corpus can run into hundreds of MB or GB (PDFs, transcripts, scraped content). Filesystem reads stay fast for hundreds of files but degrade. SQLite is fine to ~10GB; Postgres FTS handles larger. A `VectorCorpusBackend` makes RAG-style retrieval possible.
- **Search semantics.** Filesystem grep is keyword-only. Operators want semantic search across the corpus ("find every wiki page where I discussed avalanche vs snowball"). That's a `query(text, top_k)` shape on a backend, not a directory walk.
- **Multi-tenant.** SaaS deployment with shared corpora (every tenant's agent reads the same internal knowledge base) needs the wiki to live somewhere other than per-tenant filesystem.
- **Sync / ingestion.** A backend lets ingestion be a write API, not "drop files in `raw/` and hope the watcher picks them up."

## What to change

1. New module `atomic_agents/corpus/` with `backend.py` (Protocol) + `filesystem.py` (default wrapping current `wiki/` + `raw/` walks).
2. `CorpusBackend` protocol exposes:
   - `list_pages(corpus="wiki" | "raw")` → list of `CorpusRef`
   - `read_page(name, corpus)` → `CorpusPage` (body + metadata)
   - `write_page(name, content, corpus)` — wiki distillation writes
   - `query(text, top_k, corpus)` — semantic search (capability-gated; FS backend can fall back to keyword)
   - `stats(corpus)` — page count, total bytes, last update
   - `ingest(source, corpus)` — operator-driven addition (capability-gated)
3. Migrate call sites: agent prompt assembly (wiki INDEX read), wiki writers (if any), dashboard wiki tab.
4. Spec doc `docs/spec/26-corpus-backend.md`.

## Relationship to MemoryBackend

Memory and corpus are intentionally separate primitives. Memory is short, agent-mutable, behavioral. Corpus is long, distilled, knowledge-shaped. They share a write-path discipline but have very different access patterns. Don't collapse.

## Future backends

- `SQLiteCorpusBackend` — single-box, fast keyword + FTS5 full-text search
- `PostgresCorpusBackend` — multi-tenant, full Postgres FTS
- `PgvectorCorpusBackend` — semantic search via embeddings
- `S3CorpusBackend` — large file storage with metadata in DB
- `ChromaCorpusBackend` / `WeaviateCorpusBackend` — purpose-built vector stores

## Acceptance

- Existing wiki tests + raw read paths pass with `FilesystemCorpusBackend` as default.
- Protocol conformance suite (~12 tests) — list, read, write, stats, query (with FS fallback to keyword).
- One vector-shaped mock backend proves the semantic-search capability fits.

## Open questions

- Should `wiki` and `raw` be separate corpus types in the same backend, or two backend instances? (Probably same backend, different "corpus" parameter — consistent with how filesystem implements them as sibling directories.)
- Embeddings storage: per-page embedding column vs. external vector store? Backend choice.
- Page versioning: same as memory (`.versions/` per page), or different model? Probably same to keep operators' mental model consistent.

## Context

- Surfaced in scaling review of post-#57 hardcoded items (2026-05-08), item #5 (wiki/raw portion)
- Run-history portion (outcomes/dreams/evals) split into separate LogBackend issue
- Pattern reference: `MemoryBackend` from PR #57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[backend] CorpusBackend — wiki/raw knowledge storage abstracted from filesystem walks #65

Background

Why it matters

What to change

Relationship to MemoryBackend

Future backends

Acceptance

Open questions

Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[backend] CorpusBackend — wiki/raw knowledge storage abstracted from filesystem walks #65

Description

Background

Why it matters

What to change

Relationship to MemoryBackend

Future backends

Acceptance

Open questions

Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions