|
| 1 | +## Canonical Document Model |
| 2 | + |
| 3 | +Forest now treats every multi-chunk import as a canonical document with derived segment nodes. |
| 4 | + |
| 5 | +### Schema |
| 6 | + |
| 7 | +- **documents** |
| 8 | + - `id`: canonical document identifier (root node id when present) |
| 9 | + - `title`, `body`: authoritative markdown |
| 10 | + - `metadata`: JSON blob (`chunkStrategy`, `overlap`, `autoLink`, etc.) |
| 11 | + - `version`: increments with each edit |
| 12 | + - `root_node_id`: optional pointer to the document summary node |
| 13 | + - `created_at`, `updated_at` |
| 14 | +- **document_chunks** |
| 15 | + - `document_id`, `segment_id`, `node_id` |
| 16 | + - `offset`, `length`: byte offsets into canonical body (joined with `\n\n`) |
| 17 | + - `chunk_order`: rendering order |
| 18 | + - `checksum`: sha256 of the chunk body for change detection |
| 19 | + - timestamps mirror the underlying chunk node |
| 20 | + |
| 21 | +### Lifecycle |
| 22 | + |
| 23 | +1. **Import** (`importDocumentCore`) |
| 24 | + - Generates root and chunk nodes. |
| 25 | + - Seeds `documents` + `document_chunks` with metadata, offsets, and checksums. |
| 26 | + - Structural edges (parent-child, sequential) are built on top of the same nodes to keep progressive IDs stable. |
| 27 | +2. **Edit via `forest node edit`** |
| 28 | + - Loads a document session, renders the full document, and lets the user edit each segment between `<!-- forest:segment ... -->` markers. |
| 29 | + - Only segments with actual textual deltas are re-embedded and rescored; the document version and chunk records are rewritten atomically. |
| 30 | +3. **Refresh via flags (`forest node refresh`)** |
| 31 | + - Keeps the legacy flag/file workflow but detects chunk membership. |
| 32 | + - After updating the chunk node, the canonical document is rebuilt (version bump, checksums + offsets updated) and a concise log is emitted. |
| 33 | +4. **Existing standalone notes** |
| 34 | + - Continue to bypass the document session; they remain untouched by the canonical pipeline. |
| 35 | + |
| 36 | +### Rescoring and Edge Integrity |
| 37 | + |
| 38 | +- `rescoreNode` deletes edges whose new score drops below the suggestion threshold and updates accepted edges in place. |
| 39 | +- Only chunks with edited content trigger rescoring; unchanged segments retain their embeddings and edges. |
| 40 | +- The canonical update carries over `lastEditedAt` / `lastEditedNodeId` metadata to aid observability. |
| 41 | + |
| 42 | +### Migration / Backfill |
| 43 | + |
| 44 | +- On boot, `backfillCanonicalDocuments` scans for chunk nodes lacking canonical entries and populates `documents` + `document_chunks`. |
| 45 | +- Run `bun run dev -- node recent --json` after upgrade to confirm documents now show versioned metadata. |
| 46 | +- If backfill fails (e.g., due to malformed chunks), rerun the CLI once the data is corrected—the migration is idempotent. |
| 47 | + |
| 48 | +### Error Recovery |
| 49 | + |
| 50 | +- `forest node edit` preserves the temp file path when parsing fails so the user can inspect/repair the document. |
| 51 | +- `forest node refresh` prints a document-update summary; if segments remain unchanged the tool logs “document unchanged (no structural delta).” |
| 52 | +- Re-run `forest node refresh <chunk> --no-auto-link` to rebuild canonical data without rescoring if needed. |
| 53 | + |
| 54 | +### Rollout Checklist |
| 55 | + |
| 56 | +1. **Backup existing database.** `cp forest.db forest.db.backup-$(date +%Y%m%d)` |
| 57 | +2. **Upgrade CLI** and launch once—automatic backfill populates canonical tables. |
| 58 | +3. **Verify schema:** |
| 59 | + - `bunx tsx -e "const { listDocuments } = require('./src/lib/db'); listDocuments().then(console.log);"` |
| 60 | + - Spot-check a document via `forest node read <doc-root>`; confirm the footer shows chunk IDs. |
| 61 | +4. **Smoke test editing:** edit a chunk with `forest node edit <chunk-id> --no-auto-link` and confirm the console emits the document version bump. |
| 62 | +5. **Monitor logs:** watch for `document updated:` output in CI or production logs; unexpected `document unchanged` messages may indicate user edits were reverted by tooling. |
| 63 | +6. **Re-run lint/tests:** `bun run lint` and `bun test`. |
0 commit comments