Skip to content

Commit bf7dfd0

Browse files
committed
first draft of document model
1 parent bd871ac commit bf7dfd0

7 files changed

Lines changed: 1303 additions & 61 deletions

File tree

docs/document-model.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
## Canonical Document Model
2+
3+
Forest now treats every multi-chunk import as a canonical document with derived segment nodes.
4+
5+
### Schema
6+
7+
- **documents**
8+
- `id`: canonical document identifier (root node id when present)
9+
- `title`, `body`: authoritative markdown
10+
- `metadata`: JSON blob (`chunkStrategy`, `overlap`, `autoLink`, etc.)
11+
- `version`: increments with each edit
12+
- `root_node_id`: optional pointer to the document summary node
13+
- `created_at`, `updated_at`
14+
- **document_chunks**
15+
- `document_id`, `segment_id`, `node_id`
16+
- `offset`, `length`: byte offsets into canonical body (joined with `\n\n`)
17+
- `chunk_order`: rendering order
18+
- `checksum`: sha256 of the chunk body for change detection
19+
- timestamps mirror the underlying chunk node
20+
21+
### Lifecycle
22+
23+
1. **Import** (`importDocumentCore`)
24+
- Generates root and chunk nodes.
25+
- Seeds `documents` + `document_chunks` with metadata, offsets, and checksums.
26+
- Structural edges (parent-child, sequential) are built on top of the same nodes to keep progressive IDs stable.
27+
2. **Edit via `forest node edit`**
28+
- Loads a document session, renders the full document, and lets the user edit each segment between `<!-- forest:segment ... -->` markers.
29+
- Only segments with actual textual deltas are re-embedded and rescored; the document version and chunk records are rewritten atomically.
30+
3. **Refresh via flags (`forest node refresh`)**
31+
- Keeps the legacy flag/file workflow but detects chunk membership.
32+
- After updating the chunk node, the canonical document is rebuilt (version bump, checksums + offsets updated) and a concise log is emitted.
33+
4. **Existing standalone notes**
34+
- Continue to bypass the document session; they remain untouched by the canonical pipeline.
35+
36+
### Rescoring and Edge Integrity
37+
38+
- `rescoreNode` deletes edges whose new score drops below the suggestion threshold and updates accepted edges in place.
39+
- Only chunks with edited content trigger rescoring; unchanged segments retain their embeddings and edges.
40+
- The canonical update carries over `lastEditedAt` / `lastEditedNodeId` metadata to aid observability.
41+
42+
### Migration / Backfill
43+
44+
- On boot, `backfillCanonicalDocuments` scans for chunk nodes lacking canonical entries and populates `documents` + `document_chunks`.
45+
- Run `bun run dev -- node recent --json` after upgrade to confirm documents now show versioned metadata.
46+
- If backfill fails (e.g., due to malformed chunks), rerun the CLI once the data is corrected—the migration is idempotent.
47+
48+
### Error Recovery
49+
50+
- `forest node edit` preserves the temp file path when parsing fails so the user can inspect/repair the document.
51+
- `forest node refresh` prints a document-update summary; if segments remain unchanged the tool logs “document unchanged (no structural delta).”
52+
- Re-run `forest node refresh <chunk> --no-auto-link` to rebuild canonical data without rescoring if needed.
53+
54+
### Rollout Checklist
55+
56+
1. **Backup existing database.** `cp forest.db forest.db.backup-$(date +%Y%m%d)`
57+
2. **Upgrade CLI** and launch once—automatic backfill populates canonical tables.
58+
3. **Verify schema:**
59+
- `bunx tsx -e "const { listDocuments } = require('./src/lib/db'); listDocuments().then(console.log);"`
60+
- Spot-check a document via `forest node read <doc-root>`; confirm the footer shows chunk IDs.
61+
4. **Smoke test editing:** edit a chunk with `forest node edit <chunk-id> --no-auto-link` and confirm the console emits the document version bump.
62+
5. **Monitor logs:** watch for `document updated:` output in CI or production logs; unexpected `document unchanged` messages may indicate user edits were reverted by tooling.
63+
6. **Re-run lint/tests:** `bun run lint` and `bun test`.

0 commit comments

Comments
 (0)