@@ -327,13 +327,120 @@ When implementing a new feature that should be available in both CLI and API:
327327Uses ** sql.js** (SQLite compiled to WASM) with in-memory database persisted to disk on mutation.
328328
329329** Core types:**
330- - ` NodeRecord ` : Nodes with id, title, body, tags, tokenCounts, optional embedding
331- - ` EdgeRecord ` : Edges with sourceId, targetId, score, status ('accepted' | 'suggested')
330+ - ` NodeRecord ` : Nodes with id, title, body, tags, tokenCounts, optional embedding, isChunk, parentDocumentId, chunkOrder
331+ - ` EdgeRecord ` : Edges with sourceId, targetId, score, status ('accepted' | 'suggested'), edgeType
332+ - ` DocumentRecord ` : Canonical documents with id, title, body, metadata, version, rootNodeId, timestamps
333+ - ` DocumentChunkRecord ` : Segment mappings with documentId, segmentId, nodeId, offset, length, chunkOrder, checksum, timestamps
332334
333335** Key pattern** : Database is lazily initialized on first access. All mutations set ` dirty = true ` and persist to disk.
334336
335337Database path controlled by ` FOREST_DB_PATH ` env var (default: ` forest.db ` in cwd).
336338
339+ ** Database Schema:**
340+ ``` sql
341+ -- Core node storage
342+ nodes: id, title, body, tags, token_counts, embedding, created_at, updated_at,
343+ is_chunk, parent_document_id, chunk_order
344+
345+ -- Edge relationships
346+ edges: id, source_id, target_id, score, status, edge_type, metadata, created_at, updated_at
347+
348+ -- Canonical document storage
349+ documents: id, title, body, metadata, version, root_node_id, created_at, updated_at
350+
351+ -- Document-to-node mappings
352+ document_chunks: document_id, segment_id, node_id, offset, length, chunk_order, checksum,
353+ created_at, updated_at
354+
355+ -- Edge event history (for undo)
356+ edge_events: id, edge_id, source_id, target_id, prev_status, next_status, payload,
357+ created_at, undone
358+
359+ -- Metadata key-value store
360+ metadata: key, value
361+ ```
362+
363+ ### Canonical Document Model (src/lib/db.ts, src/core/import.ts)
364+
365+ Forest treats multi-chunk imports as first-class documents with versioned canonical storage.
366+
367+ ** Architecture:**
368+ - ** Canonical body** : Stored in ` documents.body ` as the authoritative source, reconstructed from ` \n\n ` -joined segment bodies
369+ - ** Chunk mappings** : ` document_chunks ` table tracks byte offsets, lengths, and SHA-256 checksums for each segment
370+ - ** Version tracking** : ` documents.version ` increments on every edit; metadata stores ` lastEditedAt ` and ` lastEditedNodeId `
371+ - ** Automatic backfill** : On startup, ` backfillCanonicalDocuments() ` scans for chunk nodes without canonical entries (idempotent)
372+
373+ ** Lifecycle:**
374+
375+ 1 . ** Import** (` importDocumentCore ` in ` src/core/import.ts ` )
376+ - Chunks document via ` chunkDocument() ` using headers/size/hybrid strategy
377+ - Creates root node (optional summary) and chunk nodes with ` isChunk=true ` , ` parentDocumentId ` set
378+ - Inserts canonical document record with metadata (chunkStrategy, maxTokens, overlap, etc.)
379+ - Creates ` document_chunks ` mappings with offsets and checksums
380+ - Builds structural edges: parent-child (root→chunks) and sequential (chunk[ i] →chunk[ i+1] )
381+ - Optionally auto-links against existing graph
382+
383+ 2 . ** Edit** (` forest node edit <chunk-id> ` )
384+ - Detects chunk membership via ` loadDocumentSessionForNode() `
385+ - Renders full document with ` <!-- forest:segment --> ` markers (HTML comments)
386+ - User edits in their preferred editor
387+ - Parser validates all segments present, IDs match, no orphans
388+ - Only modified segments get re-embedded and rescored (selective performance optimization)
389+ - Canonical body, version, and chunk records updated atomically
390+ - Console shows: "document updated: <title > (version 1 → 2, segments touched: 2)"
391+
392+ 3 . ** Refresh** (` forest node refresh <chunk-id> ` )
393+ - Updates chunk node via flags/files/stdin
394+ - Detects chunk membership and rebuilds canonical document
395+ - Version bumps, checksums and offsets recalculated
396+ - Logs: "document updated" or "document unchanged (no structural delta)"
397+
398+ 4 . ** Standalone notes**
399+ - Nodes with ` isChunk=false ` bypass document session entirely
400+ - Edited as independent entities with no canonical storage
401+
402+ ** Key behaviors:**
403+ - ** Selective re-embedding** : Unchanged segments retain embeddings and edges (avoids expensive recomputation)
404+ - ** Checksum-based change detection** : SHA-256 of normalized content enables efficient diffing
405+ - ** Temp file preservation** : Parse failures save edited content to ` /tmp/forest-edit-* ` for debugging
406+ - ** Segment reordering** : Parser detects order changes; ` chunkOrder ` updated accordingly
407+ - ** Error recovery** : Validation errors show line numbers; user can fix and retry
408+
409+ ** Editor buffer format example:**
410+ ``` markdown
411+ # Forest Document Editor
412+ # Document: My Research Paper (7fa7acb2)
413+ # Total segments: 3
414+
415+ <!-- forest:segment start segment_id=seg-1 node_id=abc123 order=0 title="Introduction" -->
416+ This is the introduction content...
417+ <!-- forest:segment end segment_id=seg-1 -->
418+
419+ <!-- forest:segment start segment_id=seg-2 node_id=def456 order=1 title="Methods" focus=true -->
420+ This is the methods section...
421+ <!-- forest:segment end segment_id=seg-2 -->
422+ ```
423+
424+ ** Metadata schema** (DocumentRecord.metadata):
425+ ``` typescript
426+ {
427+ // Import settings
428+ chunkStrategy : ' headers' | ' size' | ' hybrid' ,
429+ maxTokens : number ,
430+ overlap : number ,
431+ chunkCount : number ,
432+ source : ' import' | ' backfill' ,
433+
434+ // Edit tracking
435+ lastEditedAt : ISO8601 timestamp ,
436+ lastEditedNodeId : UUID ,
437+
438+ // Backfill flags
439+ backfill : boolean ,
440+ chunkOrdersProvided : boolean
441+ }
442+ ```
443+
337444### Scoring Algorithm (src/lib/scoring.ts)
338445
339446** Hybrid scoring** computes edge weights between node pairs:
0 commit comments