Skip to content

Commit ef66a25

Browse files
bwlclaude
andcommitted
Address PR review feedback: comprehensive improvements
Implements all P0, P1, and P2 items from code review: ## P0 (Blockers) - Add 12 comprehensive tests for document-session parser - Round-trip edits, multi-segment edits, error cases - Segment reordering, validation, whitespace handling - Update CLAUDE.md with canonical document model section - Architecture, lifecycle, editor buffer format - Database schema with all new tables - Fix node refresh persistence bug - updateNode now called to persist changes to DB ## P1 (Recommended) - Add /api/v1/documents endpoints - GET /documents - list all - GET /documents/:id - get by ID - GET /documents/:id/chunks - get chunks - Add backfill logging for observability - Shows count when backfilling canonical documents - Type DocumentMetadata properly - Replaces Record<string, unknown> with typed interface - Includes import settings, edit tracking, backfill flags ## P2 (Nice to Have) - Add CLI document commands - forest documents list - forest documents show [id] --chunks - forest documents stats - Improve bun:test types - Replace stub with @types/bun package Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent bf7dfd0 commit ef66a25

20 files changed

Lines changed: 971 additions & 28 deletions

File tree

.claude/settings.local.json

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,9 @@
88
"WebFetch(domain:platform.openai.com)",
99
"WebFetch(domain:raw.githubusercontent.com)",
1010
"Bash(gh pr view:*)",
11-
"Bash(gh pr diff:*)"
11+
"Bash(gh pr diff:*)",
12+
"Bash(bun test:*)",
13+
"Bash(bun run lint)"
1214
],
1315
"deny": [],
1416
"ask": []

CLAUDE.md

Lines changed: 109 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -327,13 +327,120 @@ When implementing a new feature that should be available in both CLI and API:
327327
Uses **sql.js** (SQLite compiled to WASM) with in-memory database persisted to disk on mutation.
328328

329329
**Core types:**
330-
- `NodeRecord`: Nodes with id, title, body, tags, tokenCounts, optional embedding
331-
- `EdgeRecord`: Edges with sourceId, targetId, score, status ('accepted' | 'suggested')
330+
- `NodeRecord`: Nodes with id, title, body, tags, tokenCounts, optional embedding, isChunk, parentDocumentId, chunkOrder
331+
- `EdgeRecord`: Edges with sourceId, targetId, score, status ('accepted' | 'suggested'), edgeType
332+
- `DocumentRecord`: Canonical documents with id, title, body, metadata, version, rootNodeId, timestamps
333+
- `DocumentChunkRecord`: Segment mappings with documentId, segmentId, nodeId, offset, length, chunkOrder, checksum, timestamps
332334

333335
**Key pattern**: Database is lazily initialized on first access. All mutations set `dirty = true` and persist to disk.
334336

335337
Database path controlled by `FOREST_DB_PATH` env var (default: `forest.db` in cwd).
336338

339+
**Database Schema:**
340+
```sql
341+
-- Core node storage
342+
nodes: id, title, body, tags, token_counts, embedding, created_at, updated_at,
343+
is_chunk, parent_document_id, chunk_order
344+
345+
-- Edge relationships
346+
edges: id, source_id, target_id, score, status, edge_type, metadata, created_at, updated_at
347+
348+
-- Canonical document storage
349+
documents: id, title, body, metadata, version, root_node_id, created_at, updated_at
350+
351+
-- Document-to-node mappings
352+
document_chunks: document_id, segment_id, node_id, offset, length, chunk_order, checksum,
353+
created_at, updated_at
354+
355+
-- Edge event history (for undo)
356+
edge_events: id, edge_id, source_id, target_id, prev_status, next_status, payload,
357+
created_at, undone
358+
359+
-- Metadata key-value store
360+
metadata: key, value
361+
```
362+
363+
### Canonical Document Model (src/lib/db.ts, src/core/import.ts)
364+
365+
Forest treats multi-chunk imports as first-class documents with versioned canonical storage.
366+
367+
**Architecture:**
368+
- **Canonical body**: Stored in `documents.body` as the authoritative source, reconstructed from `\n\n`-joined segment bodies
369+
- **Chunk mappings**: `document_chunks` table tracks byte offsets, lengths, and SHA-256 checksums for each segment
370+
- **Version tracking**: `documents.version` increments on every edit; metadata stores `lastEditedAt` and `lastEditedNodeId`
371+
- **Automatic backfill**: On startup, `backfillCanonicalDocuments()` scans for chunk nodes without canonical entries (idempotent)
372+
373+
**Lifecycle:**
374+
375+
1. **Import** (`importDocumentCore` in `src/core/import.ts`)
376+
- Chunks document via `chunkDocument()` using headers/size/hybrid strategy
377+
- Creates root node (optional summary) and chunk nodes with `isChunk=true`, `parentDocumentId` set
378+
- Inserts canonical document record with metadata (chunkStrategy, maxTokens, overlap, etc.)
379+
- Creates `document_chunks` mappings with offsets and checksums
380+
- Builds structural edges: parent-child (root→chunks) and sequential (chunk[i]→chunk[i+1])
381+
- Optionally auto-links against existing graph
382+
383+
2. **Edit** (`forest node edit <chunk-id>`)
384+
- Detects chunk membership via `loadDocumentSessionForNode()`
385+
- Renders full document with `<!-- forest:segment -->` markers (HTML comments)
386+
- User edits in their preferred editor
387+
- Parser validates all segments present, IDs match, no orphans
388+
- Only modified segments get re-embedded and rescored (selective performance optimization)
389+
- Canonical body, version, and chunk records updated atomically
390+
- Console shows: "document updated: <title> (version 1 → 2, segments touched: 2)"
391+
392+
3. **Refresh** (`forest node refresh <chunk-id>`)
393+
- Updates chunk node via flags/files/stdin
394+
- Detects chunk membership and rebuilds canonical document
395+
- Version bumps, checksums and offsets recalculated
396+
- Logs: "document updated" or "document unchanged (no structural delta)"
397+
398+
4. **Standalone notes**
399+
- Nodes with `isChunk=false` bypass document session entirely
400+
- Edited as independent entities with no canonical storage
401+
402+
**Key behaviors:**
403+
- **Selective re-embedding**: Unchanged segments retain embeddings and edges (avoids expensive recomputation)
404+
- **Checksum-based change detection**: SHA-256 of normalized content enables efficient diffing
405+
- **Temp file preservation**: Parse failures save edited content to `/tmp/forest-edit-*` for debugging
406+
- **Segment reordering**: Parser detects order changes; `chunkOrder` updated accordingly
407+
- **Error recovery**: Validation errors show line numbers; user can fix and retry
408+
409+
**Editor buffer format example:**
410+
```markdown
411+
# Forest Document Editor
412+
# Document: My Research Paper (7fa7acb2)
413+
# Total segments: 3
414+
415+
<!-- forest:segment start segment_id=seg-1 node_id=abc123 order=0 title="Introduction" -->
416+
This is the introduction content...
417+
<!-- forest:segment end segment_id=seg-1 -->
418+
419+
<!-- forest:segment start segment_id=seg-2 node_id=def456 order=1 title="Methods" focus=true -->
420+
This is the methods section...
421+
<!-- forest:segment end segment_id=seg-2 -->
422+
```
423+
424+
**Metadata schema** (DocumentRecord.metadata):
425+
```typescript
426+
{
427+
// Import settings
428+
chunkStrategy: 'headers' | 'size' | 'hybrid',
429+
maxTokens: number,
430+
overlap: number,
431+
chunkCount: number,
432+
source: 'import' | 'backfill',
433+
434+
// Edit tracking
435+
lastEditedAt: ISO8601 timestamp,
436+
lastEditedNodeId: UUID,
437+
438+
// Backfill flags
439+
backfill: boolean,
440+
chunkOrdersProvided: boolean
441+
}
442+
```
443+
337444
### Scoring Algorithm (src/lib/scoring.ts)
338445

339446
**Hybrid scoring** computes edge weights between node pairs:

bun.lock

Lines changed: 9 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

package.json

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,8 @@
1010
"lint": "bunx tsc --noEmit",
1111
"dev": "bunx tsx src/index.ts",
1212
"dev:server": "bunx tsx src/server/index.ts",
13-
"start": "bun run dist/index.js"
13+
"start": "bun run dist/index.js",
14+
"test": "bun test"
1415
},
1516
"keywords": [
1617
"graph",
@@ -34,6 +35,7 @@
3435
"sql.js": "^1.11.0"
3536
},
3637
"devDependencies": {
38+
"@types/bun": "^1.3.0",
3739
"@types/marked-terminal": "^6.1.1",
3840
"@types/node": "^20.11.20",
3941
"@types/sql.js": "^1.4.9",

0 commit comments

Comments
 (0)