Skip to content

Commit 8de2525

Browse files
bwlclaude
andcommitted
feat: Add database schema for document-aware chunking (Phase 1)
This commit introduces the foundational schema and types needed to support document chunking while maintaining backward compatibility with existing data. Schema Changes: - Added `is_chunk`, `parent_document_id`, `chunk_order` columns to nodes table - Added `edge_type` column to edges table (semantic, parent-child, sequential, manual) - All new columns have appropriate defaults for backward compatibility Type Updates: - NodeRecord now includes chunking metadata fields - EdgeRecord now includes edgeType discriminator - Created parseNodeRow() and parseEdgeRow() helper functions Core Infrastructure: - src/core/import.ts - Document import with chunking and structural edges - src/lib/chunking.ts - Header-based and size-based chunking strategies Updated all edge/node creation sites to include new required fields: - Semantic edges marked as 'semantic' (auto-linking) - Manual edges marked as 'manual' (user-created links) - Parent-child edges marked as 'parent-child' (document structure) - Sequential edges marked as 'sequential' (chunk ordering) - All nodes marked with isChunk=false unless explicitly chunked Next phases will add: - Document reconstruction utilities - UX updates to hide chunks and show parent documents - forest import command for explicit document ingestion 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 80ce3e4 commit 8de2525

10 files changed

Lines changed: 1014 additions & 74 deletions

File tree

src/cli/commands/admin-recompute-embeddings.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,7 @@ async function runAdminRecomputeEmbeddings(flags: AdminRecomputeEmbeddingsFlags)
9090
targetId,
9191
score,
9292
status: status as EdgeStatus,
93+
edgeType: 'semantic',
9394
metadata: { components },
9495
createdAt: new Date().toISOString(),
9596
updatedAt: new Date().toISOString(),

src/cli/commands/capture.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,9 @@ async function runCapture(flags: CaptureFlags) {
130130
embedding,
131131
createdAt: new Date().toISOString(),
132132
updatedAt: new Date().toISOString(),
133+
isChunk: false,
134+
parentDocumentId: null,
135+
chunkOrder: null,
133136
};
134137

135138
const existingNodes = await listNodes();

src/cli/commands/edges.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -769,6 +769,7 @@ async function runEdgesUndo(ref: string | undefined) {
769769
targetId,
770770
score: (event.payload?.score as number) ?? 0,
771771
status: 'suggested',
772+
edgeType: 'semantic',
772773
metadata: event.payload?.metadata ?? null,
773774
createdAt: new Date().toISOString(),
774775
updatedAt: new Date().toISOString(),
@@ -786,6 +787,7 @@ async function runEdgesUndo(ref: string | undefined) {
786787
targetId,
787788
score: (event.payload?.score as number) ?? 0,
788789
status: 'suggested',
790+
edgeType: 'semantic',
789791
metadata: event.payload?.metadata ?? null,
790792
createdAt: new Date().toISOString(),
791793
updatedAt: new Date().toISOString(),

0 commit comments

Comments
 (0)