Skip to content

feat: wire TreeSitterChunker into LibScopeLite.index() via preChunked#461

Merged
RobertLD merged 1 commit intofix/export-treesitter-chunkerfrom
feat/treesitter-preChunked-lite-index
Mar 19, 2026
Merged

feat: wire TreeSitterChunker into LibScopeLite.index() via preChunked#461
RobertLD merged 1 commit intofix/export-treesitter-chunkerfrom
feat/treesitter-preChunked-lite-index

Conversation

@RobertLD
Copy link
Copy Markdown
Owner

Summary

  • Add preChunked?: string[] to IndexDocumentInput — when provided, indexDocument uses these chunks directly, bypassing the markdown chunker
  • LibScopeLite.index() now checks doc.language: if set and supported by TreeSitterChunker, pre-chunks the content at function/class boundaries and passes result as preChunked; falls back silently to the text chunker on any error
  • Update LiteDoc docs (lite.md, lite-api.md) to mark setting language as the preferred approach over using TreeSitterChunker directly
  • 7 new tests across lite.test.ts and indexing.test.ts

Test plan

  • npm run typecheck — no new errors
  • npm test — 1488 tests pass (7 new)
  • index() with supported language → chunk() called, results passed as preChunked
  • index() with unsupported/no language → text chunker used, no exception
  • index() when tree-sitter throws → falls back silently, indexing succeeds
  • indexDocument with preChunked → chunks stored verbatim in DB
  • indexDocument with empty/undefined preChunked → normal text chunking

🤖 Generated with Claude Code

Add `preChunked?: string[]` to `IndexDocumentInput` — when provided,
`indexDocument` skips the markdown chunker and uses the caller's chunks
directly.

`LibScopeLite.index()` now checks `doc.language`: if set and supported,
it pre-chunks the content with `TreeSitterChunker` and passes the result
as `preChunked`. Falls back silently to the text chunker on any error
(tree-sitter not installed, parse failure, etc.).

Consumers set `language: "cpp"` (or any supported alias) on their
`LiteDoc` and get function/class-boundary chunks automatically.
Docs updated to note this as the preferred approach over using
`TreeSitterChunker` directly.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
libscope Ignored Ignored Preview Mar 19, 2026 9:25pm

@RobertLD RobertLD merged commit e14cfe6 into fix/export-treesitter-chunker Mar 19, 2026
2 checks passed
@RobertLD RobertLD deleted the feat/treesitter-preChunked-lite-index branch March 19, 2026 21:23
@sonarqubecloud
Copy link
Copy Markdown

RobertLD added a commit that referenced this pull request Mar 19, 2026
* fix: export TreeSitterChunker and CodeChunk from libscope/lite

TreeSitterChunker was compiled but not re-exported from the ./lite
entry point, making it inaccessible to consumers using the package
exports map.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat: wire TreeSitterChunker into LibScopeLite.index() via preChunked (#461)

Add `preChunked?: string[]` to `IndexDocumentInput` — when provided,
`indexDocument` skips the markdown chunker and uses the caller's chunks
directly.

`LibScopeLite.index()` now checks `doc.language`: if set and supported,
it pre-chunks the content with `TreeSitterChunker` and passes the result
as `preChunked`. Falls back silently to the text chunker on any error
(tree-sitter not installed, parse failure, etc.).

Consumers set `language: "cpp"` (or any supported alias) on their
`LiteDoc` and get function/class-boundary chunks automatically.
Docs updated to note this as the preferred approach over using
`TreeSitterChunker` directly.

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* style: fix prettier formatting

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant