Code aware chunking in RAG strategies by krissetto · Pull Request #967 · docker/docker-agent

krissetto · 2025-11-27T13:23:29Z

The goal of this change is to allow RAG strategy chunking configurations to define they are code_aware.

When code_aware: true, the rag system will use treesitter bindings to parse the AST of the code for the chunking, in order to not spit up logical blocks of code like functions.

Only Go code is supported in this initial implementation, but we can add support for more languages fairly easily in followup PRs.

Example:

...
rag:
  codebase:
    docs: [./src]
    strategies:
      - type: chunked-embeddings
        embedding_model: openai/text-embedding-3-small
        database: ./code.db
        chunking:
          size: 2000
          **code_aware: true**    # <- Enable AST-based chunking
    results:
      limit: 5

The treesitter AST parsing will also be used in follow up rag strategies to gather the semantic meaning of the analyzed code before generating embeddings 😏

krissetto · 2025-11-27T14:02:29Z

needs some extra work to prepare for using CGO in CI

Signed-off-by: Christopher Petito <chrisjpetito@gmail.com>

krissetto requested a review from a team as a code owner November 27, 2025 13:23

krissetto force-pushed the code-aware-chunking branch from b132740 to 87d9f29 Compare November 27, 2025 14:00

Code aware chunking in RAG strategies using treesitter

bc2b832

Signed-off-by: Christopher Petito <chrisjpetito@gmail.com>

krissetto force-pushed the code-aware-chunking branch from 97b6e92 to bc2b832 Compare November 28, 2025 12:16

dgageot approved these changes Nov 28, 2025

View reviewed changes

dgageot merged commit fa676c0 into docker:main Nov 28, 2025
5 checks passed

thaJeztah mentioned this pull request Dec 2, 2025

build(deps): bump softprops/action-gh-release from 2.4.2 to 2.5.0 docker/packaging#329

Merged

crazy-max mentioned this pull request Dec 5, 2025

cagent: fix build since switch to CGo docker/packaging#333

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code aware chunking in RAG strategies#967

Code aware chunking in RAG strategies#967
dgageot merged 1 commit intodocker:mainfrom
krissetto:code-aware-chunking

krissetto commented Nov 27, 2025

Uh oh!

krissetto commented Nov 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

krissetto commented Nov 27, 2025

Uh oh!

krissetto commented Nov 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants