feat(search): Implement semantic chunk merging and strict search modes by doITmagic · Pull Request #34 · doITmagic/rag-code-mcp

doITmagic · 2026-03-07T17:38:11Z

Description

This PR addresses the issue of search result "pollution" when querying large documents. Previously, large Markdown, HTML, YAML, or JSON files indexed with TreeSitter would dominate the top search results with multiple fragmented chunks, pushing out relevant backend code.

Changes included:

Tree-based Chunk Merging (Deduplication): A post-retrieval processing step groupDocsByTree that intelligently merges adjacent or overlapping chunks from the same file and AST signature back into a single cohesive block. It safely loads any missing gap lines directly from disk.
Strict Search Modes: Added a new Mode field to the rag_search tool (strict_code, strict_docs, all), allowing AI agents to explicitly filter out documentation or code to avoid context pollution.
Added robust unit tests verifying gap retrieval, graceful fallbacks, and multi-file merging prevention.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update

Checklist:

I have performed a self-review of my own code
I have formatted my code with go fmt ./...
I have run tests go test ./... and they pass
I have verified integration with Ollama/Qdrant (if applicable)
I have updated the documentation accordingly

Copilot

Pull request overview

This PR addresses search result "pollution" from large documentation files by introducing: (1) a post-retrieval chunk merging step that consolidates adjacent/overlapping chunks from the same file and AST signature into unified blocks, reading gap lines directly from disk; and (2) new Mode field (strict_code, strict_docs, all) for the rag_search tool to explicitly filter documentation or code results.

Changes:

Added mode-based post-retrieval filtering (strict_code, strict_docs, all) in Execute.
Added groupDocsByTree to merge doc chunks by (filePath, signature) key, with disk-based gap fill and fallback to [...] concatenation.
Added TestGroupDocsByTree covering gap retrieval, multi-file isolation, and fallback behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File	Description
`internal/service/tools/smart_search.go`	Adds `Mode` field to `SmartSearchInput`, mode-based result filtering in `Execute`, `readLines` helper, and `groupDocsByTree` merging function.
`internal/service/tools/smart_search_test.go`	Unit tests for `groupDocsByTree` covering code pass-through, doc grouping with gap fill, multi-file isolation, and disk-fallback behavior.

- Auto-enable IncludeDocs when mode=strict_docs - Use bufio.Scanner in readLines for memory efficiency - Use struct groupKey instead of string separator for collision safety - Extract shared isDocSymbolType/isDocExtension helpers for consistency - Remove non-doc extensions (.sh, .sql, .css, .scss, .svelte) from isDocFile - Fix misleading comment about code_block in test - Add error check for os.WriteFile in test

doITmagic added 2 commits March 7, 2026 19:34

feat(search): implement tree-based deduplication and strict modes

0d4dd84

test: add edge case unit tests for AST chunk merging

d735d3b

Copilot AI review requested due to automatic review settings March 7, 2026 17:38

doITmagic self-assigned this Mar 7, 2026

Copilot started reviewing on behalf of doITmagic March 7, 2026 17:38 View session

Copilot AI reviewed Mar 7, 2026

View reviewed changes

doITmagic merged commit 9a1c628 into dev Mar 7, 2026
5 checks passed

doITmagic deleted the feat/smart-search-merge branch March 7, 2026 20:02

doITmagic pushed a commit that referenced this pull request Mar 8, 2026

docs: rebuild V2 documentation and fix PR #34 comments

2200f9e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(search): Implement semantic chunk merging and strict search modes#34

feat(search): Implement semantic chunk merging and strict search modes#34
doITmagic merged 3 commits intodevfrom
feat/smart-search-merge

doITmagic commented Mar 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

doITmagic commented Mar 7, 2026

Description

Type of change

Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants