Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
a2cfdcd
feat: add knowledge base, reranking, and config merge
Apr 2, 2026
6215056
fix: address PR review comments and add error handling
Apr 2, 2026
7f0395b
fix segmentation fault
Apr 2, 2026
05f4eef
Merge branch 'main' of https://github.com/fsender/opencode-codebase-i…
Apr 2, 2026
264751a
fix: add parse limit to 4096 levels
Apr 2, 2026
8409030
fix: prevent segmentation fault and optimize performance
Apr 2, 2026
3206ab0
fix: watcher now accounts for additionalInclude patterns
Apr 3, 2026
8b59012
fix CI typecheck
Apr 3, 2026
2e0bbaf
docs‌: update ARCHITECTURE.md for new features.
fsender Apr 3, 2026
e0011da
fix: Reset recursive limit to 1024 to avoid segmentation fault
fsender Apr 3, 2026
e71f0a4
Merge branch 'main' into Resolve-conflict
fsender Apr 4, 2026
eaae319
fix: config tool reading and merging issue
fsender Apr 4, 2026
c6e9389
fix: call graph increase stability
fsender Apr 5, 2026
b14c284
fix: call-extractor
fsender Apr 5, 2026
7bee8c8
fix: remove all parent() calls and fix PHP method call detection
fsender Apr 6, 2026
bd3ceb6
fix: remove parent() calls, upgrade tree-sitter, fix PHP method call …
fsender Apr 6, 2026
c874abb
Merge branch 'main' into Resolve-conflict
fsender Apr 9, 2026
b026180
fix: TXT and MD chunking
fsender Apr 9, 2026
a98ed4f
fix: TXT and MD chunking
fsender Apr 9, 2026
e3d3f5b
Merge remote-tracking branch 'origin/Resolve-conflict'
fsender Apr 9, 2026
55d9ae2
fix: keep short txt and md chunks
Helweg Apr 11, 2026
0ffca37
fix: unblock native parser and call graph tests
Helweg Apr 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 36 additions & 3 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,10 @@ This document explains the architecture of opencode-codebase-index, including da
│ OpenCode Agent │
│ │
│ Tools: codebase_search, codebase_peek, find_similar, call_graph, │
│ index_codebase, index_status, index_health_check, index_metrics, │
│ index_logs │
│ Commands: /search, /find, /call-graph, /index, /status │
│ index_codebase, index_status, index_health_check, index_metrics, │
│ index_logs, add_knowledge_base, list_knowledge_bases, │
│ remove_knowledge_base │
│ Commands: /search, /find, /call-graph, /index, /status │
└─────────────────────────────────────────────────────────────────────────────┘
Expand Down Expand Up @@ -170,6 +171,7 @@ File system observer using chokidar:
- Watches for file changes → triggers incremental index
- Watches `.git/HEAD` → detects branch switches
- Debounces rapid changes (500ms window)
- Merges `additionalInclude` patterns with `include` patterns for proper file filtering

## Design Decisions

Expand Down Expand Up @@ -223,6 +225,21 @@ BM25 hybrid provides:
- Better results for technical queries
- Configurable weighting (hybridWeight)

### Why Optimized Tool Return Formats?

Problem: Redundant prompt phrases in tool responses increase token usage and may cause LLMs to exit reasoning prematurely.

Solution:
- **Remove summary phrases**: e.g., "Found X results", "Index status:", "Health check complete:"
- **Return raw data**: Direct result lists without introductory text
- **Maintain clarity**: Keep essential context for unambiguous results

Benefits:
- Reduced token consumption for LLM tool calls
- Faster LLM processing (less text to parse)
- Better integration with LLM reasoning loops
- Maintained functionality with cleaner output

### Why Branch-Aware Indexing?

Problem: Switching branches changes code but embeddings are expensive.
Expand Down Expand Up @@ -284,6 +301,21 @@ Benefits:

For a typical 500-file codebase (~5000 chunks): ~30MB total

### Tool Call Performance

Tool return formats are optimized to reduce token usage:

| Tool | Before Optimization | After Optimization | Token Savings |
|------|---------------------|-------------------|---------------|
| `codebase_search` | "Found X results for 'query': ..." | Raw result list | ~15-20 tokens |
| `codebase_peek` | "Found X locations for 'query': ..." | Raw result list | ~15-20 tokens |
| `find_similar` | "Found X similar code blocks: ..." | Raw result list | ~15-20 tokens |
| `call_graph` | "X calls Y function(s): ..." | Raw result list | ~10-15 tokens |
| `index_status` | "Index status: ..." | Raw data | ~5-10 tokens |
| `formatHealthCheck` | "Health check complete: ..." | Raw data | ~5-10 tokens |

**Impact**: Reduces LLM context size, improves reasoning loop efficiency, and lowers API costs.

## Security Considerations

### What Gets Sent to Cloud
Expand Down Expand Up @@ -321,6 +353,7 @@ No credentials are stored by the plugin.
- `ts_language()` match arm
- `is_comment_node()` patterns
- `is_semantic_node()` patterns
- Note: Recursion depth is limited to 1024 levels to prevent stack overflow
4. Add tests in `native/src/parser.rs`

### Adding a New Embedding Provider
Expand Down
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added
- **Knowledge base support**: Added `add_knowledge_base`, `list_knowledge_bases`, and `remove_knowledge_base` tools to manage external document folders indexed alongside the project
- **Reranking with SiliconFlow**: Added `BAAI/bge-reranker-v2-m3` reranking support via SiliconFlow API for improved search result quality
- **TXT/HTML file support**: Added `*.txt`, `*.html`, `*.htm` to default include patterns for document indexing
- **Config merging**: Global and project configs are now merged, allowing shared provider settings at global level and knowledge base paths at project level
- **Hidden file exclusion**: Files and folders starting with `.` are now excluded from indexing and file watching
- **Build folder exclusion**: Folders containing "build" in their name (e.g., `build`, `mingwBuildDebug`) are now excluded from indexing and file watching
- **additionalInclude config**: Added new config option to extend default file patterns without replacing them

### Changed
- **Default verbose=false**: Changed `/index` command default to `verbose=false` to reduce token consumption

## [0.6.1] - 2026-03-29

### Added
Expand Down
Loading
Loading