Skip to content

Search tool: add projectDir, rebuild, and search-first prompt nudge#121

Merged
m-aebrer merged 7 commits intomasterfrom
feature/issue-119-120-search-tool-improvements
Apr 8, 2026
Merged

Search tool: add projectDir, rebuild, and search-first prompt nudge#121
m-aebrer merged 7 commits intomasterfrom
feature/issue-119-120-search-tool-improvements

Conversation

@m-aebrer
Copy link
Copy Markdown
Collaborator

@m-aebrer m-aebrer commented Apr 8, 2026

Closes #119
Closes #120

Three bundled improvements to the search tool:

  1. projectDir parameter — lets the model index/search arbitrary directories (critical for Telegram sessions where cwd is ~/)
  2. rebuild parameter — force a clean index rebuild when results look wrong (recovery path for stale/corrupt indexes)
  3. Prompt guideline update — gently encourage search-first exploration over grep/find for broad questions

Implementation plan posted as a comment below.

@m-aebrer
Copy link
Copy Markdown
Collaborator Author

m-aebrer commented Apr 8, 2026

Implementation Plan

Three bundled changes: projectDir parameter (issue 119), rebuild parameter (issue 120), and a prompt nudge toward search-first exploration.

Key Scenario

In Telegram sessions (cwd=~/), the engine creates a single index at ~/.dreb/index/search.db covering everything under home. If results for a specific project go stale, there is no way to rebuild just that project — the only option is nuking the entire home index. The combo of projectDir + rebuild solves this:

  • search(query, projectDir: "~/projects/foo") → separate index at ~/projects/foo/.dreb/index/search.db
  • search(query, projectDir: "~/projects/foo", rebuild: true) → nukes only that project index, home index untouched

Deliverables

  1. projectDir parameter — optional string on the search tool that overrides the index root directory
  2. rebuild parameter — optional boolean that forces a clean index rebuild
  3. resetIndex() method on SearchEngine — closes the IndexManager (preserves embedder), deletes the DB file
  4. Prompt guideline update — nudge toward search for initial exploration, grep/find for exact matches
  5. Tests for all new behavior

Files to Modify

File Changes
packages/coding-agent/src/core/tools/search.ts Add projectDir and rebuild to schema; resolve projectDir via resolveToCwd; pass resolved dir to getSearchEngine; call engine.resetIndex() when rebuild: true; update formatSearchCall; update promptGuidelines
packages/coding-agent/src/core/search/search.ts Add resetIndex() method: close IndexManager (not embedder), delete DB file, null out indexManager so lazy init re-creates
packages/coding-agent/src/core/system-prompt.ts Update hardcoded exploration guideline to mention search when available
packages/coding-agent/test/search/search-tool.test.ts Add tests for both new parameters and the key multi-project scenario

Change Details

search.ts (tool) — Schema:

  • Add projectDir: Type.Optional(Type.String(...)) — directory to index/search instead of cwd
  • Add rebuild: Type.Optional(Type.Boolean(...)) — force clean rebuild

search.ts (tool) — Execute:

  • Resolve: const resolvedProjectDir = projectDir ? resolveToCwd(projectDir, cwd) : cwd (same pattern as grep/find)
  • Pass resolvedProjectDir to getSearchEngine() instead of bare cwd
  • Before engine.search(): if rebuild, call engine.resetIndex()

search.ts (tool) — formatSearchCall:

  • Show projectDir when provided (e.g., search "query" in ~/projects/foo)
  • Show [rebuild] indicator when rebuild is true

search.ts (tool) — promptGuidelines:

  • Replace existing guidelines with:
    • "Use search for initial exploration and broad questions about the codebase. Use grep when you know the exact text or pattern."
    • "The first search query builds an index (may take 10-60s). Subsequent queries are fast."

search/search.ts — New resetIndex() method:

  • Close indexManager (DB connection) but preserve embedder (expensive ONNX model, unrelated to index corruption)
  • Delete <projectRoot>/.dreb/index/search.db via unlinkSync
  • Null out indexManager so next search() call triggers lazy re-creation → fresh DB

system-prompt.ts — Guideline tweak:

  • When search is in the tool list, update the exploration guideline to mention it: something like "Prefer search for exploration, grep/find for exact matches, all over bash"

Testing Approach

Schema tests:

  • projectDir is optional string
  • rebuild is optional boolean

Execute tests (using existing fixture project pattern):

  • projectDir pointing to fixture dir produces results when cwd passed to factory is a different directory
  • projectDir pointing to nonexistent dir returns empty results gracefully
  • rebuild: true produces valid results (index rebuilt from scratch)
  • rebuild: true after a prior search still works (engine cache stays valid, embedder preserved)
  • When neither new param is provided, behavior unchanged (regression)

Multi-project scenario:

  • Create two separate fixture dirs
  • Search with projectDir pointing to each → each gets its own index
  • rebuild: true on one does not affect the other

formatSearchCall rendering:

  • Shows project dir when provided
  • Shows rebuild indicator when true

Acceptance Criteria

  • search tool accepts optional projectDir string parameter
  • When projectDir is provided, search indexes and searches that directory
  • Engine cache key accounts for project directory (separate index per project)
  • Existing behavior unchanged when projectDir is not provided
  • search tool accepts optional rebuild boolean parameter
  • When rebuild: true, existing index DB is deleted before re-indexing
  • Normal (non-rebuild) searches retain incremental behavior
  • The embedder is NOT disposed on rebuild (only the index)
  • projectDir + rebuild targets only that project's DB, leaving other indexes untouched
  • System prompt gently encourages search-first exploration
  • All new behavior has test coverage
  • npm test passes, npx biome check clean on touched files

Risks

  • Path resolution edge cases — mitigated by using resolveToCwd (proven in grep/find)
  • Open DB handle on rebuild — mitigated by closing IndexManager before deleting file; lazy init re-creates on next search
  • Prompt over-rotation — keep the nudge gentle; search and grep have complementary strengths

Plan created by mach6

- Add projectDir parameter to search tool — lets the model index/search
  arbitrary directories instead of cwd (critical for Telegram sessions)
- Add rebuild parameter — force clean index rebuild for stale/corrupt recovery
- Add resetIndex() to SearchEngine — closes DB, deletes index, preserves embedder
- Update system prompt to position search as default exploration tool
- Add 13 new tests covering schema, projectDir, rebuild, isolation, guidelines
@m-aebrer
Copy link
Copy Markdown
Collaborator Author

m-aebrer commented Apr 8, 2026

Progress Update

All three deliverables implemented and tested:

Issue 119 — projectDir parameter:

  • Added projectDir optional string to search schema
  • Resolves via resolveToCwd (same as grep/find) and uses as engine cache key
  • Each unique projectDir gets its own SearchEngine + index DB

Issue 120 — rebuild parameter:

  • Added rebuild optional boolean to search schema
  • New resetIndex() on SearchEngine: closes IndexManager, deletes search.db, preserves embedder
  • Lazy init rebuilds fresh index on next search() call

Prompt nudge — search-first exploration:

  • Search tool guideline: "Use search as your default exploration tool — for understanding code, finding where things are, and answering questions about the codebase. Use grep when you already know the exact text or pattern."
  • System prompt: when search is available, replaces grep/find guidance with "Start with search to explore and understand the codebase"

Tests: 13 new tests — schema (2), projectDir (2), rebuild (2), multi-project isolation (2), system prompt guidelines (2), plus existing tests all passing (2276 passed, 0 failed).

Commit: 1878ca0


Progress tracked by mach6

@m-aebrer
Copy link
Copy Markdown
Collaborator Author

m-aebrer commented Apr 8, 2026

QA Report — Manual Testing via dreb -p

Each test was run against the built binary (dreb -p --no-session) so the agent exercised the new features from the inside as a "man on the inside" QA tester.

Test 1: Basic Search — Tool Parameters Visible

Prompt: Run a basic search and report all accepted parameters.

Result: ✅ Pass

  • Tool executed successfully, returned 20 results
  • Agent confirmed all 5 parameters visible in the schema: query (required), path, limit, projectDir (new), rebuild (new)

Test 2: projectDir Parameter — Index Scoping

Prompt: Search for "render" with projectDir pointing to packages/tui, then again without projectDir.

Result: ✅ Pass

  • With projectDir: 20 results, all scoped to the tui package (e.g. src/tui.ts, test/tui-render.test.ts). No leakage from other packages.
  • Without projectDir: 20 results spanning the full repo (packages/coding-agent/, packages/tui/, etc.)
  • Confirms separate indexes are built per project directory.

Test 3: rebuild Parameter — Force Fresh Index

Prompt: Search for "SearchEngine" with rebuild: true, then again without rebuild.

Result: ✅ Pass

  • With rebuild: Succeeded, indexBuilt: true, 20 results. Full re-index was triggered.
  • Without rebuild: Succeeded, index already up-to-date (no rebuild needed), 20 results. Confirms the freshly rebuilt index is reused normally.

Test 4: System Prompt — Search-First Guidance

Prompt: Inspect your own guidelines section and check for the new search-first guidance.

Result: ✅ Pass

  • New guideline present: "Start with `search` to explore and understand the codebase. Use grep/find/ls for exact text matches and specific file lookups. Prefer all of these over bash."
  • Old guideline "Prefer grep/find/ls tools over bash for file exploration" confirmed absent (replaced).

Summary

Test Feature Status
Basic search All params visible
projectDir Index scoped to target dir
rebuild Forces fresh index build
System prompt Search-first guidance

All features working as designed in the built binary.


QA tested by mach6 via dreb -p

@m-aebrer m-aebrer marked this pull request as ready for review April 8, 2026 14:46
@m-aebrer
Copy link
Copy Markdown
Collaborator Author

m-aebrer commented Apr 8, 2026

Code Review

Critical

No critical findings.

Important

Finding 1 — Duplicate search-first guideline in real sessions (code-reviewer, confidence 95)
system-prompt.ts injects "Start with \search` to explore..."when the search tool is present, ANDsearch.tstool emits its ownpromptGuidelines[0]: "Use `search` as your default exploration tool...". Both pass deduplication (different wording, same meaning). Real sessions show two near-identical bullets. Fix: remove promptGuidelines[0]from the search tool — system-prompt.ts now owns cross-tool guidance. Keep onlypromptGuidelines[1]` (the timing note).

Finding 2 — Non-existent projectDir silently creates directory tree and returns empty results (error-auditor, confidence 85)
When projectDir resolves to a non-existent path: IndexManager.open() calls mkdirSync(indexDir, { recursive: true }) creating .dreb/index/ inside a fabricated directory, scanProject() swallows ENOENT and returns 0 files, tool returns "No results found" — indistinguishable from a legitimate empty project. The engine is also cached for the session. Fix: validate resolvedProjectDir exists and is a directory before constructing the engine; return an explicit error message if not.

Finding 3 — resetIndex() failure leaves engine in inconsistent state (error-auditor, confidence 80)
If unlinkSync throws (EPERM, EACCES, EROFS), indexManager is already null but the DB file persists. The raw exception propagates uncaught. On the next search() call, a new IndexManager opens the old (undeleted) DB and does an incremental update — silently returning stale results. Fix: wrap engine.resetIndex() in try/catch in the execute handler and return a descriptive tool result on failure.

Suggestions

Finding 4 — formatSearchCall() rendering branches completely untested (test-reviewer, confidence 92)
Four conditional branches (projectDir, searchPath, rebuild, limit) are untested. The function is pure — a stub theme where fg(_, s) => s and bold(s) => s would make this trivially testable.

Finding 5 — isSearchAvailable() === false execution path untested (test-reviewer, confidence 87)
The early-return branch returning "Semantic search requires Node.js 22+..." is never exercised. Mock isSearchAvailable to return false and verify the degraded-mode message.

Finding 6 — limit parameter is schema-tested but never behavior-tested (test-reviewer, confidence 82)
No execute test passes a non-default limit and checks that the result count is constrained. With the 2-file fixture, limit: 1 should reliably return exactly 1 result.

Finding 7 — rebuild: true as the very first call (no prior index) untested (test-reviewer, confidence 83)
All rebuild tests first build an index, then rebuild. The path where rebuild: true is the first call — resetIndex() on a fresh engine with null indexManager and no DB — is not tested. The code handles it correctly (?.close() + existsSync), but the behavior is unverified.

Finding 8 — Relative projectDir path not tested (test-reviewer, confidence 80)
All projectDir tests pass absolute paths (from mkdtempSync). The resolveToCwd code path for relative projectDir is never exercised.

Finding 9 — System-prompt tests do not cover the duplication path (code-reviewer, confidence 85)
Tests call buildSystemPrompt without promptGuidelines, so they cannot detect finding 1. A test that passes the search tool's actual promptGuidelines alongside selectedTools: ["search"] and asserts search-first guidance appears exactly once would catch this.

Strengths

  • Clean implementation following existing patterns (resolveToCwd, engine cache)
  • resetIndex() correctly preserves the expensive embedder while only resetting the index
  • WAL/SHM sidecar files handled correctly — close() checkpoints before unlinkSync
  • Thorough multi-project isolation tests
  • QA testing via dreb -p caught real end-to-end behavior
  • All acceptance criteria from both issues are met (verified by completeness-checker)

Agents run: code-reviewer, error-auditor, test-reviewer, completeness-checker


Reviewed by mach6

@m-aebrer
Copy link
Copy Markdown
Collaborator Author

m-aebrer commented Apr 8, 2026

Review Assessment

Review comment

Classifications

Finding Classification Reasoning
1. Duplicate search-first guideline Rejected by user User determined this is intentional — the two guidelines serve complementary purposes (cross-tool guidance vs tool-specific guidance).
2. Non-existent projectDir silently creates directory tree Genuine issue Confirmed. IndexManager.open() calls mkdirSync creating .dreb/index/ inside the non-existent path. walkDirectory swallows ENOENT, tool returns "No results found" — indistinguishable from empty project. A typo silently creates junk directories.
3. resetIndex() failure leaves inconsistent state False positive No try/catch around engine.resetIndex(), so unlinkSync failures propagate as tool errors visible to the agent. The "inconsistent state" on retry would do an incremental update on the existing DB — functionally equivalent to not rebuilding, not corruption.
4. formatSearchCall() rendering branches untested Genuine issue Four conditional branches (projectDir, searchPath, rebuild, limit) with zero tests. Function is pure and trivially testable. New code in this PR should ship with tests per project rules.
5. isSearchAvailable() === false path untested Nitpick Trivial guard returning a static string. Testing requires mocking node:sqlite availability — fragile, low value relative to cost.
6. limit parameter behavior-untested Nitpick limit is passed through to engine.search() which does .slice(0, limit). Testing this in the tool layer would be testing the engine, not the tool. Schema test exists.
7. rebuild: true as first call untested Genuine issue All rebuild tests first build an index, then rebuild. The cold-start path (null indexManager, no DB file) is handled correctly by guards (?.close(), existsSync) but is unverified by any test.
8. Relative projectDir path not tested Nitpick resolveToCwd is a well-tested utility shared across tools. Testing relative path resolution here would be testing the utility, not the search tool.
9. System-prompt tests don't cover duplication Rejected by user Linked to finding 1 which was rejected. No test gap to address.

Action Plan

  1. Validate projectDir existence (finding 2) — After resolveToCwd, check that the resolved path exists and is a directory. Return an explicit error message if not. Prevents junk directory creation and misleading empty results.
  2. Add formatSearchCall() tests (finding 4) — Test the four conditional rendering branches with a stub theme. Pure function, straightforward to test.
  3. Add rebuild-as-first-call test (finding 7) — Create a fresh tool, immediately call with rebuild: true, verify valid results and no errors.

Assessment by mach6

@m-aebrer
Copy link
Copy Markdown
Collaborator Author

m-aebrer commented Apr 8, 2026

Progress Update

Fixed 3 review findings and a gitignore gap:

Finding 2 — projectDir validation: Added existsSync + statSync().isDirectory() check after resolveToCwd. Returns clear error "projectDir does not exist or is not a directory: <path>" instead of silently creating junk directories. Two new tests (non-existent path, file-instead-of-directory).

Finding 4 — formatSearchCall tests: Exported formatSearchCall (@internal) and added 7 tests with stub theme covering all branches: basic query, projectDir, searchPath, rebuild true/false, limit, all options combined.

Finding 7 — Cold-start rebuild test: Added test for rebuild: true as very first call on a fresh engine (no prior index). Verifies valid results with indexBuilt: true.

Gitignore fix: Changed .dreb/index/ and .dreb/agent/ patterns to **/.dreb/index/ and **/.dreb/agent/ so .dreb dirs in any subdirectory (e.g. packages/tui/.dreb/) are ignored.

Commit: 697cbbf


Progress tracked by mach6

@m-aebrer
Copy link
Copy Markdown
Collaborator Author

m-aebrer commented Apr 8, 2026

Code Review (Re-review)

Critical

No critical findings.

Important

No important findings.

Suggestions

Finding 1 — Duplicate assertion in formatSearchCall projectDir test (code-reviewer + test-reviewer, confidence 90/92)
The "renders projectDir when provided" test has two identical assertions:

expect(result).toContain("project");
expect(result).toContain("project"); // shortenPath output

The second was meant to verify the shortened path, but checks the same substring. Both match the label prefix "project", not the actual path rendering. The second assertion adds zero coverage.
Fix: expect(result).toContain("project /home/user/project") or use os.homedir() to test actual shortening with ~/.

Finding 2 — TOCTOU window on existsSync/statSync (error-auditor, confidence 82)
existsSync and statSync are two separate syscalls. If the directory is deleted between them (by another OS process), statSync throws uncaught ENOENT. Practical probability is near-zero, but can be eliminated at zero cost by collapsing to a single try/catch around statSync:

if (projectDir) {
    let isDir = false;
    try { isDir = statSync(resolvedProjectDir).isDirectory(); } catch {}
    if (!isDir) { return { ... }; }
}

This is also simpler: one syscall instead of two.

Finding 3 — formatSearchResult / renderResult path untested (test-reviewer, confidence 85)
formatSearchResult handles TUI-visible behaviors with no test coverage: collapsed-mode truncation (20 lines max with "... (N more lines)") and index stats footer ([Index: N files, N chunks]). The indexStats field was added in this PR. Could be exported and tested with a stub theme, same pattern as formatSearchCall.

Strengths

  • All 3 prior genuine findings (projectDir validation, formatSearchCall tests, cold-start rebuild) were fixed correctly
  • projectDir validation correctly handles broken symlinks (existsSync returns false), symlinks to files (statSync follows, isDirectory returns false), and symlinks to directories (accepted)
  • Cold-start rebuild test properly exercises the null-indexManager path through resetIndex()
  • .gitignore glob widening to **/.dreb/index/ correctly covers nested projectDir indexes
  • All acceptance criteria from both issues verified as met
  • 34 tests covering the new functionality, all meaningful

Agents run: code-reviewer, error-auditor, test-reviewer, completeness-checker


Reviewed by mach6 (re-review)

@m-aebrer
Copy link
Copy Markdown
Collaborator Author

m-aebrer commented Apr 8, 2026

Review Assessment (Re-review)

Review comment

Classifications

Finding Classification Reasoning
1. Duplicate assertion in formatSearchCall projectDir test Genuine issue Two identical expect(result).toContain("project") assertions. The second was meant to verify shortenPath output but checks the same substring. Should verify the full rendered format like "project /home/user/project".
2. TOCTOU window on existsSync/statSync Nitpick Technically a race between two syscalls, but this is a dev tool validating an LLM-provided path. A directory vanishing in microseconds between syscalls is not a realistic failure mode. The uncaught exception surfaces as a tool error anyway.
3. formatSearchResult / renderResult untested Genuine issue formatSearchResult has branching logic (empty output, 20-line truncation, expanded mode, indexStats footer) — all new PR code with no test coverage. Per project rules, tests ship with features.

Action Plan

  1. Add formatSearchResult tests (finding 3) — Export the function, add tests for: empty output, truncation at 20 lines, expanded mode, indexStats footer.
  2. Fix duplicate assertion (finding 1) — Change second assertion to verify actual rendered format e.g. "project /home/user/project".

Assessment by mach6

@m-aebrer
Copy link
Copy Markdown
Collaborator Author

m-aebrer commented Apr 8, 2026

Addendum: .dreb/ gitignore + tool visibility (Option E)

Problem

Adding **/.dreb/ to .gitignore (so nothing in .dreb/ is tracked by git) breaks dreb's own tools — grep (ripgrep), find (fd), and search (scanner) all respect .gitignore and will skip .dreb/memory/, .dreb/agents/, .dreb/extensions/, etc. These contain user-authored content that tools must be able to search.

Solution: Explicit Path Include (Option E)

When rg/fd receive an explicit path argument, they bypass .gitignore for that path. The search scanner already has scanMemoryDir() which bypasses gitignore for global memory. We extend all three tools to explicitly include tool-visible .dreb/ subdirectories.

Tool-visible vs tool-hidden .dreb/ subdirectories

Visible (include) Hidden (skip)
memory/ — project knowledge files index/ — SQLite search DB
agents/ — custom agent definitions agent/ — sessions, models, auth, bins
extensions/ — user extension code secrets/ — API keys

Shared constant

Create a shared list of tool-visible .dreb/ subdirectories that all tools reference:

// packages/coding-agent/src/core/tools/dreb-paths.ts (new file)
export const DREB_TOOL_VISIBLE_DIRS = ["memory", "agents", "extensions"] as const;
export const DREB_DIR = ".dreb";

Deliverables

1. .gitignore — change to blanket **/.dreb/

Replace the current specific patterns:

-**/.dreb/index/
-**/.dreb/agent/
+**/.dreb/

Also untrack the .dreb/memory/ files that are currently committed:

git rm --cached -r .dreb/memory/

2. Scanner (packages/coding-agent/src/core/search/scanner.ts)

After the main walkDirectory call in scanProject(), explicitly scan tool-visible .dreb/ subdirectories. Follow the existing scanMemoryDir() pattern (which already bypasses gitignore for ~/.dreb/memory/):

// After the main walk (line ~124), add:
for (const subdir of DREB_TOOL_VISIBLE_DIRS) {
    const dir = join(projectRoot, DREB_DIR, subdir);
    if (existsSync(dir)) {
        scanMemoryDir(dir, projectRoot, results);
    }
}

scanMemoryDir already handles recursive scanning of .md files with no gitignore filtering. It works for agents/ and extensions/ too since they contain .md and code files. The key point: this runs AFTER the main walk, so gitignore rules don't apply.

Note: .dreb/index is already in SKIP_DIRS (defense-in-depth). With **/.dreb/ in gitignore, the main walk never enters .dreb/ at all, so SKIP_DIRS is redundant but harmless.

3. grep tool (packages/coding-agent/src/core/tools/grep.ts)

When building rg arguments, append tool-visible .dreb/ subdirectories as additional search paths. rg bypasses gitignore for explicit path arguments.

Only when the user hasn't restricted the search to a specific subdirectory (i.e., no path parameter, or path is the project root). If the user passes path: "src/", don't inject .dreb/ results.

In the rg arg construction (around line ~214):

args.push(pattern, searchPath);
// Include tool-visible .dreb/ subdirs when searching from project root
if (!pathParam || searchPath === cwd) {
    for (const subdir of DREB_TOOL_VISIBLE_DIRS) {
        const drebSubdir = path.join(cwd, DREB_DIR, subdir);
        if (existsSync(drebSubdir)) {
            args.push(drebSubdir);
        }
    }
}

rg accepts multiple paths after the pattern: rg pattern path1 path2 path3. Results merge into the same JSON stream. The existing match limit applies across all paths combined.

4. find tool (packages/coding-agent/src/core/tools/find.ts)

Same approach as grep. fd also supports multiple search paths and bypasses gitignore for explicit paths.

In the fd arg construction (around line ~200):

args.push(pattern, searchPath);
// Include tool-visible .dreb/ subdirs when searching from project root
if (!patternParam?.includes(".dreb") && (!pathParam || searchPath === cwd)) {
    for (const subdir of DREB_TOOL_VISIBLE_DIRS) {
        const drebSubdir = path.join(cwd, DREB_DIR, subdir);
        if (existsSync(drebSubdir)) {
            args.push(drebSubdir);
        }
    }
}

The existing result limit applies across combined results.

5. Tests

Scanner test (test/search/search-tool.test.ts or new test/search/scanner.test.ts):

  • Create a fixture with .gitignore containing .dreb/ and a .dreb/memory/test.md file
  • Verify scanProject() includes the memory file despite gitignore
  • Verify .dreb/index/ content is NOT included

Grep test (test/tools/grep.test.ts or existing):

  • Create a fixture with .gitignore containing .dreb/ and a .dreb/memory/test.md file
  • Verify grep finds content inside .dreb/memory/ when no path restriction
  • Verify grep does NOT find .dreb/ content when path is restricted to a subdirectory

Find test (similar):

  • Verify find discovers files inside .dreb/memory/ despite gitignore
  • Verify find does NOT include .dreb/ files when path is restricted

6. README update

Update the .gitignore guidance in README Semantic Search section to say **/.dreb/ (already done in this PR).

Key implementation notes

  • scanMemoryDir handles .md files and recurses into subdirectories. For agents/ (which contain .md agent definitions) and extensions/ (which may contain .ts/.js code), it should work since scanMemoryDir already handles various text file types via the extension map. Verify this handles non-.md files in extensions — if not, consider using walkDirectory with an empty ignore matcher instead.
  • Match limits: rg and fd apply their limits across all paths combined. No special handling needed — the existing limit logic works.
  • Path deduplication: If the user explicitly passes path: ".dreb/memory", don't add .dreb/memory again as an extra path. Check for overlap.
  • Relative paths in results: rg and fd will emit paths relative to each search root. When .dreb/memory/ is passed as a separate absolute path, rg emits results with the absolute path. The grep tool's result formatting may need to handle this — convert absolute .dreb/ paths back to relative-to-cwd.

Acceptance criteria

  • .gitignore uses **/.dreb/ (blanket ignore)
  • .dreb/memory/ files no longer tracked by git
  • search tool indexes .dreb/memory/, .dreb/agents/, .dreb/extensions/ content
  • grep tool finds matches inside tool-visible .dreb/ subdirs
  • find tool discovers files inside tool-visible .dreb/ subdirs
  • None of the tools expose .dreb/index/, .dreb/agent/, or .dreb/secrets/
  • Tool-visible dirs only included when searching project root (not narrowed subdirectory)
  • All new behavior has test coverage
  • npm test passes

Plan addendum by mach6

@m-aebrer
Copy link
Copy Markdown
Collaborator Author

m-aebrer commented Apr 8, 2026

Addendum v2: .dreb/ gitignore + tool visibility (blocklist approach)

Supersedes the previous addendum. This version uses a blocklist (exclude known-bad) instead of an allowlist (include known-good), making it future-proof for new .dreb/ subdirectories.

Problem

Adding **/.dreb/ to .gitignore (so nothing in .dreb/ is tracked/pushed) breaks dreb's own tools — grep (ripgrep), find (fd), and search (scanner) all respect .gitignore and skip .dreb/ entirely. But .dreb/ contains user-authored content (memory, agent definitions, extensions, CONTEXT.md) that tools must be able to search.

Key Insight

When rg/fd receive an explicit path argument, they bypass .gitignore for that path:

# .gitignore has: .dreb/
rg --hidden pattern . ./.dreb/memory/    # finds matches — gitignore bypassed for explicit path
fd --hidden . . ./.dreb/agents/          # finds files — same behavior

The search scanner already does this for global memory via scanMemoryDir() which bypasses gitignore entirely.

Approach: Dynamic Discovery + Blocklist

Instead of hardcoding which .dreb/ subdirs to include (allowlist), we dynamically list all .dreb/ subdirectories at runtime and exclude only the ones we know are sensitive or binary-heavy (blocklist).

// Shared constant — the only dirs we HIDE from tools
const DREB_HIDDEN_SUBDIRS = new Set(["index", "agent", "secrets"]);

Three items covers it:

  • index — SQLite search DB (binary, large)
  • agent — session logs, model cache, auth tokens, downloaded binaries
  • secrets — API keys

Everything else is automatically tool-visible: memory/, agents/, extensions/, plus anything the user creates in the future. No code change needed when new .dreb/ subdirs are added.

At runtime, for each tool:

const drebDir = path.join(root, ".dreb");
if (existsSync(drebDir)) {
    const entries = readdirSync(drebDir, { withFileTypes: true });
    for (const entry of entries) {
        if (entry.isDirectory() && !DREB_HIDDEN_SUBDIRS.has(entry.name)) {
            additionalPaths.push(path.join(drebDir, entry.name));
        }
    }
    // Also include top-level .dreb/ files (CONTEXT.md etc) — see per-tool notes
}

Current tool behavior (read these files for full context)

Tool File Mechanism Respects .gitignore?
grep packages/coding-agent/src/core/tools/grep.ts ripgrep (rg) with --hidden flag Yes (rg default)
find packages/coding-agent/src/core/tools/find.ts fd with --hidden flag Yes (fd default)
search packages/coding-agent/src/core/search/scanner.ts ignore npm package + hardcoded SKIP_DIRS Yes
ls packages/coding-agent/src/core/tools/ls.ts readdirSync directly No — unaffected
read packages/coding-agent/src/core/tools/read.ts fs.readFile directly No — unaffected

Only grep, find, and search need changes.

Deliverables

1. Shared constant — new file packages/coding-agent/src/core/tools/dreb-paths.ts

import { existsSync, readdirSync } from "node:fs";
import path from "node:path";

/** .dreb/ subdirectories that tools must NOT expose (sensitive/binary content). */
const DREB_HIDDEN_SUBDIRS = new Set(["index", "agent", "secrets"]);

/**
 * Get tool-visible .dreb/ subdirectory paths for a project root.
 * Dynamically lists .dreb/ entries, excludes blocklisted ones.
 * Returns absolute paths. Returns empty array if .dreb/ doesn't exist.
 */
export function getDrebToolVisibleDirs(projectRoot: string): string[] {
    const drebDir = path.join(projectRoot, ".dreb");
    if (!existsSync(drebDir)) return [];

    const paths: string[] = [];
    let entries;
    try {
        entries = readdirSync(drebDir, { withFileTypes: true });
    } catch {
        return [];
    }

    for (const entry of entries) {
        if (entry.isDirectory() && !DREB_HIDDEN_SUBDIRS.has(entry.name)) {
            paths.push(path.join(drebDir, entry.name));
        }
    }
    return paths;
}

2. .gitignore — blanket .dreb/ ignore

-# dreb runtime data — index, cached models, session logs
-**/.dreb/index/
-**/.dreb/agent/
+# dreb runtime data
+**/.dreb/

Also untrack the .dreb/memory/ files currently committed:

git rm --cached -r .dreb/memory/

The files remain on disk (local only), just removed from git tracking.

3. grep tool (packages/coding-agent/src/core/tools/grep.ts)

rg accepts multiple search paths: rg pattern path1 path2 path3. Results merge into the same JSON output stream.

In the rg arg construction (around line ~214 where args.push(pattern, searchPath) is called):

args.push(pattern, searchPath);

// Include tool-visible .dreb/ subdirs when searching from project root
// Only when user hasn't restricted search to a specific subdirectory
if (!userPath) {
    for (const dir of getDrebToolVisibleDirs(cwd)) {
        args.push(dir);
    }
}

Where userPath is the path parameter from the schema — when the user passes path: "src/", don't inject .dreb/ results.

Path formatting note: rg emits paths relative to each search root. When .dreb/memory/ is passed as an absolute path, rg may emit absolute paths for those matches. The grep tool's result formatting needs to handle this — convert any absolute paths that start with cwd back to relative paths. Check how searchPath is currently resolved and how results are formatted (look for path manipulation in the rg JSON output parsing).

Match limit: The existing limit applies across all paths combined — no special handling needed.

4. find tool (packages/coding-agent/src/core/tools/find.ts)

fd also supports multiple search paths. Same approach:

args.push(pattern, searchPath);

// Include tool-visible .dreb/ subdirs when searching from project root
if (!userPath) {
    for (const dir of getDrebToolVisibleDirs(cwd)) {
        args.push(dir);
    }
}

Same path formatting concern: fd may emit absolute paths for the extra search roots. Ensure results are normalized to relative-to-cwd paths.

Result limit: fd's --max-results applies across all paths combined.

5. Search scanner (packages/coding-agent/src/core/search/scanner.ts)

The scanner already has scanMemoryDir() (line ~333) which recursively scans a directory for text files with no gitignore filtering. It's called for ~/.dreb/memory/ (global memory, line ~127).

After the main walkDirectory call in scanProject() (around line ~124), add:

// Include tool-visible .dreb/ subdirs (bypasses gitignore)
for (const dir of getDrebToolVisibleDirs(projectRoot)) {
    scanMemoryDir(dir, projectRoot, results);
}

Important: verify that scanMemoryDir handles non-.md files correctly. It needs to handle .ts/.js files in extensions/ and .md files in agents/ and memory/. Read the scanMemoryDir function (line ~333) to check its file type filtering. If it only handles .md, it needs to be generalized — or use a variant that checks the EXTENSION_MAP (line ~57) for supported file types.

Note: SKIP_DIRS already has ".dreb/index" — this is now redundant (gitignore stops the main walk before reaching .dreb/) but keep it as defense-in-depth.

6. Tests

Fixture setup for all tests: Create a temp directory with:

  • A .gitignore file containing **/.dreb/
  • .dreb/memory/knowledge.md — should be tool-visible
  • .dreb/agents/custom.md — should be tool-visible
  • .dreb/index/search.db — should be tool-hidden
  • .dreb/agent/sessions/log.jsonl — should be tool-hidden
  • .dreb/secrets/keys.json — should be tool-hidden
  • A regular src/main.ts — baseline

Scanner test (new test or add to test/search/search-tool.test.ts):

  • scanProject() includes .dreb/memory/knowledge.md despite gitignore
  • scanProject() includes .dreb/agents/custom.md
  • scanProject() does NOT include .dreb/index/, .dreb/agent/, .dreb/secrets/ content

Grep test (add to existing grep tests):

  • grep finds content inside .dreb/memory/ when no path restriction
  • grep does NOT find .dreb/ content when path is restricted to a subdirectory (e.g. path: "src/")
  • grep does NOT find content in .dreb/agent/ or .dreb/secrets/

Find test (add to existing find tests):

  • find discovers files inside .dreb/memory/ and .dreb/agents/ when no path restriction
  • find does NOT discover files in .dreb/index/, .dreb/agent/, .dreb/secrets/
  • find does NOT include .dreb/ files when path is restricted

Unit test for getDrebToolVisibleDirs:

  • Returns empty array when .dreb/ doesn't exist
  • Includes memory/, agents/, extensions/ subdirs
  • Excludes index/, agent/, secrets/ subdirs
  • Handles .dreb/ existing but empty

7. README update

The Semantic Search section (already updated in this PR) should say **/.dreb/ for the gitignore guidance. Verify this is correct after the change.

Acceptance criteria

  • .gitignore uses **/.dreb/ (blanket ignore)
  • .dreb/memory/ files no longer tracked by git (but still exist on disk)
  • getDrebToolVisibleDirs() dynamically discovers non-blocklisted .dreb/ subdirs
  • search tool indexes content from tool-visible .dreb/ subdirs
  • grep tool finds matches inside tool-visible .dreb/ subdirs
  • find tool discovers files inside tool-visible .dreb/ subdirs
  • None of the tools expose index/, agent/, or secrets/ content
  • Tool-visible dirs only included when searching project root (not narrowed subdirectory)
  • All new behavior has test coverage
  • npm test passes, npx biome check clean

Plan addendum v2 by mach6

@m-aebrer
Copy link
Copy Markdown
Collaborator Author

m-aebrer commented Apr 8, 2026

Progress Update

Implemented .dreb/ gitignore + tool visibility (Option E, blocklist approach):

New shared utility (dreb-paths.ts): getDrebToolVisibleDirs() dynamically lists .dreb/ subdirectories, excluding a blocklist of index, agent, secrets. Everything else is automatically tool-visible — future-proof for new subdirs.

.gitignore: Changed to blanket **/.dreb/. Untracked .dreb/memory/ files (still exist locally).

Scanner (scanner.ts): After main walk, calls scanMemoryDir() for each tool-visible .dreb/ subdir. Bypasses gitignore. Skipped in home-dir mode (global memory already handled separately).

Grep (grep.ts): Appends tool-visible .dreb/ subdirs as additional rg path args when user has not restricted search with path parameter. rg bypasses gitignore for explicit paths.

Find (find.ts): Same approach with fd.

Tests: 13 new tests across 2 files:

  • dreb-paths.test.ts — 6 unit tests for the utility
  • scanner-dreb-visibility.test.ts — 7 integration tests verifying visible dirs included, hidden dirs excluded

All 2299 tests pass. Build clean.

Commit: 2221056


Progress tracked by mach6

@m-aebrer
Copy link
Copy Markdown
Collaborator Author

m-aebrer commented Apr 8, 2026

Code Review

Critical

No critical findings.

Important

Finding 1 — grep and find .dreb/ injection has no integration tests (test-reviewer + completeness-checker, confidence 97)
Both grep.ts and find.ts call getDrebToolVisibleDirs(cwd) and append paths to rg/fd args when !searchDir. This is the end-to-end path that makes .dreb/memory/, .dreb/agents/, and .dreb/extensions/ visible through grep and find. There are unit tests for getDrebToolVisibleDirs and scanner visibility, but zero tests exercise the actual grep/find integration — "grep for a pattern, get a match from .dreb/memory/". The plan addendum v2 explicitly specified grep and find test scenarios. A subtle regression (wrong condition, wrong arg position) would silently break .dreb/ visibility for these tools while all tests pass.

Finding 2 — formatSearchResult still unexported and untested (code-reviewer + test-reviewer, confidence 100)
Prior action item from round 2. formatSearchResult remains a private function with zero tests. It handles TUI-visible behaviors: 20-line collapsed truncation with "... (N more lines)", expanded mode, and [Index: N files, N chunks] footer (added in this PR). formatSearchCall was correctly exported and tested; the same treatment should apply to formatSearchResult.

Finding 3 — Symlinked subdirectories inside .dreb/ silently excluded from tool visibility (error-auditor, confidence 92)
getDrebToolVisibleDirs() uses entry.isDirectory() which returns false for symlinks (lstat semantics). If .dreb/memory is a symlink to a directory (common in dotfile management), it is silently skipped and all three tools (grep, find, search) see no memory content. Fix: also check entry.isSymbolicLink() and verify the target with statSync().isDirectory().

Suggestions

Finding 4 — Duplicate assertion in formatSearchCall projectDir test still present (code-reviewer + test-reviewer, confidence 100)
Prior action item from round 2. The "renders projectDir when provided" test has two identical assertions:

expect(result).toContain("project");
expect(result).toContain("project"); // shortenPath output

The second was meant to verify shortenPath output but checks the same substring. Provides zero incremental coverage.

Finding 5 — Explicit path: "." silently bypasses .dreb/ injection in grep and find (error-auditor + code-reviewer, confidence 95/85)
The injection condition is !searchDir. Passing path: "." resolves to the same root as omitting path, but "." is truthy so .dreb/ injection is skipped. A model call grep({pattern: "foo", path: "."}) gets fewer results than grep({pattern: "foo"}) with no diagnostic. Fix: compare resolved paths instead of checking for undefined, or add a comment explaining the intentional exclusion.

Finding 6 — readdirSync failure in getDrebToolVisibleDirs silently hides all .dreb/ content (error-auditor, confidence 82)
If .dreb/ exists but cannot be read (permissions, NFS hiccup, replaced by file between existsSync and readdirSync), the catch returns [] with no log entry. All three tools proceed normally but without .dreb/ content. A console.warn or process.stderr.write would help users diagnose missing results.

Strengths

  • Blocklist approach in dreb-paths.ts is well-designed — future-proof for new .dreb/ subdirs
  • Path handling for rg/fd results is correct — .dreb/ matches produce clean relative paths
  • scanMemoryDir reuse for non-memory dirs works correctly with proper relativization
  • resetIndex() / engineCache interaction is correct — embedder preserved, DB deleted, lazy rebuild
  • Scanner correctly skips .dreb/ injection in home-dir mode to avoid double-scanning global memory
  • All prior genuine findings (projectDir validation, formatSearchCall tests, cold-start rebuild) were fixed correctly
  • Thorough multi-project isolation tests verify separate indexes per projectDir

Agents run: code-reviewer, error-auditor, test-reviewer, completeness-checker


Reviewed by mach6

@m-aebrer
Copy link
Copy Markdown
Collaborator Author

m-aebrer commented Apr 8, 2026

Review Assessment

Review comment

Classifications

Finding Classification Reasoning
1. grep/find .dreb/ injection has no integration tests Genuine issue Confirmed: zero tests exercise the getDrebToolVisibleDirs injection path in grep or find. dreb-paths.test.ts only tests the utility in isolation. Plan addendum v2 explicitly listed grep/find test scenarios as deliverables, and project rules require tests to ship with features.
2. formatSearchResult still unexported and untested Genuine issue Confirmed: formatSearchResult has no export keyword and zero tests. Handles TUI truncation (20-line collapse), expanded mode, and indexStats footer. Was flagged as action item in prior review round 2 and listed in the action plan but not implemented.
3. Symlinked .dreb/ subdirs silently excluded Nitpick entry.isDirectory() does return false for symlinks (lstat semantics), but dreb creates .dreb/ subdirs itself as real directories. Dotfile managers typically symlink the entire .dreb/ dir (which works fine — contents are real dirs within the symlinked tree), not individual subdirs. Unlikely to affect real users.
4. Duplicate assertion in projectDir test Genuine issue Confirmed: lines 156-158 have identical expect(result).toContain("project") twice. Comment says second should verify shortenPath output. Was flagged as action item in prior round 2 but not fixed. Low severity but real test quality issue.
5. Explicit path: "." bypasses .dreb/ injection Nitpick Guard !searchDir is truthy for ".", skipping injection even though resolved path equals cwd. However, path: "." is an unusual explicit input — LLMs use path for specific subdirectories, not to redundantly specify the default. Behavioral difference is real but unlikely to matter in practice.
6. readdirSync failure silently hides .dreb/ content Nitpick Silent degradation is arguably correct — tools should still work on the rest of the project. Adding logging would require injecting a logger into a pure utility. Low-value change.

Action Plan

  1. Add grep and find .dreb/ integration tests (finding 1) — Create test scenarios with .gitignore listing .dreb/, files in .dreb/memory/, verify grep and find return results from tool-visible dirs and exclude blocklisted dirs. This was an explicit plan deliverable.
  2. Export formatSearchResult and add tests (finding 2) — Add export, add tests for truncation, expanded mode, indexStats footer, and empty output. Same pattern as formatSearchCall.
  3. Fix duplicate assertion (finding 4) — Change second assertion to verify distinct content (e.g. the shortenPath output for the given path).

Assessment by mach6

@m-aebrer
Copy link
Copy Markdown
Collaborator Author

m-aebrer commented Apr 8, 2026

Progress Update

Fixed 3 review findings from round 3:

Finding 1 — grep/find .dreb/ integration tests: Added 6 integration tests to tools.test.ts (3 grep, 3 find). Each test initializes a git repo with .gitignore blocking .dreb/, creates tool-visible (memory) and tool-hidden (index, secrets) subdirectories. Tests verify: visible dirs included in results, hidden dirs excluded, and .dreb/ injection skipped when path is restricted.

Finding 2 — formatSearchResult exported and tested: Exported formatSearchResult (@internal) and added 7 tests covering: empty content, missing text, short content in collapsed mode, 20-line truncation with remaining count, expanded mode showing all lines, indexStats footer rendering, and absent indexStats.

Finding 3 — duplicate assertion fixed: Changed the duplicate expect(result).toContain("project") assertion in the "renders projectDir when provided" test to verify the actual path string "/home/user/project".

All 2312 tests pass. Build and lint clean.

Commit: 2a77414


Progress tracked by mach6

@m-aebrer m-aebrer merged commit 6b9340f into master Apr 8, 2026
2 checks passed
@m-aebrer m-aebrer deleted the feature/issue-119-120-search-tool-improvements branch April 8, 2026 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add force-rebuild option to search tool for index recovery Add projectDir argument to search tool for indexing arbitrary directories

1 participant