ast-outline v0.3.0
Three new subcommands turn ast-outline from a structural-only tool into a full code navigation and discovery toolkit.
NEW IN v0.3.0
ast-outline search "<query>"— hybrid BM25 + dense semantic search over the repo. Auto-detects symbol queries vs. natural language:HandlerStackleans BM25 (alpha 0.3), "how does login work" balances both (alpha 0.5). Full ranking pipeline: RRF fusion (k=60), file-coherence boost, definition boost (3× when a chunk defines the queried symbol), embedded-symbol boosts for NL queries, and path penalties (test files 0.3×,.d.ts0.7×,__init__.py0.5×, file-saturation decay 0.5^extra).ast-outline find-related <FILE>:<LINE>— the same engine in semantic-only mode, language-filtered, with the source chunk excluded. Pastes directly fromsearchoutput. Useful for "what else looks like this?" navigation.ast-outline index— explicit build / refresh / inspect.--rebuilddrops the cache,--statsprints chunk count + model + build time. The index is built lazily on firstsearch/find-relatedif missing.
EMBEDDING MODEL
- Uses
minishlab/potion-code-16M— a static (no neural-net inference)vocab × 256model that runs on CPU in microseconds. Downloaded once (~64 MB) on first use to~/.cache/ast-outline/models/. - Corp-network friendly: tries HuggingFace first, falls back to
hf-mirror.comif blocked. TLS verification is disabled by default so corporate MITM proxies don't break setup — common in banks, regulated environments. Integrity is enforced via a SHA-256 manifest written after first download (subsequent loads detect tampering even if the original fetch was over a MITM channel). SetAST_OUTLINE_TLS_STRICT=1to enforce strict TLS. Full rationale atwiki/network-security.md.
PER-REPO INDEX
- Lives at
.ast-outline/index/(auto-gitignored on first build). On-disk:meta.json+ bincodechunks.bin/bm25.bin/files.bin+ a header-lessembeddings.f32matrix. Forward-compatible — a v2 partial-rebuild path can swap in without invalidating caches. - Auto-refreshes on every
search/find-relatedcall. Steady-state cost on an unchanged 10k-file repo: ~30 ms of stat syscalls. Files where(mtime, size)differ get an xxhash3 check; only real edits trigger a rebuild. - Concurrent searches are safe via an
fs2advisory exclusive lock; writes use.tmp+ atomic rename so a SIGKILL mid-write leaves the previous index intact.
WHAT GETS WALKED — UNIFIED ACROSS ALL COMMANDS
outline,digest,show,implements,search,find-related,indexnow all honour the same five-layer ignore pipeline:.gitignore(and.ignore, global gitignore,.git/info/exclude)- Hardcoded denylist (
node_modules,target,dist,.venv,__pycache__,.next,.cache,.gradle,.idea,.vscode,.eggs,.tox,.mypy_cache,.pytest_cache,.ruff_cache,.parcel-cache,.turbo,.nuxt,out,build,.ast-outline, …) — applied even when.gitignoredoesn't list them .ast-outline-ignore— new per-repo escape hatch (gitignore syntax)- Extension allowlist (only files we know how to chunk structurally)
- Per-file guards (size cap, generated-file heuristics)
- Search supports broader languages than outline does: anything
ast-grepparses + markdown (~25 languages incl. bash, cpp, css, dart, elixir, haskell, lua, php, ruby, swift, yaml, …). Outline-family commands stay limited to the 9 + markdown that have hand-written adapters.
MCP TOOLS
- 3 new tools (
search,find_related,index) registered alongside the existing four. Same JSON schemas as the CLI's--jsonoutput — identical bytes between surfaces. - 4 new JSON schemas:
ast-outline.search.v1,ast-outline.related.v1,ast-outline.index-stats.v1, plus the on-diskast-outline.search-index.v1.
NUMBERS
- 144 unit tests + 7 integration tests + 6 network-gated tests against the real HF model. End-to-end smoke (3-file fixture repo): build → search → find-related → re-open + delta check, all in ~2 s including model load.
- Release binary grew from ~7 MB to ~8 MB. New deps:
reqwest(rustls),tokenizers,safetensors,memmap2,wide,regex,bincode 2,xxhash-rust,fs2,sha2.
DOCS
- New wiki pages:
wiki/search.md(architecture deep-dive),wiki/network-security.md(TLS policy + corp-network rationale),wiki/file-filtering.md(the 5-layer walker). - README adds a "Semantic search" section + a "What gets walked" section. Self-contained for crates.io readers; wiki pages linked via GitHub URLs.
- Agent prompt (
ast-outline prompt) now includes guidance for steps 5 (search) and 6 (find-related).