Skip to content

ast-outline v0.3.0

Choose a tag to compare

@aeroxy aeroxy released this 02 May 09:30
· 133 commits to main since this release

Three new subcommands turn ast-outline from a structural-only tool into a full code navigation and discovery toolkit.

NEW IN v0.3.0

  • ast-outline search "<query>" — hybrid BM25 + dense semantic search over the repo. Auto-detects symbol queries vs. natural language: HandlerStack leans BM25 (alpha 0.3), "how does login work" balances both (alpha 0.5). Full ranking pipeline: RRF fusion (k=60), file-coherence boost, definition boost (3× when a chunk defines the queried symbol), embedded-symbol boosts for NL queries, and path penalties (test files 0.3×, .d.ts 0.7×, __init__.py 0.5×, file-saturation decay 0.5^extra).
  • ast-outline find-related <FILE>:<LINE> — the same engine in semantic-only mode, language-filtered, with the source chunk excluded. Pastes directly from search output. Useful for "what else looks like this?" navigation.
  • ast-outline index — explicit build / refresh / inspect. --rebuild drops the cache, --stats prints chunk count + model + build time. The index is built lazily on first search/find-related if missing.

EMBEDDING MODEL

  • Uses minishlab/potion-code-16M — a static (no neural-net inference) vocab × 256 model that runs on CPU in microseconds. Downloaded once (~64 MB) on first use to ~/.cache/ast-outline/models/.
  • Corp-network friendly: tries HuggingFace first, falls back to hf-mirror.com if blocked. TLS verification is disabled by default so corporate MITM proxies don't break setup — common in banks, regulated environments. Integrity is enforced via a SHA-256 manifest written after first download (subsequent loads detect tampering even if the original fetch was over a MITM channel). Set AST_OUTLINE_TLS_STRICT=1 to enforce strict TLS. Full rationale at wiki/network-security.md.

PER-REPO INDEX

  • Lives at .ast-outline/index/ (auto-gitignored on first build). On-disk: meta.json + bincode chunks.bin / bm25.bin / files.bin + a header-less embeddings.f32 matrix. Forward-compatible — a v2 partial-rebuild path can swap in without invalidating caches.
  • Auto-refreshes on every search / find-related call. Steady-state cost on an unchanged 10k-file repo: ~30 ms of stat syscalls. Files where (mtime, size) differ get an xxhash3 check; only real edits trigger a rebuild.
  • Concurrent searches are safe via an fs2 advisory exclusive lock; writes use .tmp + atomic rename so a SIGKILL mid-write leaves the previous index intact.

WHAT GETS WALKED — UNIFIED ACROSS ALL COMMANDS

  • outline, digest, show, implements, search, find-related, index now all honour the same five-layer ignore pipeline:
    1. .gitignore (and .ignore, global gitignore, .git/info/exclude)
    2. Hardcoded denylist (node_modules, target, dist, .venv, __pycache__, .next, .cache, .gradle, .idea, .vscode, .eggs, .tox, .mypy_cache, .pytest_cache, .ruff_cache, .parcel-cache, .turbo, .nuxt, out, build, .ast-outline, …) — applied even when .gitignore doesn't list them
    3. .ast-outline-ignore — new per-repo escape hatch (gitignore syntax)
    4. Extension allowlist (only files we know how to chunk structurally)
    5. Per-file guards (size cap, generated-file heuristics)
  • Search supports broader languages than outline does: anything ast-grep parses + markdown (~25 languages incl. bash, cpp, css, dart, elixir, haskell, lua, php, ruby, swift, yaml, …). Outline-family commands stay limited to the 9 + markdown that have hand-written adapters.

MCP TOOLS

  • 3 new tools (search, find_related, index) registered alongside the existing four. Same JSON schemas as the CLI's --json output — identical bytes between surfaces.
  • 4 new JSON schemas: ast-outline.search.v1, ast-outline.related.v1, ast-outline.index-stats.v1, plus the on-disk ast-outline.search-index.v1.

NUMBERS

  • 144 unit tests + 7 integration tests + 6 network-gated tests against the real HF model. End-to-end smoke (3-file fixture repo): build → search → find-related → re-open + delta check, all in ~2 s including model load.
  • Release binary grew from ~7 MB to ~8 MB. New deps: reqwest (rustls), tokenizers, safetensors, memmap2, wide, regex, bincode 2, xxhash-rust, fs2, sha2.

DOCS

  • New wiki pages: wiki/search.md (architecture deep-dive), wiki/network-security.md (TLS policy + corp-network rationale), wiki/file-filtering.md (the 5-layer walker).
  • README adds a "Semantic search" section + a "What gets walked" section. Self-contained for crates.io readers; wiki pages linked via GitHub URLs.
  • Agent prompt (ast-outline prompt) now includes guidance for steps 5 (search) and 6 (find-related).