The S3-native search runtime for AI agents.
Ripgrep linked in-process · Tree-sitter spans · Turbopuffer-style centroid vector index · One MCP server, every agent runtime.
By 2026 every major coding agent — Claude Code, Cursor, Codex CLI, Devin, Aider, OpenCode, Continue — converged on the same retrieval loop:
agent → parallel(grep, glob, read) → tree-sitter spans → reason → repeat
The bottleneck is the latency and ergonomics of the file tools the agent calls, especially when the agent's "filesystem" is an S3 bucket. Existing options force a trade:
- Code-search engines (Probe, native
rg) — fast, but local FS only. - Vector DBs (Qdrant, Milvus, Redis) — fast warm queries, but $1 600 / TB / month RAM.
- S3-native vector DBs (Turbopuffer, Vespa Cloud) — cheap storage, but proprietary or warm p50 ~8 ms.
agentic-search runs all three retrieval shapes — grep, AST spans, centroid vector — directly against S3 / R2 / GCS / local FS, in one Rust binary, with the same wire surface (MCP + REST) every agent SDK already speaks.
SIFT-1M canonical ANN benchmark — 1 M × 128 d, ground truth from corpus-texmex.irisa.fr
| system | recall@10 | warm p50 | storage |
|---|---|---|---|
| agentic-search (probe=32) | 97.25 % | 4.85 ms | object |
| Qdrant (HNSW) | ~98 % | 4 ms | RAM |
| Milvus (HNSW) | ~98 % | 6 ms | RAM |
| Turbopuffer (SPFresh) | 90-95 % | 8 ms | object |
| Redis Vector | ~98 % | 1-5 ms | RAM |
At probe=32 we match Qdrant's in-memory HNSW recall and latency, while keeping the index on object storage. Beats Turbopuffer's centroid-on-S3 by 1.6× latency at materially higher recall. Reproduce with
cargo run --release -p bench --bin sift1m -- --probe 32 --queries 2000. Full recall/latency curve and the perf changelog indocs/BENCHMARKS.md.
| endpoint | p50 ms | p95 ms | notes |
|---|---|---|---|
POST /grep |
31.1 | 47.2 | ripgrep-as-library + JSON spans |
POST /grep (ast: true, warm cache) |
88.3 | 146.1 | tree-sitter widening + parse-cache + drift check |
rg (subprocess) |
32.4 | 53.2 | native ripgrep, raw line output |
/grep lands within 5 % of native rg while emitting JSON spans, parallel fan-out, JoinSet cancellation, and tier-cache plumbing. The agent never sees a slow subprocess.
# install the `agentic-search` binary from crates.io
cargo install agentic-search-cli --locked
agentic-search --versionOr build from source:
git clone https://github.com/CREVIOS/agentic-search
cd agentic-search
cargo install --path crates/as-cli --locked # builds agentic-search-cli locallyPre-built binaries for linux / darwin / windows (x86_64 + arm64) attach to every GitHub Release.
pip install claude-agent-search # Claude Agent SDK
pip install openai-agentic-search # OpenAI Agents SDK
pip install deepagents-search # DeepAgents
pip install langchain-agentic-search # LangChain Retriever + Tool
pip install crewai-agentic-search # CrewAI tool wrapperpnpm add @agentic-search/sdkgo get github.com/CREVIOS/agentic-search/sdks/go/agenticsearchdocker run --rm -p 127.0.0.1:8787:8787 ghcr.io/crevios/agentic-search:latest
# or with persistent NVMe LRU cache:
docker compose up -d
curl -s http://127.0.0.1:8787/healthMulti-arch (linux/amd64 + linux/arm64) images publish on every tagged release.
agentic-search ls s3://my-corpus/docs/
agentic-search glob s3://my-corpus/docs/ "**/*.md"
agentic-search grep s3://my-corpus/ "TODO\\(security\\)"
agentic-search find s3://my-repo/src/ --symbol verify_jwt
# build a prefix manifest so cold listing collapses to one GET
agentic-search index-manifest s3://my-corpus/
# expose everything to any MCP host (Claude Code, Cursor, Cline, …)
agentic-search serve --mcp
# or as a REST server (default 127.0.0.1:8787)
agentic-search serve{
"mcpServers": {
"agentic-search": {
"command": "agentic-search",
"args": ["serve", "--mcp"]
}
}
}from claude_agent_sdk import ClaudeAgentOptions, query
from claude_agent_search import mcp_server_config, as_tools
opts = ClaudeAgentOptions(
mcp_servers=mcp_server_config(),
allowed_tools=as_tools(),
)
async for msg in query(
prompt="Find every place we still use HS256 in s3://corp/ and summarize.",
options=opts,
):
print(msg)from deepagents import create_deep_agent
from deepagents_search import search_tool
agent = create_deep_agent(tools=[search_tool()])import { AgenticSearchClient } from "@agentic-search/sdk";
const client = new AgenticSearchClient("http://127.0.0.1:8787");
const hits = await client.grep("s3://corp/", "HS256");import "github.com/CREVIOS/agentic-search/sdks/go/agenticsearch"
c := agenticsearch.New("http://127.0.0.1:8787")
hits, _ := c.Grep(ctx, "s3://corp/", "HS256", nil)Six layers, from agent down to bytes:
- SDK adapters. Claude Agent SDK, OpenAI Agents, DeepAgents, LangChain, CrewAI, Node, Go, raw CLI/MCP. Wire-compatible with whatever the host runtime already expects.
- Tool surface.
ls,read,grep,find_symbol,search,delegate. Exposed twice — once over MCP stdio, once as REST. The schema-lock test (crates/as-server/tests/mcp_schema.rs) pins the set so renames break CI. - Planner. Parallel
JoinSetfan-out across probes. Stage budgets per call. RRF fusion of multi-stage results, NaN-safetotal_cmpeverywhere. Cancel-on-drop. - Search stages.
- Grep —
grep-searcherlinked in-process. No subprocess tax. Optional content-hash stamping so AST widening downstream can drop drifted spans. - AST — tree-sitter spans (
function/method/class).ContainerIndexparses each file once;SpanCacheis content-addressed so vendored files share one parse across the workspace. - Vector — Turbopuffer-shaped centroid index. K-means clusters on object storage, 2-roundtrip query,
cluster_cachesized tomin(k, MAX). SIFT-1M result above.
- Grep —
- Tier cache. Memory LRU (
parking_lot) → NVMe LRU with mtime-LRU sweep + touch-on-hit → object store. Warm queries land in microseconds; cold queries do one S3 GET. - Object store.
s3://,r2://,gs://,file://behind one trait. AWS SDK auth on remote schemes; mmap fast-path on local FS (drops tokio scheduler round-trip).
Crate-level breakdown in docs/ARCHITECTURE.md.
| Crate | Purpose |
|---|---|
as-core |
Shared error / result / Hit / Config types |
as-store |
Object-store trait + S3/GCS/R2/local backends, prefix manifest |
as-fs |
ls / glob / read surface, manifest-aware listing |
as-grep |
Ripgrep-as-library, parallel scan, span emission |
as-ast |
Tree-sitter spans, ContainerIndex, content-addressed cache |
as-cache |
Tiered cache: memory LRU → NVMe LRU with mtime-sweep |
as-embed |
fastembed-rs (ONNX, BGE-small-en) |
as-vec |
Centroid (clustered) vector index on object storage |
as-plan |
Planner: parallel fan-out, stage budgets, stable RRF |
as-server |
REST + MCP stdio server |
as-cli |
agentic-search binary |
- Beats native
rgon the agent-loop shape (warm-server/grepp50 31.1 ms vs.rg32.4 ms) while emitting JSON spans, parallel fan-out, JoinSet cancellation, and tier-cache plumbing. - Matches Qdrant in-memory HNSW on SIFT-1M (97 % recall, 4.85 ms warm p50) — on object storage.
- Beats Turbopuffer's centroid-on-S3 by 1.6× latency at materially higher recall on the same benchmark.
- Tree-sitter spans that widen a line hit to its enclosing
fn/class/method. Content-addressed parse cache — vendored / duplicated files share one parse process-wide. - Cold-S3 manifest. Optional prefix manifest collapses
ListObjectsV2paging to one GET; strict count validation + DoS-bounded buffer + live-list fallback on truncation. - Production security defaults. Server binds
127.0.0.1unless--allow-public. Path-escape rejection on everyfile://read. CI runscargo-deny,gitleaks,pip-audit(5 Python adapters),pnpm audit,govulncheck. - One binary, every agent. MCP stdio + REST + Python (×5 adapters) + Node + Go SDKs + Docker image on GHCR. Wire-compatible with Claude Code, Cursor, Cline, Claude Desktop, OpenAI Agents SDK, LangChain, CrewAI, DeepAgents.
Four end-to-end runs land in examples/, all against a 4 MB
markdown corpus on a local S3-compatible bucket (RustFS):
| Example | Language | Surface | Live transcript |
|---|---|---|---|
| Claude Agent SDK | Python | MCP stdio | transcript |
| DeepAgents | Python | REST | transcript |
@agentic-search/sdk |
TypeScript | REST | transcript |
agenticsearch Go SDK |
Go | REST | transcript |
Every run signs SigV4 requests against http://localhost:19000
(swap the AWS env for real AWS keys and the same binary talks to
real S3). Full setup + security model in examples/README.md.
v0.1.0 — first public release.
| Surface | Status |
|---|---|
| MCP stdio server | ✅ shipped, schema-locked |
REST server (agentic-search serve) |
✅ shipped, loopback default |
| Grep + AST + RRF planner | ✅ shipped, beats native rg |
Centroid vector index (as-vec) |
✅ shipped, matches Qdrant on SIFT-1M |
| Tiered cache (mem → NVMe → S3) | ✅ shipped, touch-on-hit |
| Content-hash drift detection | ✅ shipped |
| Python / Node / Go / CLI SDKs | ✅ shipped, publish-ready |
| Multi-arch GHCR Docker image | ✅ shipped on release |
| Streaming kmeans for 10M+ docs | 🛠 in progress (docs/PLAN.md) |
| Sharded manifest + Bloom filters | 🛠 in progress |
| Int8 / PQ quantised cluster records | 🛠 in progress |
| Mode-aware planner (code vs. documents) | 🛠 in progress |
| Cross-encoder reranker stage | 🛠 in progress |
See CHANGELOG.md for the release ledger and docs/PLAN.md for the M-doc-scale roadmap.
- Server binds
127.0.0.1by default;--allow-publicis required for non-loopback bind. - Path-escape rejection on every
file://read. - API keys read from environment only, never logged.
- CI gate:
cargo-deny(advisories / bans / licenses / sources) +gitleaks+pip-audit(5 Python adapters) +pnpm audit+govulncheck. - MCP transport follows JSON-RPC 2.0, protocol version
2025-11-25.
Report security issues privately via GitHub Security Advisories. Full disclosure policy in CONTRIBUTING.md.
cargo test --workspace --locked
cargo clippy --workspace --all-targets -- -D warnings
python bench/macro/run.py --runs 5 --server
./scripts/rustfs-up.sh # local S3 (RustFS) for cold/warm S3 testingCI mirrors these exactly. PRs that change a hot path must include a bench delta. Full contributor guide in CONTRIBUTING.md.
Apache-2.0. See LICENSE.

