claw-vector is the semantic memory engine behind ClawDB. It combines SQLite-backed metadata and FTS5 keyword search, memory-mapped vector persistence, adaptive Flat and HNSW indexing, hybrid retrieval, metadata filtering, reranking, and a Python embedding microservice exposed over gRPC and HTTP.
- Persistent vector storage with SQLite metadata and mmap-backed raw vectors.
- Automatic index selection: Flat for small collections, HNSW once collections grow.
- ANN, filtered, and hybrid vector plus keyword retrieval.
- Search responses that include both ranked hits and execution metrics.
- A Python embedding service built on sentence-transformers with Prometheus metrics and health probes.
- Rust library and gRPC server entrypoints for application integration.
+----------------------------------+
| Python embedding service |
Text ingest and queries -->| sentence-transformers / ONNX |
| gRPC :50051 HTTP :8080 |
+----------------+-----------------+
|
v
+------------------------+ +------------------------------+
| VectorEngine |---->| CollectionManager |
| Rust API and gRPC | | lifecycle, persistence, |
| ANN and hybrid search | | index selection, restore |
+-----------+------------+ +-----------+------------------+
| |
v v
+--------------------------+ +--------------------------+
| ANN and rerank pipeline | | SQLite metadata store |
| Flat or HNSW | | collections, records, |
| filters, hybrid fusion | | UUID mapping, FTS5 |
+-------------+------------+ +-------------+------------+
| |
v v
+------------------+ +---------------------+
| mmap vector file | | persisted indexes |
| raw f32 payloads | | Flat and HNSW state |
+------------------+ +---------------------+
Collection definitions, text payloads, metadata, UUID mappings, and keyword-search state live in SQLite. The database is opened in WAL mode and schema migrations are applied automatically on startup.
Raw vectors are stored separately in fixed-slot memory-mapped files. That keeps metadata queries cheap while letting the engine hydrate vectors only when a request actually asks for them or when reranking needs them.
Each collection starts on a Flat index and automatically switches to HNSW around 1,000 vectors. The collection manager persists the chosen index type so reopen and restore follow the same search path that was active before shutdown.
The Rust API returns a SearchResponse:
pub struct SearchResponse {
pub results: Vec<SearchResult>,
pub metrics: SearchMetrics,
}SearchMetrics includes query dimensionality, candidate counts, post-filter counts, and end-to-end latency in microseconds.
Metadata filters support:
eqgtltcontainsinexistsandornot
Hybrid retrieval combines ANN candidates with SQLite FTS5 keyword hits. The alpha field on HybridQuery controls the blend between vector similarity and keyword relevance, where 1.0 is vector-only and 0.0 is keyword-only.
Available rerankers:
diversityfor MMR-style result diversificationrecencyfor boosting newer recordscompositefor chaining reranking passes
With Docker Compose:
docker compose up --build claw-vector-svcOr run it locally:
cd python
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,inference]"
claw-vector-svcThe service exposes gRPC on 127.0.0.1:50051 and HTTP on 127.0.0.1:8080 by default.
use claw_vector::{
DistanceMetric, MetadataFilter, SearchQuery, VectorConfig, VectorEngine,
};
use serde_json::json;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = VectorConfig::builder()
.db_path("data/claw_vector.db")
.index_dir("data/claw_vector_indices")
.embedding_service_url("http://127.0.0.1:50051")
.build()?;
let engine = VectorEngine::new(config).await?;
engine
.create_collection("docs", 384, DistanceMetric::Cosine)
.await?;
engine
.upsert(
"docs",
"claw-vector persists vector metadata in SQLite",
json!({"tenant": "acme", "topic": "storage", "priority": 3}),
)
.await?;
let response = engine
.search(SearchQuery {
collection: "docs".into(),
vector: vec![0.12; 384],
top_k: 5,
filter: Some(MetadataFilter::Eq {
key: "tenant".into(),
value: json!("acme"),
}),
include_vectors: false,
include_metadata: true,
ef_search: Some(64),
reranker: None,
})
.await?;
println!("results={}", response.results.len());
println!("latency_us={}", response.metrics.latency_us);
engine.close().await?;
Ok(())
}cargo run --bin claw-vector-serverThe server listens on 0.0.0.0:50051 by default. Override that with CLAW_GRPC_ADDR.
The Python service loads the embedding model during FastAPI lifespan startup, warms it up, then starts the async gRPC server used by the Rust engine.
HTTP endpoints:
GET /healthGET /readyPOST /embedPOST /batch-embedGET /model-infoGET /metrics
Authentication:
- HTTP requests accept
X-Claw-Api-Key. - gRPC requests accept
x-claw-api-key(Rust server) orauthorization: Bearer <key>(Python embedding service). - When keys are configured, unauthorized requests return
401(HTTP) orUnauthenticated(gRPC).
Example request:
curl http://127.0.0.1:8080/embed \
-H 'content-type: application/json' \
-d '{"texts":["semantic memory for clawdb"],"normalize":true}'Example response shape:
{
"vectors": [
{
"values": [0.01, -0.03, 0.42],
"dimensions": 384
}
],
"model_name": "sentence-transformers/all-MiniLM-L6-v2",
"latency_ms": 12
}VectorConfig::builder() controls all runtime settings. The path and embedding endpoint settings can also be supplied through environment variables.
| Setting | Default | Source | Description |
|---|---|---|---|
db_path |
claw_vector.db |
builder, CLAW_VECTOR_DB_PATH |
SQLite database file for collections, metadata, and FTS5 state. |
index_dir |
claw_vector_indices |
builder, CLAW_VECTOR_INDEX_DIR |
Directory for persisted index files and mmap vector files. |
embedding_service_url |
http://localhost:50051 |
builder, CLAW_EMBEDDING_URL |
gRPC endpoint for the Python embedding service. |
default_dimensions |
384 |
builder | Default embedding dimensionality. |
ef_construction |
200 |
builder | HNSW build-time recall and speed tradeoff. |
m_connections |
16 |
builder | HNSW graph degree. |
ef_search |
50 |
builder | Default HNSW search breadth. |
max_elements |
1_000_000 |
builder | Maximum vectors per collection index. |
cache_size |
10_000 |
builder | Rust-side embedding LRU cache capacity. |
batch_size |
64 |
builder | Max texts per embedding request. |
embedding_timeout_ms |
5000 |
builder | gRPC timeout for embedding calls. |
num_threads |
available parallelism | builder | Rayon worker count for index operations. |
default_workspace_id |
default |
builder, CLAW_DEFAULT_WORKSPACE_ID |
Default tenant/workspace id used when a request does not provide one. |
api_key_store_path |
claw_vector_auth.db |
builder, CLAW_API_KEY_STORE_PATH |
SQLite database used by Rust gRPC auth key store. |
rate_limit_rps |
100 |
builder, CLAW_RATE_LIMIT_RPS |
Default per-workspace request rate limit for Rust gRPC APIs. |
require_auth |
true in release, false in tests |
builder, CLAW_REQUIRE_AUTH |
Enable or disable auth checks (intended for local development/testing only when false). |
CLAW_GRPC_ADDR |
0.0.0.0:50051 |
env | Listen address for the Rust gRPC server binary. |
| Environment variable | Default | Description |
|---|---|---|
MODEL_NAME |
sentence-transformers/all-MiniLM-L6-v2 |
Hugging Face model to load. |
DEVICE |
cpu |
Runtime device such as cpu, cuda, or another supported backend. |
GRPC_HOST |
0.0.0.0 |
gRPC bind host. |
GRPC_PORT |
50051 |
gRPC bind port. |
HTTP_HOST |
0.0.0.0 |
HTTP bind host. |
HTTP_PORT |
8080 |
HTTP bind port. |
MAX_BATCH_SIZE |
64 |
Max texts accepted in a single embedding request. |
CACHE_SIZE |
10000 |
In-process embedding cache capacity. |
NORMALIZE_EMBEDDINGS |
true |
Whether embeddings are normalized by default. |
ONNX_MODEL_PATH |
unset | Optional ONNX model path for alternate inference. |
MAX_SEQUENCE_LENGTH |
256 |
Max token length exposed in model metadata responses. |
CLAW_API_KEY |
unset | Single API key accepted by Python HTTP/gRPC endpoints. |
CLAW_API_KEYS |
unset | Comma-separated list of API keys accepted by Python HTTP/gRPC endpoints. |
EMBED_RATE_LIMIT_PER_MINUTE |
200 |
Per-API-key HTTP request limit for /embed. |
Rust:
cargo test --lib --tests
cargo check --benchesPython:
cd python
pytest tests/The engine test suite covers collection lifecycle, persistence across reopen, hybrid search, metadata filters, embedding cache behavior, and Flat to HNSW migration.
Criterion benchmarks live in benches/vector_bench.rs and cover:
- single-vector upsert
- batch upsert of 100 records
- Flat search at 500 vectors
- HNSW search at 1,000 and 100,000 vectors
- filtered ANN search at 10,000 vectors
- embedding cache hit latency
- hybrid search at 1,000 text-backed records
Operational targets for the current implementation:
- HNSW search p99 under 10 ms at 100k vectors
- cache-hit embedding lookups under 1 us
- embedding throughput above 2,000 texts per minute for the default MiniLM model on suitable hardware
Use Criterion directly when you want full benchmark runs:
cargo bench --bench vector_bench- Rust protobuf generation is handled in
build.rswith vendoredprotoc, so local protobuf installation is not required. - SQLite migrations are embedded and applied automatically by the Rust store layer.
docker-compose.ymlprovides a ready-to-run embedding service and an optional Rust dev container profile.