LithicDB is a disk-first vector database MVP built around a hybrid cluster-graph plus quantized payload design. It targets retrieval teams that want lower memory pressure than fully in-memory HNSW systems while still supporting online writes, deletes, metadata filtering, and benchmarkable ANN search on a single node.
The current build also includes two product-hardening features beyond the initial MVP:
- Per-collection WAL replay for crash recovery between snapshots
- Online compaction to reclaim deleted payload space and rebuild clusters
- Periodic background maintenance for tombstone-heavy collections
- Checksummed WAL frames and atomic snapshot rewrites
- Manifest-published segment generations for safer compaction cutovers
- Who it is for: teams building RAG, semantic search, or recommendation systems that need predictable single-node cost and do not want to keep the full vector payload in RAM.
- Why they would choose it: the in-memory footprint stays bounded because only routing centroids, metadata postings, and document handles live in memory while normalized vectors stay on disk.
- Technical edge: ANN search navigates a graph of coarse clusters, scans only a small set of candidate blocks, scores candidates using
q8compressed vectors, then reranks the finalists with exact cosine from disk-residentf32vectors. - Tradeoffs: peak recall can trail mature all-memory graph indexes, and append-heavy workloads will eventually need background compaction and smarter rebalancing to maintain ideal latency.
LithicDB commits to direction C: hybrid graph + quantization with lower memory usage than standard HNSW.
The key design choice is to separate routing from payload:
- Routing layer: a compact graph over cluster centroids kept in memory.
- Payload layer: normalized vectors stored on disk in two append-only files:
vectors.f32for exact reranking and brute-force benchmarkingvectors.q8for low-memory approximate scoring
- Filtering layer: in-memory metadata posting lists for exact
key=valueconstraints plus a numeric metadata index for range predicates.
This gives a useful MVP behavior envelope:
- Disk-first and low-memory
- Online inserts and deletes
- Metadata-aware ANN search
- Brute-force cosine comparison from the same collection state
LithicDB now has a GitHub release workflow that builds installable binaries for Linux, macOS, and Windows whenever you push a semantic version tag.
Standard release path:
git tag v0.1.0
git push origin v0.1.0Manual GitHub CLI path:
gh release create v0.1.0 --generate-notesThe workflow publishes packaged artifacts containing:
lithicdbserver binarybenchmarkbinaryREADME.md
You can also rerun the workflow manually from GitHub Actions with workflow_dispatch and pass an existing tag.
Repo structure:
src/main.rs: server entrypointsrc/api/routes.rs: REST routes and handlerssrc/engine/db.rs: multi-collection database orchestrationsrc/engine/collection.rs: collection lifecycle, insert/delete/search, persistencesrc/index/graph.rs: centroid graph traversalsrc/index/quantizer.rs: normalization andq8quantization mathsrc/storage/files.rs: append/read primitives and snapshot persistencesrc/models/: API and persisted data modelssrc/bin/benchmark.rs: ANN vs brute-force benchmark runnertests/engine_tests.rs: integration-style engine test
Collection layout on disk:
data/<collection>/CURRENT: active segment manifest pointing to the current state/vector filesdata/<collection>/GENERATIONS: catalog of active and retired generationsdata/<collection>/wal.bin: write-ahead log of inserts and deletes since the last snapshotdata/<collection>/state-<generation>.bin: serialized snapshot for an active generationdata/<collection>/vectors-<generation>.f32: active generation exact vectorsdata/<collection>/vectors-<generation>.q8: active generation quantized vectorsdata/<collection>/state.bin,vectors.f32,vectors.q8: legacy base-generation names used for the initial generation
Core data structures:
DocumentRecord: external id, metadata, append offsets, deletion flagClusterNode: centroid vector, member doc ids, neighbor clustersfilter_index:key -> value -> set(doc_id)for metadata filtering
LithicDB uses a three-stage search pipeline:
- Normalize the query to unit length.
- Traverse the cluster graph from an entry centroid using greedy best-first expansion to select the most promising clusters.
- Score members of those clusters with quantized cosine using the
q8payload, then rerank the best candidates against exact normalizedf32vectors from disk.
Why this works:
- The graph narrows the search space without storing all vectors in memory.
- Quantized candidate scoring reduces disk bandwidth and CPU cost for the first pass.
- Exact reranking recovers quality on the final shortlist.
Update path:
- Append a logical insert to
wal.bin. - Append normalized
f32vector tovectors.f32. - Append
q8vector tovectors.q8. - Add metadata terms to posting lists.
- Add numeric metadata values to the range index when parsable.
- Assign the vector to the nearest cluster.
- Split an oversized cluster with a lightweight two-seed k-means style partition.
- Rewire graph neighbors for affected clusters.
- Periodically snapshot state, truncate the WAL, and roll over to a fresh write generation.
Delete path:
- Append a logical delete to
wal.bin. - Mark the document deleted in memory.
- Remove metadata postings.
- Remove the doc id from any cluster membership list.
- Recompute centroids for affected clusters and drop empty clusters.
- Snapshot and truncate the WAL on the configured interval.
Recovery path:
- Load
state.bin. - Replay
wal.binentries in order. - Persist a fresh snapshot.
- Truncate
wal.bin.
Integrity guards:
- Each WAL frame stores
length + checksum + payload - Replay aborts if a checksum does not match
- Snapshot writes go to a temporary file and then atomically rename into place
- Compaction writes a new generation and publishes it by atomically rewriting
CURRENT - Retired generations are tracked in
GENERATIONSand cleaned after publish
GET /healthz
POST /collections
{
"name": "docs",
"dimension": 128,
"max_cluster_size": 256,
"graph_degree": 8
}POST /collections/docs/vectors
{
"records": [
{
"id": "doc-1",
"vector": [0.1, 0.2, 0.3],
"metadata": {
"category": "finance"
}
}
]
}DELETE /collections/docs/vectors/doc-1
POST /collections/docs/search
{
"vector": [0.1, 0.2, 0.3],
"k": 5,
"filter": {
"category": "finance"
},
"entry_points": 4,
"ef_search": 24,
"probe_clusters": 12
}Structured filter form with exact plus numeric range:
{
"vector": [0.1, 0.2, 0.3],
"k": 5,
"filter": {
"must": [
{ "op": "eq", "field": "category", "value": "finance" },
{ "op": "range", "field": "price", "gte": 20.0, "lte": 30.0 }
]
}
}GET /collections/docs/vectors/doc-1
GET /collections/docs/stats
GET /collections/docs/diagnostics
POST /collections/docs/compact
POST /admin/collections/docs/backup
POST /admin/collections/restore
Prerequisites:
- Rust toolchain with
cargoandrustc
Build:
cargo build --releaseRun the server:
cargo run --release -- --data-dir ./data --bind 127.0.0.1:8080 --maintenance-interval-secs 30Require an API key for mutating and admin endpoints:
LITHICDB_API_KEY=secret \
cargo run --release -- --data-dir ./data --bind 127.0.0.1:8080Disable background maintenance:
cargo run --release -- --data-dir ./data --bind 127.0.0.1:8080 --maintenance-interval-secs 0Create a collection:
curl -X POST http://127.0.0.1:8080/collections \
-H 'content-type: application/json' \
-d '{
"name":"docs",
"dimension":3,
"max_cluster_size":128,
"graph_degree":8
}'Insert vectors:
curl -X POST http://127.0.0.1:8080/collections/docs/vectors \
-H 'content-type: application/json' \
-d '{
"records":[
{"id":"a","vector":[1.0,0.0,0.0],"metadata":{"category":"finance"}},
{"id":"b","vector":[0.9,0.1,0.0],"metadata":{"category":"finance"}},
{"id":"c","vector":[0.0,1.0,0.0],"metadata":{"category":"legal"}}
]
}'Search with filtering:
curl -X POST http://127.0.0.1:8080/collections/docs/search \
-H 'content-type: application/json' \
-d '{
"vector":[1.0,0.0,0.0],
"k":2,
"filter":{"category":"finance"},
"ef_search":24,
"probe_clusters":12
}'Fetch by id:
curl http://127.0.0.1:8080/collections/docs/vectors/aDelete by id:
curl -X DELETE http://127.0.0.1:8080/collections/docs/vectors/aRead collection stats:
curl http://127.0.0.1:8080/collections/docs/statsRead collection diagnostics:
curl http://127.0.0.1:8080/collections/docs/diagnosticsRun compaction:
curl -X POST http://127.0.0.1:8080/collections/docs/compactCreate a backup:
curl -X POST http://127.0.0.1:8080/admin/collections/docs/backup \
-H 'x-api-key: secret'Restore a backup:
curl -X POST http://127.0.0.1:8080/admin/collections/restore \
-H 'content-type: application/json' \
-H 'x-api-key: secret' \
-d '{
"backup_name":"docs-1712345678",
"target_name":"docs-restore"
}'Run tests:
cargo testRun the benchmark at 100k vectors:
cargo run --release --bin benchmark -- \
--data-dir ./data/bench \
--vectors 100000 \
--dimension 128 \
--queries 200 \
--k 10Benchmark output includes:
- ANN average latency
- brute-force average latency
- recall@k
- disk footprint
- simple memory estimate for routing structures
- Collection creation with fixed vector dimension
- Vector insert with id and metadata
- Disk persistence for vectors and index state
- WAL-backed crash recovery between snapshots
- Online delete
- Online compaction to reclaim deleted logical state
- Background compaction when tombstones cross a threshold
- Approximate cosine search with exact reranking
- Exact
key=valuemetadata filtering - Structured metadata filters with exact match and numeric range predicates
- Numeric metadata indexing for range-filter pruning
- Fetch by id
- Collection stats for observability
- Collection diagnostics for generation and maintenance visibility
- Generation rollover for incremental persistence
- Multi-entry ANN routing via
entry_points - Optional API-key protection for mutating/admin endpoints
- Local backup and restore
- Brute-force baseline over the same stored vectors
- 100k vector benchmark path
- Vectors are normalized on write so cosine similarity becomes a dot product.
- ANN quality depends on
max_cluster_size,graph_degree,ef_search, andprobe_clusters. - Persistence uses snapshots plus a WAL. This gives basic crash recovery, but a production system should also add checksums, segment versioning, and stronger fsync policy controls.
- Persistence uses snapshots plus a checksummed WAL. Snapshot commits are temp-file writes followed by rename.
- Deletes are logical until compaction rebuilds the payload files.
- Background maintenance compacts collections when deleted vectors exceed either 20% of docs or 64 tombstones.
- Add versioned WAL segments with truncation-safe tail handling and recovery tooling.
- Move from full snapshots to segment-based incremental persistence.
- Add online compaction without full collection rewrite pauses.
- Introduce adaptive product quantization instead of scalar
q8. - Add concurrent ingest pipelines with per-collection write workers.
- Improve graph routing with multi-entry search and better split heuristics.
- Add richer metadata filtering with numeric ranges and boolean expressions.
- Add collection diagnostics, graph introspection, and tunable search profiles.
- Add gRPC and bulk import/export tools.
- Extend to replication and object-store backed cold segments.
- Numeric filtering is currently evaluated against stored metadata values at query time rather than through a dedicated numeric index.
- Snapshot persistence is periodic, but still collection-wide.
- Compaction still rewrites the whole collection and runs inline when triggered, but now publishes a new generation through a single active-manifest update and explicit generation catalog.
- No authentication, TLS, or replication.
- Benchmark memory is estimated from structure sizes rather than sampled from the OS.