A multi-model NoSQL database built from scratch in Rust.
Documents. Graphs. Full-Text Search. Vector Embeddings. One engine.
Detailed Plan · Architecture · Storage Engine · Crate Reference · Getting Started · License
dllb is a multi-model NoSQL database management system that natively supports four data models in a single, unified engine:
- Documents -- schemaless or schemafull JSON/MessagePack records with secondary B-tree indexes
- Native Graphs -- first-class edges with properties, bidirectional traversal, BFS/DFS, path finding, and pattern matching
- Full-Text Search -- BM25-scored inverted indexes powered by Tantivy, with configurable analyzers and stemming
- Vector Embeddings -- HNSW approximate nearest neighbor index for dense vectors (cosine, L2, dot product), with SIMD-accelerated distance computation
All four models are stored as binary key-value pairs in a single sorted keyspace backed by redb (pure-Rust, ACID, crash-safe, copy-on-write B-trees). Different "models" are simply different key layouts and query patterns over the same byte stream.
Most real-world applications need more than one data model. A social network needs documents (profiles), graphs (relationships), full-text (search), and vectors (recommendations). The traditional answer is polyglot persistence -- stitching together MongoDB, Neo4j, Elasticsearch, and Pinecone. That means six connection pools, six consistency models, six failure modes, and ETL pipelines to keep them in sync.
dllb eliminates the seams. One query can combine a vector similarity search with a graph traversal and a full-text match:
SELECT id, name,
vector::distance::knn() AS vec_score,
search::score(1) AS ft_score
FROM ast_node
WHERE embedding <|20,50|> $query_vec
AND source_text @1@ 'async trait'
AND ->calls->fn_node.module = 'core'
ORDER BY (1.0 - vec_score) * 0.6 + ft_score * 0.4 DESC
LIMIT 10;dllb is designed from the ground up as a first-class store for AST and MetaAST embeddings of source code. Each AST node (function, class, module, trait) is a document. Structural relationships (call graph, containment, imports, type references) are graph edges. Code embeddings (CodeBERT, StarCoder, etc.) are vectors. Source text and docstrings are full-text indexed with a code-aware tokenizer that understands camelCase/snake_case boundaries.
This enables queries like:
-- Find functions similar to this one, called by modules importing 'tokio'
SELECT id, name, file_path, vector::distance::knn() AS similarity
FROM ast_node
WHERE kind = 'function'
AND source_embedding <|20,100|> $my_fn_embedding
AND <-contains<-module<-imports<-module[WHERE name CONTAINS 'tokio']
ORDER BY similarity
LIMIT 10;block-beta
columns 1
block:client["Client Layer"]
columns 3
tcp["TCP/WebSocket"] rest["REST API"] embed["Embedded API"]
end
block:query["Query Engine"]
columns 4
lexer["Lexer/Parser"] planner["Planner"] optimizer["Optimizer"] executor["Executor"]
end
block:model["Model Layer"]
columns 5
doc["Document"] graph["Graph"] fts["Full-Text"] vec["Vector/HNSW"] ast["AST/Code Intel"]
end
block:actors["Actor System (joerl)"]
columns 3
sup["Supervision Trees"] gen["GenServer"] links["Links & Monitors"]
end
txn["Transaction Manager (MVCC)"]
block:storage["Storage Engine"]
columns 4
redb["redb (CoW B-trees)"] wal["WAL"] mvcc["MVCC"] compact["Compaction"]
end
client --> query --> model --> actors --> txn --> storage
The runtime is structured as a joerl supervision tree -- an Erlang/OTP-inspired actor model providing automatic crash recovery, isolated failure domains, and structured concurrency:
graph TD
dllb_sup["dllb_sup<br/><i>OneForAll</i>"] --> storage_sup["storage_sup<br/><i>OneForOne</i>"]
dllb_sup --> index_sup["index_sup<br/><i>OneForOne</i>"]
dllb_sup --> client_sup["client_sup<br/><i>OneForOne</i>"]
storage_sup --> StorageWriter["StorageWriter<br/><i>GenServer</i>"]
index_sup --> FtsActor["FtsActor<br/><i>GenServer</i>"]
index_sup --> HnswActor["HnswActor<br/><i>GenServer</i>"]
index_sup --> GcActor["GcActor<br/><i>periodic</i>"]
client_sup --> ConnectionActor["ConnectionActor<br/><i>per client</i>"]
Actors manage stateful subsystems (storage writes, index maintenance, client connections, background GC). Hot-path operations (key encoding, distance computation, query parsing) remain direct function calls -- no mailbox overhead.
All data lives in a single sorted keyspace. Type tags in keys distinguish models:
| Tag | Byte | Purpose |
|---|---|---|
* |
0x2A | Document record |
~ |
0x7E | Graph edge pointer |
+ |
0x2B | Index entry (B-tree, HNSW, full-text) |
! |
0x21 | Metadata (schema, table definitions) |
Key structure:
[namespace][0x00][database][0x00][table][tag][record_id][...extra]
Graph traversals, document lookups, and index scans all reduce to the same primitive: prefix range scans over contiguous byte slices.
dllb/
Cargo.toml # Workspace manifest
PLAN.md # Detailed implementation plan
README.md # This file
crates/
core/ # RecordId, Value, Error, Schema, FieldType::Vector
storage/ # KvStore trait, redb backend, WAL, key encoding
transaction/ # MVCC, transaction manager, conflict detection
document/ # Document model, CRUD, secondary indexes
graph/ # Graph model, edge storage, traversal engine
search/ # Tantivy integration, full-text index management
vector/ # HNSW index, distance metrics, quantization
code-intel/ # AST/MetaAST schemas, code-aware tokenizer
query/ # Lexer, parser, planner, optimizer, executor
server/ # TCP/WebSocket server (binary)
cli/ # Interactive REPL (binary)
| Component | Choice | Rationale |
|---|---|---|
| Storage | redb | Pure Rust, ACID, MVCC, crash-safe, zero C deps |
| Full-text | Tantivy | Lucene-class, BM25, 15K+ stars, MIT |
| Actor system | joerl | Erlang/OTP supervision, GenServer, telemetry |
| Async runtime | Tokio | Industry standard |
| Serialization | MessagePack + JSON | Compact internal, readable external |
| Vector math | SIMD (planned) | Hardware-accelerated distance computation |
- Rust 1.89+ (edition 2024)
- Linux, macOS, or Windows
git clone <repo-url> dllb
cd dllb
cargo build --releasecargo run --release -p dllb-servercargo run --release -p dllb-clicargo test --workspacedllb uses a SQL-like declarative language inspired by SurrealQL:
-- Documents
CREATE user SET name = 'Alice', age = 30;
SELECT name, age FROM user WHERE age > 25;
UPDATE user:alice SET age = 31;
DELETE user:alice;
-- Graphs
RELATE user:alice->purchased->product:widget SET quantity = 2;
SELECT ->purchased->product.name FROM user:alice;
-- Full-text search
SELECT * FROM article WHERE content @@ 'distributed consensus';
-- Vector similarity (KNN)
SELECT id, vector::distance::knn() AS dist
FROM ast_node
WHERE embedding <|10,100|> $query_embedding
ORDER BY dist;
-- Cross-model: all four in one query
SELECT id, name,
vector::distance::knn() AS vec_score,
search::score(1) AS ft_score
FROM ast_node
WHERE kind = 'function'
AND embedding <|20,50|> $query_vec
AND source_text @1@ 'async fn'
AND ->calls->fn_node.module = 'core'
ORDER BY vec_score * 0.6 + ft_score * 0.4 DESC
LIMIT 10;- Project structure and workspace setup
- Core types (RecordId, Value, Schema, Error)
- KvStore trait and key encoding
- redb backend implementation
- Document CRUD with MessagePack serialization, schema validation, secondary indexes
- Graph edge storage and traversal (bidirectional, multi-hop walk, filtered)
- Tantivy full-text integration (BM25 scoring, language stemming, multi-index)
- HNSW vector index (distance metrics, brute-force baseline, in-memory HNSW with recall tests)
- AST/MetaAST code intelligence layer (38 MetaAST node types, code tokenizer, schemas, extraction)
- Query parser and executor (tokenizer, recursive-descent parser, direct executor)
- TCP server and CLI REPL (tokio TCP, line protocol, JSON responses, rustyline REPL)
- Reactivity (LIVE SELECT, events, changefeeds)
- Geo-spatial (R-tree, GeoJSON)
- Object-oriented concepts (inheritance, computed fields)
- Distributed clustering via joerl's EPMD-based node discovery
Contributions welcome. See PLAN.md for the detailed technical plan and current phase.
MIT
