GitHub - Oeditus/dllb: Multi-Model NoSQL DBMS in Rust

A multi-model NoSQL database built from scratch in Rust.
Documents. Graphs. Full-Text Search. Vector Embeddings. One engine.

Detailed Plan · Architecture · Storage Engine · Crate Reference · Getting Started · License

What is dllb?

dllb is a multi-model NoSQL database management system that natively supports four data models in a single, unified engine:

Documents -- schemaless or schemafull JSON/MessagePack records with secondary B-tree indexes
Native Graphs -- first-class edges with properties, bidirectional traversal, BFS/DFS, path finding, and pattern matching
Full-Text Search -- BM25-scored inverted indexes powered by Tantivy, with configurable analyzers and stemming
Vector Embeddings -- HNSW approximate nearest neighbor index for dense vectors (cosine, L2, dot product), with SIMD-accelerated distance computation

All four models are stored as binary key-value pairs in a single sorted keyspace backed by redb (pure-Rust, ACID, crash-safe, copy-on-write B-trees). Different "models" are simply different key layouts and query patterns over the same byte stream.

Why dllb?

Most real-world applications need more than one data model. A social network needs documents (profiles), graphs (relationships), full-text (search), and vectors (recommendations). The traditional answer is polyglot persistence -- stitching together MongoDB, Neo4j, Elasticsearch, and Pinecone. That means six connection pools, six consistency models, six failure modes, and ETL pipelines to keep them in sync.

dllb eliminates the seams. One query can combine a vector similarity search with a graph traversal and a full-text match:

SELECT id, name,
    vector::distance::knn() AS vec_score,
    search::score(1) AS ft_score
  FROM ast_node
  WHERE embedding <|20,50|> $query_vec
  AND source_text @1@ 'async trait'
  AND ->calls->fn_node.module = 'core'
  ORDER BY (1.0 - vec_score) * 0.6 + ft_score * 0.4 DESC
  LIMIT 10;

Code Intelligence First-Class Citizen

dllb is designed from the ground up as a first-class store for AST and MetaAST embeddings of source code. Each AST node (function, class, module, trait) is a document. Structural relationships (call graph, containment, imports, type references) are graph edges. Code embeddings (CodeBERT, StarCoder, etc.) are vectors. Source text and docstrings are full-text indexed with a code-aware tokenizer that understands camelCase/snake_case boundaries.

This enables queries like:

-- Find functions similar to this one, called by modules importing 'tokio'
SELECT id, name, file_path, vector::distance::knn() AS similarity
  FROM ast_node
  WHERE kind = 'function'
  AND source_embedding <|20,100|> $my_fn_embedding
  AND <-contains<-module<-imports<-module[WHERE name CONTAINS 'tokio']
  ORDER BY similarity
  LIMIT 10;

Architecture

block-beta
  columns 1
  block:client["Client Layer"]
    columns 3
    tcp["TCP/WebSocket"] rest["REST API"] embed["Embedded API"]
  end
  block:query["Query Engine"]
    columns 4
    lexer["Lexer/Parser"] planner["Planner"] optimizer["Optimizer"] executor["Executor"]
  end
  block:model["Model Layer"]
    columns 5
    doc["Document"] graph["Graph"] fts["Full-Text"] vec["Vector/HNSW"] ast["AST/Code Intel"]
  end
  block:actors["Actor System (joerl)"]
    columns 3
    sup["Supervision Trees"] gen["GenServer"] links["Links & Monitors"]
  end
  txn["Transaction Manager (MVCC)"]
  block:storage["Storage Engine"]
    columns 4
    redb["redb (CoW B-trees)"] wal["WAL"] mvcc["MVCC"] compact["Compaction"]
  end

  client --> query --> model --> actors --> txn --> storage

Actor-Based Fault Tolerance

The runtime is structured as a joerl supervision tree -- an Erlang/OTP-inspired actor model providing automatic crash recovery, isolated failure domains, and structured concurrency:

graph TD
  dllb_sup["dllb_sup<br/><i>OneForAll</i>"] --> storage_sup["storage_sup<br/><i>OneForOne</i>"]
  dllb_sup --> index_sup["index_sup<br/><i>OneForOne</i>"]
  dllb_sup --> client_sup["client_sup<br/><i>OneForOne</i>"]

  storage_sup --> StorageWriter["StorageWriter<br/><i>GenServer</i>"]

  index_sup --> FtsActor["FtsActor<br/><i>GenServer</i>"]
  index_sup --> HnswActor["HnswActor<br/><i>GenServer</i>"]
  index_sup --> GcActor["GcActor<br/><i>periodic</i>"]

  client_sup --> ConnectionActor["ConnectionActor<br/><i>per client</i>"]

Actors manage stateful subsystems (storage writes, index maintenance, client connections, background GC). Hot-path operations (key encoding, distance computation, query parsing) remain direct function calls -- no mailbox overhead.

Key Encoding

All data lives in a single sorted keyspace. Type tags in keys distinguish models:

Tag	Byte	Purpose
`*`	0x2A	Document record
`~`	0x7E	Graph edge pointer
`+`	0x2B	Index entry (B-tree, HNSW, full-text)
`!`	0x21	Metadata (schema, table definitions)

Key structure: [namespace][0x00][database][0x00][table][tag][record_id][...extra]

Graph traversals, document lookups, and index scans all reduce to the same primitive: prefix range scans over contiguous byte slices.

Project Structure

dllb/
  Cargo.toml              # Workspace manifest
  PLAN.md                 # Detailed implementation plan
  README.md               # This file
  crates/
    core/                 # RecordId, Value, Error, Schema, FieldType::Vector
    storage/              # KvStore trait, redb backend, WAL, key encoding
    transaction/          # MVCC, transaction manager, conflict detection
    document/             # Document model, CRUD, secondary indexes
    graph/                # Graph model, edge storage, traversal engine
    search/               # Tantivy integration, full-text index management
    vector/               # HNSW index, distance metrics, quantization
    code-intel/           # AST/MetaAST schemas, code-aware tokenizer
    query/                # Lexer, parser, planner, optimizer, executor
    server/               # TCP/WebSocket server (binary)
    cli/                  # Interactive REPL (binary)

Technology Stack

Component	Choice	Rationale
Storage	redb	Pure Rust, ACID, MVCC, crash-safe, zero C deps
Full-text	Tantivy	Lucene-class, BM25, 15K+ stars, MIT
Actor system	joerl	Erlang/OTP supervision, GenServer, telemetry
Async runtime	Tokio	Industry standard
Serialization	MessagePack + JSON	Compact internal, readable external
Vector math	SIMD (planned)	Hardware-accelerated distance computation

Getting Started

Prerequisites

Rust 1.89+ (edition 2024)
Linux, macOS, or Windows

Build

git clone <repo-url> dllb
cd dllb
cargo build --release

Run the Server

cargo run --release -p dllb-server

Run the CLI

cargo run --release -p dllb-cli

Run Tests

cargo test --workspace

Query Language

dllb uses a SQL-like declarative language inspired by SurrealQL:

-- Documents
CREATE user SET name = 'Alice', age = 30;
SELECT name, age FROM user WHERE age > 25;
UPDATE user:alice SET age = 31;
DELETE user:alice;

-- Graphs
RELATE user:alice->purchased->product:widget SET quantity = 2;
SELECT ->purchased->product.name FROM user:alice;

-- Full-text search
SELECT * FROM article WHERE content @@ 'distributed consensus';

-- Vector similarity (KNN)
SELECT id, vector::distance::knn() AS dist
  FROM ast_node
  WHERE embedding <|10,100|> $query_embedding
  ORDER BY dist;

-- Cross-model: all four in one query
SELECT id, name,
    vector::distance::knn() AS vec_score,
    search::score(1) AS ft_score
  FROM ast_node
  WHERE kind = 'function'
  AND embedding <|20,50|> $query_vec
  AND source_text @1@ 'async fn'
  AND ->calls->fn_node.module = 'core'
  ORDER BY vec_score * 0.6 + ft_score * 0.4 DESC
  LIMIT 10;

Roadmap

Prototype (In Progress)

Future

Reactivity (LIVE SELECT, events, changefeeds)
Geo-spatial (R-tree, GeoJSON)
Object-oriented concepts (inheritance, computed fields)
Distributed clustering via joerl's EPMD-based node discovery

Contributing

Contributions welcome. See PLAN.md for the detailed technical plan and current phase.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
crates		crates
docs		docs
stuff		stuff
.dllb_history		.dllb_history
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
PLAN.md		PLAN.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is dllb?

Why dllb?

Code Intelligence First-Class Citizen

Architecture

Actor-Based Fault Tolerance

Key Encoding

Project Structure

Technology Stack

Getting Started

Prerequisites

Build

Run the Server

Run the CLI

Run Tests

Query Language

Roadmap

Prototype (In Progress)

Future

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What is dllb?

Why dllb?

Code Intelligence First-Class Citizen

Architecture

Actor-Based Fault Tolerance

Key Encoding

Project Structure

Technology Stack

Getting Started

Prerequisites

Build

Run the Server

Run the CLI

Run Tests

Query Language

Roadmap

Prototype (In Progress)

Future

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages