OpenGraphDB v0.6.0

A single-binary embedded graph database in Rust — openCypher + vector + full-text + MCP + RDF. Apache-2.0, no JVM, no AGPL, one file. This is the first public release: the milestone where correctness, reachable authentication, and AI-native retrieval all became true over the wire, not just in the engine.

We lead with honesty. Everything below is backed by a reproducible test or a live transcript, and the "Honest limitations" section is as much a feature of this release as the highlights. Where a competitor would round up, we tell you exactly where the edges are.

Pre-1.0 notice. OpenGraphDB v0.6.0 is single-process and embedded. The API and on-disk format may still change, durability and concurrency are not yet hardened for untrusted multi-tenant network exposure, and several capabilities below are new in this release. Run it where that posture fits — local-first apps, AI agent memory, GraphRAG, a graph that ships inside your binary — and read the limitations before you put it on a shared network.

Why OpenGraphDB

It's the graph database shaped the way an AI actually works: understand knowledge and code, retrieve it efficiently, keep it secured and shareable under access control — all from one file you drop into a binary instead of a cluster you operate.

One Rust binary. No JVM, no sidecar, no server farm. The visual playground SPA ships inside the binary.
openCypher with documented dialect limits, plus RDF (Turtle/OWL) import/export round-trip.
AI-native retrieval: HNSW vectors + BM25 full-text + graph-walk, fused with Reciprocal Rank Fusion (RRF).
A built-in MCP server so AI agents (Claude Desktop, Cursor, the MCP SDKs) connect directly.
Apache-2.0. A graph database you can embed, fork, and ship without an AGPL or JVM tax — the Neo4j alternative for people who don't want a cluster.

Highlights

🔐 Authentication that is actually reachable and enforced

RBAC existed in the engine for releases, but nothing in a shipped binary could activate it — the token path was effectively dead code. v0.6.0 wires it to a real front door:

ogdb user add / grant / revoke / list — creating the first user activates RBAC for that database. Roles are admin, read_write, read_only. user list never prints token values.
--require-auth (or OGDB_REQUIRE_AUTH=1) is fail-closed: the server refuses to start if auth is required but no users exist, and every data route returns 401 to anonymous or invalid requests. /health stays open for liveness.
--require-auth-reads gates reads (/query, /schema, /metrics) too — while the open, anonymous-read playground remains the explicit default so the demo experience is unchanged unless you opt in.
Label-injection hardening: the MCP search_nodes tool validates labels against ^[A-Za-z_][A-Za-z0-9_]*$ before interpolation, so a crafted label is rejected with 400, never executed.

🧠 Vector retrieval, end to end over the wire

The HNSW engine (three distance metrics, 1–4096 dimensions, crash-safe persistence) was always strong — but you couldn't create a vector index from any client. Now you can:

CREATE VECTOR INDEX and CREATE FULLTEXT INDEX parse, route, and execute (previously they failed at the parser).
kNN over the wire: CALL db.index.vector.queryNodes(name, vec, k) returns correctly ranked nearest neighbors.
Numeric Cypher list literals written as embeddings are coerced to indexable vectors at harvest time, so vectors you author in Cypher actually get indexed.

This makes the bring-your-own-vectors hybrid retrieval story true end-to-end: store your embeddings, then fuse vector + graph-walk + BM25 with RRF in one round-trip. (Bring-your-own-model — managed auto-embeddings — is next; see What's next.)

🤝 An MCP server that stock AI clients connect to cleanly

The MCP front door is now spec-conformant for the path real clients use (initialize → notifications/initialized → tools/list → tools/call):

protocolVersion is negotiated, not hardcoded — the server echoes a supported version the client asks for.
Capabilities are objects, notifications correctly get no reply, and tools/call results are wrapped in the proper MCP content envelope with structuredContent.
Clean typed cells: query results come back as native JSON (30, "Alice") instead of double-encoded, type-prefixed strings ("i64:30", "string:Alice").
stdout is pure JSON-RPC — a stray status line that used to corrupt the stream for line-by-line clients now goes to stderr.

Point Claude Desktop or Cursor at a single-file graph and it just works.

📡 Opt-in realtime change feed + a live-polling playground

GET /changes exposes a monotonic mutation counter, enabled with --enable-changes (or OGDB_ENABLE_CHANGES=1). Reads don't bump it; writes do. Disabled by default — when off, the route returns a clear 404 explaining how to turn it on.
The visual playground auto-refreshes when the counter advances: write a node from a separate terminal and watch the canvas update within ~1 second, no manual re-run. The badge advances "Live → Polling" when subscribed, and silently falls back to plain "Live" against a server that hasn't enabled the feed.

✅ Seven engine-correctness fixes + two crash fixes

These are the ones that matter most for anything built on top, because they were silently wrong:

labels(), id(), type(), toString() returned NULL instead of real values — which quietly broke search_nodes and any query relying on them. Now implemented and verified (labels(n) → [Person], not NULL).
ORDER BY <projection alias> (e.g. RETURN n.name AS myname ORDER BY myname) is resolved correctly instead of erroring or falling back to insertion order.
Parser-routing fixes: a trailing semicolon and UNION queries no longer silently fall back to the legacy parser.
Importer phantom nodes: an edge referencing a non-existent node is now rejected instead of fabricating endpoints, and gap-filled node counts are reported instead of happening silently.
Two crash fixes: a client that disconnects mid-write no longer takes the whole server down, and the MCP stdout-pollution issue above is closed.

🛡️ Secure-by-default networking + a TLS deployment recipe

Bolt and gRPC now bind 127.0.0.1 by default (previously 0.0.0.0). Exposing all interfaces requires an explicit --bind 0.0.0.0:<port> and prints a loud warning.
A reverse-proxy TLS recipe ships in SECURITY.md (nginx/Caddy, including SSE buffering and a Bolt L4 note) — the supported secure posture today is bind-loopback + terminate TLS at the proxy. (Native in-process TLS is deliberately deferred — see below.)

📊 Honest, reproducible benchmarks

Measured on a fixed box (i9-10920X, powersave governor, N=5, lower-median; CPU-bound and reproduces closely):

Operation	p50	Notes
Point read (`neighbors()`, 10k graph)	5.8 µs	~166k qps
2-hop traversal	22.9 µs	~48k qps
Hybrid retrieval (RRF)	204 µs	published number is conservative; verified runs ~2× faster
Footprint (10k graph)	—	~28 MB RSS, ~39 MB on disk, sub-second load
Visual playground	—	5,000 nodes @ 58–61 fps (17k edges @ 53–59 fps)

We publish our methodology and the harness to reproduce these — including the numbers that aren't flattering. We do not publish head-to-head "X× faster than $competitor" bars; those comparisons are directional, not measured.

Honest limitations / not yet

We'd rather you find these here than discover them in production.

No native in-process TLS. HTTP, Bolt, and MCP are cleartext on the wire. The supported secure deployment is bind-loopback + a reverse proxy terminating TLS (recipe in SECURITY.md). Native TLS is on the roadmap and was deliberately not half-built — half a TLS stack is worse than none.
Durability is not 1.0-grade. In particular, edge type and properties are not yet WAL-logged and can be lost if the sidecar is corrupted after a crash. Single-process embedded durability, not a replicated cluster.
Concurrency is single-process. This is an embedded database. There is no multi-tenant isolation, per-object ACL, or row/node-level security — process-per-tenant is the supported isolation model.
RBAC is coarse. Database-wide read / write / admin. There is no per-label or per-row enforcement yet, and per-message Bolt/gRPC gating beyond the existing token check is not part of this release.
Bring your own vectors, not your own model (yet). There is no built-in embedder. You supply the vectors; OGDB indexes and retrieves them. Managed auto-embeddings are the next headline.
Not a wire-compatible Neo4j driver drop-in. The Bolt server negotiates v1 only, so modern Neo4j 5.x drivers won't connect. The honest claim is "openCypher dialect familiar to Neo4j users, no JVM, no AGPL, single file" — a migration target, not a drop-in driver replacement.
$param binding is not implemented. To avoid a silent correctness bug, $param now hard-errors loudly instead of mis-resolving to a literal string. Real parameter binding is roadmapped.
Cypher dialect limits. Some reserved-word labels (:Order, :CONTAINS) require escaping; the limitation is documented in the quickstart and migration guide.
Bitemporal / time-travel is not a claimed feature in this release. AT TIME is not yet a verified, shipping capability — it is deliberately left off the highlights rather than advertised as working.
Visualization ceiling. The 2D canvas is verified at 5,000 nodes @ ~58 fps and degrades beyond that. We claim the number we measured, not more.
Language bindings (ogdb-node, ogdb-python) are preview / build-from-source. The first-class AI integration surface is the MCP server, which ships inside the ogdb binary.

Getting started

Install

# From crates.io (binary name: ogdb)
cargo install ogdb-cli

# Or download a prebuilt binary for your platform from the GitHub Release
# (Linux x86_64/arm64, macOS arm64/x86_64, Windows x86_64) and verify it
# against the attached SHA256SUMS.txt before running.

A 60-second tour

# 1. Run a query against an embedded database file
ogdb query mydb.ogdb "CREATE (a:Person {name:'Alice', age:30})"
ogdb query mydb.ogdb "MATCH (n:Person) RETURN n.name AS name, labels(n) AS labels"
#   name=Alice   labels=[Person]

# 2. Serve it over HTTP (binds 127.0.0.1 by default)
ogdb serve mydb.ogdb --http

# 3. Turn on authentication (fail-closed)
ogdb user add alice mydb.ogdb --role admin --token s3cret
ogdb serve mydb.ogdb --http --require-auth
#   anonymous request -> 401 ; Authorization: Bearer s3cret -> 200

# 4. Vector search, end to end
ogdb query mydb.ogdb "CREATE VECTOR INDEX docvec FOR (n:Doc) ON (n.embedding) OPTIONS {dimensions: 3, similarity: 'cosine'}"
# ...insert Doc nodes whose `embedding` is a numeric list...
ogdb query mydb.ogdb "CALL db.index.vector.queryNodes('docvec', [1.0,0.0,0.0], 3) YIELD node, score RETURN node, score ORDER BY score ASC"

# 5. Opt-in realtime change feed
ogdb serve mydb.ogdb --http --enable-changes
#   GET /changes -> {"seq": 0} ; writes bump seq, reads don't

# 6. Connect an AI client over MCP (stdio)
ogdb mcp --stdio mydb.ogdb

Then open the bundled visual playground served by the binary, pick a dataset (MovieLens, Air Routes, Game of Thrones, Wikidata, and the OpenGraphDB codebase graph), and run real Cypher against it — no signup, no separate frontend to host.

If you're coming from Neo4j: most openCypher you know works as-is. Check the migration guide for the documented dialect limits (reserved-word label escaping, $param, Bolt v1) before you port a large workload.

What's next

The roadmap, in rough order — and named honestly as roadmap, not as shipped:

Pluggable embeddings (the next headline). CREATE EMBEDDING MODEL backed by an OpenAI-compatible provider abstraction — OpenAI, Ollama, LM Studio, HF TEI, vLLM, or any custom endpoint — so OGDB embeds your text with your chosen model. Embed-on-write and embed-on-query (db.index.vector.queryText(...)) turn bring-your-own-vectors into bring-your-own-model, with NL-in / results-out hybrid search.
Native in-process TLS and at-rest encryption, so a reverse proxy is optional rather than required.
Modern Bolt (v4/v5) driver compatibility and federation (WAL replication exposure, federated reads).
Durability and concurrency hardening toward a 1.0 that's safe under load and on shared networks.
Finer-grained access control: per-role HTTP boundary enforcement, and a path toward per-object / row-level security and multi-tenancy.
The operational knowledge-graph vision: a graph that an AI agent owns end-to-end — sync collectors, an ops view over a live system, conversation ingestion, and pattern mining — so the database becomes the operational memory of the systems it watches. Shipping as a demonstrated vision today; building toward a shipped capability.

OpenGraphDB is Apache-2.0. Try it, break it, and tell us where the limitations bite — the limitations page is a living document, and that's the point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.6.0

Choose a tag to compare

Sorry, something went wrong.