Skip to content

v0.6.0

Latest

Choose a tag to compare

@github-actions github-actions released this 14 Jun 18:38

OpenGraphDB v0.6.0

A single-binary embedded graph database in Rust — openCypher + vector + full-text + MCP + RDF. Apache-2.0, no JVM, no AGPL, one file. This is the first public release: the milestone where correctness, reachable authentication, and AI-native retrieval all became true over the wire, not just in the engine.

We lead with honesty. Everything below is backed by a reproducible test or a live transcript, and the "Honest limitations" section is as much a feature of this release as the highlights. Where a competitor would round up, we tell you exactly where the edges are.

Pre-1.0 notice. OpenGraphDB v0.6.0 is single-process and embedded. The API and on-disk format may still change, durability and concurrency are not yet hardened for untrusted multi-tenant network exposure, and several capabilities below are new in this release. Run it where that posture fits — local-first apps, AI agent memory, GraphRAG, a graph that ships inside your binary — and read the limitations before you put it on a shared network.


Why OpenGraphDB

It's the graph database shaped the way an AI actually works: understand knowledge and code, retrieve it efficiently, keep it secured and shareable under access control — all from one file you drop into a binary instead of a cluster you operate.

  • One Rust binary. No JVM, no sidecar, no server farm. The visual playground SPA ships inside the binary.
  • openCypher with documented dialect limits, plus RDF (Turtle/OWL) import/export round-trip.
  • AI-native retrieval: HNSW vectors + BM25 full-text + graph-walk, fused with Reciprocal Rank Fusion (RRF).
  • A built-in MCP server so AI agents (Claude Desktop, Cursor, the MCP SDKs) connect directly.
  • Apache-2.0. A graph database you can embed, fork, and ship without an AGPL or JVM tax — the Neo4j alternative for people who don't want a cluster.

Highlights

🔐 Authentication that is actually reachable and enforced

RBAC existed in the engine for releases, but nothing in a shipped binary could activate it — the token path was effectively dead code. v0.6.0 wires it to a real front door:

  • ogdb user add / grant / revoke / list — creating the first user activates RBAC for that database. Roles are admin, read_write, read_only. user list never prints token values.
  • --require-auth (or OGDB_REQUIRE_AUTH=1) is fail-closed: the server refuses to start if auth is required but no users exist, and every data route returns 401 to anonymous or invalid requests. /health stays open for liveness.
  • --require-auth-reads gates reads (/query, /schema, /metrics) too — while the open, anonymous-read playground remains the explicit default so the demo experience is unchanged unless you opt in.
  • Label-injection hardening: the MCP search_nodes tool validates labels against ^[A-Za-z_][A-Za-z0-9_]*$ before interpolation, so a crafted label is rejected with 400, never executed.

🧠 Vector retrieval, end to end over the wire

The HNSW engine (three distance metrics, 1–4096 dimensions, crash-safe persistence) was always strong — but you couldn't create a vector index from any client. Now you can:

  • CREATE VECTOR INDEX and CREATE FULLTEXT INDEX parse, route, and execute (previously they failed at the parser).
  • kNN over the wire: CALL db.index.vector.queryNodes(name, vec, k) returns correctly ranked nearest neighbors.
  • Numeric Cypher list literals written as embeddings are coerced to indexable vectors at harvest time, so vectors you author in Cypher actually get indexed.

This makes the bring-your-own-vectors hybrid retrieval story true end-to-end: store your embeddings, then fuse vector + graph-walk + BM25 with RRF in one round-trip. (Bring-your-own-model — managed auto-embeddings — is next; see What's next.)

🤝 An MCP server that stock AI clients connect to cleanly

The MCP front door is now spec-conformant for the path real clients use (initializenotifications/initializedtools/listtools/call):

  • protocolVersion is negotiated, not hardcoded — the server echoes a supported version the client asks for.
  • Capabilities are objects, notifications correctly get no reply, and tools/call results are wrapped in the proper MCP content envelope with structuredContent.
  • Clean typed cells: query results come back as native JSON (30, "Alice") instead of double-encoded, type-prefixed strings ("i64:30", "string:Alice").
  • stdout is pure JSON-RPC — a stray status line that used to corrupt the stream for line-by-line clients now goes to stderr.

Point Claude Desktop or Cursor at a single-file graph and it just works.

📡 Opt-in realtime change feed + a live-polling playground

  • GET /changes exposes a monotonic mutation counter, enabled with --enable-changes (or OGDB_ENABLE_CHANGES=1). Reads don't bump it; writes do. Disabled by default — when off, the route returns a clear 404 explaining how to turn it on.
  • The visual playground auto-refreshes when the counter advances: write a node from a separate terminal and watch the canvas update within ~1 second, no manual re-run. The badge advances "Live → Polling" when subscribed, and silently falls back to plain "Live" against a server that hasn't enabled the feed.

✅ Seven engine-correctness fixes + two crash fixes

These are the ones that matter most for anything built on top, because they were silently wrong:

  • labels(), id(), type(), toString() returned NULL instead of real values — which quietly broke search_nodes and any query relying on them. Now implemented and verified (labels(n)[Person], not NULL).
  • ORDER BY <projection alias> (e.g. RETURN n.name AS myname ORDER BY myname) is resolved correctly instead of erroring or falling back to insertion order.
  • Parser-routing fixes: a trailing semicolon and UNION queries no longer silently fall back to the legacy parser.
  • Importer phantom nodes: an edge referencing a non-existent node is now rejected instead of fabricating endpoints, and gap-filled node counts are reported instead of happening silently.
  • Two crash fixes: a client that disconnects mid-write no longer takes the whole server down, and the MCP stdout-pollution issue above is closed.

🛡️ Secure-by-default networking + a TLS deployment recipe

  • Bolt and gRPC now bind 127.0.0.1 by default (previously 0.0.0.0). Exposing all interfaces requires an explicit --bind 0.0.0.0:<port> and prints a loud warning.
  • A reverse-proxy TLS recipe ships in SECURITY.md (nginx/Caddy, including SSE buffering and a Bolt L4 note) — the supported secure posture today is bind-loopback + terminate TLS at the proxy. (Native in-process TLS is deliberately deferred — see below.)

📊 Honest, reproducible benchmarks

Measured on a fixed box (i9-10920X, powersave governor, N=5, lower-median; CPU-bound and reproduces closely):

Operation p50 Notes
Point read (neighbors(), 10k graph) 5.8 µs ~166k qps
2-hop traversal 22.9 µs ~48k qps
Hybrid retrieval (RRF) 204 µs published number is conservative; verified runs ~2× faster
Footprint (10k graph) ~28 MB RSS, ~39 MB on disk, sub-second load
Visual playground 5,000 nodes @ 58–61 fps (17k edges @ 53–59 fps)

We publish our methodology and the harness to reproduce these — including the numbers that aren't flattering. We do not publish head-to-head "X× faster than $competitor" bars; those comparisons are directional, not measured.


Honest limitations / not yet

We'd rather you find these here than discover them in production.

  • No native in-process TLS. HTTP, Bolt, and MCP are cleartext on the wire. The supported secure deployment is bind-loopback + a reverse proxy terminating TLS (recipe in SECURITY.md). Native TLS is on the roadmap and was deliberately not half-built — half a TLS stack is worse than none.
  • Durability is not 1.0-grade. In particular, edge type and properties are not yet WAL-logged and can be lost if the sidecar is corrupted after a crash. Single-process embedded durability, not a replicated cluster.
  • Concurrency is single-process. This is an embedded database. There is no multi-tenant isolation, per-object ACL, or row/node-level security — process-per-tenant is the supported isolation model.
  • RBAC is coarse. Database-wide read / write / admin. There is no per-label or per-row enforcement yet, and per-message Bolt/gRPC gating beyond the existing token check is not part of this release.
  • Bring your own vectors, not your own model (yet). There is no built-in embedder. You supply the vectors; OGDB indexes and retrieves them. Managed auto-embeddings are the next headline.
  • Not a wire-compatible Neo4j driver drop-in. The Bolt server negotiates v1 only, so modern Neo4j 5.x drivers won't connect. The honest claim is "openCypher dialect familiar to Neo4j users, no JVM, no AGPL, single file" — a migration target, not a drop-in driver replacement.
  • $param binding is not implemented. To avoid a silent correctness bug, $param now hard-errors loudly instead of mis-resolving to a literal string. Real parameter binding is roadmapped.
  • Cypher dialect limits. Some reserved-word labels (:Order, :CONTAINS) require escaping; the limitation is documented in the quickstart and migration guide.
  • Bitemporal / time-travel is not a claimed feature in this release. AT TIME is not yet a verified, shipping capability — it is deliberately left off the highlights rather than advertised as working.
  • Visualization ceiling. The 2D canvas is verified at 5,000 nodes @ ~58 fps and degrades beyond that. We claim the number we measured, not more.
  • Language bindings (ogdb-node, ogdb-python) are preview / build-from-source. The first-class AI integration surface is the MCP server, which ships inside the ogdb binary.

Getting started

Install

# From crates.io (binary name: ogdb)
cargo install ogdb-cli

# Or download a prebuilt binary for your platform from the GitHub Release
# (Linux x86_64/arm64, macOS arm64/x86_64, Windows x86_64) and verify it
# against the attached SHA256SUMS.txt before running.

A 60-second tour

# 1. Run a query against an embedded database file
ogdb query mydb.ogdb "CREATE (a:Person {name:'Alice', age:30})"
ogdb query mydb.ogdb "MATCH (n:Person) RETURN n.name AS name, labels(n) AS labels"
#   name=Alice   labels=[Person]

# 2. Serve it over HTTP (binds 127.0.0.1 by default)
ogdb serve mydb.ogdb --http

# 3. Turn on authentication (fail-closed)
ogdb user add alice mydb.ogdb --role admin --token s3cret
ogdb serve mydb.ogdb --http --require-auth
#   anonymous request -> 401 ; Authorization: Bearer s3cret -> 200

# 4. Vector search, end to end
ogdb query mydb.ogdb "CREATE VECTOR INDEX docvec FOR (n:Doc) ON (n.embedding) OPTIONS {dimensions: 3, similarity: 'cosine'}"
# ...insert Doc nodes whose `embedding` is a numeric list...
ogdb query mydb.ogdb "CALL db.index.vector.queryNodes('docvec', [1.0,0.0,0.0], 3) YIELD node, score RETURN node, score ORDER BY score ASC"

# 5. Opt-in realtime change feed
ogdb serve mydb.ogdb --http --enable-changes
#   GET /changes -> {"seq": 0} ; writes bump seq, reads don't

# 6. Connect an AI client over MCP (stdio)
ogdb mcp --stdio mydb.ogdb

Then open the bundled visual playground served by the binary, pick a dataset (MovieLens, Air Routes, Game of Thrones, Wikidata, and the OpenGraphDB codebase graph), and run real Cypher against it — no signup, no separate frontend to host.

If you're coming from Neo4j: most openCypher you know works as-is. Check the migration guide for the documented dialect limits (reserved-word label escaping, $param, Bolt v1) before you port a large workload.


What's next

The roadmap, in rough order — and named honestly as roadmap, not as shipped:

  • Pluggable embeddings (the next headline). CREATE EMBEDDING MODEL backed by an OpenAI-compatible provider abstraction — OpenAI, Ollama, LM Studio, HF TEI, vLLM, or any custom endpoint — so OGDB embeds your text with your chosen model. Embed-on-write and embed-on-query (db.index.vector.queryText(...)) turn bring-your-own-vectors into bring-your-own-model, with NL-in / results-out hybrid search.
  • Native in-process TLS and at-rest encryption, so a reverse proxy is optional rather than required.
  • Modern Bolt (v4/v5) driver compatibility and federation (WAL replication exposure, federated reads).
  • Durability and concurrency hardening toward a 1.0 that's safe under load and on shared networks.
  • Finer-grained access control: per-role HTTP boundary enforcement, and a path toward per-object / row-level security and multi-tenancy.
  • The operational knowledge-graph vision: a graph that an AI agent owns end-to-end — sync collectors, an ops view over a live system, conversation ingestion, and pattern mining — so the database becomes the operational memory of the systems it watches. Shipping as a demonstrated vision today; building toward a shipped capability.

OpenGraphDB is Apache-2.0. Try it, break it, and tell us where the limitations bite — the limitations page is a living document, and that's the point.