This is a rough, work-in-progress redesign of the node-neo4j API.
This driver will aim to generally be stateless and functional, inspired by React.js. Some context doesn't typically change, though (e.g. the URL to the database), so this driver supports maintaining such context as simple state.
This driver continues to accept standard Node.js callbacks of the form
function (result, error)
, for straightforward development using standard
Node.js control flow tools and libraries.
In addition, however, this driver also supports streaming use cases now. Where specified, callbacks can be omitted to have get back streams instead. And importantly, in that case, this driver will take special care not to buffer results in memory, ensuring high performance and low memory usage.
This v2 driver converges on an options-/hash-based API for most/all methods. This both conveys clearer intent and leaves room for future additions. For the sake of this rough spec, you can assume most options are optional, but this will be documented more precisely in the final API documentation.
Let me make a "connection" to the database.
var neo4j = require('neo4j');
var db = new neo4j.GraphDatabase({
url: 'http://localhost:7474',
auth: null, // optional; see below for more details
headers: {}, // optional defaults, e.g. User-Agent
proxy: null, // optional URL
agent: null, // optional http.Agent instance, for custom socket pooling
});
To specify auth credentials, the username and password can be provided either
directly in the URL, e.g. 'http://user:pass@localhost:7474'
,
or via an auth
option, which can either be a 'username:password'
string
or a {username, password}
object. The auth option takes precedence.
If credentials are given in any form, they will be normalized to a
{username, password}
object and set as the auth
property on the
constructed GraphDatabase
instance.
In addition, the url
property will have its credentials cleared if an auth
option was provided as well.
An empty string/object auth
can be provided to clear auth in this way;
the auth
property will be normalized to null
in this case.
The current v1 of the driver is hypermedia-driven, so it discovers the
/db/data
endpoint. We may hardcode that in v2 for efficiency and simplicity,
but if we do, do we need to make that customizable/overridable too?
Let me manage database authentication.
Note that this section only matters for managing auth, not specifying it.
function cbBool(err, bool) {}
function cbDone(err) {}
db.checkPasswordChangeNeeded(cbBool);
db.changePassword({password}, cbDone);
For convenience, changing the password will automatically update the auth
property on the GraphDatabase
instance, so that subsequent requests will work.
Let me make arbitrary HTTP requests to the REST API.
This will allow callers to make any API requests; no one will be blocked by this driver not supporting a particular API.
It'll also allow callers to interface with arbitrary Neo4j plugins (e.g. neo4j-spatial), even custom ones.
function cb(err, body) {}
var req = db.http({method, path, headers, body, raw}, cb);
This method returns a Request.js
instance, which is a duplex stream
to and from which both request and response data can be streamed.
(Request.js
provides a number of benefits over the native HTTP
ClientRequest
and IncomingMessage
classes, such as proxy support,
gzip decompression, simpler writes, and a single, unified 'error'
event.)
In addition, if a callback is given, it will be called with the final result.
By default, this result will be the HTTP response body (parsed as JSON),
with nodes and relationships transformed to
Node
and Relationship
objects,
or an Error
object both if there was a native error (e.g. DNS)
or if the HTTP status code is 4xx
or 5xx
.
Alternately, raw: true
can be passed to have the callback receive the full
HTTP response, with statusCode
, headers
, and body
properties.
The body
will still be parsed as JSON, but nodes, relationships, and errors
will not be transformed to node-neo4j objects in this case.
In addition, 4xx
and 5xx
status code will not yield an error.
Give me useful objects, instead of raw JSON.
Transform the Neo4j REST API's raw JSON format for nodes and relationships
into useful Node
and Relationship
objects.
These objects don't have to do anything / have any mutating methods;
they're just container objects that serve to organize information.
class Node {_id, labels, properties}
class Relationship {_id, type, properties, _fromId, _toId}
TODO: Transform path JSON into Path
objects too?
We have this in v1, but is it really useful and functional these days?
E.g. see issue #57.
Importantly, using Neo4j's native IDs is strongly discouraged these days, so v2 of this driver makes that an explicit private property. It's still there if you need it, e.g. for the old-school traversal API.
That extends to relationships: they explicitly do not link to Node
instances anymore (since that data is frequently not available), only IDs.
Those IDs are thus private too.
Also importantly, there's no more notion of persistence or updating
(e.g. no more save()
method) in this v2 API.
If you want to update data and persist it back to Neo4j,
you do that the same way as you would without these objects.
This driver is not an ORM/OGM.
Those problems must be solved separately.
It's similarly no longer possible to create or get an instance of either of
these classes that represents data that isn't persisted yet (like the current
v1's synchronous db.createNode()
method returns).
These classes are only instantiated internally, and only returned async'ly
from database responses.
Let me make simple, parametrized Cypher queries.
function cb(err, results) {}
var stream = db.cypher({query, params, headers, lean}, cb);
// Alternate simple version -- no params, no headers, not lean:
var stream = db.cypher(query, cb);
If a callback is given, it'll be called with the array of results,
or an error if there is one.
Alternately, the results can be streamed back by omitting the callback.
In that case, a stream will be returned, which will emit a data
event
for each result, or an error
event if there is one.
In both cases, each result will be a dictionary from column name to
row value for that column.
In addition, by default, nodes and relationships will be transformed to
Node
and Relationship
objects.
If you don't need the full knowledge of node and relationship metadata
(labels, types, native IDs), you can bypass that by specifying lean: true
,
which will return just property data, for a potential performance gain.
TODO: Should we formalize the streaming case into a documented Stream class? Or should we just link to cypher-stream?
TODO: Should we allow access to other underlying data formats, e.g. "graph"?
Let me make multiple, separate Cypher queries in one network request.
function cbMany(err, arraysOfResults) {}
var streams = db.cypher({queries, headers}, cbMany);
// Alternate simple version -- no custom headers:
var streams = db.cypher(queries, cbMany);
In both cases, queries
is an array of queries, where each query can be a
{query, params, lean}
object or a simple string.
Important: batch queries are executed transactionally — either they all succeed, or they all fail.
If a callback is given, it'll be called with an array containing an array of results for each query (in matching order), or an error if there is any. Alternately, the results can be streamed back by omitting the callback. In that case, an array will be returned, containing a stream for each query (in matching order).
TODO: Is it valuable to return a stream of results across all queries?
Let me make multiple queries, across multiple network requests, all within a single transaction.
var tx = db.beginTransaction();
This method is named "begin" instead of "create" to reflect that it returns immediately, and has not actually persisted anything to the database yet.
This method returns a Transaction
object, which mainly just encapsulates the
state of a "transaction ID" returned by Neo4j from the first query.
In addition, though, Transaction
instances expose helpful properties to let
you know a transaction's precise state, e.g. whether it was automatically
rolled back by Neo4j due to a fatal error.
This precise state tracking also lets this driver prevent predictable errors,
which may be signs of code bugs, and provide more helpful error messaging.
class Transaction {_id, expiresAt, expiresIn, state}
# `expiresAt` is a Date, while `expiresIn` is a millisecond count.
# `state` is one of 'open', 'pending', 'committed, 'rolled back', or 'expired',
# all of which are constant properties on instances, e.g. `STATE_ROLLED_BACK`.
function cbResults(err, results) {}
function cbDone(err) {}
var stream = tx.cypher({query, params, headers, lean, commit}, cbResults);
tx.commit(cbDone);
tx.rollback(cbDone);
tx.renew(cbDone);
The transactional cypher
method is just like the regular cypher
method
(e.g. it supports simple strings, as well as batching),
except that it supports an additional commit
option, which can be set to
true
to automatically attempt to commit the transaction with this query.
Otherwise, transactions can be committed and rolled back independently. They can also be explicitly renewed, as Neo4j expires them if idle too long, but every query in a transaction implicitly renews the transaction as well.
Throw meaningful and semantic errors.
Background reading — a huge source of inspiration that informs much of the API design here:
http://www.joyent.com/developers/node/design/errors
Neo4j v2 provides excellent error info for (transactional) Cypher requests:
http://neo4j.com/docs/stable/status-codes.html
Essentially, errors are grouped into three "classifications": client errors, database errors, and transient errors. There are additional "categories" and "titles", but the high-level classifications are just the right level of granularity for decision-making (e.g. whether to convey the error to the user, fail fast, or retry).
{
"code": "Neo.ClientError.Statement.EntityNotFound",
"message": "Node with id 741073"
}
Unfortunately, other endpoints return errors in a completely different format and style. E.g.:
- 404
NodeNotFoundException
- 409
OperationFailureException
- 400
PropertyValueException
- 400
BadInputException
w/ nestedConstraintViolationException
andIllegalTokenNameException
{
"exception": "BadInputException",
"fullname": "org.neo4j.server.rest.repr.BadInputException",
"message": "Unable to add label, see nested exception.",
"stacktrace": [
"org.neo4j.server.rest.web.DatabaseActions.addLabelToNode(DatabaseActions.java:328)",
"org.neo4j.server.rest.web.RestfulGraphDatabase.addNodeLabel(RestfulGraphDatabase.java:447)",
"java.lang.reflect.Method.invoke(Method.java:606)",
"org.neo4j.server.rest.transactional.TransactionalRequestDispatcher.dispatch(TransactionalRequestDispatcher.java:139)",
"java.lang.Thread.run(Thread.java:744)"
],
"cause": {...}
}
One important distinction is that (transactional) Cypher errors don't have any associated HTTP status code (since the results are streamed), while the "legacy" exceptions do. Fortunately, HTTP 4xx and 5xx status codes map almost directly to "client error" and "database error" classifications, while "transient" errors can be detected by name.
So when it comes to designing this driver's v2 error API, there are two open questions:
-
Should this driver abstract away this discrepancy in Neo4j error formats, and present a uniform error API across the two? Or should it expose these two different formats?
-
Should this driver return standard
Error
objects decorated w/ e.g. aneo4j
property? Or should it define its ownError
subclasses?
The current design of this API chooses to present a uniform (but minimal) API
using Error
subclasses. Importantly:
-
The subclasses correspond to the client/database/transient classifications mentioned above, for easy decision-making via either the
instanceof
operator or thename
property. -
Special care is taken to provide
message
andstack
properties rich in info, so that no special serialization is needed to debug production errors. That is, simply logging thestack
(as most error handlers tend to do) should get you meaningful and useful data. -
Structured info returned by Neo4j is also available on the
Error
instances under aneo4j
property, for deeper introspection and analysis if desired.
class Error {name, message, stack, neo4j}
class ClientError extends Error
class DatabaseError extends Error
class TransientError extends Error
Note that these class names do not have a Neo4j
prefix, since they'll only
be exposed via this driver's module.exports
, but for convenience, instances'
name
properties do have a neo4j.
prefix, e.g. neo4j.ClientError
.
Let me manage the database schema: labels, indexes, and constraints.
Much of this is already possible with Cypher, but a few things aren't (e.g. listing all labels). Even for the things that are, this driver can provide convenience methods for those boilerplate Cypher queries.
function cb(err, labels) {}
db.getLabels(cb);
Methods to manipulate labels on nodes are purposely not provided, as those start to get into the territory of operations that should really be performed atomically within a Cypher query. (E.g. labels should generally be set on nodes right when they're created.) Conveniences for those kinds of things are perfect for an ORM/OGM to solve.
Labels are simple strings.
function cbOne(err, index) {}
function cbMany(err, indexes) {}
function cbBool(err, bool) {}
function cbDone(err) {}
db.getIndexes(cbMany); // across all labels
db.getIndexes({label}, cbMany); // for a particular label
db.hasIndex({label, property}, cbBool);
db.createIndex({label, property}, cbOne);
// callback receives null if this index already exists
db.dropIndex({label, property}, cbBool);
// callback receives true if this index existed and was dropped,
// false if it didn't exist in the first place
Returned indexes are minimal Index
objects:
class Index {label, property}
TODO: Neo4j's REST API actually takes and returns arrays of properties, but AFAIK, all indexes today only deal with a single property. Should multiple properties be supported?
TODO: Today, there's no need for a db.getIndex()
method, since the parameters
you need to fetch an index are the same ones that are returned.
But if Neo4j adds extra info to the response (e.g. online/offline status),
we should add that method, along with that extra info in our Index
class.
The only constraint type implemented by Neo4j today is the uniqueness constraint, so this API assumes that. Because of that, it's possible that this API may have to break whenever new constraint types are added (whatever they may be).
function cbOne(err, constraint) {}
function cbMany(err, constraints) {}
function cbBool(err, bool) {}
function cbDone(err) {}
db.getConstraints(cbMany); // across all labels
db.getConstraints({label}, cbMany); // for a particular label
db.hasConstraint({label, property}, cbBool);
db.createConstraint({label, property}, cbOne);
// callback receives null if this constraint already exists
db.dropConstraint({label, property}, cbDone);
// callback receives true if this constraint existed and was dropped,
// false if it didn't exist in the first place
Returned constraints are minimal Constraint
objects:
class Constraint {label, property}
TODO: Neo4j's REST API actually takes and returns arrays of properties, but uniqueness constraints today only deal with a single property. Should multiple properties be supported?
TODO: Today, there's no need for a db.getConstraint()
method, since the
parameters you need to fetch a constraint are the same ones that are returned.
But if Neo4j adds extra info to the response (e.g. online/offline status),
we should add that method, along with that extra info in our Constraint
class.
function cbKeys(err, keys) {}
function cbTypes(err, types) {}
db.getPropertyKeys(cbKeys);
db.getRelationshipTypes(cbTypes);
Neo4j v2's constraint-based indexing has yet to implement much of the functionality provided by Neo4j v1's legacy indexing (e.g. relationships, support for arrays, fulltext indexing). This driver thus provides legacy indexing APIs.
function cbOne(err, index) {}
function cbMany(err, indexes) {}
function cbDone(err) {}
db.getLegacyNodeIndexes(cbMany);
db.getLegacyNodeIndex({name}, cbOne);
db.createLegacyNodeIndex({name, config}, cbOne);
db.deleteLegacyNodeIndex({name}, cbDone);
db.getLegacyRelationshipIndexes(cbMany);
db.getLegacyRelationshipIndex({name}, cbOne);
db.createLegacyRelationshipIndex({name, config}, cbOne);
db.deleteLegacyRelationshipIndex({name}, cbDone);
Both returned legacy node indexes and legacy relationship indexes are
minimal LegacyIndex
objects:
class LegacyIndex {name, config}
The config
property is e.g. {provider: 'lucene', type: 'fulltext'}
;
full documentation here.
function cbOne(err, node_or_rel) {}
function cbMany(err, nodes_or_rels) {}
function cbDone(err) {}
db.addNodeToLegacyIndex({name, key, value, _id}, cbOne);
db.getNodesFromLegacyIndex({name, key, value}, cbMany); // key-value lookup
db.getNodesFromLegacyIndex({name, query}, cbMany); // arbitrary Lucene query
db.removeNodeFromLegacyIndex({name, key, value, _id}, cbDone); // key, value optional
db.addRelationshipToLegacyIndex({name, key, value, _id}, cbOne);
db.getRelationshipsFromLegacyIndex({name, key, value}, cbMany); // key-value lookup
db.getRelationshipsFromLegacyIndex({name, query}, cbMany); // arbitrary Lucene query
db.removeRelationshipFromLegacyIndex({name, key, value, _id}, cbDone); // key, value optional
Neo4j v1's legacy indexing provides a uniqueness constraint for adding (existing) and creating (new) nodes and relationships. It has two modes:
-
"Get or create": if an existing node or relationship is found for the given key and value, return it, otherwise add this node or relationship (creating it if it's new).
-
"Create or fail": if an existing node or relationship is found for the given key and value, fail, otherwise add this node or relationship (creating it if it's new).
For adding existing nodes or relationships, simply pass unique: true
to the
add
method.
function cb(err, node_or_rel) {}
db.addNodeToLegacyIndex({name, key, value, _id, unique: true}, cb);
db.addRelationshipToLegacyIndex({name, key, value, _id, unique: true}, cb);
This uses the "create or fail" mode. It's hard to imagine a real-world use case for "get or create" when adding existing nodes, but please offer feedback if you have one.
For creating new nodes or relationships, the create
method below corresponds
with "create or fail", while getOrCreate
corresponds with "get or create":
function cb(err, node_or_rel) {}
db.createNodeFromLegacyIndex({name, key, value, properties}, cb);
db.getOrCreateNodeFromLegacyIndex({name, key, value, properties}, cb);
db.createRelationshipFromLegacyIndex({name, key, value, type, properties, _fromId, _toId}, cb);
db.getOrCreateRelationshipFromLegacyIndex({name, key, value, type, properties, _fromId, _toId}, cb);
Neo4j provides two automatic legacy indexes:
node_auto_index
and relationship_auto_index
.
Instead of hardcoding those index names in your app,
Neo4j provides separate legacy auto-indexing APIs,
which this driver exposes as well.
The APIs are effectively the same as the above;
just replace LegacyIndex
with LegacyAutoIndex
in all method names,
then omit the name
parameter.
function cbOne(err, node_or_rel) {}
function cbMany(err, nodes_or_rels) {}
db.getNodesFromLegacyAutoIndex({key, value}, cbMany); // key-value lookup
db.getNodesFromLegacyAutoIndex({query}, cbMany); // arbitrary Lucene query
db.createNodeFromLegacyAutoIndex({key, value, properties}, cbOne);
db.getOrCreateNodeFromLegacyAutoIndex({key, value, properties}, cbOne);
db.getRelationshipsFromLegacyAutoIndex({key, value}, cbMany); // key-value lookup
db.getRelationshipsFromLegacyAutoIndex({query}, cbMany); // arbitrary Lucene query
db.createRelationshipFromLegacyAutoIndex({key, value, type, properties, _fromId, _toId}, cbOne);
db.getOrCreateRelationshipFromLegacyAutoIndex({key, value, type, properties, _fromId, _toId}, cbOne);