Different behavior between Tinkergraph, Neo4j (embedded) and neo4j-gremlin-bolt #52

dlutz2 · 2017-03-31T18:06:30Z

In trying to transition from using Neo4j as a local store to a remote store using neo4j-gremlin-bolt, we have noticed that neo4j-gremlin-bolt behaves differently from Tinkergraph and Neo4j (embedded) when creating vertices and these vertices becoming visible in the (in memory) graph. The attached code illustrates the difference.
NeoBoltTest.txt

The expected behavior is that at the end of the loop, the in-memory graph contains 10 vertices, which is what is seen with Tinkergraph and Neo4j (embedded). With neo4j-gremlin-bolt, the vertex added does not seem to be immediately visible in the graph. The vertices do appear in the db and a second run of the code pulls them back but otherwise the behavior is the same.,
I have tried various combinations of try-with-resource, commit() and close(). The only combination I have found that produces the expected behavior is to commit() and then close() the graph after every addVertex(), which seems odd (although there is an issue with Tinkerpop regarding the vagueness of the semantics of close(): https://issues.apache.org/jira/browse/TINKERPOP-789 )

Perhaps I have missed something?

rjbaucells · 2017-03-31T18:14:09Z

The neo4j-gremlin-bolt driver does not send any CYPHER statements to the server (CREATE, MERGE) while the transaction is open, it does a flush on commit updating the server with the in-memory changes.

dlutz2 · 2017-03-31T18:52:13Z

I think the difference is in the in-memory behavior e.g. adding a vertex and then immediately checking to see if that vertex is there before doing a commit.
Does that mean that any in-memory changes are not reflected in the in-memory graph until they have been committed and somehow "brought back" in memory?

rjbaucells · 2017-03-31T19:26:52Z

If you add a Vertex it will be in memory only until you commit. There is no need to fetch the vertices after commit, they will stay in memory until the Graph is closed.

dlutz2 · 2017-04-03T13:18:45Z

That is the behavior I expected but what I see with this code:

for (int i = 0; i < 10; i++) {
  System.out.println("Before " + + g.V().toList().size());
  Vertex v = graph.addVertex(label);
  System.out.println("After " +  + g.V().toList().size() + "\n");
}

Is that the size of the in memory graph is always 1 (after the first addVertex).
If I commit after the loop, 10 instances show up in Neo.
If I run it a second time, the loop is always 11 (after the first addVertex), 20 instance show up in Neo.
The behavior with commits is what I expect, its the behavior of the in memory graph between instantiation and commits which is confusing me and which differs from the Tinkergraph and Neo4j embedded.
Am I doing something embarrassingly wrong here?

dlutz2 · 2017-04-06T17:45:39Z

Forgot to mention I was using the Neo4JNativeElementIdProvider. Poking around in the code, I see that
Neo4JSession.addVertex() creates the vertex (new Neo4JVertex(...) ) and then registers the vertex in Neo4JSession.vertices:

vertices.put(vertex.id(), vertex);

but using a Neo4JNativeElementIdProvider that vertex id will always be null at this point (since it hasn't been committed yet), so every new vertex overwrites the preceding ones in the vertices Map. I also see that the vertex is add to transientVertices but I can't tell if transient vertices participate in any use of Traversals. These transient vertices are then committed. This is consistent with the behavior I see where only the last vertex added is seen in the in-memory graph but all vertices get committed. Also consistent with the observed behavior that previously committed vertices (which do have Ids) are seen in the graph on subsequent runs.

rjbaucells · 2017-04-06T19:46:35Z

Yes, I was aware of that problem and I was working on a solution for the native id provider. I have a local branch with the fix but I still need to work on the unit tests, some of the Tinkerpop tests are failing after the fix.

dlutz2 · 2017-04-06T19:51:55Z

Thanks very much.

fenallen · 2017-04-07T10:28:53Z

Just found this after much head scratching. I am falling foul of exactly the same issue.
Looking forward to the fix.

rjbaucells · 2017-04-07T16:16:03Z

I have just released v0.2.23 with the fix for this bug. Wait for the release to be available on Maven Central

dlutz2 · 2017-04-07T16:22:31Z

Excellent. Thanks for the quick response. I'll try it out as soon as it appears

dlutz2 · 2017-04-14T16:28:40Z

Tried this out. Although it fixed the issue with adding vertices to the graph, newly created (uncommitted) vertices still have null ids, which causes null pointer exceptions when used in a traversal, e.g.

 Vertex v = graph.addVertex(label);
  g.V(v.id()).toList(); // should find newly created vertex

I think the underlying issue is that ids have to be assigned at vertex creation. Would you like me to open a new issue for this?

rjbaucells · 2017-04-14T18:29:24Z

This is the behavior for the native id provider, I do not think there is much we can do in this case. IDs will be generated on the database after a CREATE statement is executed, something we can only do once the API consumer explicitly commit the transaction. Trying to execute CREATE statements without knowing all required fields have been set is not possible for the moment.

dlutz2 · 2017-04-17T13:39:24Z

Is this due to a difference in behavior/capability between the Bolt driver API and the embedded Neo4J API? The embedded implementation of Tinkerpop handles creates and transactions independently i.e. it creates a Neo4j vertex on graph.AddVertex() which may or may not get committed later.

rjbaucells · 2017-04-17T14:11:41Z

Embedded driver will create an actual Element on addVertex() so it will be able to assign an identifier at that time, BOLT driver requires a CREATE statement execution (database is on the other side of the network) and an explicit RETURN clause to retrieve the identifier. We cannot issue the CREATE statement until all required properties are populated, action that is indicated by an explicit commit call from the API consumer. Change the ID provider to one that allocates the identifier on node creation by implementing Neo4JElementIdProvider.generate()

rjbaucells added the bug label Apr 7, 2017

rjbaucells mentioned this issue Apr 7, 2017

transient element index #55

Merged

rjbaucells closed this as completed in #55 Apr 7, 2017

dlutz2 mentioned this issue Apr 21, 2017

Plans for a Bolt-based api impl? neo4j-contrib/neo4j-tinkerpop-api-impl#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different behavior between Tinkergraph, Neo4j (embedded) and neo4j-gremlin-bolt #52

Different behavior between Tinkergraph, Neo4j (embedded) and neo4j-gremlin-bolt #52

dlutz2 commented Mar 31, 2017 •

edited

rjbaucells commented Mar 31, 2017

dlutz2 commented Mar 31, 2017 •

edited

rjbaucells commented Mar 31, 2017

dlutz2 commented Apr 3, 2017 •

edited

dlutz2 commented Apr 6, 2017

rjbaucells commented Apr 6, 2017

dlutz2 commented Apr 6, 2017

fenallen commented Apr 7, 2017

rjbaucells commented Apr 7, 2017

dlutz2 commented Apr 7, 2017

dlutz2 commented Apr 14, 2017

rjbaucells commented Apr 14, 2017

dlutz2 commented Apr 17, 2017

rjbaucells commented Apr 17, 2017

Different behavior between Tinkergraph, Neo4j (embedded) and neo4j-gremlin-bolt #52

Different behavior between Tinkergraph, Neo4j (embedded) and neo4j-gremlin-bolt #52

Comments

dlutz2 commented Mar 31, 2017 • edited

rjbaucells commented Mar 31, 2017

dlutz2 commented Mar 31, 2017 • edited

rjbaucells commented Mar 31, 2017

dlutz2 commented Apr 3, 2017 • edited

dlutz2 commented Apr 6, 2017

rjbaucells commented Apr 6, 2017

dlutz2 commented Apr 6, 2017

fenallen commented Apr 7, 2017

rjbaucells commented Apr 7, 2017

dlutz2 commented Apr 7, 2017

dlutz2 commented Apr 14, 2017

rjbaucells commented Apr 14, 2017

dlutz2 commented Apr 17, 2017

rjbaucells commented Apr 17, 2017

dlutz2 commented Mar 31, 2017 •

edited

dlutz2 commented Mar 31, 2017 •

edited

dlutz2 commented Apr 3, 2017 •

edited