Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different behavior between Tinkergraph, Neo4j (embedded) and neo4j-gremlin-bolt #52

Closed
dlutz2 opened this issue Mar 31, 2017 · 14 comments · Fixed by #55
Closed

Different behavior between Tinkergraph, Neo4j (embedded) and neo4j-gremlin-bolt #52

dlutz2 opened this issue Mar 31, 2017 · 14 comments · Fixed by #55
Labels

Comments

@dlutz2
Copy link

dlutz2 commented Mar 31, 2017

In trying to transition from using Neo4j as a local store to a remote store using neo4j-gremlin-bolt, we have noticed that neo4j-gremlin-bolt behaves differently from Tinkergraph and Neo4j (embedded) when creating vertices and these vertices becoming visible in the (in memory) graph. The attached code illustrates the difference.
NeoBoltTest.txt

The expected behavior is that at the end of the loop, the in-memory graph contains 10 vertices, which is what is seen with Tinkergraph and Neo4j (embedded). With neo4j-gremlin-bolt, the vertex added does not seem to be immediately visible in the graph. The vertices do appear in the db and a second run of the code pulls them back but otherwise the behavior is the same.,
I have tried various combinations of try-with-resource, commit() and close(). The only combination I have found that produces the expected behavior is to commit() and then close() the graph after every addVertex(), which seems odd (although there is an issue with Tinkerpop regarding the vagueness of the semantics of close(): https://issues.apache.org/jira/browse/TINKERPOP-789 )

Perhaps I have missed something?

@rjbaucells
Copy link
Contributor

The neo4j-gremlin-bolt driver does not send any CYPHER statements to the server (CREATE, MERGE) while the transaction is open, it does a flush on commit updating the server with the in-memory changes.

@dlutz2
Copy link
Author

dlutz2 commented Mar 31, 2017

I think the difference is in the in-memory behavior e.g. adding a vertex and then immediately checking to see if that vertex is there before doing a commit.
Does that mean that any in-memory changes are not reflected in the in-memory graph until they have been committed and somehow "brought back" in memory?

@rjbaucells
Copy link
Contributor

If you add a Vertex it will be in memory only until you commit. There is no need to fetch the vertices after commit, they will stay in memory until the Graph is closed.

@dlutz2
Copy link
Author

dlutz2 commented Apr 3, 2017

That is the behavior I expected but what I see with this code:

for (int i = 0; i < 10; i++) {
  System.out.println("Before " + + g.V().toList().size());
  Vertex v = graph.addVertex(label);
  System.out.println("After " +  + g.V().toList().size() + "\n");
}

Is that the size of the in memory graph is always 1 (after the first addVertex).
If I commit after the loop, 10 instances show up in Neo.
If I run it a second time, the loop is always 11 (after the first addVertex), 20 instance show up in Neo.
The behavior with commits is what I expect, its the behavior of the in memory graph between instantiation and commits which is confusing me and which differs from the Tinkergraph and Neo4j embedded.
Am I doing something embarrassingly wrong here?

@dlutz2
Copy link
Author

dlutz2 commented Apr 6, 2017

Forgot to mention I was using the Neo4JNativeElementIdProvider. Poking around in the code, I see that
Neo4JSession.addVertex() creates the vertex (new Neo4JVertex(...) ) and then registers the vertex in Neo4JSession.vertices:

vertices.put(vertex.id(), vertex);

but using a Neo4JNativeElementIdProvider that vertex id will always be null at this point (since it hasn't been committed yet), so every new vertex overwrites the preceding ones in the vertices Map. I also see that the vertex is add to transientVertices but I can't tell if transient vertices participate in any use of Traversals. These transient vertices are then committed. This is consistent with the behavior I see where only the last vertex added is seen in the in-memory graph but all vertices get committed. Also consistent with the observed behavior that previously committed vertices (which do have Ids) are seen in the graph on subsequent runs.

@rjbaucells
Copy link
Contributor

Yes, I was aware of that problem and I was working on a solution for the native id provider. I have a local branch with the fix but I still need to work on the unit tests, some of the Tinkerpop tests are failing after the fix.

@dlutz2
Copy link
Author

dlutz2 commented Apr 6, 2017

Thanks very much.

@fenallen
Copy link

fenallen commented Apr 7, 2017

Just found this after much head scratching. I am falling foul of exactly the same issue.
Looking forward to the fix.

@rjbaucells
Copy link
Contributor

I have just released v0.2.23 with the fix for this bug. Wait for the release to be available on Maven Central

@dlutz2
Copy link
Author

dlutz2 commented Apr 7, 2017

Excellent. Thanks for the quick response. I'll try it out as soon as it appears

@dlutz2
Copy link
Author

dlutz2 commented Apr 14, 2017

Tried this out. Although it fixed the issue with adding vertices to the graph, newly created (uncommitted) vertices still have null ids, which causes null pointer exceptions when used in a traversal, e.g.

 Vertex v = graph.addVertex(label);
  g.V(v.id()).toList(); // should find newly created vertex

I think the underlying issue is that ids have to be assigned at vertex creation. Would you like me to open a new issue for this?

@rjbaucells
Copy link
Contributor

This is the behavior for the native id provider, I do not think there is much we can do in this case. IDs will be generated on the database after a CREATE statement is executed, something we can only do once the API consumer explicitly commit the transaction. Trying to execute CREATE statements without knowing all required fields have been set is not possible for the moment.

@dlutz2
Copy link
Author

dlutz2 commented Apr 17, 2017

Is this due to a difference in behavior/capability between the Bolt driver API and the embedded Neo4J API? The embedded implementation of Tinkerpop handles creates and transactions independently i.e. it creates a Neo4j vertex on graph.AddVertex() which may or may not get committed later.

@rjbaucells
Copy link
Contributor

Embedded driver will create an actual Element on addVertex() so it will be able to assign an identifier at that time, BOLT driver requires a CREATE statement execution (database is on the other side of the network) and an explicit RETURN clause to retrieve the identifier. We cannot issue the CREATE statement until all required properties are populated, action that is indicated by an explicit commit call from the API consumer. Change the ID provider to one that allocates the identifier on node creation by implementing Neo4JElementIdProvider.generate()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants