Skip to content

Remove null values to fix TINKERPOP-1923#818

Closed
jbaker-nca wants to merge 1 commit intoapache:tp32from
jbaker-nca:TINKERPOP-1923
Closed

Remove null values to fix TINKERPOP-1923#818
jbaker-nca wants to merge 1 commit intoapache:tp32from
jbaker-nca:TINKERPOP-1923

Conversation

@jbaker-nca
Copy link
Copy Markdown

No description provided.

@spmallette
Copy link
Copy Markdown
Contributor

Thanks for looking into this issue. I saw your issue in JIRA and your SO question, but just haven't had a chance to really dig into it. I can see how your solution would get rid of the NullPointerException, but do you understand the root cause of why this happens in the first place? Could you perhaps add your reasoning as to why this is the right fix in that context in the description of this pull request?

Also, if this is the right fix, could you get rid of the use of stream() in favor of TinkerPop's IteratorUtils.filter() - we've found stream() to sadly be far less performant.

Finally, could you please retarget this PR to the tp32 branch as I think this fix should be applied across all currently maintained versions? Note that we have a release coming in the next two weeks so this PR arrived just in time - thanks again.

@jbaker-nca jbaker-nca changed the base branch from master to tp32 March 16, 2018 11:18
@jbaker-nca
Copy link
Copy Markdown
Author

Hi @spmallette, I've made the changes requested.

My understanding is that in some cases, when removing nodes and/or edges, the collection of edges on a vertex can end up containing null values (I believe they should be properly removed rather than replaced with nulls, but my knowledge of the Tinkerpop codebase is very limited so I wasn't sure why this would be happening). These null values were causing issues when looping across the collection, which was resulting in the exception thrown.

@robertdale
Copy link
Copy Markdown
Member

@jbaker-nca Is there a test case that would fail without this fix so it can catch regressions?

@spmallette
Copy link
Copy Markdown
Contributor

when removing nodes and/or edges, the collection of edges on a vertex can end up containing null values

right, i think i followed that part - i was asking more about the "why" nulls were showing up in the first place, but it sounds like you hadn't gotten that far (not a problem - i understand that you're new to the code base). I think we need to sort out the "why" before we merge a "fix" or else that fix might not be the right one.

@rcaunt
Copy link
Copy Markdown

rcaunt commented Mar 22, 2018

@spmallette I've been working with @jbaker-nca to understand this issue a bit more.

Our use case for discovering this bug was essentially merging two separate graphs into a single graph.

When two vertices are to be merged we copy the edges from the original vertex to a new one and then proceed to remove the edges on that vertex using the following:

Object[] edgeIdsArr = edgeIds.toArray();	
if(edgeIdsArr.length > 0)	
    graph.traversal().E(edgeIdsArr).drop().iterate();

Once the vertices are merged we then remove the original vertices from the graph:

Object[] ids = mergeGroup.stream().map(Element::id).filter(Objects::nonNull).toArray();
if(ids != null && ids.length > 0)
    graph.traversal().V(ids).drop().iterate();

This second traversal on vertex removal is where the error would occur yielding the following exception:

Exception in thread "main" java.lang.NegativeArraySizeException
        at java.util.AbstractCollection.toArray(Unknown Source)
        at java.util.ArrayList.addAll(Unknown Source)
        at java.lang.Iterable.forEach(Unknown Source)
        at org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerHelper.getEdges(TinkerHelper.java:173)
        at org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex.edges(TinkerVertex.java:146)
        at org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex.remove(TinkerVertex.java:131)
        at org.apache.tinkerpop.gremlin.process.traversal.step.filter.DropStep.filter(DropStep.java:67)
        at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:38)
        at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:128)
        at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:38)
        at org.apache.tinkerpop.gremlin.process.traversal.Traversal.iterate(Traversal.java:198)
        at org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal.iterate(GraphTraversal.java:2677)
        at org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal$Admin.iterate(GraphTraversal.java:176)
        at org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.DefaultGraphTraversal.iterate(DefaultGraphTraversal.java:48)

I noticed the following Google Groups post - https://groups.google.com/forum/#!topic/gremlin-users/CGkuoBkNMGI - which indicates that when removing a vertex the edges should be removed at the same time. Therefore I modified our code to no longer remove the edges and only remove the vertices.

This appears to have fixed our issue and we have reverted back to Tinkerpop 3.3.1 without any further issues. However, I suspect that one should be able to remove edges and then vertices without encountering the issue.

@spmallette
Copy link
Copy Markdown
Contributor

@rcaunt how did you do this part:

copy the edges from the original vertex to a new one

@jbaker-nca
Copy link
Copy Markdown
Author

The copying code is as follows (I've simplified a little to make it clearer what's happening, as it's actually part of a bigger chunk code):

        List<Edge> newEdges = new ArrayList<>();
        //Copy edges from orig onto onto v
        for (Edge eOrig : graph.traversal().V(orig.id()).inE().toSet()) {
           if (newEdges.contains(eOrig))
                continue;

            LOGGER.debug("Copying IN edge {}, from {}", eOrig.id(), eOrig.outVertex().id());
            Vertex vOut = eOrig.outVertex();

            Edge e = vOut.addEdge(eOrig.label(), v);
            ElementUtils.copyProperties(eOrig, e);

            LOGGER.debug("Recording new edge {}", eOrig.id());
            newEdges.add(e);
        }

        for (Edge eOrig : graph.traversal().V(orig.id()).outE().toSet()) {
            if (newEdges.contains(eOrig))
                continue;

            LOGGER.debug("Copying OUT edge {}, to {}", eOrig.id(), eOrig.inVertex().id());
            Vertex vIn = eOrig.inVertex();

            Edge e = v.addEdge(eOrig.label(), vIn);
            ElementUtils.copyProperties(eOrig, e);

            LOGGER.debug("Recording new edge {}", eOrig.id());
            newEdges.add(e);
        }

        LOGGER.debug("{} edges copied", newEdges.size());

@spmallette
Copy link
Copy Markdown
Contributor

I think I'm going to have to close this because I can't recreate the problem with the current information supplied and I don't want to merge this PR as-is because it seems like it's just making the issue go away without actually addressing the root of the issue. As there is a workaround (i.e. drop the vertex which drops the edges with it) I don't think we have a bad blocker here. If the issue arises in the future and can be recreated, I'd be happy to re-open this.

I still feel like there's something multi-threaded at the root of this somehow, but I can't think of how since you've confirmed that you don't have anything like that going on. If the issue does stem from multiple threads modifying some aspect of the TinkerGraph at the same time then this problem isn't an issue as TinkerGraph is not meant to be thread-safe for mutations.

@asfgit asfgit closed this in da91a53 Apr 25, 2018
@zhulik
Copy link
Copy Markdown

zhulik commented Jun 13, 2022

The issue is still there and happens very often with the tinkerpop/gremlin-server:3.5 docker image:

project-gremlin_server-1  | [WARN] TraversalOpProcessor - Exception processing a Traversal on iteration for request [6564f5d1-73f7-44d5-ad1d-97bb2aadd582].
project-gremlin_server-1  | java.lang.NegativeArraySizeException: -1
project-gremlin_server-1  |      at java.base/java.util.AbstractCollection.toArray(AbstractCollection.java:139)
project-gremlin_server-1  |      at java.base/java.util.ArrayList.addAll(ArrayList.java:702)
project-gremlin_server-1  |      at java.base/java.util.HashMap$Values.forEach(HashMap.java:977)
project-gremlin_server-1  |      at org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerHelper.getEdges(TinkerHelper.java:172)
project-gremlin_server-1  |      at org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex.edges(TinkerVertex.java:158)
project-gremlin_server-1  |      at org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex.remove(TinkerVertex.java:143)
project-gremlin_server-1  |      at org.apache.tinkerpop.gremlin.process.traversal.step.filter.DropStep.filter(DropStep.java:68)
project-gremlin_server-1  |      at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:38)
project-gremlin_server-1  |      at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
project-gremlin_server-1  |      at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
project-gremlin_server-1  |      at org.apache.tinkerpop.gremlin.process.traversal.step.filter.FilterStep.processNextStart(FilterStep.java:37)
project-gremlin_server-1  |      at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
project-gremlin_server-1  |      at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
project-gremlin_server-1  |      at org.apache.tinkerpop.gremlin.process.traversal.step.map.ScalarMapStep.processNextStart(ScalarMapStep.java:39)
project-gremlin_server-1  |      at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
project-gremlin_server-1  |      at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:226)
project-gremlin_server-1  |      at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.fillBulker(TraverserIterator.java:69)
project-gremlin_server-1  |      at org.apache.tinkerpop.gremlin.server.util.TraverserIterator.hasNext(TraverserIterator.java:56)
project-gremlin_server-1  |      at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.handleIterator(TraversalOpProcessor.java:302)
project-gremlin_server-1  |      at org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor.lambda$iterateBytecodeTraversal$0(TraversalOpProcessor.java:222)
project-gremlin_server-1  |      at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
project-gremlin_server-1  |      at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
project-gremlin_server-1  |      at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
project-gremlin_server-1  |      at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
project-gremlin_server-1  |      at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
project-gremlin_server-1  |      at java.base/java.lang.Thread.run(Thread.java:829)

I use gremlin-server as a replacement for AWS Neptune locally and on CI. Main use case - create tens of thousands of vertices and nodes(in parallel from multiple processes), run some queries, drop all the data with g.V().drop(). The error occurs even if I delete edges with g.E().drop() first.

The script that imports the data heavily relies on the upsert pattern, multiple processes may try to upsert same vertices and edges in parallel.

After the first error, I'm able to drop all vertices one by one except the one which probably caused the error. Attempts to drop it lead to the same error. That vertex was created with an upsert. Server has to be restarted to act normally.

From my experience the error occurs more often if I use more parallel processes to import my data.

@zhulik
Copy link
Copy Markdown

zhulik commented Jun 13, 2022

I prepared a script that 100% reproduces the issue: https://github.com/zhulik/gremlin_server_error_demo

@zhulik
Copy link
Copy Markdown

zhulik commented Jun 14, 2022

Screenshot_20220614_174204

Well, wrapping lines that may cause the issue in a try...catch block fixes it 😆

@spmallette
Copy link
Copy Markdown
Contributor

@zhulik This appears to be a TinkerGraph problem. If the issue occurs when doing parallel mutations to the graph then that likely expected given how TinkerGraph is designed. It is not really safe for multi-threaded writes (multi-threaded reads should typically not be a problem). I'd recommend that if you use Gremlin Server for testing that you switch from TinkerGraph to something more robust like neo4j, bitsy, janusgraph, etc. - i.e. a graph more suited to handling multi-threading.

@zhulik
Copy link
Copy Markdown

zhulik commented Jun 16, 2022

@spmallette well, I've just tested neo4j, bitsy and janusgraph and they don't fit: I need user provided identifiers and I need identifiers to be strings. From what I see only Neptune and Tinkergraph have full support for them

@spmallette
Copy link
Copy Markdown
Contributor

even TinkerGraph isn't a good match as it doesn't support multi-request transactions, but perhaps you aren't using those. i'm not sure what else to advise at the moment i'm afraid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants