Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TINKERPOP-1585 & TINKERPOP-1590: DedupGlobalStep and Distributed TinkerMemory #524

Merged
merged 16 commits into from
Jan 9, 2017

Conversation

okram
Copy link
Contributor

@okram okram commented Jan 6, 2017

https://issues.apache.org/jira/browse/TINKERPOP-1585
https://issues.apache.org/jira/browse/TINKERPOP-1590

This was a week long project with @dkuppitz. In short, we made DedupGlobalStep a bit better. It had a bug where it was deduping (in OLAP) twice at the master traversals. That is no longer and issue. Moreover, I added an explicit barrier data structure to reduce the amount of data movement between the master barrier and the master step. When working on this, I originally thought the bug was in TinkerMemory (TinkerGraphComputer) where there is lock contention for barrier data. There is! and I was able to seep up a toy traversal we had from 7 seconds to 2.9 seconds cause of it by having each TinkerWorker maintain its own memory data structure that is then merged into the master TinkerMemory when the iteration is complete. Standard, we just never thought to do this. Lots of other nick nack goodies...

* Fixed an optimization bug in OLAP-based `DedupGlobalStep` where deduping occurred twice.
* `MemoryComputeKey` now implements `Cloneable` which is useful for `BiOperator` reducers that maintain thread-unsafe state.
* `TinkerGraphComputer` now supports distributed `Memory` with lock-free partition aggregation.

VOTE +1.

… which is already dedup'd is added to the master traveral's dedup-step and then dedup'd again. I simple boolean flag added to determine if the master traversal's dedup is getting data from workers or not and thus, if so, don't dedup again.
…read before propagated Memory to TinkerMemory. This reduces synchronization issues due all threads contending to mutate the master memory.
… method). TinkerGraphComputer now how a sound distributed Memory system where each worker/thread aggregates without concurrency locally and then, at the end of the iteration, the thread-distributed memories are aggregated into the main TinkerMemory.
…ally important we have NOT been cloning the BiOperators of OrderGlobalStep and GroupStep. We have just been 'getting lucky' in that Spark and Giraph use Serialization and thus we get a clone for free. However, for parallelization within a JVM, we woulld have issues except we never realized because we had a single global Memory for TinkerGraph. Now we don't and clone()ing bi operators works.
…dkuppitz to verify performance improvement of DedupGlobalStep on SparkGraphComputer.
…r. Also, realized some dumb things I was doing in TraversalVertexProgram. Its crazy, for this benchmark that @dkuppitz and i have, if I don't touch vertex.properties that are compute keys: millisecond return times. If I do, seconds return times.... Need to figure out how to partition TinkerGraphView... Perhaps thread local..dah.
…ead has their own Iterator<Vertex> partition. In TinkerPop 3.3.0 (with Partitioner) this will be replaced (easily).
…the addition of worker barriers into the master traversal. A slight performance increase.
…rs are the next thing. Also, detachment factory stuff will help reduce barrier sizes. Found a bug in SparkInterceptorStrategyTest. Fixed.
this.barrier = null;
}
while (this.barrierIterator != null && this.barrierIterator.hasNext()) {
if (null == this.barrierIterator)
Copy link
Contributor

@dkuppitz dkuppitz Jan 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.barrierIterator can never be null within the while() loop. Unless I overlooked something fundamental, processNextStart can be simplified to:

protected Traverser.Admin<S> processNextStart() {
    if (null != this.barrier) {
        this.barrierIterator = this.barrier.entrySet().iterator();
        this.barrier = null;
        while (this.barrierIterator.hasNext()) {
            final Map.Entry<Object, Traverser.Admin<S>> entry = this.barrierIterator.next();
            if (this.duplicateSet.add(entry.getKey()))
                return PathProcessor.processTraverserPathLabels(entry.getValue(), this.keepLabels);
        }
    }
    return PathProcessor.processTraverserPathLabels(super.processNextStart(), this.keepLabels);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code fails. Did you run it? ... The problem is that this.barrier is null'd but the barrierIterator still exists and you still need to fetch results from it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't run it. Just thought we can get rid of the multiple null checks.

@@ -142,7 +142,7 @@ public void shouldSuccessfullyEvaluateInterceptedTraversals() throws Exception {
test(6l, g.V().out().values("name").count());
test(2l, g.V().out("knows").values("name").count());
test(3l, g.V().in().has("name", "marko").count());
test(6l, g.V().repeat(__.dedup()).times(2).count());
test(0l, g.V().repeat(__.dedup()).times(2).count());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did that change? 0 as the expected result looks kinda wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is crazy. The right answer is 0. Think about it. It goes through once, fine, all 6 vertices. The second time -- already seem them! Filter. Thus, 0.

However, I just checked OLTP and it assumes 6. Wondering what is "right" ?. ....

gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().repeat(dedup()).times(2)
==>v[1]
==>v[2]
==>v[3]
==>v[4]
==>v[5]
==>v[6]
gremlin> g.withComputer().V().repeat(dedup()).times(2)
gremlin>

Whatever, is decided as the correct answer, we should definitely put this into RepeatTest. I just randomly had this query in a Spark test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in IM. This is actually a bug in RepeatUnrollStrategy that was introduced in 3.2, but doesn't exist in 3.1. Added a test case to DedupTest and both OLTP and OLAP now behave as expected.

…ke DedupGlobalStep can not be unrolled else you have split the global state amongst times(x) local steps. Added a test case to DedupGlobalStep g.V().repeat(dedup()).times(2).
@dkuppitz
Copy link
Contributor

dkuppitz commented Jan 7, 2017

docker/build.sh -t -i succeeded.

VOTE: +1

@twilmes
Copy link
Contributor

twilmes commented Jan 8, 2017

Nice set of updates.

docker/build.sh -t -i success

VOTE: +1

@asfgit asfgit merged commit 3fd74fc into tp32 Jan 9, 2017
@asfgit asfgit deleted the TINKERPOP-1585 branch February 21, 2017 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants