Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TINKERPOP-1822: Change default RepeatStep to DFS #838

Closed
wants to merge 1 commit into from

Conversation

krlohnes
Copy link
Contributor

No description provided.

@krlohnes
Copy link
Contributor Author

I picked up where #715 left off. I'm getting some errors around

Class is not registered: org.apache.tinkerpop.gremlin.process.traversal.SearchAlgo
Note: To register this class use: kryo.register(org.apache.tinkerpop.gremlin.process.traversal.SearchAlgo.class);

when I run the tests.

I'm not sure where I need to register that SearchAlgo enum, if someone could point me in the right direction for that. I'm still working through getting the tests running, but I wanted to get some feedback on this. It works with ./bin/gremlin.sh just fine, I think it's test configuration at this point. I haven't implemented this for the computer algorithm yet either, so I don't know if that will impact the test runs.

I wanted to get some feedback on this before pushing forward with figuring out all of the tests.

@mpollmeier This is what I came up with since I last commented on your PR.

Copy link
Contributor

@mpollmeier mpollmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! This works on the console, I've also tested my previous test case (see below).

As discussed on https://groups.google.com/forum/#!topic/gremlin-users/gLSLxH_K-wE I actually think we should make DFS the default for OLTP, since that's clearly what the user would expect and the docs suggest.

I may be able to help with the tests, will have a look over the weekend. Looking forward to comments from others.

IMO this is a more obvious test:
Example graph: v5 <- v3 <- v1 <- v0 -> v2 -> v4 -> v6

graph = TinkerGraph.open()
v0 = graph.addVertex("l0")
v1 = graph.addVertex("l1")
v2 = graph.addVertex("l1")
v3 = graph.addVertex("l2")
v4 = graph.addVertex("l2")
v5 = graph.addVertex("l3")
v6 = graph.addVertex("l3")
v0.addEdge("e", v2)
v2.addEdge("e", v4)
v4.addEdge("e", v6)
v0.addEdge("e", v1)
v1.addEdge("e", v3)
v3.addEdge("e", v5)

graph.traversal().V(v0).emit().repeat(sideEffect{println("inside repeat at " + it)}.out()).times(3)

==>v[0]
inside repeat at v[0]
==>v[2]
inside repeat at v[2]
==>v[4]
inside repeat at v[4]
==>v[6]
==>v[1]
inside repeat at v[1]
==>v[3]
inside repeat at v[3]
==>v[5]

* @param algo either DFS or BFS
*
* @see <a href="http://tinkerpop.apache.org/docs/${project.version}/reference/#order-step" target="_blank">Reference Documentation - Order Step</a>
* @since 3.0.0-incubating
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@since 3.3.3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


@Override
public boolean hasNext() {
return super.hasNext() || this.stashedStarts.peek() != null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.stashedStarts.isEmpty()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.stashedStarts.peek() != null would be equivalent to !this.stashedStarts.isEmpty(), no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed that.

@krlohnes krlohnes force-pushed the add_optional_depth_first branch 3 times, most recently from 861dbfd to 27111dc Compare April 13, 2018 14:12
@krlohnes
Copy link
Contributor Author

krlohnes commented Apr 13, 2018

I figured out the errors that I was getting with serialization. I think I just missed registering SearchAlgo in a few spots. I have some legitimate test failures now I think, which makes fixing things quite a bit easier, I can take a swing at those over the weekend. @mpollmeier

I do like the test you're suggesting, I was trying to come up with a test case for an existing dataset in the test infrastructure to make sure things worked there for now. I think testing on a k-ary tree might be worth while too. A lot of bugs I found when I was testing the general idea were found in that case.

As far as making DFS the default, my only concern would be breaking existing traversals. I don't know if there are traversals out there that depend on repeat being BFS that would need to be fixed with a change for the default. I'd think there would be a desire to not change that behavior default until the next minor version bump. I'd want to give users a bit of time to know this exists, and make the switch after there's some awareness of the option to toggle between the two.

@krlohnes
Copy link
Contributor Author

I'm definitely still missing something around serialization. All of the traversals that are failing in the gryo side of things work in the console. I'm fairly unfamiliar with that part of the code, but I'll look a bit this weekend if I have some time. If someone more familiar with it sees something obvious I've missed, let me know and I'm happy to add it.

@robertdale
Copy link
Member

Look at the Serializers in ./gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/

@mpollmeier
Copy link
Contributor

If I run mvn clean test I also get heaps of compilation errors, but that's the same with master for me. E.g.

[ERROR]   symbol:   class JsonGenerator
[ERROR]   location: class org.apache.tinkerpop.gremlin.structure.io.graphson.TraversalSerializersV3d0.TraversalJacksonSerializer
[ERROR] /home/mp/Projects/tinkerpop/tinkerpop3/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphson/TraversalSerializersV3d0.java:[74,99] cannot find symbol
[ERROR]   symbol:   class SerializerProvider
[ERROR]   location: class org.apache.tinkerpop.gremlin.structure.io.graphson.TraversalSerializersV3d0.TraversalJacksonSerializer
[ERROR] -> [Help 1]

Is that the same you're getting @krlohnes?
If you just run mvn clean install as per http://tinkerpop.apache.org/docs/current/dev/developer/#building-testing it compiles fine (for me) and also runs tests which all pass, but then it stalls at TinkerGraphProcessComputerTest.

Travis reports four failed builds. Two of them time out, no idea what's going on there. The other two fail for a missing integration into .net and JS. Not really my cup of tea...

Note: I also had to comment out <revapi.skip>false</revapi.skip> in gremlin-core/pom.xml. No idea what that is, but it runs into a NPE for me.

@robertdale how do you normally run the tests, and how can we fix the .net/JS stuff?

@krlohnes
Copy link
Contributor Author

@mpollmeier I was running mvn clean install locally, and had the same issue with that test stalling out. A few builds before this had different errors in Travis, not just stalling out, so it looks like some of the changes i pushed EOD took care of those.

I can take care of the js issue and the .net stuff.

I'm not sure about the TinkerGraphProcessComputerTest stalling out though or the Travis stuff stalling out.

I'll take a swing at the js and .net stuff and see how that plays out for now.

@spmallette
Copy link
Contributor

Sorry, but I haven't had time to dig into this into any great detail. Hopefully, sometime this week. Here's some quick comments based on the discussion:

I actually think we should make DFS the default for OLTP
I'd want to give users a bit of time to know this exists, and make the switch after there's some awareness of the option to toggle between the two.

That's a topic that is probably worth a DISCUSS thread on the dev mailing list. I typically reference threads like that as a link in the associated JIRA.

mvn clean test

That should work, but typically only after mvn clean install of a new branch. gremlin-shaded and hadoop-gremlin typically cause that problem, as a result, you really can only do mvn clean install at least until 3.4.x when i get some changes in (maybe...hopefully).

@spmallette
Copy link
Contributor

I'll take a swing at the js and .net stuff and see how that plays out for now.

i meant to mention this in my last post but forgot....just a suggestion but you might want to wait a bit for feedback before forging ahead too much further. maybe let that DISCUSS thread you started play its course a bit first. already some questions popping up over there and i know i have some myself. don't want you to have to do more work that is necessary on this in case something has to change.

@krlohnes
Copy link
Contributor Author

Thanks @spmallette I'll hold off on any more changes until things are worked out with that DISCUSS thread

@krlohnes krlohnes force-pushed the add_optional_depth_first branch 4 times, most recently from b75885c to 64a7a60 Compare May 20, 2018 20:33
@krlohnes krlohnes changed the title TINKERPOP-1822: Add Depth First Search repeat step option TINKERPOP-1822: Change default RepeatStep to DFS May 20, 2018
final Traverser.Admin<S> start = this.starts.next();
final Traverser.Admin<S> start = nextStart(repeatStep, useDfs);
if (useDfs) {
final List<Traverser.Admin<S>> localStarts = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason you need to allocate a List here? could you not write directly to stashedStarts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this could be done with an index and writing directly to stashed starts, definitely. will fix.

@@ -273,11 +300,40 @@ public RepeatEndStep(final Traversal.Admin traversal) {
super(traversal);
}

final LinkedList<Traverser.Admin<S>> stashedStarts = new LinkedList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you given any thought to the memory requirements of stashedStarts? Seems like that approach could be really intensive for a large graph (i.e. a traversal that touches a lot of data)? any thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, things have gotten fairly busy for me. I'd given a little bit of thought around this, but not a whole lot. I don't think this needs to be a linked list, it could likely an ArrayList or an ArrayDeque and the entries would take up less memory. ArrayDeque may be preferable for large graphs since it'll resize less frequently than an ArrayList, but it will consume more memory than an ArrayList by default.

However, I think having a stashedStarts here is necessary from the view of "one piece of code serving 2 algorithms" If this were to be completely rewritten as DFS (I don't know how to do that at this point, but for the sake of argument), and there was a desire to use the same code to utilize BFS, there'd likely be a need to have stashed starts to achieve BFS.

That's about as far as I've gotten in thinking about this for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An ArrayList sounds reasonable to me as far as the right List implementation for how stashedStarts is being used. When I'd posed the question I was more thinking about the more general increased memory requirements for doing DFS in this fashion as we basically have to accumulate what could be a fairly large list in memory in order to perform this operation. I suppose that we do such things in fold()-ing steps but the user is explicitly aware of their choice to do that when they use such a step. In the case of stashedStarts the memory requirement for choosing DFS is somewhat hidden as it's not clear from the Gremlin they've written that an internal list is being built. Perhaps that's simply a side-effect of allowing this to work in the way that it does (as you alluded to in your comment).

I'm still thinking that DFS will be something that users will invoke in specific use cases where they will be more aware of the consequences of what it is that they are doing. If that is the case, then this would perhaps just be one more consequence of making that choice to consider.

I'd be curious to see some JFRs around the different modes of execution that we now have. Perhaps some microbenchmarks are in order too. And then the fun part....does OLAP still work without any change? 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spmallette Any thoughts on what might be some decent data/traversals for the JFRs/microbenchmarks around this?

As far as OLAP goes, what are the expectations there? I haven't made any code changes to the computer algorithm yet since I'm not terribly familiar with that side of TinkerPop, so I think things should still be working working as before. I guess the question here is code changes or documentation changes when it comes to OLAP?

I'm hopefully going to spend some time this weekend or the next around this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on what might be some decent data/traversals for the JFRs/microbenchmarks around this?

Maybe just start with the Grateful Dead dataset? I think it might be sufficiently complex to yield a good test of the different approaches we have now. If not, maybe we need to generate something artificial.

Personally, I'd love to see a JFR that executes the same traversal with each of the three configurations that we now have with a Thread.sleep() between them so that we can easily distinguish when one traversal stops and the next starts. Not sure what the traversal (or traversals) needs to be - I guess I'd just like to easily compare what happens from a processing/memory perspective with each of the configurations we've talked about and then true that up with the expectations that we have regarding each configuration that we have.

As far as OLAP goes, what are the expectations there?

I was just curious if it all the tests still pass there or not. I'd assume so given that you didn't make changes there, but I just wanted to be sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spmallette Those results are definitely...interesting to say the least. I think the tests themselves are reasonable, though, as a comment, I'm not typically using a repeat that's going to be able to utilize the RepeatUnrollStrategy. At least not for what I originally started investigating this for.

That said, I took a step back and worked on performance between the BFS and DFS, and have gotten them much closer. On my local machine that BFS test was returning 889 from the counter. With the latest commit I added, DFS is returning 758. That's obviously not coming close to the default "let the strategies do their thing" performance, but it's significantly better than the ops counts being in the teens for DFS. Given that I was expecting slightly degraded performance with DFS, I think this is in a much better place performance wise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this a bit more, it looks like I misread some of the test query results I had, and the new commit doesn't work to make the repeat step depth first, so ignore that last comment. I'm still working on trying to figure out a new approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to remove RepeatUnrollStrategy because it adds barriers occasionally as do other strategies - that really needs to be fixed as something separate.

https://issues.apache.org/jira/browse/TINKERPOP-2004

I think that if you had a specific use case in mind when you started doing this it would be cool if you did a performance test on that and shared your results if possible.

I also think that the queries in my tests weren't really presenting scenarios where someone would want to do DFS. I'm imaging that the only time DFS will be used is when the user is knowledgeable and has advanced understanding of their data to know that DFS will out perform the default. I assume that your specific use case was falling into that scenario. As i said, it would be nice to see that in action.

So, that said, it would be great to see DFS perform more quickly for that case I presented for the JFRs, but that may not be an explicit requirement. It may be more important to simply demonstrate that DFS has a use case where it can shine.

I didn't study the JFRs for too long as the performance struck me as a point of discussion first. If you have any thoughts to share on those specifically, that would also be cool. Thanks again!

@spmallette
Copy link
Contributor

@krlohnes not sure if you're still thinking about this one, but if you are and you get a chance, could you please rebase your branch to freshen things up a bit?

@krlohnes
Copy link
Contributor Author

@spmallette Yeah, I'm still giving this some thought. Just rebased.

@krlohnes krlohnes closed this Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants