You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current documentation of JanusGraph about Hadoop-Gremlin is very short and doesn't really explain how to execute a vertex program on JanusGraph or how to execute traversals in OLAP mode. This led to many questions on the mailing list and on StackOverflow. It also doesn't help that there exist a variety of configuration options specific to Hadoop-Gremlin that aren't explained anywhere [1] and that the documentation is split between the JanusGraph and the TinkerPop docs, where the TinkerPop docs of course lack JanusGraph specific config options.
I would suggest that at least one minimal but complete example for the different use cases of Hadoop-Gremlin are added to the JanusGraph documentation:
Importing data into JanusGraph
Exporting data with JanusGraph
Executing OLAP traversals (a g.V().out().out().count() should be enough)
Apart from that, I also noticed that the JanusGraph compatibility matrix lacks information about the supported Hadoop and Spark versions. Of course, this is specific to TinkerPop, but wouldn't it make sense to also include the information in this matrix? Otherwise every user has to search in the TinkerPop documentation for this information and as far as I know, it is only provided in TinkerPop's changelog which is not really the first place a typical JanusGraph user will look into.
In short: I think that getting JanusGraph to work with Spark (or Giraph) is one of the most complicated aspects of JanusGraph and that a good documentation avoids that users stumble upon the same problems again and again.
The current documentation of JanusGraph about Hadoop-Gremlin is very short and doesn't really explain how to execute a vertex program on JanusGraph or how to execute traversals in OLAP mode. This led to many questions on the mailing list and on StackOverflow. It also doesn't help that there exist a variety of configuration options specific to Hadoop-Gremlin that aren't explained anywhere [1] and that the documentation is split between the JanusGraph and the TinkerPop docs, where the TinkerPop docs of course lack JanusGraph specific config options.
I would suggest that at least one minimal but complete example for the different use cases of Hadoop-Gremlin are added to the JanusGraph documentation:
g.V().out().out().count()
should be enough)VertexProgram
likePageRank
Another aspect concerning Hadoop-Gremlin is that HDFS is not required anymore with Spark in all cases since TinkerPop 3.1.1. A short explanation of when the HDFS can be omitted might also be a good idea here.
Apart from that, I also noticed that the JanusGraph compatibility matrix lacks information about the supported Hadoop and Spark versions. Of course, this is specific to TinkerPop, but wouldn't it make sense to also include the information in this matrix? Otherwise every user has to search in the TinkerPop documentation for this information and as far as I know, it is only provided in TinkerPop's changelog which is not really the first place a typical JanusGraph user will look into.
In short: I think that getting JanusGraph to work with Spark (or Giraph) is one of the most complicated aspects of JanusGraph and that a good documentation avoids that users stumble upon the same problems again and again.
[1] For example
cassandra.input.predicate
which led to some confusion in the Aurelius Google Group.The text was updated successfully, but these errors were encountered: