Suggestion: Extend the Hadoop-Gremlin documentation #253

FlorianHockmann · 2017-05-02T17:11:24Z

The current documentation of JanusGraph about Hadoop-Gremlin is very short and doesn't really explain how to execute a vertex program on JanusGraph or how to execute traversals in OLAP mode. This led to many questions on the mailing list and on StackOverflow. It also doesn't help that there exist a variety of configuration options specific to Hadoop-Gremlin that aren't explained anywhere [1] and that the documentation is split between the JanusGraph and the TinkerPop docs, where the TinkerPop docs of course lack JanusGraph specific config options.

I would suggest that at least one minimal but complete example for the different use cases of Hadoop-Gremlin are added to the JanusGraph documentation:

Importing data into JanusGraph
Exporting data with JanusGraph
Executing OLAP traversals (a g.V().out().out().count() should be enough)
Executing a VertexProgram like PageRank
As soon as it's supported: Modifying the graph for example for repair jobs that are already mentioned in the docs

Another aspect concerning Hadoop-Gremlin is that HDFS is not required anymore with Spark in all cases since TinkerPop 3.1.1. A short explanation of when the HDFS can be omitted might also be a good idea here.

Apart from that, I also noticed that the JanusGraph compatibility matrix lacks information about the supported Hadoop and Spark versions. Of course, this is specific to TinkerPop, but wouldn't it make sense to also include the information in this matrix? Otherwise every user has to search in the TinkerPop documentation for this information and as far as I know, it is only provided in TinkerPop's changelog which is not really the first place a typical JanusGraph user will look into.

In short: I think that getting JanusGraph to work with Spark (or Giraph) is one of the most complicated aspects of JanusGraph and that a good documentation avoids that users stumble upon the same problems again and again.

[1] For example cassandra.input.predicate which led to some confusion in the Aurelius Google Group.

The text was updated successfully, but these errors were encountered:

pluradj · 2017-07-18T14:19:20Z

Low hanging fruit is to add an example OLAP properties file for HBase, similar to the one that's already there for Cassandra.

HadoopMarc covered it in his recent blog post http://yaaics.blogspot.nl/2017/07/configuring-janusgraph-for-spark-yarn.html

pluradj mentioned this issue May 9, 2017

Issues in sample bulk loader with HBase #268

Closed

sjudeng added the area/docs label Aug 13, 2017

pluradj mentioned this issue Nov 1, 2017

JanusGraph 0.2.0 Spark failed to connect to master #685

Closed

farodin91 added this to To do in Overhaul JanusGraph Documentation Dec 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: Extend the Hadoop-Gremlin documentation #253

Suggestion: Extend the Hadoop-Gremlin documentation #253

FlorianHockmann commented May 2, 2017

pluradj commented Jul 18, 2017

Suggestion: Extend the Hadoop-Gremlin documentation #253

Suggestion: Extend the Hadoop-Gremlin documentation #253

Comments

FlorianHockmann commented May 2, 2017

pluradj commented Jul 18, 2017