Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Extend the Hadoop-Gremlin documentation #253

Open
FlorianHockmann opened this issue May 2, 2017 · 1 comment
Open

Suggestion: Extend the Hadoop-Gremlin documentation #253

FlorianHockmann opened this issue May 2, 2017 · 1 comment

Comments

@FlorianHockmann
Copy link
Member

The current documentation of JanusGraph about Hadoop-Gremlin is very short and doesn't really explain how to execute a vertex program on JanusGraph or how to execute traversals in OLAP mode. This led to many questions on the mailing list and on StackOverflow. It also doesn't help that there exist a variety of configuration options specific to Hadoop-Gremlin that aren't explained anywhere [1] and that the documentation is split between the JanusGraph and the TinkerPop docs, where the TinkerPop docs of course lack JanusGraph specific config options.

I would suggest that at least one minimal but complete example for the different use cases of Hadoop-Gremlin are added to the JanusGraph documentation:

  • Importing data into JanusGraph
  • Exporting data with JanusGraph
  • Executing OLAP traversals (a g.V().out().out().count() should be enough)
  • Executing a VertexProgram like PageRank
  • As soon as it's supported: Modifying the graph for example for repair jobs that are already mentioned in the docs

Another aspect concerning Hadoop-Gremlin is that HDFS is not required anymore with Spark in all cases since TinkerPop 3.1.1. A short explanation of when the HDFS can be omitted might also be a good idea here.

Apart from that, I also noticed that the JanusGraph compatibility matrix lacks information about the supported Hadoop and Spark versions. Of course, this is specific to TinkerPop, but wouldn't it make sense to also include the information in this matrix? Otherwise every user has to search in the TinkerPop documentation for this information and as far as I know, it is only provided in TinkerPop's changelog which is not really the first place a typical JanusGraph user will look into.

In short: I think that getting JanusGraph to work with Spark (or Giraph) is one of the most complicated aspects of JanusGraph and that a good documentation avoids that users stumble upon the same problems again and again.

[1] For example cassandra.input.predicate which led to some confusion in the Aurelius Google Group.

@pluradj
Copy link
Member

pluradj commented Jul 18, 2017

Low hanging fruit is to add an example OLAP properties file for HBase, similar to the one that's already there for Cassandra.

HadoopMarc covered it in his recent blog post http://yaaics.blogspot.nl/2017/07/configuring-janusgraph-for-spark-yarn.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants