Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OLAP with Cassandra 3 input fails with Guava error #1159

Closed
pluradj opened this issue Jul 9, 2018 · 7 comments
Closed

OLAP with Cassandra 3 input fails with Guava error #1159

pluradj opened this issue Jul 9, 2018 · 7 comments

Comments

@pluradj
Copy link
Member

pluradj commented Jul 9, 2018

This was mentioned on janusgraph-users.

The scenario is running a SparkGraphComputer job on the master branch (0.3.0-SNAPSHOT) against Spark 2.2.0 cluster using the conf/hadoop-graph/read-cassandra-3.properties.

graph = GraphFactory.open('conf/hadoop-graph/read-cassandra-3.properties')
g = graph.traversal().withComputer(SparkGraphComputer)
g.V().count()

The error most readily seen from the Gremlin Console is:

java.lang.NoClassDefFoundError: Could not initialize class com.datastax.driver.core.Cluster

The NoClassDefFoundError seems to indicate that a jar file is missing, but it is misleading because looking more closely at the nested exceptions in the Spark worker:

java.lang.ExceptionInInitializerError
        at com.datastax.driver.core.Cluster.<clinit>(Cluster.java:68)
        at org.janusgraph.hadoop.formats.cassandra.CqlBridgeRecordReader.initialize(CqlBridgeRecordReader.java:127)
        at org.janusgraph.hadoop.formats.util.GiraphRecordReader.initialize(GiraphRecordReader.java:60)
        at org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:182)
        at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:179)
        at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:134)
        at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        ... snip ...
Caused by: com.datastax.driver.core.exceptions.DriverInternalError: Detected incompatible version of Guava in the classpath. You need 16.0.1 or higher.
        at com.datastax.driver.core.GuavaCompatibility.selectImplementation(GuavaCompatibility.java:138)
        at com.datastax.driver.core.GuavaCompatibility.<clinit>(GuavaCompatibility.java:52)
        ... 30 more

Trying to run with the other Cassandra input format:

graph = GraphFactory.open('conf/hadoop-graph/read-cassandra.properties')
g = graph.traversal().withComputer(SparkGraphComputer)
g.V().count()

Leads to this exception:

java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.createStarted()Lcom/google/common/base/Stopwatch;
	at org.janusgraph.graphdb.database.idassigner.StandardIDPool.waitForIDBlockGetter(StandardIDPool.java:136)
	at org.janusgraph.graphdb.database.idassigner.StandardIDPool.close(StandardIDPool.java:229)
	at org.janusgraph.graphdb.database.idassigner.VertexIDAssigner.close(VertexIDAssigner.java:140)
	at org.janusgraph.util.system.IOUtils.closeQuietly(IOUtils.java:63)
	at org.janusgraph.graphdb.database.StandardJanusGraph.closeInternal(StandardJanusGraph.java:237)
	at org.janusgraph.graphdb.database.StandardJanusGraph.close(StandardJanusGraph.java:197)
	at org.janusgraph.hadoop.formats.util.input.current.JanusGraphHadoopSetupImpl.close(JanusGraphHadoopSetupImpl.java:118)
	at org.janusgraph.hadoop.formats.util.JanusGraphVertexDeserializer.close(JanusGraphVertexDeserializer.java:221)
	at org.janusgraph.hadoop.formats.util.GiraphInputFormat$RefCountedCloseable.release(GiraphInputFormat.java:122)
	at org.janusgraph.hadoop.formats.util.GiraphRecordReader.close(GiraphRecordReader.java:105)
	at org.apache.spark.rdd.NewHadoopRDD$$anon$1.org$apache$spark$rdd$NewHadoopRDD$$anon$$close(NewHadoopRDD.scala:244)
	at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:219)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)

JanusGraph is packaging Guava 18.0, but Spark 2.2.0 has Guava 14.0.1. It seems that the Spark worker is picking up the wrong Guava.

@debasishdebs
Copy link
Contributor

@pluradj can you close out this issue? So I tried connecting to Spark cluster (2.2.0 standalone) using Janusgraph 3.0. It works now. I only needed to add "spark.executor.extraClassPath" as property the way I needed to add that to Janusgraph0.2.1.

And once connection were done, I was able to run OLAP queries on them.

@pluradj
Copy link
Member Author

pluradj commented Jul 17, 2018

Let's use this issue to update the documentation @debasishdebs

@hxcoder65
Copy link

@debasishdebs , can you share you spark-submit param details? I was not able to reproduce your result on SparkCluster 2.3.1 using Janusgraph 3.0.
@pluradj , I think it's too rush to close this issue without sufficient verification.

@hxcoder65
Copy link

@debasishdebs , any ideas? really need your detailed settings to find out how it worked.

@debasishdebs
Copy link
Contributor

@hxcoder65 : Few points from your comments, why are you using Spark 2.3.1 when JanusGraph 0.3.0 supports Spark 2.2.0?

I don't use any spark-submit, rather used Gremlin Console to execute OLAP queries. I don't have access to the VMs where I've the env setup (Spark 2.2.0 cluster, Cassandra Cluster, ES Cluster, JG Client) so I can't verify the same now, but as far as I remembered, can you try migrating to Spark 2.2.0 and follow the docs released with JanusGraph 0.3.0 to the point, and if that still doesn't work, can you share stacktrace why that is happening?

The docs are here!

@debasishdebs
Copy link
Contributor

@hxcoder65 : Any success with your task?

@hxcoder65
Copy link

hxcoder65 commented Oct 12, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants