Skip to content
This repository has been archived by the owner on May 3, 2022. It is now read-only.

ClassNotFoundException Errors running the alpha release #200

Closed
thanigaiv opened this issue Mar 31, 2015 · 9 comments
Closed

ClassNotFoundException Errors running the alpha release #200

thanigaiv opened this issue Mar 31, 2015 · 9 comments

Comments

@thanigaiv
Copy link

When I run the ALS example using the alpha release, I get the below error. I get the same error with both downloaded jars and jars built from source. i used the following command to run the layer:

./run.sh --layer-jar oryx-batch-2.0.0-alpha-1.jar --conf example.conf

Exception in thread "main" java.lang.IllegalStateException: No valid com.cloudera.oryx.app.batch.mllib.als.ALSUpdate exists
at com.cloudera.oryx.common.lang.ClassUtils.loadClass(ClassUtils.java:43)
at com.cloudera.oryx.lambda.BatchLayer.loadUpdateInstance(BatchLayer.java:254)
at com.cloudera.oryx.lambda.BatchLayer.start(BatchLayer.java:168)
at com.cloudera.oryx.batch.Main.main(Main.java:34)
Caused by: java.lang.ClassNotFoundException: com.cloudera.oryx.app.batch.mllib.als.ALSUpdate
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at com.cloudera.oryx.common.lang.ClassUtils.loadClass(ClassUtils.java:40)

@srowen
Copy link
Member

srowen commented Mar 31, 2015

Hm, that should be inside the .jar file you have there. Is the compute-classpath.sh script there in the same dir? can you jar tf the jar file to see if this class exists inside? it should.

@thanigaiv
Copy link
Author

Found the issue - the problem was with the example.conf in https://github.com/OryxProject/oryx/blob/master/app/conf/als-example.conf

The update-class should be "com.cloudera.oryx.app.mllib.als.ALSUpdate" instead it is given as "com.cloudera.oryx.app.batch.mllib.als.ALSUpdate".

Changing the above config property fixed the issue.

@srowen
Copy link
Member

srowen commented Mar 31, 2015

Ah, thank you. Yes, this class was moved. The example shows the new location; it is in the old location in the release you have on your hands there. I should probably caveat that in the comments, at the least.

@thanigaiv
Copy link
Author

Now I get another error when the batch layer runs first time after the file is ingested:

2015-03-31 17:00:06,524 WARN TaskSetManager:71 Lost task 6.1 in stage 19.0 (TID 182, 1-p-d1hadoop09.art.com): FetchFailed(BlockManagerId(1, 1-p-d1hadoop09.art.com, 59700), shuffleId=5, mapId=0, reduceId=6, message=
org.apache.spark.shuffle.FetchFailedException: Error in opening FileSegmentManagedBuffer{file=/data/yarn/nm/usercache/root/appcache/application_1426812029369_0893/spark-local-20150331165623-6233/19/shuffle_5_0_0.data, offset=3437, length=532}
at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67)
at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:125)
at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:46)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:130)
at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:127)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:127)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:228)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Error in opening FileSegmentManagedBuffer{file=/data/yarn/nm/usercache/root/appcache/application_1426812029369_0893/spark-local-20150331165623-6233/19/shuffle_5_0_0.data, offset=3437, length=532}
at org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:113)
at org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$3.apply(ShuffleBlockFetcherIterator.scala:299)
at org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$3.apply(ShuffleBlockFetcherIterator.scala:299)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:299)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:53)
... 37 more
Caused by: java.io.FileNotFoundException: /data/yarn/nm/usercache/root/appcache/application_1426812029369_0893/spark-local-20150331165623-6233/19/shuffle_5_0_0.data (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:146)
at org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:98)
... 42 more

@srowen
Copy link
Member

srowen commented Apr 1, 2015

This happened in the past when using a hash-based shuffle and a large
shuffle occurred. However, Spark should be using a sort-based shuffle
in 1.2 by default, but this shows a hash-based shuffle. The app
doesn't change this, but it should require 1.2. How are you running
the app -- what Hadoop / Spark release?

On Wed, Apr 1, 2015 at 1:06 AM, Thanigai Vellore
notifications@github.com wrote:

Now I get another error when the batch layer runs first time after the file
is ingested:

2015-03-31 17:00:06,524 WARN TaskSetManager:71 Lost task 6.1 in stage 19.0
(TID 182, 1-p-d1hadoop09.art.com): FetchFailed(BlockManagerId(1,
1-p-d1hadoop09.art.com, 59700), shuffleId=5, mapId=0, reduceId=6, message=
org.apache.spark.shuffle.FetchFailedException: Error in opening
FileSegmentManagedBuffer{file=/data/yarn/nm/usercache/root/appcache/application_1426812029369_0893/spark-local-20150331165623-6233/19/shuffle_5_0_0.data,
offset=3437, length=532}
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67)
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at
org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:125)
at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
at
org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:46)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at
org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:130)
at
org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:127)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:127)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at
org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:228)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Error in opening
FileSegmentManagedBuffer{file=/data/yarn/nm/usercache/root/appcache/application_1426812029369_0893/spark-local-20150331165623-6233/19/shuffle_5_0_0.data,
offset=3437, length=532}
at
org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:113)
at
org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$3.apply(ShuffleBlockFetcherIterator.scala:299)
at
org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$3.apply(ShuffleBlockFetcherIterator.scala:299)
at scala.util.Try$.apply(Try.scala:161)
at
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:299)
at
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:53)
... 37 more
Caused by: java.io.FileNotFoundException:
/data/yarn/nm/usercache/root/appcache/application_1426812029369_0893/spark-local-20150331165623-6233/19/shuffle_5_0_0.data
(No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:146)
at
org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:98)
... 42 more


Reply to this email directly or view it on GitHub.

@thanigaiv
Copy link
Author

I'm running on CDH 5.3 which has Spark 1.2. Should I manually set spark.shuffle.manager to SORT?

@thanigaiv
Copy link
Author

Yes, adding the config setting worked - Thanks!

@smarthi smarthi closed this as completed Apr 1, 2015
@srowen
Copy link
Member

srowen commented Apr 1, 2015

Hm. I suppose I'm surprised it defaulted to hash, but maybe that was held
back until 1.3 to be conservative. OK, well yeah the sort shuffle has
proved reliable and has a big advantage in number of open files, so yes I
would use it. I can probably change the app to set it to 'sort' by default
anyway.

On Wed, Apr 1, 2015 at 6:38 PM, Thanigai Vellore notifications@github.com
wrote:

Yes, adding the config setting worked - Thanks!


Reply to this email directly or view it on GitHub
#200 (comment).

@linxixiong
Copy link

hi,I met the probelm too. set spark.shuffle.manager to SORT not work. spark1.2.1

org.apache.spark.shuffle.FetchFailedException: Error in opening FileSegmentManagedBuffer{file=/data2/hadoop/hd_space/tmp/nm-local-dir/usercache/linxixiong/appcache/application_1432021926412_9808/spark-82b1829f-7662-4480-82ca-9abf7cc6618d/10/shuffle_5_106_0.data, offset=17462106, length=2529501}
at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67)
at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:125)
at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:60)
at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:46)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:130)
at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:127)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:127)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
at org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
at org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
at org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:245)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.IOException: Error in opening FileSegmentManagedBuffer{file=/data2/hadoop/hd_space/tmp/nm-local-dir/usercache/linxixiong/appcache/application_1432021926412_9808/spark-82b1829f-7662-4480-82ca-9abf7cc6618d/10/shuffle_5_106_0.data, offset=17462106, length=2529501}
at org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:113)
at org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$3.apply(ShuffleBlockFetcherIterator.scala:299)
at org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$3.apply(ShuffleBlockFetcherIterator.scala:299)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:299)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:53)
... 37 more
Caused by: java.io.FileNotFoundException: /data2/hadoop/hd_space/tmp/nm-local-dir/usercache/linxixiong/appcache/application_1432021926412_9808/spark-82b1829f-7662-4480-82ca-9abf7cc6618d/10/shuffle_5_106_0.data (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:146)
at org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:98)
... 42 more

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants