Skip to content
This repository has been archived by the owner on Nov 8, 2018. It is now read-only.

Problem with running mnist.py example #48

Open
menon92 opened this issue Dec 14, 2017 · 2 comments
Open

Problem with running mnist.py example #48

menon92 opened this issue Dec 14, 2017 · 2 comments

Comments

@menon92
Copy link

menon92 commented Dec 14, 2017

I am trying to run mnist.py with standalone cluster mode. for this I set master = "local[*] to "spark://MY_MASTER_IP:7077" and I submit my task by following command

 spark-submit --master spark://192.168.1.101:7077 \
--num-executors 1\
--executor-memory 8G \
--executor-cores 2 \
--driver-memory 8G PATH_TO/mnist.py

but I get the following error

INFO DAGScheduler: ResultStage 1 (load at NativeMethodAccessorImpl.java:0) failed in 2.200 s due to Job aborted due to stage failure: Task 5 in stage 1.0 failed 4 times, most recent failure: Lost task 5.3 in stage 1.0 (TID 13, 192.168.1.102, executor 1): java.io.FileNotFoundException: File file:/home/semanticslab3/development/spark/spark-2.2.1-bin-hadoop2.7/bin/data/mnist_train.csv does not exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:174)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:105)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
	at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
	at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
	at scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:214)
	at scala.collection.AbstractIterator.aggregate(Iterator.scala:1336)
	at org.apache.spark.rdd.RDD$$anonfun$aggregate$1$$anonfun$22.apply(RDD.scala:1113)
	at org.apache.spark.rdd.RDD$$anonfun$aggregate$1$$anonfun$22.apply(RDD.scala:1113)
	at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2125)
	at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2125)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

     Driver stacktrace:
    17/12/14 16:29:32 INFO DAGScheduler: Job 1 failed: load at NativeMethodAccessorImpl.java:0, took   2.211758 s

With master = "local[*] it works nice but it use only on worker. I want to use to two of my worker

Thanks

@JoeriHermans
Copy link
Collaborator

Hi,

Did you modify the example script in any way? Besides changing the master address? Because the script is not intended to be submitted through spark-submit.

Joeri

@menon92
Copy link
Author

menon92 commented Dec 14, 2017

Hello @JoeriHermans ,

Thanks for you quick replay. I just add an import statement from pyspark.sql import SparkSession in mnist.py and changing the master address. nothing more then that.

you said, script is not intended to be submitted through spark-submit then why this code is running when I use local[*] as master address.

if the script is not intended to be submitted through spark-submitI then how can i run this mnist.py so that it run over two cluster node .

or It is not possible to run in cluster ?

Thanks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants