You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
That JAR exists and I've successfully added it to the classpath in YARN-backed notebooks like the one where I see the error above; I don't know why it took so long, why it ultimately failed, or why sc was no longer in scope afterward.
The text was updated successfully, but these errors were encountered:
All my ~100 executors seemed to be failing to communicate with the driver in the above app; they all had stack traces like:
15/08/18 00:16:07 INFO util.Utils: Successfully started service 'driverPropsFetcher' on port 58410.
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:146)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:245)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:97)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:159)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:65)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
... 4 more
I have no idea why that would happen; I regularly run Spark apps via {adam,spark}-{submit,shell} on this YARN cluster with the same config params with no issues.
Update: I originally included a further repro case here, but had an incorrect path to my JAR in the :cp command; that typo was only manifested in the :cp command finishing and the JAR I wanted added not being on the classpath, which was confusing. Also, the :cp command started a second YARN app, re-ran a previous Spark job I'd run, then tore down the 2nd YARN app and started a 3rd, which seems like strange behavior to me.
That JAR exists and I've successfully added it to the classpath in YARN-backed notebooks like the one where I see the error above; I don't know why it took so long, why it ultimately failed, or why
sc
was no longer in scope afterward.The text was updated successfully, but these errors were encountered: