Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notebook (kernel) not starting up in yarn-client mode #735

Closed
raproth opened this issue Nov 18, 2016 · 2 comments
Closed

Notebook (kernel) not starting up in yarn-client mode #735

raproth opened this issue Nov 18, 2016 · 2 comments

Comments

@raproth
Copy link

raproth commented Nov 18, 2016

We use spark notebook 0.63 with spark 1.6.1.

Under certain cirumstances (which I cannot reproduce yet), our notebooks (running as yarn-client) won't start up.

In the notebook, it says Kernel starting, please wait.

In the logfile there are just these lines:

2016-11-18 10:19:22,182  INFO [main] (notebook.kernel.pfork.BetterFork$) - Remote process starting
2016-11-18 10:19:22,516  INFO [Remote-akka.actor.default-dispatcher-4] (akka.event.slf4j.Slf4jLogger) - Slf4jLogger started
2016-11-18 10:19:22,551  INFO [Remote-akka.actor.default-dispatcher-4] (Remoting) - Starting remoting
2016-11-18 10:19:22,674  INFO [Remote-akka.actor.default-dispatcher-2] (Remoting) - Remoting started; listening on addresses :[akka.tcp://Remote@127.0.0.1:34898]
2016-11-18 10:19:22,675  INFO [Remote-akka.actor.default-dispatcher-2] (Remoting) - Remoting now listens on addresses: [akka.tcp://Remote@127.0.0.1:34898]

In the log of the notebook server, I don't see anything suspicious:

16/11/18 10:19:21 INFO application: getNotebook at path mynotebook.snb
16/11/18 10:19:21 INFO application: Loading notebook at path mynotebook.snb

The process seems to be waiting for something, but I cannot figure out what it is (the driver machine is not under heavy load, there are plenty of free executors...).

Any hint what to do to solve this?

@kevinwkc
Copy link

I have the same issue:
[info] application - Closing websockets for kernel c1bdaf31-12cc-4b96-8fc0-2998d
34bd642
[info] application - Closing kernel c1bdaf31-12cc-4b96-8fc0-2998d34bd642
[info] application - UN-registering web-socket (notebook.server.WebSockWrapperIm
pl@3777af) in service notebook.server.CalcWebSocketService$CalcActor@173440c (cu
rrent count is 1)
[error] play - Cannot invoke the action, eventually got an error: java.util.NoSu
chElementException: key not found: c1bdaf31-12cc-4b96-8fc0-2998d34bd642
[error] application -

! @7334hk13d - Internal server error, for (POST) [/api/kernels/c1bdaf31-12cc-4b9
6-8fc0-2998d34bd642/restart] ->

play.api.Application$$anon$1: Execution exception[[NoSuchElementException: key n
ot found: c1bdaf31-12cc-4b96-8fc0-2998d34bd642]]
at play.api.Application$class.handleError(Application.scala:296) ~[com.t
ypesafe.play.play_2.10-2.3.10.jar:2.3.10]
at play.api.DefaultApplication.handleError(Application.scala:402) [com.t
ypesafe.play.play_2.10-2.3.10.jar:2.3.10]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3$$anonfun
$applyOrElse$4.apply(PlayDefaultUpstreamHandler.scala:320) [com.typesafe.play.pl
ay_2.10-2.3.10.jar:2.3.10]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$3$$anonfun
$applyOrElse$4.apply(PlayDefaultUpstreamHandler.scala:320) [com.typesafe.play.pl
ay_2.10-2.3.10.jar:2.3.10]
at scala.Option.map(Option.scala:145) [org.scala-lang.scala-library-2.10
.6.jar:na]
Caused by: java.util.NoSuchElementException: key not found: c1bdaf31-12cc-4b96-8
fc0-2998d34bd642
at scala.collection.MapLike$class.default(MapLike.scala:228) ~[org.scala
-lang.scala-library-2.10.6.jar:na]
at scala.collection.AbstractMap.default(Map.scala:58) ~[org.scala-lang.s
cala-library-2.10.6.jar:na]
at scala.collection.mutable.HashMap.apply(HashMap.scala:64) ~[org.scala-
lang.scala-library-2.10.6.jar:na]
at controllers.Application$$anonfun$9$$anonfun$apply$5.apply(Application
.scala:129) ~[nooostab.spark-notebook-0.7.0-scala-2.10.6-spark-2.0.1-hadoop-2.6.
0.jar:0.7.0-scala-2.10.6-spark-2.0.1-hadoop-2.6.0]
at controllers.Application$$anonfun$9$$anonfun$apply$5.apply(Application
.scala:128) ~[nooostab.spark-notebook-0.7.0-scala-2.10.6-spark-2.0.1-hadoop-2.6.
0.jar:0.7.0-scala-2.10.6-spark-2.0.1-hadoop-2.6.0]
[info] application - View notebook '\Flow Example.snb', presentation: 'None'
Java HotSpot(TM) Server VM warning: ignoring option MaxPermSize=1073741824; supp
ort was removed in 8.0
[info] application - View notebook '\Flow Example.snb', presentation: 'None'
[debug] application - Termination of op calculator

@vidma
Copy link
Contributor

vidma commented Feb 12, 2017

we used spark-notebook (SN) with YARN without any problems. but it was yarn-client model, in which driver runs ON THE SAME MACHINE as spark-notebook - so it has to have all yarn/hadoop configs available in PATH.
P.S. and in production SN server should have plenty of RAM for the driver (128+ GB preferably, if many people using simultaneously).

I can't comment about yarn-cluster mode, as I personally didn't test in yarn cluster mode (not worth it, harder to debug/view logs, kill applications, etc).

here's subset of sample config:

export HADOOP_USER_NAME=someuser
# optional depending on network configuration - executors MUST reach driver (i.e. SN)
# export SPARK_LOCAL_IP=x.x.x.x # needed this 
# export SPA_PUBLIC_DNS=notebook.some.com 
export LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native/
export HADOOP_CONF_DIR=/etc/hadoop/conf:/etc/hive/conf 
./bin/spark-notebook \
  -Dmanager.tachyon.enabled=false \
  -Dmanager.kernel.autostartOnNotebookOpen=true \
  -Dmanager.notebooks.override.sparkConf.spark.port.maxRetries=100 \
  -Dmanager.notebooks.override.sparkConf.spark.dynamicAllocation.initialExecutors=1 \
  -Dmanager.notebooks.override.sparkConf.spark.dynamicAllocation.minExecutors=0 \
  -Dmanager.notebooks.override.sparkConf.spark.serializer=org.apache.spark.serializer.KryoSerializer \
  -Dmanager.notebooks.override.sparkConf.spark.rdd.compress=true

and append something like this to cluster conf / (i.e. seen at notebook -> edit metadata )

  "customSparkConf": {
    "spark.app.name": "Spark Notebook",
    "spark.master": "yarn-client",
    "spark.yarn.jar": "hdfs:///path-to/spark-assembly-1.6.1-hadoop2.6.0-cdh5.X.X.jar"
  },

@vidma vidma closed this as completed Feb 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants