Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with Spark 2.0 and Scala 2.11 #56

Merged
merged 1 commit into from
Sep 8, 2016

Conversation

mariusvniekerk
Copy link
Member

Took an initial stab at seeing what is needed to get Spark 2.0-SNAPSHOT building

Removed code that referred to a classServerURI which no longer exists as of SPARK-11563
Had to remove the -deprecation flag temporarily.
Probably want to change over initialization for this to make a SparkSession in 2.0 rather than a a SparkContext and then a SqlContext.

@mariusvniekerk
Copy link
Member Author

Unsure why this is not testing properly on jenkins.

When running tests locally on windows i get initialization issues for AddExternalJarMagicSpecForIntegration

@mariusvniekerk
Copy link
Member Author

@Lull3rSkat3r

I've had endless issues getting sbt to resolve properly using coursier -- so i've disabled that for now.

Okay I've changed the internals of Toree to make a SparkSession instead of a sqlContext.

All the tests pass locally for me now. (Well aside from the 3 ignored ones).

@Lull3rSkat3r
Copy link
Member

@mariusvniekerk, reading the Spark 2.0 documentation it says, "For the Scala API, Spark 2.0.0-preview uses Scala 2.11. You will need to use a compatible Scala version (2.11.x)." I see that there is a 2.10 version in Maven Central. So from these docs I don't know if this means we can expect the 2.10 libraries to be stable.

The topic of dropping 2.10 support has come up in the dev mailing list, but has not been finalized based on the comments in apache/spark#11542. I haven't seen any other threads discussing this, but just want to keep this in the back of our heads.

I will try the PR and see how it works for me and give you an update.

@mariusvniekerk
Copy link
Member Author

mariusvniekerk commented Jul 6, 2016

So 2.11 scala is the "default" build now. You can still compile spark for 2.10 (much like you could compile for 2.11 in the 1.X series).

Spark does require java8 under 2.0

@chipsenkbeil
Copy link
Contributor

Is the 2.11 support just with Spark 2.0? Or is it also for Spark 1.x? Also, with automated tests failing, does this mean that 2.11 passes those tests and 2.10 is now broken? Figuring out what the state is.

@mariusvniekerk
Copy link
Member Author

So we probably need to change the automated test stuff to use sbt +test to do all the pieces.

@mariusvniekerk
Copy link
Member Author

I'm getting some tests around %AddJar stuff, not certain what actually needs to change

@chipsenkbeil
Copy link
Contributor

chipsenkbeil commented Jul 19, 2016

Add jar uses some code specific to Scala 2.10. It was removed in 2.11, but a fix was added to later releases of 2.11 to support the same implementation. I believe it was reintroduced into Scala 2.11.5 via scala/scala#4051. That added a different method to support dynamically adding jars (used to power the REPL).

So, we'd need to provide an implementation difference in Scala 2.10 vs. 2.11, which is doable with the version of sbt we have. Just need to provide a method that is implemented differently between the two.

The original issue that added this functionality (that we also have in the kernel) is here: apache/spark#1929

@mariusvniekerk
Copy link
Member Author

mariusvniekerk commented Jul 19, 2016

So i'm not sure what is causing the test errors like this,

[info] Exception encountered when attempting to run a suite with class name: org.scalatest.DeferredAbortedSuite *** ABORTED *** (119 milliseconds)
[info]   java.lang.ClassCastException: interface akka.actor.Scheduler is not assignable from class akka.actor.LightArrayRevolverScheduler
[info]   at akka.actor.ReflectiveDynamicAccess$$anonfun$getClassFor$1.apply(DynamicAccess.scala:69)
[info]   at akka.actor.ReflectiveDynamicAccess$$anonfun$getClassFor$1.apply(DynamicAccess.scala:66)
[info]   at scala.util.Try$.apply(Try.scala:192)
[info]   at akka.actor.ReflectiveDynamicAccess.getClassFor(DynamicAccess.scala:66)
[info]   at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:84)
[info]   at akka.actor.ActorSystemImpl.createScheduler(ActorSystem.scala:677)
[info]   at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:576)
[info]   at akka.actor.ActorSystem$.apply(ActorSystem.scala:142)
[info]   at akka.actor.ActorSystem$.apply(ActorSystem.scala:119)
[info]   at org.apache.toree.kernel.protocol.v5.kernel.socket.HeartbeatSpec.<init>(HeartbeatSpec.scala:38)
[info]   ...

Any hints? I saw multiple versions of Akka being used and cleaned that up.

@mariusvniekerk
Copy link
Member Author

@chipsenkbeil I suspect that the remaining test failure may be due to the ScalaTestJar.jar in resources.

scala-interpreter/src/test/resources

How is that built?
Can we move that thing around to have a Scala2.10 and 2.11 version?

@chipsenkbeil
Copy link
Contributor

@mariusvniekerk, I can't even remember how we created that. I should have left the sources in the resource directory. I believe it had a single class with a method just to verify that adding an external jar written in Scala worked. so, yeah, we could definitely produce 2.10 and 2.11 versions. From there, we could have a test that runs for 2.10 versus 2.11. It's as easy as adding scala-2.10 and scala-2.11 directories to the test folder alongside scala, which is run for all versions of Scala.

@mariusvniekerk
Copy link
Member Author

@chipsenkbeil So it doesn't seem to be scala version related.

I can run that test suite in intellij (and they work), just not in sbt.
This seems to be related to something with how sbt runs these tests.

I'm going to replace the spark snaphot with the 2.0.0 released ones later this week

@mariusvniekerk
Copy link
Member Author

@chipsenkbeil So all the tests pass locally. Not sure what we want to do about the docker system-test thing since it relies on a docker image that doesn't contain spark 2.0

@chipsenkbeil
Copy link
Contributor

Sounds like it needs to be updated. I wonder if we can support both in one Docker image. @Lull3rSkat3r, @lbustelo?

@lbustelo
Copy link
Contributor

Is it as simple as setting SPARK_HOME to the right version?

@mariusvniekerk
Copy link
Member Author

mariusvniekerk commented Jul 28, 2016

Yes it should be simple to allow switching inside the docker image. This branch though does not run on pre 2.0.
We set the SPARK_HOME in the kernel_spec.

For now to get the tests to work properly on travis i'm just appending installing spark 2.0.0 during the docker based test.

@lbustelo
Copy link
Contributor

@mariusvniekerk This is pretty awesome work. Regrading the system-test target, I recommend that we add a target similar to example-image that does the installation of Spark 2.0. Then have example-image depend on that.

That way we have the image ready to go once, and could be used multiple times downstream.

As for merging this. I want to eventually/finally cut a release of Toree 0.1.0 and that would be the current state of master supporting Spark 1.6.x. The best thing to do is for us to brach master out and them we can begin talking about merging this into master. From that point on, master supports Spark 2.

@mariusvniekerk
Copy link
Member Author

I can roll the docker changes into a separate pr if this guy is getting too big.

@mariusvniekerk
Copy link
Member Author

@lbustelo okay done that. I think the python test failure at the end is just transient?

@mariusvniekerk mariusvniekerk changed the title [WIP] Compatibility with Spark 2.0 Compatibility with Spark 2.0 and Scala 2.11 Aug 10, 2016
@Brian-Burns-Bose
Copy link
Contributor

I made a build of this pull request and tried it out using a custom interpreter. Our interpreter works with the regular build but now I'm getting the following error....

Uncaught error from thread [spark-kernel-actor-system-akka.actor.default-dispatcher-2] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[spark-kernel-actor-system]
java.lang.AbstractMethodError: org.eclairjs.nashorn.JavascriptInterpreter.interpret(Ljava/lang/String;ZLscala/Option;)Lscala/Tuple2;
at org.apache.toree.kernel.protocol.v5.interpreter.tasks.ExecuteRequestTaskActor$$anonfun$receive$1.applyOrElse(ExecuteRequestTaskActor.scala:79)
at akka.actor.Actor$class.aroundReceive(Actor.scala:484)
at org.apache.toree.kernel.protocol.v5.interpreter.tasks.ExecuteRequestTaskActor.aroundReceive(ExecuteRequestTaskActor.scala:35)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

@mariusvniekerk
Copy link
Member Author

JavaScript interpreter? There are some changes in the interpreter API as well as a bump to scala 2.11

@Brian-Burns-Bose
Copy link
Contributor

Yes indeed a JavascriptInterpreter. I can see the some of the changes to the Interpreter Trait. My JavascriptInterpreter is in a maven project in which I have a toree dependency. I used to be able to do a "make sbt-publishM2" in toree to get the jars in my local repo. That doesn't seem to be working anymore? I get a bunch of scaladoc errors

@mariusvniekerk
Copy link
Member Author

I'll see what those are and fix them. Odd that the existing build on Travis
succeeds
On Wed, Aug 10, 2016 at 16:30 Brian Burns notifications@github.com wrote:

Yes indeed a JavascriptInterpreter. I can see the some of the changes to
the Interpreter Trait. My JavascriptInterpreter is in a maven project in
which I have a toree dependency. I used to be able to do a "make
sbt-publishM2" in toree to get the jars in my local repo. That doesn't seem
to be working anymore? I get a bunch of scaladoc errors


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#56 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEg9SqDIlNFzPcjEh3rwR3k_63wI0O_ks5qejTmgaJpZM4I6OnB
.

@Brian-Burns-Bose
Copy link
Contributor

In the meantime I just hardcoded the toree assembly jar to my system path. I fixed our interpreter to conform to the new API. I'm getting much further now...

@mariusvniekerk
Copy link
Member Author

@lbustelo @chipsenkbeil So I'm a bit stumped atm. I realized that i didn't have the kernel tests for Scala running.

Binding in the kernel to the interpreter fails with something like

16/08/15 18:59:43 [ERROR] o.a.t.k.i.s.ScalaInterpreter - java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
        at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.callEither(IMain.scala:790)
        at org.apache.toree.kernel.interpreter.scala.ScalaInterpreterSpecific$class.bind(ScalaInterpreterSpecific.scala:156)
        at org.apache.toree.kernel.interpreter.scala.ScalaInterpreter.bind(ScalaInterpreter.scala:41)
        at org.apache.toree.kernel.interpreter.scala.ScalaInterpreter$$anonfun$bindKernelVariable$1.apply$mcV$sp(ScalaInterpreter.scala:127)
        ...
Caused by: java.lang.ClassCastException: org.apache.toree.boot.layer.StandardComponentInitialization$$anon$1 cannot be cast to org.apache.toree.kernel.api.Kernel
        at $line2.$eval$.set(<console>:7)
        at $line2.$eval.set(<console>)
        ... 45 more

Seems this is caused by the interpreter not being able to cast the kernel as a kernel ?

@m3cinc
Copy link

m3cinc commented Aug 15, 2016

Just noticed the python kernel works but not the scala kernel...

On Mon, Aug 15, 2016 at 3:02 PM, Marius van Niekerk <
notifications@github.com> wrote:

@lbustelo https://github.com/lbustelo @chipsenkbeil
https://github.com/chipsenkbeil So I'm a bit stumped atm. I realized
that i didn't have the kernel tests for Scala running.

Binding in the kernel to the interpreter fails with something like

16/08/15 18:59:43 [ERROR] o.a.t.k.i.s.ScalaInterpreter - java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.callEither(IMain.scala:790)
at org.apache.toree.kernel.interpreter.scala.ScalaInterpreterSpecific$class.bind(ScalaInterpreterSpecific.scala:156)
at org.apache.toree.kernel.interpreter.scala.ScalaInterpreter.bind(ScalaInterpreter.scala:41)
at org.apache.toree.kernel.interpreter.scala.ScalaInterpreter$$anonfun$bindKernelVariable$1.apply$mcV$sp(ScalaInterpreter.scala:127)
...
Caused by: java.lang.ClassCastException: org.apache.toree.boot.layer.StandardComponentInitialization$$anon$1 cannot be cast to org.apache.toree.kernel.api.Kernel
at $line2.$eval$.set(:7)
at $line2.$eval.set()
... 45 more

Seems this is caused by the interpreter not being able to cast the kernel
as a kernel ?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#56 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKVP_dePl7BfJu-fREzFRB7AM1_4g-nfks5qgLfdgaJpZM4I6OnB
.

@mariusvniekerk
Copy link
Member Author

Further now.

I suspect that the classloader used by the interpreter isn't looked at when trying to do spark things.

@lbustelo
Copy link
Contributor

@mariusvniekerk Just wanted to let you know that I branched master off and created a 0.1.x branch. We will do our first release off of that. That way, master is now free to take in your PR. Let me know when you think this is ready.

@Brian-Burns-Bose
Copy link
Contributor

@mariusvniekerk yes

@mariusvniekerk
Copy link
Member Author

@lbustelo I think this is good to merge to master now

@lbustelo
Copy link
Contributor

@mariusvniekerk I'll take a look at it tomorrow

@lbustelo
Copy link
Contributor

@mariusvniekerk

So I did a make clean dev and when I when into a new notebook, the kernel dies. I get this error

[I 18:46:25.705 NotebookApp] Loading urth_import server extension.
[I 18:46:25.708 NotebookApp] Serving notebooks from local directory: /srv/toree/etc/examples/notebooks
[I 18:46:25.708 NotebookApp] 0 active kernels 
[I 18:46:25.709 NotebookApp] The Jupyter Notebook is running at: http://[all ip addresses on your system]:8888/
[I 18:46:25.709 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[I 18:47:07.628 NotebookApp] 302 GET / (192.168.99.1) 0.82ms
[I 18:59:57.013 NotebookApp] Writing notebook-signing key to /root/.local/share/jupyter/notebook_secret
[W 18:59:57.014 NotebookApp] Notebook Untitled.ipynb is not trusted
[I 18:59:57.136 NotebookApp] Kernel started: 097d9211-f1cb-4bd4-8079-883975a257f4
Starting Spark Kernel with SPARK_HOME=/usr/local/spark
Listening for transport dt_socket at address: 5005
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
    at org.apache.toree.boot.CommandLineOptions.toConfig(CommandLineOptions.scala:142)
    at org.apache.toree.Main$$anon$1.<init>(Main.scala:35)
    at org.apache.toree.Main$.delayedEndpoint$org$apache$toree$Main$1(Main.scala:35)
    at org.apache.toree.Main$delayedInit$body.apply(Main.scala:24)
    at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
    at scala.App$$anonfun$main$1.apply(App.scala:71)
    at scala.App$$anonfun$main$1.apply(App.scala:71)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
    at scala.App$class.main(App.scala:71)
    at org.apache.toree.Main$.main(Main.scala:24)
    at org.apache.toree.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

@mariusvniekerk
Copy link
Member Author

mariusvniekerk commented Aug 17, 2016

That's caused by an incompatible version of scala. You may have a spark compiled against 2.10

i've been mostly working against

make system-test

@lbustelo
Copy link
Contributor

That's it... so I rely on docker images to test Toree. I don't run it directly on my mac. We need to update the image getting used for the dev target to be something that installs Spark 2.0. The image on the Makefile is still bringing in 1.6. https://github.com/jupyter/docker-stacks/blob/master/pyspark-notebook/Dockerfile#L10

@lbustelo
Copy link
Contributor

Will build and test with jupyter/docker-stacks#263

@lbustelo
Copy link
Contributor

So far so good...

screen shot 2016-08-18 at 3 16 17 pm

@mariusvniekerk
Copy link
Member Author

I've removed the sqlcontext bind and replaced it with a spark: SparkSession one -- seems to be the way data bricks does it now.

@lbustelo
Copy link
Contributor

@mariusvniekerk So what are the implicit vals that are created?

@mariusvniekerk
Copy link
Member Author

kernel, sc and spark

@lbustelo
Copy link
Contributor

I'm thinking that for now, we should have our own Dockerfile to use for make dev and not rely on dockerstacks.

@mariusvniekerk
Copy link
Member Author

Yeah I basically did that to get system test to pass. I'm travelling a bit atm but I'll see what I can do once I have real internet again.

@lbustelo
Copy link
Contributor

@mariusvniekerk You should rebase on master to pick up #59

@mariusvniekerk
Copy link
Member Author

Rebasing this is getting terrifying :)

On Wed, Aug 24, 2016, 20:27 Gino Bustelo notifications@github.com wrote:

@mariusvniekerk https://github.com/mariusvniekerk You should rebase on
master to pick up #59 #59


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#56 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEg9f8iFzEJt9xCyRjsvnXtUL2OUXm7ks5qjIz5gaJpZM4I6OnB
.

@maver1ck
Copy link
Contributor

maver1ck commented Aug 24, 2016

I don't think we need to rebase.
This patch doesn't have any changes in MagicParser.scala and should merge cleanly.

@lbustelo
Copy link
Contributor

you are right... no need to rebase... but since master has been very quite since @mariusvniekerk started this... may no be that bad. I'll leave it up to you guys. It just makes my job easier when I merge this in.

@lbustelo
Copy link
Contributor

lbustelo commented Sep 7, 2016

I'm going to merge this and have the make dev fix go in separate.

@nitind, the PR for your makefile fixes will then go on top of master.

@lbustelo
Copy link
Contributor

lbustelo commented Sep 7, 2016

@mariusvniekerk I tried for almost an hour to merge this branch into master... it is giving me all sorts of problems. Specifically, it complaints about project/Common.scala.

I even tried squashing your commits... but still got

gMac:toree ginobustelo$ git rebase -i HEAD~40
error: refusing to lose untracked file at 'project/Common.scala'
error: could not apply c1f85ed... Added cross compiling version

When you have resolved this problem, run "git rebase --continue".
If you prefer to skip this patch, run "git rebase --skip" instead.
To check out the original branch and stop rebasing, run "git rebase --abort".

Could not apply c1f85edd1c05e7bc88204f247ee656ff07f216a6... Added cross compiling version

Since I'm doing this blind... I would rather you cleanup this PR by squashing your commits and rebase on master. Remember that this repo is really a mirror to the internal Apache repo, so there is no easy button for me. I have to manually merge to master. I will merge this in as soon as you address these issues. Thanks!

@mariusvniekerk
Copy link
Member Author

mariusvniekerk commented Sep 7, 2016

Okay I'll do this tomorrow morning

Removed SparkR copy
Migrated to Scala 2.11
Migrated to Apache Spark 2.0.x
Migrated to Akka 2.4.8
@mariusvniekerk
Copy link
Member Author

Squashed everything together with a reset mixed

@asfgit asfgit merged commit 01936c1 into apache:master Sep 8, 2016
@lbustelo
Copy link
Contributor

lbustelo commented Sep 8, 2016

merged!!! @mariusvniekerk Thanks again for all your hard work.

@nitind Since this is now in master, fix the make dev with a PR to master.

@mariusvniekerk
Copy link
Member Author

Thanks guys

@mariusvniekerk mariusvniekerk deleted the Spark2.0-compat branch September 21, 2016 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
9 participants