-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compatibility with Spark 2.0 and Scala 2.11 #56
Conversation
Unsure why this is not testing properly on jenkins. When running tests locally on windows i get initialization issues for AddExternalJarMagicSpecForIntegration |
I've had endless issues getting sbt to resolve properly using coursier -- so i've disabled that for now. Okay I've changed the internals of Toree to make a SparkSession instead of a sqlContext. All the tests pass locally for me now. (Well aside from the 3 ignored ones). |
@mariusvniekerk, reading the Spark 2.0 documentation it says, "For the Scala API, Spark 2.0.0-preview uses Scala 2.11. You will need to use a compatible Scala version (2.11.x)." I see that there is a 2.10 version in Maven Central. So from these docs I don't know if this means we can expect the 2.10 libraries to be stable. The topic of dropping 2.10 support has come up in the dev mailing list, but has not been finalized based on the comments in apache/spark#11542. I haven't seen any other threads discussing this, but just want to keep this in the back of our heads. I will try the PR and see how it works for me and give you an update. |
So 2.11 scala is the "default" build now. You can still compile spark for 2.10 (much like you could compile for 2.11 in the 1.X series). Spark does require java8 under 2.0 |
Is the 2.11 support just with Spark 2.0? Or is it also for Spark 1.x? Also, with automated tests failing, does this mean that 2.11 passes those tests and 2.10 is now broken? Figuring out what the state is. |
So we probably need to change the automated test stuff to use sbt +test to do all the pieces. |
I'm getting some tests around %AddJar stuff, not certain what actually needs to change |
Add jar uses some code specific to Scala 2.10. It was removed in 2.11, but a fix was added to later releases of 2.11 to support the same implementation. I believe it was reintroduced into Scala 2.11.5 via scala/scala#4051. That added a different method to support dynamically adding jars (used to power the REPL). So, we'd need to provide an implementation difference in Scala 2.10 vs. 2.11, which is doable with the version of sbt we have. Just need to provide a method that is implemented differently between the two. The original issue that added this functionality (that we also have in the kernel) is here: apache/spark#1929 |
So i'm not sure what is causing the test errors like this,
Any hints? I saw multiple versions of Akka being used and cleaned that up. |
@chipsenkbeil I suspect that the remaining test failure may be due to the ScalaTestJar.jar in resources.
How is that built? |
@mariusvniekerk, I can't even remember how we created that. I should have left the sources in the resource directory. I believe it had a single class with a method just to verify that adding an external jar written in Scala worked. so, yeah, we could definitely produce 2.10 and 2.11 versions. From there, we could have a test that runs for 2.10 versus 2.11. It's as easy as adding |
@chipsenkbeil So it doesn't seem to be scala version related. I can run that test suite in intellij (and they work), just not in sbt. I'm going to replace the spark snaphot with the 2.0.0 released ones later this week |
@chipsenkbeil So all the tests pass locally. Not sure what we want to do about the docker system-test thing since it relies on a docker image that doesn't contain spark 2.0 |
Sounds like it needs to be updated. I wonder if we can support both in one Docker image. @Lull3rSkat3r, @lbustelo? |
Is it as simple as setting SPARK_HOME to the right version? |
Yes it should be simple to allow switching inside the docker image. This branch though does not run on pre 2.0. For now to get the tests to work properly on travis i'm just appending installing spark 2.0.0 during the docker based test. |
@mariusvniekerk This is pretty awesome work. Regrading the system-test target, I recommend that we add a target similar to That way we have the image ready to go once, and could be used multiple times downstream. As for merging this. I want to eventually/finally cut a release of Toree 0.1.0 and that would be the current state of master supporting Spark 1.6.x. The best thing to do is for us to brach master out and them we can begin talking about merging this into master. From that point on, master supports Spark 2. |
I can roll the docker changes into a separate pr if this guy is getting too big. |
4d92832
to
17dbbf9
Compare
@lbustelo okay done that. I think the python test failure at the end is just transient? |
I made a build of this pull request and tried it out using a custom interpreter. Our interpreter works with the regular build but now I'm getting the following error.... Uncaught error from thread [spark-kernel-actor-system-akka.actor.default-dispatcher-2] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[spark-kernel-actor-system] |
JavaScript interpreter? There are some changes in the interpreter API as well as a bump to scala 2.11 |
Yes indeed a JavascriptInterpreter. I can see the some of the changes to the Interpreter Trait. My JavascriptInterpreter is in a maven project in which I have a toree dependency. I used to be able to do a "make sbt-publishM2" in toree to get the jars in my local repo. That doesn't seem to be working anymore? I get a bunch of scaladoc errors |
I'll see what those are and fix them. Odd that the existing build on Travis
|
In the meantime I just hardcoded the toree assembly jar to my system path. I fixed our interpreter to conform to the new API. I'm getting much further now... |
@lbustelo @chipsenkbeil So I'm a bit stumped atm. I realized that i didn't have the kernel tests for Scala running. Binding in the kernel to the interpreter fails with something like
Seems this is caused by the interpreter not being able to cast the kernel as a kernel ? |
Just noticed the python kernel works but not the scala kernel... On Mon, Aug 15, 2016 at 3:02 PM, Marius van Niekerk <
|
Further now. I suspect that the classloader used by the interpreter isn't looked at when trying to do spark things. |
@mariusvniekerk Just wanted to let you know that I branched master off and created a |
@mariusvniekerk yes |
@lbustelo I think this is good to merge to master now |
@mariusvniekerk I'll take a look at it tomorrow |
So I did a
|
That's caused by an incompatible version of scala. You may have a spark compiled against 2.10 i've been mostly working against
|
That's it... so I rely on docker images to test Toree. I don't run it directly on my mac. We need to update the image getting used for the |
Will build and test with jupyter/docker-stacks#263 |
I've removed the sqlcontext bind and replaced it with a spark: SparkSession one -- seems to be the way data bricks does it now. |
@mariusvniekerk So what are the implicit vals that are created? |
kernel, sc and spark |
I'm thinking that for now, we should have our own Dockerfile to use for |
Yeah I basically did that to get system test to pass. I'm travelling a bit atm but I'll see what I can do once I have real internet again. |
@mariusvniekerk You should rebase on master to pick up #59 |
Rebasing this is getting terrifying :) On Wed, Aug 24, 2016, 20:27 Gino Bustelo notifications@github.com wrote:
|
I don't think we need to rebase. |
you are right... no need to rebase... but since master has been very quite since @mariusvniekerk started this... may no be that bad. I'll leave it up to you guys. It just makes my job easier when I merge this in. |
I'm going to merge this and have the @nitind, the PR for your makefile fixes will then go on top of master. |
@mariusvniekerk I tried for almost an hour to merge this branch into master... it is giving me all sorts of problems. Specifically, it complaints about I even tried squashing your commits... but still got
Since I'm doing this blind... I would rather you cleanup this PR by squashing your commits and rebase on master. Remember that this repo is really a mirror to the internal Apache repo, so there is no easy button for me. I have to manually merge to master. I will merge this in as soon as you address these issues. Thanks! |
Okay I'll do this tomorrow morning |
Removed SparkR copy Migrated to Scala 2.11 Migrated to Apache Spark 2.0.x Migrated to Akka 2.4.8
28bb827
to
01936c1
Compare
Squashed everything together with a reset mixed |
merged!!! @mariusvniekerk Thanks again for all your hard work. @nitind Since this is now in master, fix the |
Thanks guys |
Took an initial stab at seeing what is needed to get Spark 2.0-SNAPSHOT building
Removed code that referred to a classServerURI which no longer exists as of SPARK-11563
Had to remove the -deprecation flag temporarily.
Probably want to change over initialization for this to make a SparkSession in 2.0 rather than a a SparkContext and then a SqlContext.