New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MAHOUT-1894] Add Support for Spark 2.x #271
Conversation
MAHOUT-1894 Add support for spark 2.x
I'm soooo into dropping a special Mahout shell, do your comments mean we just run Mahout classes in the Spark shell for Spark 2.x? Does this work with and without (@andrewpalumbo 's case) Zeppelin? IF we can compile Mahout with Scala 2.11 fairly easily (excluding the shell) and IF we can run Mahout with some helper scripts in the Spark Shell, we can drop the Mahou Shell code and get all the advantages of using the plain Spark Shell with our extensions. Can/should this be done? I realize I've asked these before but this seems the best forum. |
@pferrel In short yes. The idea here is we entirely drop the Mahout Shell. It was also the blocker for upgrading to Spark 2.x. The Zeppelin integration, for all intents and purposes is a spark shell + some imports and setting up the distributed context. So that is what we're doing here. Hopefully removing the shell will also clear the way for the Scala 2.11 upgrade / profile. |
hmm.. just tried to launch into
Something in the script? Spark seems to think that spark-submit is a directory ... |
Possibly a regression last night when I moved the location/ changed name of load.scala -> bin/load-shell.scala |
Confirmed shell explosion- fixed by deleting $MAHOUT_HOME/bin/metastore_db My shell explosion was a slightly different flavor though. Can you try the above? |
@@ -211,10 +207,6 @@ | |||
</dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is an other shell dependency @ line 158 that needs to come out.
@rawkintrevo I was wondering how we will evolve the shell if a new spark version comes out, also I am wondering what the use cases are for mahout-shell , seems like most people use mahout as an embedded application or a library, is the shell just to test out a few things? I would be all for removing the shell altogether actually. Less code to maintain in the long run, let me know if I am missing something here. |
@skanjila the shell is useful enough that I'd like to keep it around if possible. Some reasons off the cuff:
To your point- yes, 86ing the entire shell module certainly poses some very attractive advantages. What we're seeing in this PR is an opportunity to get best of both worlds (no code, but still have a shell). Just need to work out some kinks on getting it working with spark-shell correctly. |
@rawkintrevo the notion of interactive data science is very interesting to me as thats what I do at work, however what is the advantage of using mahout for that versus doing it directly in spark shell using spark sql or the ml algorithms in spark, is that where Samsara comes in, just trying to understand the tradeoffs between the spark and the mahout worlds |
@skanjila yes. In short- SparkML has a few non-extensible algorithms with limited functionality. Mahout lets your write your own algorithms, but at the moment there are some amazing tools to help you do that (distributed svd) but not many pre baked algorithms (you still need to go back to sparkML for Random Forrests, etc). With the new algorithms framework, I hope to see Mahout catch up and exceed SparkML's pre-canned algorithm collection in short order, driven primarily by community involvement. |
@andrewpalumbo checked against spark 1.6.1/2.0.2/2.1.0 on another box- no issues. Can someone else help test this? |
Great! Is there anything left to do here?
Sent from my Verizon Wireless 4G LTE smartphone
…-------- Original message --------
From: Trevor Grant <notifications@github.com>
Date: 02/06/2017 6:37 PM (GMT-08:00)
To: apache/mahout <mahout@noreply.github.com>
Cc: Andrew Palumbo <ap.dev@outlook.com>, Mention <mention@noreply.github.com>
Subject: Re: [apache/mahout] [MAHOUT-1894] Add Support for Spark 2.x (#271)
@andrewpalumbo<https://github.com/andrewpalumbo> checked against spark 1.6.1/2.0.2/2.1.0 on another box- no issues.
Can someone else help test this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#271 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AHU2HeRKlxw62_ne3qi5KbvpvWGPYmzbks5rZ9jkgaJpZM4L0xAh>.
|
+1
Sent from my Verizon Wireless 4G LTE smartphone
…-------- Original message --------
From: Trevor Grant <notifications@github.com>
Date: 02/06/2017 4:20 PM (GMT-08:00)
To: apache/mahout <mahout@noreply.github.com>
Cc: Andrew Palumbo <ap.dev@outlook.com>, Mention <mention@noreply.github.com>
Subject: Re: [apache/mahout] [MAHOUT-1894] Add Support for Spark 2.x (#271)
@skanjila<https://github.com/skanjila> yes. In short- SparkML has a few non-extensible algorithms with limited functionality. Mahout lets your write your own algorithms, but at the moment there are some amazing tools to help you do that (distributed svd) but not many pre baked algorithms (you still need to go back to sparkML for Random Forrests, etc).
With the new algorithms framework, I hope to see Mahout catch up and exceed SparkML's pre-canned algorithm collection in short order, driven primarily by community involvement.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#271 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AHU2Ha2jHySN6ab9zvmFuBSeuRJDaTNPks5rZ7jUgaJpZM4L0xAh>.
|
I'd like someone else to test this. Also curious if this solves MAHOUT-1897 This DOES NOT solve MAHOUT-1892 serialization when doing a map block in the shell |
@rawkintrevo I was going to test this on an azure vm, do you guys still need help testing? |
@rawkintrevo here's what I see when testing on an azure vm: Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
_ __ ___ __ | |_ ___ _ | | That file does not exist scala> Looks good to me,+1 non binding of course, perhaps we should try some heavy matrix ops on it for further testing |
@skanjila Thanks for testing! I usually run the ols example- though another type of test is probably advisable to truly detect bugs. Could you also confirm that it works in the following ways: Thanks again! |
Here's the results with Spark 2.0.2
_ __ ___ __ | |_ ___ _ | | That file does not exist Welcome to Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80) scala> |
And here's the results for spark 2.1.0
_ __ ___ __ | |_ ___ _ | | That file does not exist Welcome to Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80) scala> |
@rawkintrevo is there any other help I can provide on this, maybe run through some example mscala scripts, let me know |
Yeah, I'm getting a result for all three versions of spark, but the welcome banner situation could use some work; I'd like to remove the "This file does not exist" message, and with 1.6.3 the spark banner shows up before the mahout banner, while with 2.x the mahout banner shows up first. Perhaps suppressing the mahout banner makes sense. |
As long as we're sticking to Scala 2.10, running mahout on spark 2.x is simply a matter of
mvn clean package -Dspark.version=2.0.2
or
mvn clean package -Dspark.version=2.1.0
The trouble comes with the shell...
I checked Apache Zeppelin to see how they handle multiple spark/scala versions...
a brief preview of the descent into hell that is having a shell that handles multiple spark/scala versions
So I took an alternate root. I dropped the Mahout shell all together, changed the mahout bin file to load the spark shell directly, and pass a scala script that takes care of our imports.
When building there is a single deprecation warning regarding the sqlContext and how it is created in the spark-bindings.
I think we should add binaries for Spark 2.0 and Spark 2.1 as a matter of convenience and the Zeppelin integration.