-
Notifications
You must be signed in to change notification settings - Fork 829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MMLSpark on Cloudera #311
Comments
Hey @moyanojv , Thanks for reaching out! MMLSpark should be entirely compatible with YARN as we do not rely on a particular scheduler. Are you able to install other Spark Packages on your system? Do you get a particular error message? |
Thanks @mhamilton723 for your help. Right now I'm a little lost. As far as I can see, this package contains python code so I'm not sure how to install it. Do I have to install it as a python package in my environment? |
@moyanojv to add a python+scala library to spark you use "spark packages". when you create or spin up your spark session, you use the --packages to attach our maven repo. If you are using pyspark, attaching this maven repo will automatically load python bindings into your interpreter. Here is the section in the readme that describes the process: https://github.com/Azure/mmlspark#spark-package https://github.com/Azure/mmlspark#python Hope this helps! |
@mhamilton723 I used this commands on my Cloudera cluster: pyspark2 --master yarn --deploy-mode client --packages Azure:mmlspark:0.12 And the shell comes up:
As you can see the package is downloaded, and it seems that also is correctly installed. But when i follow the tutorial:
I'm doing anything wrong? Thanks for your help. |
Hmm the first line looks right, but when you use pyspark as a command you dont need to recreate the spark object as it already exists. try just import mmlspark and see if that works |
@mhamilton723 here you have the result:
I have attached the spark environment information. PySparkShell - Environment.pdf Thanks for your help. |
Thanks for the quick reply! Is it possible to try this out with spark 2.2? That's what our package was built against. |
I'm sorry but right now this is not possible. @mhamilton723 many thanks for your help. Regards |
@moyanojv perhaps also try installing the pip package directly as it seems your spark submit is not installing the python bits as anticipated https://mmlspark.azureedge.net/pip/mmlspark-0.12-py2.py3-none-any.whl |
I am new to mmlspark, Can I have help on this please? |
@apremgeorge it looks like you are running into a similar issue, can you try to install the latest pip package for the v0.17 version here: https://mmlspark.azureedge.net/pip/mmlspark-0.17-py2.py3-none-any.whl |
@apremgeorge also, how did you install the package in cloudera? Did you specify the spark package maven coordinates somewhere? Also, do you know if the scala bindings are working and you are only having trouble with the pyspark python bindings? |
@imatiach-msft Thank you very much for the reply, |
@imatiach-msft Hi~ I run into the same problem with 0.18.1. Where can I get a 0.18.1 wheel file? Thank you! |
Is there a reference anywhere to what wheel files are available at https://mmlspark.azureedge.net/pip/** ? |
(mmlspark) [root@hadoop51]# spark2-submit --master yarn --conf spark.pyspark.python=/usr/lib/anaconda2/envs/mmlspark/bin/python --num-executors 10 --executor-memory 15G test_mmlspark.py certifi 2016.2.28 future 0.18.2 |
We are trying to use mmlspark in a Cloudera environment using Hue pyspark notebooks through livy.
All our efforts have failed and we wonder if this option is possible. The only way we've got it working is to use pyspark without yarn.
Tested but not working:
We have modify Spark 2 Client Advanced Configuration Snippet (Safety Valve) in Cloudera manager to add --packages Azure:mmlspark:0.12 (spark.jars.packages=Azure:mmlspark:0.12). With this property our livy session donwloads the packages and dependencies but we don't see anything regarding mmlspark in the session property spark.submit.pyFiles.
Here you have the spark propertes of the environment of a livy session created using the commented aproach:
livy-session-7 - Environment.pdf
And here you have a screenshot of a working environment of a pyspark2 session using a diferent aproach (pyspark2 --master local --deploy-mode client --packages Azure:mmlspark:0.12):
pyspark - Environment.pdf
So, here is my question: It is possible to use mmlspark on a Cloudera environment using Hue pyspark notebooks through livy?
Thanks in advance.
The text was updated successfully, but these errors were encountered: