New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark 1.4.1 release candidates #142
Comments
+1 Spark 1.4.1 is now released... Really appreciate if you can quickly include this one... Thank you |
Waiting for this release on AWS EMR... have major bug fixes. Thanks |
Waiting as well for 1.4.1 due to the several bug fixes. Thanks |
+1 |
It's coming... |
Great............ |
is it there? |
@christopherbozeman Can you please tell me how much time it'll take? Thanks |
Any update on this issue? I'd really appreciate the possibility to use the latest bug-fixed version.. Thanks |
Spark 1.4.1 is now available as native application with EMR's new release, see https://forums.aws.amazon.com/ann.jspa?annID=3160. |
Hi Chris, |
@PKUKILLA @christopherbozeman answered that in #32 with the links: |
@christopherbozeman This page needs updating for the dynamic feature because it calls out a fixed instance-count, true? http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-spark-launch.html
|
Thanks @christopherbozeman. Do you think you are also going to add 1.4.1 support to the |
thanks it works |
@christopherbozeman Does the EMR 4.0.0 version of Spark contain the patch from christopherbozeman/spark@316b2e0? It doesn't look like it does, as when I insert into a Hive table using Spark SQL, it creates temporary files in S3 and then appears to get stuck when trying to move them to their right place. |
Considering also the issues presented for Hive (#154) and Ganglia ( #153), is there any possibilities to get Spark 1.4.1 available as bootstrap action (a.k.a. "the usual way") so to get it in the meanwhile working on the 3.8.0 AMI (and Hadoop 2.4)? The upgrade of our system is stuck because of this, since 1.4.0 has know blocking bugs, so no change to move forward from Spark 1.3.1 until you kindly upgrade the emr-bootstrap-action support as well. I think many people would really appreciate it. Thanks. |
Stuck on the upgrade to Spark 1.4.0 using AMI 3.8.0 due to https://issues.apache.org/jira/browse/SPARK-8368. So we can't move forward neither to 1.4.0 unless we switch to AMI 4.0.0. PLEASE, upgrade the emr-boostrap-action to support Spark 1.4.1 on AMI 3.8.0, this is really a big issue for many people! |
Same here, would appreciate getting 1.4.1 integrated here while EMR 4.0 matures. |
Furthermore, we actually CAN'T switch to AMI 4.0.0 since we are leveraging DataPipeline that, obviously, doesn't currently support such AMI version: read https://forums.aws.amazon.com/thread.jspa?messageID=662004 and https://forums.aws.amazon.com/thread.jspa?messageID=658891 for references. |
+1, I think emr 4.0.0 is not mature enough to replace all application available on AMI 3.8.0. |
A Spark 1.4.1 is now available for the Spark bootstrap action for EMR AMI 3.x and can be requested by version "1.4.1.a". |
@christopherbozeman Can you answer my previous question about the EMR 4.0 version of Spark containing christopherbozeman/spark@316b2e0? |
Hi christopherbozeman, Thank you for the update. Could you please provide some feedback on the configuration that I am trying to use. My Configuration: |
@Sazpaimon I dug into your comment on #142 (comment) and determined that christopherbozeman/spark@316b2e0 is a NOOP (the underlying RDD interaction with Hadoop output format takes care of the S3 direct write). What performs the magic for not creating extra temporary paths when writing to S3 is the code that EMR added to Hive which gets included by the Spark BA installed when -h option is supplied. This is what is missing from EMR release 4.0.0. Also, Spark 1.4 only supports up to Hive 0.13 (https://issues.apache.org/jira/browse/SPARK-8065) the native Spark in EMR release 4.0.0 cannot just use the Hive 1.0 jars in order to fix the issue. I'll report this issue internally with the development team so it is resolved in a future EMR release. At this time, the ugly workaround would be to take the Hive jars from EMR AMI 3.x with Hive 0.13 that is pruned for Spark (~spark/classpath/hive/*), copy to master of a EMR 4.0.0 cluster and then append the jars to the Spark classpath. |
@rajatdt - why are you avoiding Hadoop 2.6.0? |
Hi , Can you please specify the comment that i have made on this issue. I think that you have the wrong guy here. Regards Rajat Dikshit Sent by Outlook for Android On Tue, Sep 8, 2015 at 3:42 PM -0700, "Christopher Bozeman" notifications@github.com wrote: @rajatdt - why are you avoiding Hadoop 2.6.0? — |
@rajatdt - in reference to #142 (comment). Why are you needing to use Hadoop 2.4.0? |
I was trying to work on a project with outdated instructions. So i started working and i lost track of the updated versions. Sent by Outlook for Android On Tue, Sep 8, 2015 at 4:20 PM -0700, "Christopher Bozeman" notifications@github.com wrote: @rajatdt - in reference to #142 (comment). Why are you needing to use Hadoop 2.4.0? — |
@christopherbozeman Thanks. I know exactly the piece of code you're talking about (I've had to decompile Amazon's Hive distribution for debugging purposes more times than I'd care to admit) and I'll give your suggestion a shot next time I need EMR 4.0 |
@christopherbozeman: Unfortunately I'm facing issues with this
Please, note that the very same app but built for and deployed on Spark 1.3.1 (AMI 3.7.0) use to always work smoothly on EMR. Also, the same app built for Spark 1.4.1 has been successfully run/tested on a private physical cluster (CentOS based, with Hadoop 2.4, Hive 0.13, Java 7, Scala 2.10). Any hints? Thanks in advance! |
@erond the error is likely a version conflict/mismatch on dependencies. Can I have your spark-submit arguments? |
of course @christopherbozeman. I launch the Spark driver as an EmrActivity's step within a DataPipeline: "step" : ["
s3://elasticmapreduce/libs/script-runner/script-runner.jar,
file:///home/hadoop/spark/bin/spark-submit,
--class,myCompany.myPackage.BatchSparkDriver,
--name,\"BatchSparkDriver on DP #{runsOn.@pipelineId}\",
--files,/home/hadoop/spark/conf/hive-site.xml,
--driver-class-path,/home/hadoop/spark/lib/datanucleus-api-jdo-3.2.6.jar:/home/hadoop/spark/lib/datanucleus-core-3.2.10.jar:/home/hadoop/spark/lib/datanucleus-rdbms-3.2.9.jar:/home/hadoop/spark/classpath/emr/mysql-connector-java-5.1.30.jar:hive-site.xml,
--master,yarn-cluster,
--driver-memory,512m,
--num-executors,3,
--executor-memory,2176m,
s3://myCompany-bucket/path/to/my-app-1.2.3-SNAPSHOT.jar,
(then driver's args)
"] |
@christopher, On Wed, Sep 9, 2015 at 8:43 PM, Roberto Coluccio notifications@github.com
|
@christopherbozeman did anyone else experienced the same I reported when upgrading to 1.4.1.e as the best of your knowledge? You got any advice? Thank you very much. |
When can we expect Spark 1.5.0 on emr? |
@christopherbozeman thanks for your update. Unfortunately, it still fails with the very same error, with both AMI 3.7.0 and 3.8.0. |
Hi there - thanks for your contribution. We're updating this repository to include more relevant and recent information. As such, we're cleaning up and closing old issues and PRs. Feel free to open an issue if you still use EMR and would like to see an example of something! |
Would love to have the option to launch with spark 1.4.1 rc's! I looked through the s3 buckets and noticed there's only 1.4.0 builds at the moment.
The text was updated successfully, but these errors were encountered: