Spark 1.4.1 release candidates #142

tyro89 · 2015-07-11T15:39:31Z

Would love to have the option to launch with spark 1.4.1 rc's! I looked through the s3 buckets and noticed there's only 1.4.0 builds at the moment.

ankurmitujjain · 2015-07-16T00:45:35Z

+1 Spark 1.4.1 is now released... Really appreciate if you can quickly include this one...

Thank you

mkanchwala · 2015-07-16T09:29:45Z

Waiting for this release on AWS EMR... have major bug fixes.

Thanks

erond · 2015-07-16T15:43:08Z

Waiting as well for 1.4.1 due to the several bug fixes. Thanks

MattFlower · 2015-07-16T16:43:51Z

+1

christopherbozeman · 2015-07-17T15:53:35Z

It's coming...

ankurmitujjain · 2015-07-17T19:37:42Z

Great............

ankurmitujjain · 2015-07-20T15:20:12Z

is it there?

mkanchwala · 2015-07-21T05:49:50Z

@christopherbozeman Can you please tell me how much time it'll take?

Thanks

erond · 2015-07-23T10:58:58Z

Any update on this issue? I'd really appreciate the possibility to use the latest bug-fixed version.. Thanks

christopherbozeman · 2015-07-24T16:27:42Z

Spark 1.4.1 is now available as native application with EMR's new release, see https://forums.aws.amazon.com/ann.jspa?annID=3160.

PKUKILLA · 2015-07-24T18:27:01Z

Hi Chris,
How to enable the dynamic allocation as it requires to copy shuffle jar and have the following changes in yarn-site.xml (Ref link http://www.slideshare.net/ozax86/spark-on-yarn-with-dynamic-resource-allocation)

yarn.nodemanager.aux-services
spark_shuffle,mapreduce_shuffle

yarn.nodemanager.aux-services.spark_shuffle.class
org.apache.spark.network.yarn.YarnShuffleService

jkleckner · 2015-07-24T21:21:11Z

@PKUKILLA @christopherbozeman answered that in #32 with the links:
Please see https://forums.aws.amazon.com/ann.jspa?annID=3160 and
http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-spark-configure.html#spark-dynamic-allocation

jkleckner · 2015-07-25T01:22:14Z

@christopherbozeman This page needs updating for the dynamic feature because it calls out a fixed instance-count, true?

http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-spark-launch.html

Create the cluster with the following command:

aws emr create-cluster --name "Spark cluster" --release-label emr-4.0.0 --applications Name=Spark --ec2-attributes KeyName=myKey --instance-type m3.xlarge --instance-count 3 --use-default-roles

erond · 2015-07-25T10:47:09Z

Thanks @christopherbozeman. Do you think you are also going to add 1.4.1 support to the
"old" bootstrap action as per this GH project? We are deeply using it, and we are not yet ready to move to Hadoop 2.6 and Hive 1.0. It would be great to both have the "automated" way and the "manual" way to install Spark so to be able to test all the pieces step by step before moving a production system to a new set of upgraded frameworks. Also, can you please give the community any hints about how long this project (emr-bootstrap-action to install Spark) will still be maintained (and offered)? Appreciated. Thanks as always.

PKUKILLA · 2015-07-25T18:40:08Z

thanks it works

Sazpaimon · 2015-08-03T06:33:18Z

@christopherbozeman Does the EMR 4.0.0 version of Spark contain the patch from christopherbozeman/spark@316b2e0? It doesn't look like it does, as when I insert into a Hive table using Spark SQL, it creates temporary files in S3 and then appears to get stuck when trying to move them to their right place.

erond · 2015-08-13T09:32:00Z

Considering also the issues presented for Hive (#154) and Ganglia ( #153), is there any possibilities to get Spark 1.4.1 available as bootstrap action (a.k.a. "the usual way") so to get it in the meanwhile working on the 3.8.0 AMI (and Hadoop 2.4)? The upgrade of our system is stuck because of this, since 1.4.0 has know blocking bugs, so no change to move forward from Spark 1.3.1 until you kindly upgrade the emr-bootstrap-action support as well. I think many people would really appreciate it. Thanks.

erond · 2015-08-18T16:54:10Z

Stuck on the upgrade to Spark 1.4.0 using AMI 3.8.0 due to https://issues.apache.org/jira/browse/SPARK-8368. So we can't move forward neither to 1.4.0 unless we switch to AMI 4.0.0. PLEASE, upgrade the emr-boostrap-action to support Spark 1.4.1 on AMI 3.8.0, this is really a big issue for many people!

knowak · 2015-08-19T14:53:07Z

Same here, would appreciate getting 1.4.1 integrated here while EMR 4.0 matures.

erond · 2015-08-19T15:01:36Z

Furthermore, we actually CAN'T switch to AMI 4.0.0 since we are leveraging DataPipeline that, obviously, doesn't currently support such AMI version: read https://forums.aws.amazon.com/thread.jspa?messageID=662004 and https://forums.aws.amazon.com/thread.jspa?messageID=658891 for references.

ankurmitujjain · 2015-08-19T16:46:15Z

+1, I think emr 4.0.0 is not mature enough to replace all application available on AMI 3.8.0.

christopherbozeman · 2015-09-08T14:37:42Z

A Spark 1.4.1 is now available for the Spark bootstrap action for EMR AMI 3.x and can be requested by version "1.4.1.a".

Sazpaimon · 2015-09-08T14:54:48Z

@christopherbozeman Can you answer my previous question about the EMR 4.0 version of Spark containing christopherbozeman/spark@316b2e0?

rajatdt · 2015-09-08T15:29:08Z

Hi christopherbozeman,

Thank you for the update. Could you please provide some feedback on the configuration that I am trying to use. My Configuration:
-ami-version : 3.3 (which defaults to 3.3.2)
-spark: 1.4.1.a
etc.
The question is, should I use ami-version 3.3 or should I use the latest ami-version. I want to use emr-4.0.0 release label as it provides spark 1.4.1 but it provides Hadoop 2.6.0 where as I want to use 2.4.0

christopherbozeman · 2015-09-08T21:07:38Z

@Sazpaimon I dug into your comment on #142 (comment) and determined that christopherbozeman/spark@316b2e0 is a NOOP (the underlying RDD interaction with Hadoop output format takes care of the S3 direct write). What performs the magic for not creating extra temporary paths when writing to S3 is the code that EMR added to Hive which gets included by the Spark BA installed when -h option is supplied. This is what is missing from EMR release 4.0.0. Also, Spark 1.4 only supports up to Hive 0.13 (https://issues.apache.org/jira/browse/SPARK-8065) the native Spark in EMR release 4.0.0 cannot just use the Hive 1.0 jars in order to fix the issue. I'll report this issue internally with the development team so it is resolved in a future EMR release. At this time, the ugly workaround would be to take the Hive jars from EMR AMI 3.x with Hive 0.13 that is pruned for Spark (~spark/classpath/hive/*), copy to master of a EMR 4.0.0 cluster and then append the jars to the Spark classpath.

christopherbozeman · 2015-09-08T22:42:15Z

@rajatdt - why are you avoiding Hadoop 2.6.0?

rajatdt · 2015-09-08T23:13:15Z

Hi ,

Can you please specify the comment that i have made on this issue. I think that you have the wrong guy here.

Regards

Rajat Dikshit

Sent by Outlook for Android

On Tue, Sep 8, 2015 at 3:42 PM -0700, "Christopher Bozeman" notifications@github.com wrote:

@rajatdt - why are you avoiding Hadoop 2.6.0?

—
Reply to this email directly or view it on GitHub.

christopherbozeman · 2015-09-08T23:20:04Z

@rajatdt - in reference to #142 (comment). Why are you needing to use Hadoop 2.4.0?

rajatdt · 2015-09-08T23:22:27Z

I was trying to work on a project with outdated instructions. So i started working and i lost track of the updated versions.

Sent by Outlook for Android

On Tue, Sep 8, 2015 at 4:20 PM -0700, "Christopher Bozeman" notifications@github.com wrote:

@rajatdt - in reference to #142 (comment). Why are you needing to use Hadoop 2.4.0?

—
Reply to this email directly or view it on GitHub.

Sazpaimon · 2015-09-09T00:01:42Z

@christopherbozeman Thanks. I know exactly the piece of code you're talking about (I've had to decompile Amazon's Hive distribution for debugging purposes more times than I'd care to admit) and I'll give your suggestion a shot next time I need EMR 4.0

erond · 2015-09-09T12:24:55Z

@christopherbozeman: Unfortunately I'm facing issues with this v.1.4.1.a when trying to run (YARN-cluster mode) my Spark driver on EMR with both AMI 3.7.0 and 3.8.0, in particular when trying to create a Hive external table backed on S3 I get:

5/09/09 10:09:57 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: org.apache.http.params.HttpConnectionParams.setSoKeepalive(Lorg/apache/http/params/HttpParams;Z)V
java.lang.NoSuchMethodError: org.apache.http.params.HttpConnectionParams.setSoKeepalive(Lorg/apache/http/params/HttpParams;Z)V
at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:95)
at com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:198)
at com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:132)
at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:431)
at com.amazon.ws.emr.hadoop.fs.guice.EmrFSProdModule.createAmazonS3Client(EmrFSProdModule.java:125)
at com.amazon.ws.emr.hadoop.fs.guice.EmrFSProdModule.createAmazonS3(EmrFSProdModule.java:165)
at com.amazon.ws.emr.hadoop.fs.guice.EmrFSBaseModule.provideAmazonS3(EmrFSBaseModule.java:81)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.google.inject.internal.ProviderMethod.get(ProviderMethod.java:104)
at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031)
at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
at com.google.inject.Scopes$1$1.get(Scopes.java:65)
at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
at com.google.inject.internal.SingleFieldInjector.inject(SingleFieldInjector.java:53)
at com.google.inject.internal.MembersInjectorImpl.injectMembers(MembersInjectorImpl.java:110)
at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:94)
at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:54)
at com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1009)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.initialize(EmrFileSystem.java:105)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2445)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2479)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2461)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:372)
at org.apache.hadoop.hive.common.FileUtils.isLocalFile(FileUtils.java:430)
at org.apache.hadoop.hive.common.FileUtils.isLocalFile(FileUtils.java:414)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:9887)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9180)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:422)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:345)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:326)
at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:155)
at org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:326)
at org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:316)
at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:473)
at org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:950)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:950)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:144)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:128)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:755)
at myCompany.myPackage.otherPackage.ReadStuffUsingHive.apply(ReadStuffUsingHive.scala:12)
at myCompany.myPackage.BatchSparkDriver$.main(BatchSparkDriver.scala:200)
at myCompany.myPackage.BatchSparkDriver.mainBatchSparkDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:483)
15/09/09 10:09:57 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.NoSuchMethodError: org.apache.http.params.HttpConnectionParams.setSoKeepalive(Lorg/apache/http/params/HttpParams;Z)V)
15/09/09 10:09:57 INFO spark.SparkContext: Invoking stop() from shutdown hook

Please, note that the very same app but built for and deployed on Spark 1.3.1 (AMI 3.7.0) use to always work smoothly on EMR. Also, the same app built for Spark 1.4.1 has been successfully run/tested on a private physical cluster (CentOS based, with Hadoop 2.4, Hive 0.13, Java 7, Scala 2.10).

Any hints? Thanks in advance!

christopherbozeman · 2015-09-09T14:58:49Z

@erond the error is likely a version conflict/mismatch on dependencies. Can I have your spark-submit arguments?

erond · 2015-09-09T15:13:37Z

of course @christopherbozeman. I launch the Spark driver as an EmrActivity's step within a DataPipeline:

"step" : ["
   s3://elasticmapreduce/libs/script-runner/script-runner.jar,
   file:///home/hadoop/spark/bin/spark-submit,
    --class,myCompany.myPackage.BatchSparkDriver,
    --name,\"BatchSparkDriver on DP #{runsOn.@pipelineId}\",
    --files,/home/hadoop/spark/conf/hive-site.xml,
    --driver-class-path,/home/hadoop/spark/lib/datanucleus-api-jdo-3.2.6.jar:/home/hadoop/spark/lib/datanucleus-core-3.2.10.jar:/home/hadoop/spark/lib/datanucleus-rdbms-3.2.9.jar:/home/hadoop/spark/classpath/emr/mysql-connector-java-5.1.30.jar:hive-site.xml,
    --master,yarn-cluster,
    --driver-memory,512m,
    --num-executors,3,
    --executor-memory,2176m,
    s3://myCompany-bucket/path/to/my-app-1.2.3-SNAPSHOT.jar,
    (then driver's args)
  "]

PKUKILLA · 2015-09-11T16:53:05Z

@christopher,
Is there any way to use Spark 1.5.0 with EMR?

On Wed, Sep 9, 2015 at 8:43 PM, Roberto Coluccio notifications@github.com
wrote:

of course @christopherbozeman https://github.com/christopherbozeman. I
launch the Spark driver as an EmrActivity's step within a DataPipeline:

"step" : [" s3://elasticmapreduce/libs/script-runner/script-runner.jar, file:///home/hadoop/spark/bin/spark-submit, --class,myCompany.myPackage.BatchSparkDriver, --name,"BatchSparkDriver on DP #{runsOn.@pipelineId}", --files,/home/hadoop/spark/conf/hive-site.xml, --driver-class-path,/home/hadoop/spark/lib/datanucleus-api-jdo-3.2.6.jar:/home/hadoop/spark/lib/datanucleus-core-3.2.10.jar:/home/hadoop/spark/lib/datanucleus-rdbms-3.2.9.jar:/home/hadoop/spark/classpath/emr/mysql-connector-java-5.1.30.jar:hive-site.xml, --master,yarn-cluster, --driver-memory,512m, --num-executors,3, --executor-memory,2176m, s3://myCompany-bucket/path/to/my-app-1.2.3-SNAPSHOT.jar, (then driver's args) "]

—
Reply to this email directly or view it on GitHub
#142 (comment)
.

erond · 2015-09-17T08:03:02Z

@christopherbozeman did anyone else experienced the same I reported when upgrading to 1.4.1.e as the best of your knowledge? You got any advice? Thank you very much.

njvijay · 2015-09-18T19:36:08Z

When can we expect Spark 1.5.0 on emr?

christopherbozeman · 2015-09-25T18:47:22Z

@erond Please try build 1.4.1.b that pushed with #163 to see if it resolves the issue.

christopherbozeman · 2015-09-25T18:48:01Z

@njvijay and @PKUKILLA see issue #160 regarding Spark 1.5.

erond · 2015-09-29T10:32:37Z

@christopherbozeman thanks for your update. Unfortunately, it still fails with the very same error, with both AMI 3.7.0 and 3.8.0.

dacort · 2023-04-28T21:52:30Z

Hi there - thanks for your contribution. We're updating this repository to include more relevant and recent information.

As such, we're cleaning up and closing old issues and PRs.

Feel free to open an issue if you still use EMR and would like to see an example of something!

tyro89 changed the title ~~Spark 1.4.1~~ Spark 1.4.1 release candidates Jul 11, 2015

dacort closed this as not planned Won't fix, can't repro, duplicate, stale Apr 28, 2023

Spark 1.4.1 release candidates #142

Spark 1.4.1 release candidates #142

Comments

tyro89 commented Jul 11, 2015

ankurmitujjain commented Jul 16, 2015

mkanchwala commented Jul 16, 2015

erond commented Jul 16, 2015

MattFlower commented Jul 16, 2015

christopherbozeman commented Jul 17, 2015

ankurmitujjain commented Jul 17, 2015

ankurmitujjain commented Jul 20, 2015

mkanchwala commented Jul 21, 2015

erond commented Jul 23, 2015

christopherbozeman commented Jul 24, 2015

PKUKILLA commented Jul 24, 2015

jkleckner commented Jul 24, 2015

jkleckner commented Jul 25, 2015

erond commented Jul 25, 2015

PKUKILLA commented Jul 25, 2015

Sazpaimon commented Aug 3, 2015

erond commented Aug 13, 2015

erond commented Aug 18, 2015

knowak commented Aug 19, 2015

erond commented Aug 19, 2015

ankurmitujjain commented Aug 19, 2015

christopherbozeman commented Sep 8, 2015

Sazpaimon commented Sep 8, 2015

rajatdt commented Sep 8, 2015

christopherbozeman commented Sep 8, 2015

christopherbozeman commented Sep 8, 2015

rajatdt commented Sep 8, 2015

christopherbozeman commented Sep 8, 2015

rajatdt commented Sep 8, 2015

Sazpaimon commented Sep 9, 2015

erond commented Sep 9, 2015

christopherbozeman commented Sep 9, 2015

erond commented Sep 9, 2015

PKUKILLA commented Sep 11, 2015

erond commented Sep 17, 2015

njvijay commented Sep 18, 2015

christopherbozeman commented Sep 25, 2015

christopherbozeman commented Sep 25, 2015

erond commented Sep 29, 2015

dacort commented Apr 28, 2023