Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-5134 [BUILD] Bump default Hadoop version to 2+ #5027

Closed
wants to merge 1 commit into from

Conversation

srowen
Copy link
Member

@srowen srowen commented Mar 14, 2015

Bump default Hadoop version to 2.2.0. (This is already the dependency version reported by published Maven artifacts.) See JIRA for further discussion.

… version reported by published Maven artifacts.)
@SparkQA
Copy link

SparkQA commented Mar 14, 2015

Test build #28609 has finished for PR 5027 at commit acbee14.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented Mar 19, 2015

I want to double- and triple-check about this. I'm in favor, I think @pwendell is in favor since it reflects how Spark is already published vs Hadoop 2.2. It doesn't remove support for older Hadoop. I'd like to merge tomorrow.

@asfgit asfgit closed this in d08e3eb Mar 20, 2015
@srowen srowen deleted the SPARK-5134 branch March 20, 2015 15:09
@pwendell
Copy link
Contributor

Looks good - thanks for commiting this sean.

@nchammas
Copy link
Contributor

This PR seems to have broken spark-perf. Not sure why, but the executor stderr logs have the following:

15/04/14 19:14:46 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:128)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:224)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
    at org.apache.hadoop.security.Groups.<init>(Groups.java:55)
    at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:182)
    at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:235)
    at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:249)
    at org.apache.spark.deploy.SparkHadoopUtil.<init>(SparkHadoopUtil.scala:44)
    at org.apache.spark.deploy.SparkHadoopUtil$.<init>(SparkHadoopUtil.scala:220)
    at org.apache.spark.deploy.SparkHadoopUtil$.<clinit>(SparkHadoopUtil.scala)
    ... 3 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129)
    ... 10 more
Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V
    at org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native Method)
    at org.apache.hadoop.security.JniBasedUnixGroupsMapping.<clinit>(JniBasedUnixGroupsMapping.java:49)
    at org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.<init>(JniBasedUnixGroupsMappingWithFallback.java:38)
    ... 15 more

cc @JoshRosen

@nchammas
Copy link
Contributor

Suspicion is it's just a Hadoop 1 vs. 2 issue since spark-ec2 (which we use for spark-perf testing) launches clusters with Hadoop 1 by default.

Will confirm.

@nchammas
Copy link
Contributor

Confirmed. Simply building Spark with the Hadoop version explicitly set to 1.0.4 resolves this issue.

@srowen
Copy link
Member Author

srowen commented Apr 15, 2015

How about setting up Hadoop 2 on EC2 by default?
Alternatively, yeah at least you'd want to specify a particular version if a particular version is needed.

@nchammas
Copy link
Contributor

Yeah, I asked about that some time ago, and I believe the concern was about surprising users (by changing defaults) + the fact that the Hadoop 2 distro used by spark-ec2 is somehow not a "real" distro. @shivaram could explain more.

@shivaram
Copy link
Contributor

Yeah spark-ec2 does not support Hadoop 2 right now, though there has been a patch sitting around for a while now
http://apache-spark-developers-list.1001551.n3.nabble.com/spark-ec2-default-to-Hadoop-2-td10824.html has more details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants