Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-15818] [BUILD] Upgrade to Hadoop 2.7.2 #13556

Closed
wants to merge 5 commits into from
Closed

[SPARK-15818] [BUILD] Upgrade to Hadoop 2.7.2 #13556

wants to merge 5 commits into from

Conversation

a-roberts
Copy link
Contributor

What changes were proposed in this pull request?

Updating the Hadoop version from 2.7.0 to 2.7.2 if we use the Hadoop-2.7 build profile

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
Existing tests

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

I'd like us to use Hadoop 2.7.2 owing to the Hadoop release notes stating Hadoop 2.7.0 is not ready for production use

https://hadoop.apache.org/docs/r2.7.0/ states

"Apache Hadoop 2.7.0 is a minor release in the 2.x.y release line, building upon the previous stable release 2.6.0.
This release is not yet ready for production use. Production users should use 2.7.1 release and beyond."

Hadoop 2.7.1 release notes:
"Apache Hadoop 2.7.1 is a minor release in the 2.x.y release line, building upon the previous release 2.7.0. This is the next stable release after Apache Hadoop 2.6.x."

And then Hadoop 2.7.2 release notes:
"Apache Hadoop 2.7.2 is a minor release in the 2.x.y release line, building upon the previous stable release 2.7.1."

I've tested this is OK with Intel hardware and IBM Java 8 so let's test it with OpenJDK, ideally this will be pushed to branch-2.0 and master.

I'd like us to use Hadoop 2.7.2 owing to the Hadoop release notes stating Hadoop 2.7.0 is not ready for production use

https://hadoop.apache.org/docs/r2.7.0/ states

"Apache Hadoop 2.7.0 is a minor release in the 2.x.y release line, building upon the previous stable release 2.6.0.
This release is not yet ready for production use. Production users should use 2.7.1 release and beyond."

Hadoop 2.7.1 release notes:
"Apache Hadoop 2.7.1 is a minor release in the 2.x.y release line, building upon the previous release 2.7.0. This is the next stable release after Apache Hadoop 2.6.x."

And then Hadoop 2.7.2 release notes:
"Apache Hadoop 2.7.2 is a minor release in the 2.x.y release line, building upon the previous stable release 2.7.1."

I've tested this is OK with Intel hardware and IBM Java 8 so let's test it with OpenJDK, ideally this will be pushed to branch-2.0 and master.
@a-roberts a-roberts changed the title [SPARK-15818] Upgrade to Hadoop 2.7.2 [SPARK-15818] [BUILD] Upgrade to Hadoop 2.7.2 Jun 8, 2016
@srowen
Copy link
Member

srowen commented Jun 8, 2016

That's probably fine. I suppose we should do this for all of the 2.x Hadoop branches?

@a-roberts
Copy link
Contributor Author

a-roberts commented Jun 8, 2016

Good point, I see the latest Hadoop 2.6 version is 2.6.4 (mentions fixing critical bugs) and for 2.5 it's 2.5.2 (same story), so ideally we'd ensure each "best for profile" version gets used.

Should I submit the version changes with this pull request so instead it's a "Update Hadoop versions" change?

Does your build farm exercise all Hadoop permutations (so we can test the best available 2.3.x, 2.4.x, 2.5.x, 2.6.x, 2.7.x)?

@a-roberts
Copy link
Contributor Author

a-roberts commented Jun 8, 2016

Best versions for Hadoop profiles since 2.3 are as follows

2.7: 2.7.2
2.6: 2.6.4
2.5: 2.5.2 (although there's no 2.5 profile)
2.4: 2.4.1 (users immediately encouraged to move up for a security fix)
2.3: 2.3.0 (as is)

Use the latest recommended versions for these Hadoop profiles too
@srowen
Copy link
Member

srowen commented Jun 8, 2016

Jenkins test this please

@srowen
Copy link
Member

srowen commented Jun 8, 2016

Jenkins add to whitelist

@SparkQA
Copy link

SparkQA commented Jun 8, 2016

Test build #60165 has finished for PR 13556 at commit 8829086.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jun 8, 2016

You'll have to run ./dev/test-dependencies.sh --replace-manifest to update the expected dependencies. And we can see if somehow a transitive dependency changed.

@a-roberts
Copy link
Contributor Author

Done, didn't giff automatically so git diffed myself and see plenty of differences (as you'd expect Hadoop transitive dependencies should be, say, 2.4.1 not 2.4.0 now, 2.7.2 not 2.7.0, etc).

Where do we update the expected values? Grepping for "hadoop-annotations" for example only shows dev/deps/spark-deps_jar name_.

Looking in the script I see pr-deps is created based on the Hadoop profiles, so perhaps we need to add the explicit Hadoop version when we define the Hadoop profiles...

Sample output for 2.7:

-hadoop-annotations-2.7.0.jar
-hadoop-auth-2.7.0.jar
-hadoop-client-2.7.0.jar
-hadoop-common-2.7.0.jar
-hadoop-hdfs-2.7.0.jar
-hadoop-mapreduce-client-app-2.7.0.jar
-hadoop-mapreduce-client-common-2.7.0.jar
-hadoop-mapreduce-client-core-2.7.0.jar
-hadoop-mapreduce-client-jobclient-2.7.0.jar
-hadoop-mapreduce-client-shuffle-2.7.0.jar
-hadoop-yarn-api-2.7.0.jar
-hadoop-yarn-client-2.7.0.jar
-hadoop-yarn-common-2.7.0.jar
-hadoop-yarn-server-common-2.7.0.jar
-hadoop-yarn-server-web-proxy-2.7.0.jar
+hadoop-annotations-2.7.2.jar
+hadoop-auth-2.7.2.jar
+hadoop-client-2.7.2.jar
+hadoop-common-2.7.2.jar
+hadoop-hdfs-2.7.2.jar
+hadoop-mapreduce-client-app-2.7.2.jar
+hadoop-mapreduce-client-common-2.7.2.jar
+hadoop-mapreduce-client-core-2.7.2.jar
+hadoop-mapreduce-client-jobclient-2.7.2.jar
+hadoop-mapreduce-client-shuffle-2.7.2.jar
+hadoop-yarn-api-2.7.2.jar
+hadoop-yarn-client-2.7.2.jar
+hadoop-yarn-common-2.7.2.jar
+hadoop-yarn-server-common-2.7.2.jar
+hadoop-yarn-server-web-proxy-2.7.2.jar

@srowen
Copy link
Member

srowen commented Jun 8, 2016

If you run that script it will update the deps/* files for you automatically. Just push the results so we can review them here.

@srowen
Copy link
Member

srowen commented Jun 8, 2016

LGTM

@SparkQA
Copy link

SparkQA commented Jun 8, 2016

Test build #60174 has finished for PR 13556 at commit 402463e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jun 9, 2016

Merged to master/2.0

asfgit pushed a commit that referenced this pull request Jun 9, 2016
## What changes were proposed in this pull request?

Updating the Hadoop version from 2.7.0 to 2.7.2 if we use the Hadoop-2.7 build profile

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
Existing tests

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

I'd like us to use Hadoop 2.7.2 owing to the Hadoop release notes stating Hadoop 2.7.0 is not ready for production use

https://hadoop.apache.org/docs/r2.7.0/ states

"Apache Hadoop 2.7.0 is a minor release in the 2.x.y release line, building upon the previous stable release 2.6.0.
This release is not yet ready for production use. Production users should use 2.7.1 release and beyond."

Hadoop 2.7.1 release notes:
"Apache Hadoop 2.7.1 is a minor release in the 2.x.y release line, building upon the previous release 2.7.0. This is the next stable release after Apache Hadoop 2.6.x."

And then Hadoop 2.7.2 release notes:
"Apache Hadoop 2.7.2 is a minor release in the 2.x.y release line, building upon the previous stable release 2.7.1."

I've tested this is OK with Intel hardware and IBM Java 8 so let's test it with OpenJDK, ideally this will be pushed to branch-2.0 and master.

Author: Adam Roberts <aroberts@uk.ibm.com>

Closes #13556 from a-roberts/patch-2.

(cherry picked from commit 147c020)
Signed-off-by: Sean Owen <sowen@cloudera.com>
@asfgit asfgit closed this in 147c020 Jun 9, 2016
zjffdu pushed a commit to zjffdu/spark that referenced this pull request Jun 10, 2016
## What changes were proposed in this pull request?

Updating the Hadoop version from 2.7.0 to 2.7.2 if we use the Hadoop-2.7 build profile

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
Existing tests

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

I'd like us to use Hadoop 2.7.2 owing to the Hadoop release notes stating Hadoop 2.7.0 is not ready for production use

https://hadoop.apache.org/docs/r2.7.0/ states

"Apache Hadoop 2.7.0 is a minor release in the 2.x.y release line, building upon the previous stable release 2.6.0.
This release is not yet ready for production use. Production users should use 2.7.1 release and beyond."

Hadoop 2.7.1 release notes:
"Apache Hadoop 2.7.1 is a minor release in the 2.x.y release line, building upon the previous release 2.7.0. This is the next stable release after Apache Hadoop 2.6.x."

And then Hadoop 2.7.2 release notes:
"Apache Hadoop 2.7.2 is a minor release in the 2.x.y release line, building upon the previous stable release 2.7.1."

I've tested this is OK with Intel hardware and IBM Java 8 so let's test it with OpenJDK, ideally this will be pushed to branch-2.0 and master.

Author: Adam Roberts <aroberts@uk.ibm.com>

Closes apache#13556 from a-roberts/patch-2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants