Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37600][BUILD] Upgrade to Hadoop 3.3.2 #34855

Closed
wants to merge 10 commits into from

Conversation

sunchao
Copy link
Member

@sunchao sunchao commented Dec 9, 2021

What changes were proposed in this pull request?

This PR aims to upgrade to Hadoop 3.3.2. In addition, it also removes the LZ4 wrapper classes added in SPARK-36669, therefore fixing SPARK-36679.

Why are the changes needed?

Hadoop 3.3.2 has many bug fixes and we also can remove our internal hacked Hadoop codecs.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs.

@github-actions github-actions bot added the BUILD label Dec 9, 2021
@sunchao sunchao changed the title [SPARK-37600][BUILD] Upgrade to Hadoop 3.3.2 [WIP][SPARK-37600][BUILD] Upgrade to Hadoop 3.3.2 Dec 9, 2021
@SparkQA
Copy link

SparkQA commented Dec 9, 2021

Test build #146048 has finished for PR 34855 at commit 997590e.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions github-actions bot added the SQL label Dec 9, 2021
@SparkQA
Copy link

SparkQA commented Dec 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50523/

@SparkQA
Copy link

SparkQA commented Dec 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50525/

@SparkQA
Copy link

SparkQA commented Dec 9, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50523/

@SparkQA
Copy link

SparkQA commented Dec 9, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50525/

@SparkQA
Copy link

SparkQA commented Dec 9, 2021

Test build #146050 has finished for PR 34855 at commit be530f0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

pom.xml Outdated
@@ -309,6 +309,17 @@
</extraJavaTestArgs>
</properties>
<repositories>
<repository>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd remove this before merging? after it's released

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will remove this section once the official 3.3.2 release is out.

dev/deps/spark-deps-hadoop-3-hive-2.3 Show resolved Hide resolved
@@ -120,7 +120,7 @@
<sbt.project.name>spark</sbt.project.name>
<slf4j.version>1.7.30</slf4j.version>
<log4j.version>1.2.17</log4j.version>
<hadoop.version>3.3.1</hadoop.version>
<hadoop.version>3.3.2</hadoop.version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we update #34830 (comment) together?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we update #34830 (comment) together?

+1 on. this

@sunchao
Copy link
Member Author

sunchao commented Mar 4, 2022

hmm somehow YarnClustereSuite started failing after 3.3.2. I'll need to check what caused the issue.

@dongjoon-hyun
Copy link
Member

Is there any update, @sunchao ?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to pass locally. Could you re-trigger the test simply, @sunchao ?

[info] YarnClusterSuite:
[info] - run Spark in yarn-client mode (10 seconds, 131 milliseconds)
[info] - run Spark in yarn-cluster mode (9 seconds, 90 milliseconds)
[info] - run Spark in yarn-client mode with unmanaged am (8 seconds, 78 milliseconds)
[info] - run Spark in yarn-client mode with different configurations, ensuring redaction (10 seconds, 102 milliseconds)
[info] - run Spark in yarn-cluster mode with different configurations, ensuring redaction (10 seconds, 96 milliseconds)
[info] - yarn-cluster should respect conf overrides in SparkHadoopUtil (SPARK-16414, SPARK-23630) (9 seconds, 116 milliseconds)
[info] - SPARK-35672: run Spark in yarn-client mode with additional jar using URI scheme 'local' (10 seconds, 111 milliseconds)
[info] - SPARK-35672: run Spark in yarn-cluster mode with additional jar using URI scheme 'local' (9 seconds, 96 milliseconds)
[info] - SPARK-35672: run Spark in yarn-client mode with additional jar using URI scheme 'local' and gateway-replacement path (8 seconds, 79 milliseconds)
[info] - SPARK-35672: run Spark in yarn-cluster mode with additional jar using URI scheme 'local' and gateway-replacement path (9 seconds, 90 milliseconds)
[info] - SPARK-35672: run Spark in yarn-cluster mode with additional jar using URI scheme 'local' and gateway-replacement path containing an environment variable (9 seconds, 100 milliseconds)
...

orc-core/1.7.3//orc-core-1.7.3.jar
orc-mapreduce/1.7.3//orc-mapreduce-1.7.3.jar
orc-shims/1.7.3//orc-shims-1.7.3.jar
org.jacoco.agent/0.8.5/runtime/org.jacoco.agent-0.8.5-runtime.jar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jacoco is Java code coverage library, I was surprised that it would become a dependency

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I missed this. Do you want to exclude this as an workaround, @sunchao ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm let me check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a test-only dependency brought in by aliyun-java-sdk-core in hadoop-cloud-storage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it's test only? Didn't think those appeared in these drops file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tracked to this commit which introduced it: aliyun/aliyun-openapi-java-sdk@e0d21a3, which looks only used in test?

@LuciferYang
Copy link
Contributor

It seems to pass locally. Could you re-trigger the test simply, @sunchao ?

[info] YarnClusterSuite:
[info] - run Spark in yarn-client mode (10 seconds, 131 milliseconds)
[info] - run Spark in yarn-cluster mode (9 seconds, 90 milliseconds)
[info] - run Spark in yarn-client mode with unmanaged am (8 seconds, 78 milliseconds)
[info] - run Spark in yarn-client mode with different configurations, ensuring redaction (10 seconds, 102 milliseconds)
[info] - run Spark in yarn-cluster mode with different configurations, ensuring redaction (10 seconds, 96 milliseconds)
[info] - yarn-cluster should respect conf overrides in SparkHadoopUtil (SPARK-16414, SPARK-23630) (9 seconds, 116 milliseconds)
[info] - SPARK-35672: run Spark in yarn-client mode with additional jar using URI scheme 'local' (10 seconds, 111 milliseconds)
[info] - SPARK-35672: run Spark in yarn-cluster mode with additional jar using URI scheme 'local' (9 seconds, 96 milliseconds)
[info] - SPARK-35672: run Spark in yarn-client mode with additional jar using URI scheme 'local' and gateway-replacement path (8 seconds, 79 milliseconds)
[info] - SPARK-35672: run Spark in yarn-cluster mode with additional jar using URI scheme 'local' and gateway-replacement path (9 seconds, 90 milliseconds)
[info] - SPARK-35672: run Spark in yarn-cluster mode with additional jar using URI scheme 'local' and gateway-replacement path containing an environment variable (9 seconds, 100 milliseconds)
...

I manually test with mvn locally, and there will be UT failed:

YarnClusterSuite:
- run Spark in yarn-client mode
- run Spark in yarn-cluster mode
- run Spark in yarn-client mode with unmanaged am
- run Spark in yarn-client mode with different configurations, ensuring redaction
- run Spark in yarn-cluster mode with different configurations, ensuring redaction
- yarn-cluster should respect conf overrides in SparkHadoopUtil (SPARK-16414, SPARK-23630)
- SPARK-35672: run Spark in yarn-client mode with additional jar using URI scheme 'local'
- SPARK-35672: run Spark in yarn-cluster mode with additional jar using URI scheme 'local'
- SPARK-35672: run Spark in yarn-client mode with additional jar using URI scheme 'local' and gateway-replacement path
- SPARK-35672: run Spark in yarn-cluster mode with additional jar using URI scheme 'local' and gateway-replacement path
- SPARK-35672: run Spark in yarn-cluster mode with additional jar using URI scheme 'local' and gateway-replacement path containing an environment variable
- SPARK-35672: run Spark in yarn-client mode with additional jar using URI scheme 'file'
- SPARK-35672: run Spark in yarn-cluster mode with additional jar using URI scheme 'file'
- run Spark in yarn-cluster mode unsuccessfully
- run Spark in yarn-cluster mode failure after sc initialized
- run Python application in yarn-client mode *** FAILED ***
  LOST did not equal FINISHED SLF4J: Class path contains multiple SLF4J bindings.
  SLF4J: Found binding in [jar:file:/Users/xxx/spark-source/assembly/target/scala-2.12/jars/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  SLF4J: Found binding in [jar:file:/Users/xxx/.m2/repository/org/apache/logging/log4j/log4j-slf4j-impl/2.17.1/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] (BaseYarnClusterSuite.scala:233)	

@dongjoon-hyun
Copy link
Member

Oh...

@LuciferYang
Copy link
Contributor

#34855 (comment)

Sorry, this may be my bad. I re-run it twice and succeeded

@dongjoon-hyun
Copy link
Member

Ya, your failed test case is already passed in the original GitHub Action run. Maybe you might hit some flaky test case failure which is still in this module.

@sunchao
Copy link
Member Author

sunchao commented Mar 7, 2022

Thanks for helping to verify this @LuciferYang @dongjoon-hyun ! yea it seems a bit flaky. I tried to look into the YARN logs locally but couldn't find anything interesting. Let me try to re-trigger the GitHub workflow.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunchao . All tests except pyspark-pandas-slow seems to passed. It would be irrelevant.
Could you remove [WIP] and re-trigger once more, please?

@sunchao sunchao changed the title [WIP][SPARK-37600][BUILD] Upgrade to Hadoop 3.3.2 [SPARK-37600][BUILD] Upgrade to Hadoop 3.3.2 Mar 7, 2022
@sunchao sunchao marked this pull request as ready for review March 7, 2022 23:46
@sunchao
Copy link
Member Author

sunchao commented Mar 7, 2022

Sure @dongjoon-hyun . Just re-triggered the jobs.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (Pending CIs). Thank you, @sunchao .

@dongjoon-hyun
Copy link
Member

I believe this is almost one. Could you review this once more, @viirya , @srowen , @HyukjinKwon , @AngersZhuuuu , @LuciferYang ?

@dongjoon-hyun
Copy link
Member

To @sunchao . It seems that it's not re-triggered yet. You may want to add an empty commit.

@sunchao
Copy link
Member Author

sunchao commented Mar 7, 2022

Re-triggered via empty commit. I did it manually by clicking the "Re-run all jobs" button which wasn't reflected here somehow.

dev/deps/spark-deps-hadoop-3-hive-2.3 Show resolved Hide resolved
dev/deps/spark-deps-hadoop-3-hive-2.3 Show resolved Hide resolved
hadoop-cloud/pom.xml Show resolved Hide resolved
Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Licenses look OK

@dongjoon-hyun
Copy link
Member

@srowen , do you have any other concerns? Or, the last issue (LICENSE) is resolved and we are good to go?

Screen Shot 2022-03-08 at 6 45 14 PM

@@ -69,7 +69,7 @@ private[hive] object IsolatedClientLoader extends Logging {
// If the error message contains hadoop, it is probably because the hadoop
// version cannot be resolved.
val fallbackVersion = if (VersionUtils.isHadoop3) {
"3.3.1"
"3.3.2"
} else {
"2.7.4"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, can we read the hadoop version of the project configuration here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like independent improvement idea. Could you file a JIRA for that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like independent improvement idea. Could you file a JIRA for that?

Yea, will try to do this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is easy since in this case the Hadoop version specified via hadoop.version in pom.xml is customized and is not 3.3.2, which is why it can't be fetched from Maven.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Mar 9, 2022

Thank you, @sunchao , @viirya , @srowen , @HyukjinKwon , @LuciferYang , @AngersZhuuuu .
Merged to master for Apache Spark 3.3.

Also, cc @MaxGekk since he is the release manager for Apache Spark 3.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants