Skip to content

[SPARK-37264][BUILD] Exclude hadoop-client-api transitive dependency from orc-core#34541

Closed
sarutak wants to merge 3 commits intoapache:masterfrom
sarutak:disable-test-dependencies-for-java17
Closed

[SPARK-37264][BUILD] Exclude hadoop-client-api transitive dependency from orc-core#34541
sarutak wants to merge 3 commits intoapache:masterfrom
sarutak:disable-test-dependencies-for-java17

Conversation

@sarutak
Copy link
Copy Markdown
Member

@sarutak sarutak commented Nov 10, 2021

What changes were proposed in this pull request?

Like hadoop-common and hadoop-hdfs, this PR proposes to exclude hadoop-client-api transitive dependency from orc-core.

Why are the changes needed?

Since Apache Hadoop 2.7 doesn't work on Java 17, Apache ORC has a dependency on Hadoop 3.3.1.
This causes test-dependencies.sh failure on Java 17 on hadoop-2.7 profile. As a result, run-tests.py also fails.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Confirmed that test-dependencyies.sh works on Java 17 correctly because the transitive dependency is cut.

JAVA_HOME=/path/to/java17/ build/mvn -Phadoop-2.7 dependency:tree
...
[INFO] +- org.apache.orc:orc-core:jar:1.7.1:compile
[INFO] |  +- org.apache.orc:orc-shims:jar:1.7.1:compile
[INFO] |  +- com.google.protobuf:protobuf-java:jar:2.5.0:compile
[INFO] |  +- io.airlift:aircompressor:jar:0.21:compile
[INFO] |  +- org.jetbrains:annotations:jar:17.0.0:compile
[INFO] |  \- org.threeten:threeten-extra:jar:1.5.0:compile

...

@github-actions github-actions bot added the BUILD label Nov 10, 2021
Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we cut the transitive dependency explicitly instead?

@dongjoon-hyun
Copy link
Copy Markdown
Member

I guess it only fails at Hadoop 2.7 profile, right?

@sarutak
Copy link
Copy Markdown
Member Author

sarutak commented Nov 10, 2021

Shall we cut the transitive dependency explicitly instead?

If we can do it, it's better. The dependency doesn't matter the usage of ORC in Spark right?

I guess it only fails at Hadoop 2.7 profile, right?

Yes. But test-dependencies runs both 3.2 and 2.7.
I wonder we should skip only for 2.7. WDYT?

@dongjoon-hyun
Copy link
Copy Markdown
Member

Yes, the dependency doesn't matter. ORC has it because Hadoop 2.7 doesn't work on Java 17.

@sarutak
Copy link
Copy Markdown
Member Author

sarutak commented Nov 10, 2021

because Hadoop 2.7 doesn't work on Java 17.

Ah, right. Let's cut the transitive dependency.

@dongjoon-hyun
Copy link
Copy Markdown
Member

Since we already cut the Hadoop from ORC, I'll add the missing part additionally there.

      <dependency>
        <groupId>org.apache.orc</groupId>
        <artifactId>orc-core</artifactId>
        <version>${orc.version}</version>
        <scope>${orc.deps.scope}</scope>
        <exclusions>
          <exclusion>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
          </exclusion>
          <exclusion>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
          </exclusion>
          <exclusion>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-storage-api</artifactId>
          </exclusion>
        </exclusions>
      </dependency>

@dongjoon-hyun
Copy link
Copy Markdown
Member

dongjoon-hyun commented Nov 10, 2021

Since you are already started, could you revise this PR like the following?

     <dependency>
        <groupId>org.apache.orc</groupId>
        <artifactId>orc-core</artifactId>
        <version>${orc.version}</version>
        <scope>${orc.deps.scope}</scope>
        <exclusions>
          <exclusion>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
          </exclusion>
          <exclusion>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
          </exclusion>
          <exclusion>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client-api</artifactId>
          </exclusion>
          <exclusion>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-storage-api</artifactId>
          </exclusion>
        </exclusions>
      </dependency>

This is the diff.

$ git diff
diff --git a/pom.xml b/pom.xml
index d4bb6b3d82..b2e846d8f7 100644
--- a/pom.xml
+++ b/pom.xml
@@ -2327,6 +2327,10 @@
             <groupId>org.apache.hadoop</groupId>
             <artifactId>hadoop-hdfs</artifactId>
           </exclusion>
+          <exclusion>
+            <groupId>org.apache.hadoop</groupId>
+            <artifactId>hadoop-client-api</artifactId>
+          </exclusion>
           <exclusion>
             <groupId>org.apache.hive</groupId>
             <artifactId>hive-storage-api</artifactId>

With the above change, I confirmed that the dependency is the same with the master branch with Java 17.

@sarutak
Copy link
Copy Markdown
Member Author

sarutak commented Nov 10, 2021

@dongjoon-hyun Thank you. Yeah, I'm just doing it.

@dongjoon-hyun
Copy link
Copy Markdown
Member

Thank you for the updates. Please revise the PR title and description accordingly.

@sarutak sarutak changed the title [SPARK-37264][TESTS] Skip dependency testing on Java 17 temporarily [SPARK-37264][TESTS] Cut the transitive dependency on hadoop-client-api which orc-shims depends on only for Java 17 with hadoop-2.7 Nov 10, 2021
@dongjoon-hyun
Copy link
Copy Markdown
Member

dongjoon-hyun commented Nov 10, 2021

BTW, @sarutak . ORC has that dependency always on Java 17. It's irrelevant to Spark's hadoop2.7.

which orc-shims depends on only for hadoop-2.7 profile.

@sarutak
Copy link
Copy Markdown
Member Author

sarutak commented Nov 10, 2021

@dongjoon-hyun Oh, I see. Thank you.

@sarutak sarutak changed the title [SPARK-37264][TESTS] Cut the transitive dependency on hadoop-client-api which orc-shims depends on only for Java 17 with hadoop-2.7 [SPARK-37264][TESTS] Cut the transitive dependency on hadoop-client-api which orc-shims depends on only for Java 17 Nov 10, 2021
@dongjoon-hyun
Copy link
Copy Markdown
Member

If you don't mind, may I revise the PR title and description a little?

@sarutak
Copy link
Copy Markdown
Member Author

sarutak commented Nov 10, 2021

I don't mind. It might be better. Thank you.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-37264][TESTS] Cut the transitive dependency on hadoop-client-api which orc-shims depends on only for Java 17 [SPARK-37264][TESTS] Exclude hadoop-client-api transitive dependency from orc-core Nov 10, 2021
Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. I verified manually too.

@dongjoon-hyun
Copy link
Copy Markdown
Member

I finished revision and approved. You can revise more if you want. You can merge this because the PR builder is not running Java 17 and I verified together with you.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-37264][TESTS] Exclude hadoop-client-api transitive dependency from orc-core [SPARK-37264][BUILD] Exclude hadoop-client-api transitive dependency from orc-core Nov 10, 2021
@sarutak
Copy link
Copy Markdown
Member Author

sarutak commented Nov 10, 2021

I see. Merging to master. Thank you @dongjoon-hyun for all the help !

@dongjoon-hyun
Copy link
Copy Markdown
Member

Thank YOU, @sarutak !

@sarutak sarutak closed this in b89f415 Nov 10, 2021
@SparkQA
Copy link
Copy Markdown

SparkQA commented Nov 10, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49518/

@SparkQA
Copy link
Copy Markdown

SparkQA commented Nov 10, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49520/

@SparkQA
Copy link
Copy Markdown

SparkQA commented Nov 10, 2021

Test build #145047 has finished for PR 34541 at commit d997ca9.

  • This patch fails from timeout after a configured wait of 500m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Nov 10, 2021

Test build #145049 has finished for PR 34541 at commit 2a9e425.

  • This patch fails from timeout after a configured wait of 500m.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants