Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33343][BUILD] Fix the build with sbt to copy hadoop-client-runtime.jar #30250

Closed
wants to merge 1 commit into from

Conversation

sarutak
Copy link
Member

@sarutak sarutak commented Nov 4, 2020

What changes were proposed in this pull request?

This PR fix the issue that spark-shell doesn't work if it's built with sbt package (without any profiles specified).
It's due to hadoop-client-runtime.jar isn't copied to assembly/target/scala-2.12/jars.

$ bin/spark-shell
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper
	at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426)
	at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1013)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1022)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.shaded.com.ctc.wstx.io.InputBootstrapper
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)

Why are the changes needed?

This is a bug.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Ran spark-shell and confirmed it works.

@sarutak
Copy link
Member Author

sarutak commented Nov 4, 2020

I don't know whether this is the best way to fix this issue.
Please let me know if there is any other better way.

@SparkQA
Copy link

SparkQA commented Nov 4, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35208/

@SparkQA
Copy link

SparkQA commented Nov 4, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35208/

@SparkQA
Copy link

SparkQA commented Nov 4, 2020

Test build #130607 has finished for PR 30250 at commit babab2e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sarutak
Copy link
Member Author

sarutak commented Nov 4, 2020

retest this please.

@dongjoon-hyun
Copy link
Member

cc @sunchao

@dongjoon-hyun
Copy link
Member

For me, master branch works like the following. How did you build, @sarutak ?

$ build/sbt -Pyarn -Phive -Phive-thriftserver -Psparkr test:package

$ bin/spark-shell
20/11/04 08:56:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context available as 'sc' (master = local[*], app id = local-1604509006686).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.0-SNAPSHOT
      /_/

Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_272)
Type in expressions to have them evaluated.
Type :help for more information.

scala> :quit

@sarutak
Copy link
Member Author

sarutak commented Nov 4, 2020

I built with build/sbt package.
I'll update the description.

@dongjoon-hyun
Copy link
Member

Thank you, @sarutak . I also confirmed the issue~

@sarutak
Copy link
Member Author

sarutak commented Nov 4, 2020

According to the pom.xml of hive sub project, it also seems to depend on hadoop-client-runtime but the scope is ${hadoop.deps.scope} like the change proposed by this PR. So spark-shell built with build/sbt -Phive works.

    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>${hadoop-client-runtime.artifact}</artifactId>
      <scope>${hadoop.deps.scope}</scope>
    </dependency>

@SparkQA
Copy link

SparkQA commented Nov 4, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35210/

@SparkQA
Copy link

SparkQA commented Nov 4, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35210/

@sarutak
Copy link
Member Author

sarutak commented Nov 4, 2020

retest this please.

@sunchao
Copy link
Member

sunchao commented Nov 4, 2020

Thanks @sarutak for reporting the issue. Yeah I was able to reproduce it as well with SBT, but not Maven. In particular the hadoop-client-runtime-3.2.0.jar is not copied to assembly/target/scala-2.12/jars with SBT.

@SparkQA
Copy link

SparkQA commented Nov 4, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35215/

@SparkQA
Copy link

SparkQA commented Nov 4, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35215/

@sunchao
Copy link
Member

sunchao commented Nov 4, 2020

So this has been discussed sometime back: sbt/sbt-assembly#120, but I don't know whether there is already a fix or not.

@SparkQA
Copy link

SparkQA commented Nov 4, 2020

Test build #130609 has finished for PR 30250 at commit babab2e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 4, 2020

Test build #130614 has finished for PR 30250 at commit babab2e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

cc @srowen

@dongjoon-hyun
Copy link
Member

Thank you, @sarutak and @srowen ! Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants