-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4048] Enhance and extend hadoop-provided profile. #2982
Conversation
Notes:
|
Test build #22371 has started for PR 2982 at commit
|
Test build #22371 has finished for PR 2982 at commit
|
Test PASSed. |
+1 |
Test build #22836 has started for PR 2982 at commit
|
Test build #22836 has finished for PR 2982 at commit
|
Test FAILed. |
Test build #22838 has started for PR 2982 at commit
|
Test build #22838 has finished for PR 2982 at commit
|
Test PASSed. |
Test build #22884 has started for PR 2982 at commit
|
Test build #22884 has finished for PR 2982 at commit
|
Test PASSed. |
Test build #22893 has started for PR 2982 at commit
|
Test build #22893 has finished for PR 2982 at commit
|
Test PASSed. |
Test build #22952 has started for PR 2982 at commit
|
Test build #22952 has finished for PR 2982 at commit
|
Test PASSed. |
Test build #23016 has started for PR 2982 at commit
|
Test build #23016 has finished for PR 2982 at commit
|
Test PASSed. |
@@ -41,10 +41,6 @@ | |||
<version>${project.version}</version> | |||
</dependency> | |||
<dependency> | |||
<groupId>org.eclipse.jetty</groupId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not see any direct dependency on jetty from the streaming code. I'll double check, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, streaming does not depend on Jetty, but it does use servlet APIs. Since Spark doesn't pull in any servlet API dependency directly, and instead uses the transitive one provided by Jetty, which is some Jetty repackaging of servlet APIs instead of some common dependency. (BTW it works without the explicit dependency because now it was coming as a transitive dependency of spark-core.)
In light of that I'll restore this dependency, even though I don't really like it very much (it's pulling a dependency that is too coarse for what it needs).
Test build #23243 has started for PR 2982 at commit
|
Test build #23244 has started for PR 2982 at commit
|
Friendly ping. I still have to double check TD's question, but even pending that, the code builds and runs correctly. |
Test build #23243 has finished for PR 2982 at commit
|
Test PASSed. |
// compute-classpath.{cmd,sh} and makes all needed jars available to child processes | ||
// when the assembly is built with the "*-provided" profiles enabled. | ||
if (sys.props.contains("spark.testing")) { | ||
environment.put("SPARK_DIST_CLASSPATH", sys.props("java.class.path")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed in all cases or only when you are actually running tests with a "hadoop provided" build and you've supplied you own SPARK_DIST_CLASSPATH
for the test JVM. If the latter is true, can we just propagate the value of SPARK_DIST_CLASSPATH
to the child from the parent? It's a bit weird here because you're setting SPARK_DIST_CLASSPATH to also include all of the normal spark classes.
Then you would just check if SPARK_DIST_CLASSPATH
is set and if so, you'd propagate its value to the child environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used to run into problems with tests spawning child processes that referenced hadoop classes, and those not being available when the "hadoop-provided" profile was enabled. Let me try without this and see how it goes, since I noticed some changes in the configuration of the test plugins in the root pom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if it is still a problem, my question was, can we just set SPARK_DIST_CLASSPATH
in the child to the value of SPARK_DIST_CLASSPATH
in the parent? This is instead of setting it to the value of java.class.path
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But SPARK_DIST_CLASSPATH
is not set in the parent (the parent being the surefire/scalatest runner, in this case). The goal of this code is to add the classpath built by those (which includes all the dependencies from the pom) to the child processes without having to modify the test code to manually do it.
Hey @vanzin. I left a comment about the test set-up but overall this looks good. Fine to keep |
Conflicts: yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala
Test build #25258 has started for PR 2982 at commit
|
Test build #25258 has finished for PR 2982 at commit
|
Test FAILed. |
Test build #25263 has started for PR 2982 at commit
|
Test build #25263 has finished for PR 2982 at commit
|
Test PASSed. |
<groupId>org.apache.maven.plugins</groupId> | ||
<artifactId>maven-install-plugin</artifactId> | ||
<configuration> | ||
<skip>true</skip> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change a separate issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test build #25273 has started for PR 2982 at commit
|
Test build #25273 has finished for PR 2982 at commit
|
Test PASSed. |
Thanks @vanzin for this cleanup - I'll merge this. |
After #2982 (SPARK-4048) we rely on the newer HBase packaging format.
After #2982 (SPARK-4048) we rely on the newer HBase packaging format.
This change does a few things to make the hadoop-provided profile more useful:
classpath for Spark jobs and daemons.