Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-1681] Include datanucleus jars in Spark Hive distribution #610

Closed
wants to merge 7 commits into from

Conversation

andrewor14
Copy link
Contributor

This copies the datanucleus jars over from lib_managed into dist/lib, if any. The CLASSPATH must also be updated to reflect this change.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.


if [ -n "$datanucleus_jars" ]; then
an_assembly_jar=${ASSEMBLY_JAR:-$DEPS_ASSEMBLY_JAR}
hive_files=$(jar tvf "$an_assembly_jar" org/apache/hadoop/hive/ql/exec)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably still pipe 2>/dev/null here as before, I believe this throws an error if it doesn't exist

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot, thanks

@andrewor14 andrewor14 changed the title [SPARK-1681] Include datanucleus jars in Spark distribution built with Hive support [SPARK-1681] Include datanucleus jars in Spark Hive distribution May 1, 2014
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@@ -29,6 +29,7 @@ FWDIR="$(cd `dirname $0`/..; pwd)"

# Build up classpath
CLASSPATH="$SPARK_CLASSPATH:$SPARK_SUBMIT_CLASSPATH:$FWDIR/conf"
CLASSPATH=$(echo "$CLASSPATH" | sed s/::/:/g)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just for aesthetics or is there a correctness issue here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aesthetics... @aarondav also pointed out that this is a little confusing. Maybe I should remove it.

@pwendell
Copy link
Contributor

pwendell commented May 1, 2014

@andrewor14 looks good! Just a few small questions.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14605/

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14606/

@srowen
Copy link
Member

srowen commented May 1, 2014

Sorry for a dumb question but is this the place to configure what goes into artifacts? the Maven build is where the rest of that action happens right? or is this a necessary special case?

@witgo
Copy link
Contributor

witgo commented May 1, 2014

There is another solution #598

@pwendell
Copy link
Contributor

pwendell commented May 1, 2014

Right now we actually build distributions with make-distribution that calls the maven build and then moves some things around. In this case the datanucleus jars need to be handled in a special way because they don't assemble well.

I think for 1.1 the goal should be to both consolidate the builds and to potentially migrate this entire thing to maven using release profiles. But for now I want a surgical fix that we can pull into the 1.0 branch.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14609/

@pwendell
Copy link
Contributor

pwendell commented May 3, 2014

@andrewor14 @aarondav I tried this locally and ran into a bug. The use of jar to read the index of the files doesn't honor JAVA_HOME. In my case the system default is Java 6 whereas JAVA_HOME is set to Java 7. When it tried to get the index of the jar it failed silently. The actual error was this (related to the 65k jar limit)

java.util.zip.ZipException: invalid CEN header (bad signature)
    at java.util.zip.ZipFile.open(Native Method)
    at java.util.zip.ZipFile.<init>(ZipFile.java:132)
    at java.util.zip.ZipFile.<init>(ZipFile.java:93)
    at sun.tools.jar.Main.list(Main.java:997)
    at sun.tools.jar.Main.run(Main.java:242)
    at sun.tools.jar.Main.main(Main.java:1167)
  1. Is there a reason we intentionally swallow stderr from the jar command?
  2. Could we have the jar command respect JAVA_HOME similar to in spark-class?

@aarondav
Copy link
Contributor

aarondav commented May 3, 2014

Regarding question 1, it was just to avoid having to nest another check for the existence of the dependency jar. More of a hack than anything.

@pwendell
Copy link
Contributor

pwendell commented May 3, 2014

As a more general solution for the problem I encountered I've created:
https://issues.apache.org/jira/browse/SPARK-1703

With that in mind I think it's fine to just address (2) here, since we can assume this error message will be found elsewhere.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

fi

# Verify that versions of java used to build the jars and run Spark are compatible
jar_error_check=$("$JAR_CMD" -tf "$ASSEMBLY_JAR" scala/AnyVal 2>&1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per our offline discussion, let's just change this to unused/class/path or something.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14669/

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14674/

@asfgit asfgit closed this in cf0a8f0 May 6, 2014
asfgit pushed a commit that referenced this pull request May 6, 2014
This copies the datanucleus jars over from `lib_managed` into `dist/lib`, if any. The `CLASSPATH` must also be updated to reflect this change.

Author: Andrew Or <andrewor14@gmail.com>

Closes #610 from andrewor14/hive-distribution and squashes the following commits:

a4bc96f [Andrew Or] Rename search path in jar error check
fa205e1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into hive-distribution
7855f58 [Andrew Or] Have jar command respect JAVA_HOME + check for jar errors both cases
c16bbfd [Andrew Or] Merge branch 'master' of github.com:apache/spark into hive-distribution
32f6826 [Andrew Or] Leave the double colons
940a1bb [Andrew Or] Add back 2>/dev/null
58357cc [Andrew Or] Include datanucleus jars in Spark distribution built with Hive support
(cherry picked from commit cf0a8f0)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
@andrewor14 andrewor14 deleted the hive-distribution branch May 6, 2014 17:10
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
This copies the datanucleus jars over from `lib_managed` into `dist/lib`, if any. The `CLASSPATH` must also be updated to reflect this change.

Author: Andrew Or <andrewor14@gmail.com>

Closes apache#610 from andrewor14/hive-distribution and squashes the following commits:

a4bc96f [Andrew Or] Rename search path in jar error check
fa205e1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into hive-distribution
7855f58 [Andrew Or] Have jar command respect JAVA_HOME + check for jar errors both cases
c16bbfd [Andrew Or] Merge branch 'master' of github.com:apache/spark into hive-distribution
32f6826 [Andrew Or] Leave the double colons
940a1bb [Andrew Or] Add back 2>/dev/null
58357cc [Andrew Or] Include datanucleus jars in Spark distribution built with Hive support
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
Now many project use scala, it's good to add a common role.
Since scala depends on java, add the install step inner openjdk role.
Users can install scala after openjdk with the var `with_scala`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants