[SPARK-1681] Include datanucleus jars in Spark Hive distribution #610

andrewor14 · 2014-05-01T07:21:55Z

This copies the datanucleus jars over from lib_managed into dist/lib, if any. The CLASSPATH must also be updated to reflect this change.

AmplabJenkins · 2014-05-01T07:22:57Z

Merged build triggered.

AmplabJenkins · 2014-05-01T07:23:07Z

Merged build started.

aarondav · 2014-05-01T07:29:02Z

bin/compute-classpath.sh

+
+if [ -n "$datanucleus_jars" ]; then
+  an_assembly_jar=${ASSEMBLY_JAR:-$DEPS_ASSEMBLY_JAR}
+  hive_files=$(jar tvf "$an_assembly_jar" org/apache/hadoop/hive/ql/exec)


should probably still pipe 2>/dev/null here as before, I believe this throws an error if it doesn't exist

forgot, thanks

AmplabJenkins · 2014-05-01T07:37:57Z

Merged build triggered.

AmplabJenkins · 2014-05-01T07:38:07Z

Merged build started.

pwendell · 2014-05-01T07:52:50Z

bin/compute-classpath.sh

@@ -29,6 +29,7 @@ FWDIR="$(cd `dirname $0`/..; pwd)"

 # Build up classpath
 CLASSPATH="$SPARK_CLASSPATH:$SPARK_SUBMIT_CLASSPATH:$FWDIR/conf"
+CLASSPATH=$(echo "$CLASSPATH" | sed s/::/:/g)


Is this just for aesthetics or is there a correctness issue here?

aesthetics... @aarondav also pointed out that this is a little confusing. Maybe I should remove it.

pwendell · 2014-05-01T08:01:36Z

@andrewor14 looks good! Just a few small questions.

AmplabJenkins · 2014-05-01T08:02:08Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-01T08:02:08Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14605/

AmplabJenkins · 2014-05-01T08:12:14Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-01T08:12:14Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14606/

srowen · 2014-05-01T09:59:53Z

Sorry for a dumb question but is this the place to configure what goes into artifacts? the Maven build is where the rest of that action happens right? or is this a necessary special case?

witgo · 2014-05-01T13:47:59Z

There is another solution #598

pwendell · 2014-05-01T17:18:48Z

Right now we actually build distributions with make-distribution that calls the maven build and then moves some things around. In this case the datanucleus jars need to be handled in a special way because they don't assemble well.

I think for 1.1 the goal should be to both consolidate the builds and to potentially migrate this entire thing to maven using release profiles. But for now I want a surgical fix that we can pull into the 1.0 branch.

AmplabJenkins · 2014-05-01T17:37:57Z

Merged build triggered.

AmplabJenkins · 2014-05-01T17:38:05Z

Merged build started.

AmplabJenkins · 2014-05-01T19:01:48Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-01T19:01:48Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14609/

pwendell · 2014-05-03T06:45:05Z

@andrewor14 @aarondav I tried this locally and ran into a bug. The use of jar to read the index of the files doesn't honor JAVA_HOME. In my case the system default is Java 6 whereas JAVA_HOME is set to Java 7. When it tried to get the index of the jar it failed silently. The actual error was this (related to the 65k jar limit)

java.util.zip.ZipException: invalid CEN header (bad signature)
    at java.util.zip.ZipFile.open(Native Method)
    at java.util.zip.ZipFile.<init>(ZipFile.java:132)
    at java.util.zip.ZipFile.<init>(ZipFile.java:93)
    at sun.tools.jar.Main.list(Main.java:997)
    at sun.tools.jar.Main.run(Main.java:242)
    at sun.tools.jar.Main.main(Main.java:1167)

Is there a reason we intentionally swallow stderr from the jar command?
Could we have the jar command respect JAVA_HOME similar to in spark-class?

aarondav · 2014-05-03T06:51:38Z

Regarding question 1, it was just to avoid having to nest another check for the existence of the dependency jar. More of a hack than anything.

pwendell · 2014-05-03T07:03:46Z

As a more general solution for the problem I encountered I've created:
https://issues.apache.org/jira/browse/SPARK-1703

With that in mind I think it's fine to just address (2) here, since we can assume this error message will be found elsewhere.

Both cases being building the uber assembly jar and building the deps assembly jar.

AmplabJenkins · 2014-05-05T17:52:57Z

Merged build triggered.

AmplabJenkins · 2014-05-05T17:53:07Z

Merged build started.

pwendell · 2014-05-05T18:43:55Z

bin/compute-classpath.sh

 fi

+# Verify that versions of java used to build the jars and run Spark are compatible
+jar_error_check=$("$JAR_CMD" -tf "$ASSEMBLY_JAR" scala/AnyVal 2>&1)


As per our offline discussion, let's just change this to unused/class/path or something.

AmplabJenkins · 2014-05-05T19:08:51Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-05T19:08:52Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14669/

AmplabJenkins · 2014-05-05T19:12:58Z

Merged build triggered.

AmplabJenkins · 2014-05-05T19:13:07Z

Merged build started.

AmplabJenkins · 2014-05-05T19:47:21Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-05T19:47:21Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14674/

This copies the datanucleus jars over from `lib_managed` into `dist/lib`, if any. The `CLASSPATH` must also be updated to reflect this change. Author: Andrew Or <andrewor14@gmail.com> Closes #610 from andrewor14/hive-distribution and squashes the following commits: a4bc96f [Andrew Or] Rename search path in jar error check fa205e1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into hive-distribution 7855f58 [Andrew Or] Have jar command respect JAVA_HOME + check for jar errors both cases c16bbfd [Andrew Or] Merge branch 'master' of github.com:apache/spark into hive-distribution 32f6826 [Andrew Or] Leave the double colons 940a1bb [Andrew Or] Add back 2>/dev/null 58357cc [Andrew Or] Include datanucleus jars in Spark distribution built with Hive support (cherry picked from commit cf0a8f0) Signed-off-by: Patrick Wendell <pwendell@gmail.com>

This copies the datanucleus jars over from `lib_managed` into `dist/lib`, if any. The `CLASSPATH` must also be updated to reflect this change. Author: Andrew Or <andrewor14@gmail.com> Closes apache#610 from andrewor14/hive-distribution and squashes the following commits: a4bc96f [Andrew Or] Rename search path in jar error check fa205e1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into hive-distribution 7855f58 [Andrew Or] Have jar command respect JAVA_HOME + check for jar errors both cases c16bbfd [Andrew Or] Merge branch 'master' of github.com:apache/spark into hive-distribution 32f6826 [Andrew Or] Leave the double colons 940a1bb [Andrew Or] Add back 2>/dev/null 58357cc [Andrew Or] Include datanucleus jars in Spark distribution built with Hive support

Now many project use scala, it's good to add a common role. Since scala depends on java, add the install step inner openjdk role. Users can install scala after openjdk with the var `with_scala`

Include datanucleus jars in Spark distribution built with Hive support

58357cc

aarondav reviewed May 1, 2014
View reviewed changes

Add back 2>/dev/null

940a1bb

andrewor14 changed the title ~~[SPARK-1681] Include datanucleus jars in Spark distribution built with Hive support~~ [SPARK-1681] Include datanucleus jars in Spark Hive distribution May 1, 2014

pwendell reviewed May 1, 2014
View reviewed changes

Leave the double colons

32f6826

andrewor14 added 2 commits May 5, 2014 09:59

Merge branch 'master' of github.com:apache/spark into hive-distribution

c16bbfd

Have jar command respect JAVA_HOME + check for jar errors both cases

7855f58

Both cases being building the uber assembly jar and building the deps assembly jar.

pwendell reviewed May 5, 2014
View reviewed changes

andrewor14 added 2 commits May 5, 2014 12:09

Merge branch 'master' of github.com:apache/spark into hive-distribution

fa205e1

Rename search path in jar error check

a4bc96f

asfgit closed this in cf0a8f0 May 6, 2014

andrewor14 deleted the hive-distribution branch May 6, 2014 17:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-1681] Include datanucleus jars in Spark Hive distribution #610

[SPARK-1681] Include datanucleus jars in Spark Hive distribution #610

andrewor14 commented May 1, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 1, 2014

aarondav May 1, 2014

andrewor14 May 1, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 1, 2014

pwendell May 1, 2014

andrewor14 May 1, 2014

pwendell commented May 1, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 1, 2014

srowen commented May 1, 2014

witgo commented May 1, 2014

pwendell commented May 1, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 1, 2014

pwendell commented May 3, 2014

aarondav commented May 3, 2014

pwendell commented May 3, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

pwendell May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

[SPARK-1681] Include datanucleus jars in Spark Hive distribution #610

[SPARK-1681] Include datanucleus jars in Spark Hive distribution #610

Conversation

andrewor14 commented May 1, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 1, 2014

aarondav May 1, 2014

Choose a reason for hiding this comment

andrewor14 May 1, 2014

Choose a reason for hiding this comment

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 1, 2014

pwendell May 1, 2014

Choose a reason for hiding this comment

andrewor14 May 1, 2014

Choose a reason for hiding this comment

pwendell commented May 1, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 1, 2014

srowen commented May 1, 2014

witgo commented May 1, 2014

pwendell commented May 1, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 1, 2014

pwendell commented May 3, 2014

aarondav commented May 3, 2014

pwendell commented May 3, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

pwendell May 5, 2014

Choose a reason for hiding this comment

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014