Trigger scalatest plugin in the integration-test phase #93

kimoonkim · 2017-02-08T22:57:59Z

What changes were proposed in this pull request?

Currently, the kubernetes integration test runs in the maven test phase and fails because the test jobs and other jars are missing in the target dir. (See #74) Those jars are supposed to be copied in the pre-integration-test phase, which is after the test phase.

This change fixes the issue by triggering the scalatest plugin in the integration-test phase. The target directory now has the needed jars.

How was this patch tested?

Ran the integration test build command and saw it passed the previous failing point.

$ build/mvn -B clean integration-test -Pkubernetes -Pkubernetes-integration-tests -pl resource-managers/kubernetes/integration-tests -am -Dtest=none -DwildcardSuites=org.apache.spark.deploy.kubernetes.integrationtest.KubernetesSuite

ash211 · 2017-02-10T01:20:21Z

@kimoonkim I ran the build command you listed on a fresh checkout of the k8s-support-alternate-incremental branch (without this change) and didn't see any failures.

The exact command was: build/mvn clean; build/mvn -B integration-test -Pkubernetes -Pkubernetes-integration-tests -pl resource-managers/kubernetes/integration-tests -am -Dtest=none -DwildcardSuites=org.apache.spark.deploy.kubernetes.integrationtest.KubernetesSuite

and the output included

KubernetesSuite:
- Run a simple example
- Run using spark-submit
- Run using spark-submit with the examples jar on the docker image
- Run with custom labels
- Enable SSL on the driver submit server
- Added files should exist on the driver.
Run completed in 10 minutes, 48 seconds.
Total number of tests run: 6
Suites: completed 2, aborted 0
Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0

Do you have any ideas why that might be? Am I not cleaning intermediate testing artifacts properly?

I'd expect this command to fail before your patch and succeed afterwards but what I'm observing is that it's actually succeeding before.

kimoonkim · 2017-02-10T01:55:00Z

@ash211 Ah, thanks for trying out the commands. (I am also going to do that myself to verify :-)).

I noticed your first $ mvn clean command did not specify -Pkubernees-integration-tests, which means it won't remove resource-managers/kubernetes/integration-tests/target dir if the dir exists already. Any chance the target dir was pre-populated? If yes, I think that would explain why it succeeds.

Can you also try doing clean and integration-test in a single command. i.e.

$ build/mvn -B clean integration-test -Pkubernetes -Pkubernetes-integration-tests -pl resource-managers/kubernetes/integration-tests -am -Dtest=none -DwildcardSuites=org.apache.spark.deploy.kubernetes.integrationtest.KubernetesSuite

kimoonkim · 2017-02-10T02:42:50Z

FYI, I did try the single $ mvn clean integration-test command above on a fresh clone and reproduced the failure.

kimoonkim · 2017-02-10T03:52:02Z

Reproduced the failure also with the two commands above issued on another fresh clone.

Please let me know if I should upload the full log.

kimoonkim · 2017-02-10T19:03:20Z

I found a tool that can display maven build plan. To use the tool, one just needs to add a few lines in ~/.m2/settings.xml:

  <pluginGroups>
    <pluginGroup>fr.jcgay.maven.plugins</pluginGroup>
  </pluginGroups>

Then, issue $ mvn buildplan:list -Pkubernetes -Pkubernetes-integration-tests, which will show build plan in the time order.

Before this change, the intergration-tests project shows the following. Notice scalatest-maven-plugin runs in the test phase, which is before maven-dependency-plugin doing copy-test-spark-jobs in the pre-integration-test phase:

PLUGIN	PHASE	ID	GOAL
maven-enforcer-plugin	validate	enforce-versions	enforce
scala-maven-plugin	initialize	eclipse-add-source	add-source
maven-dependency-plugin	generate-sources	default-cli	build-classpath
maven-remote-resources-plugin	generate-resources	default	process
maven-resources-plugin	process-resources	default-resources	resources
scala-maven-plugin	process-resources	scala-compile-first	compile
maven-compiler-plugin	compile	default-compile	compile
maven-antrun-plugin	generate-test-resources	create-tmp-dir	run
maven-resources-plugin	process-test-resources	default-testResources	testResources
scala-maven-plugin	process-test-resources	scala-test-compile-first	testCompile
maven-compiler-plugin	test-compile	default-testCompile	testCompile
maven-dependency-plugin	test-compile	generate-test-classpath	build-classpath
maven-surefire-plugin	test	default-test	test
maven-surefire-plugin	test	test	test
scalatest-maven-plugin	test	test	test
maven-jar-plugin	prepare-package	prepare-test-jar	test-jar
maven-jar-plugin	package	default-jar	jar
maven-site-plugin	package	attach-descriptor	attach-descriptor
maven-shade-plugin	package	default	shade
maven-source-plugin	package	create-source-jar	jar-no-fork
maven-source-plugin	package	create-source-jar	test-jar-no-fork
maven-dependency-plugin	pre-integration-test	copy-test-spark-jobs	copy
maven-dependency-plugin	pre-integration-test	unpack-docker-driver-bundle	unpack
maven-dependency-plugin	pre-integration-test	unpack-docker-executor-bundle	unpack
download-maven-plugin	pre-integration-test	download-minikube-linux	wget
download-maven-plugin	pre-integration-test	download-minikube-darwin	wget
scala-maven-plugin	verify	attach-scaladocs	doc-jar
scalastyle-maven-plugin	verify	default	check
maven-checkstyle-plugin	verify	default	check
maven-install-plugin	install	default-install	install
maven-deploy-plugin	deploy	default-deploy	deploy

Here is the build plan after this change. Notice there is one more run of scalatest-maven-plugin that triggers in the integration-test phase, which is after the copy-test-spark-jobs. With this change, that's where the KubernetesSuite will run:

PLUGIN	PHASE	ID	GOAL
maven-enforcer-plugin	validate	enforce-versions	enforce
scala-maven-plugin	initialize	eclipse-add-source	add-source
maven-dependency-plugin	generate-sources	default-cli	build-classpath
maven-remote-resources-plugin	generate-resources	default	process
maven-resources-plugin	process-resources	default-resources	resources
scala-maven-plugin	process-resources	scala-compile-first	compile
maven-compiler-plugin	compile	default-compile	compile
maven-antrun-plugin	generate-test-resources	create-tmp-dir	run
maven-resources-plugin	process-test-resources	default-testResources	testResources
scala-maven-plugin	process-test-resources	scala-test-compile-first	testCompile
maven-compiler-plugin	test-compile	default-testCompile	testCompile
maven-dependency-plugin	test-compile	generate-test-classpath	build-classpath
maven-surefire-plugin	test	default-test	test
maven-surefire-plugin	test	test	test
scalatest-maven-plugin	test	test	test
maven-jar-plugin	prepare-package	prepare-test-jar	test-jar
maven-jar-plugin	package	default-jar	jar
maven-site-plugin	package	attach-descriptor	attach-descriptor
maven-shade-plugin	package	default	shade
maven-source-plugin	package	create-source-jar	jar-no-fork
maven-source-plugin	package	create-source-jar	test-jar-no-fork
maven-dependency-plugin	pre-integration-test	copy-test-spark-jobs	copy
maven-dependency-plugin	pre-integration-test	unpack-docker-driver-bundle	unpack
maven-dependency-plugin	pre-integration-test	unpack-docker-executor-bundle	unpack
download-maven-plugin	pre-integration-test	download-minikube-linux	wget
download-maven-plugin	pre-integration-test	download-minikube-darwin	wget
scalatest-maven-plugin	integration-test	integration-test	test
scala-maven-plugin	verify	attach-scaladocs	doc-jar
scalastyle-maven-plugin	verify	default	check
maven-checkstyle-plugin	verify	default	check
maven-install-plugin	install	default-install	install
maven-deploy-plugin	deploy	default-deploy	deploy

lins05 · 2017-02-11T14:31:00Z

resource-managers/kubernetes/integration-tests/pom.xml

+             See copy-test-spark-jobs execution of maven-dependency-plugin above. -->
+        <groupId>org.scalatest</groupId>
+        <artifactId>scalatest-maven-plugin</artifactId>
+        <configuration>...</configuration>


what does the tree dots stand for?

Ah. Turned out they aren't needed. I got this piece from a scalatest user form and the three dots were there just to indicate omission of text.

I am surprised that the presence of three dots in the config did not break maven. Thanks for pushing me to look at this. I'll remove them in the next patch.

Removed this in the new patch.

lins05 · 2017-02-11T14:38:45Z

resource-managers/kubernetes/integration-tests/pom.xml

+              <goal>test</goal>
+            </goals>
+            <configuration>
+              <suffixes>(?&lt;!Suite)</suffixes>


I guess the purpose of this negative pattern is to prevent the KubernetesSuite from being tested in the test phase. Better add a comment to explain this?

Thanks for the suggestion. Added a comment in the new patch.

lins05 · 2017-02-11T14:41:49Z

resource-managers/kubernetes/integration-tests/pom.xml

+              <goal>test</goal>
+            </goals>
+            <configuration>
+              <suffixes>(?&lt;=Suite)</suffixes>


Do we need explicitly setting this? I don't find it in the scalatest-maven-plugin's section in the top level pom.xml.

Removed this in the new patch.

lins05 · 2017-02-11T14:42:56Z

@kimoonkim good to know the buildplan plugin, very nice tool!

ash211 · 2017-02-13T16:03:26Z

This does seem to make the tests run successfully now though being newer to Maven I'm not fully following why.

@mccheah can you please take a look?

kimoonkim · 2017-02-13T18:04:41Z

@lins05 Thanks for taking a look. Addressed your comments in the new patch.

kimoonkim · 2017-02-13T20:04:01Z

@ash211 @mccheah Thanks for taking a look. I am also relatively new to Maven myself and I must say Maven is quite complicated.

Let me try to explain what I think is happening before this change. Suppose we issue a command like below (which will fail):

$ build/mvn -B clean integration-test -Pkubernetes -Pkubernetes-integration-tests  \
   -pl resource-managers/kubernetes/integration-tests -am -Dtest=none  \
   -DwildcardSuites=org.apache.spark.deploy.kubernetes.integrationtest.KubernetesSuite

Maven does three high level things:

It activates the specified profiles, namely Pkubernetes Pkubernetes-integration-tests. This enables kubernetes maven modules such as resource-managers/kubernetes/core, resource-managers/kubernetes/integration-tests, resource-managers/kubernetes/integration-tests-spark-jobs, etc.
Then, maven sorts and builds these modules in the dependency order, where resource-managers/kubernetes/integration-tests-spark-jobs builds earlier than resource-managers/kubernetes/integration-tests. The latter needs a test jobs jar from the former. Note maven will build one module at a time. So integration-tests-spark-jobs will be complete before integration-tests starts. This is why the test jobs jar will be available when integration-tests starts.
For each module, maven goes through build phases in a pre-defined order.

For (3), you can find the full list on the linked page, but here are some that are relevant to us:

compile	compile the source code of the project.
test-compile	compile the test source code into the test destination directory
test	run tests using a suitable unit testing framework. These tests should not require the code be packaged or deployed.
packaging. This often results in an unpacked, processed version of the package. (Maven 2.1 and above)
package
pre-integration-test	perform actions required before integration tests are executed. This may involve things such as setting up the required environment.
integration-test	process and deploy the package if necessary into an environment where integration tests can be run.

Notice the test -> pre-integration-test -> integration-test ordering.

Now, let's see what the resource-managers/kubernetes/integration-tests module's pom.xml does. The pom.xml specifies a few plugins to download and copy a number of tarballs and jars. Here's one example. Notice it specifies the pre-integration-test phase:

     <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-dependency-plugin</artifactId>
        <executions>
          <execution>
            <id>copy-test-spark-jobs</id>
            <phase>pre-integration-test</phase>
            <goals>
              <goal>copy</goal>
            </goals>
            <configuration>
              <artifactItems>
                <artifactItem>
                  <groupId>org.apache.spark</groupId>
                  <artifactId>spark-kubernetes-integration-tests-spark-jobs_${scala.binary.version}</artifactId>
                  <version>${project.version}</version>
                  <type>jar</type>
                  <outputDirectory>${project.build.directory}/integration-tests-spark-jobs</outputDirectory>
                </artifactItem>

As shown above, those tarballs and jars will be unpacked to the project.build.directory, which will be resource-managers/kubernetes/integration-tests/target dir.

These are required inputs to KubernetsSuite. The test jobs jar is referred to by the KubernetesSuite code line 48 - 51 below:

 46 private[spark] class KubernetesSuite extends SparkFunSuite with BeforeAndAfter {
 47
 48   private val EXAMPLES_JAR = Paths.get("target", "integration-tests-spark-jobs")
 49     .toFile
 50     .listFiles()(0)
 51     .getAbsolutePath

The problem is that the scalatest plugin, which will execute KubernetsSuite, triggers only in the test phase by default. (The pom.xml of the resource-managers/kubernetes/integration-tests currently does not specify anything about scalatest plugin. The setting is inherited from the top project pom.xml.)

This plugin ordering is displayed well by the build plan plugin in a previous comment. Copying the relevant part here again:

scalatest-maven-plugin	test	test	test
...
maven-dependency-plugin	pre-integration-test	copy-test-spark-jobs	copy
maven-dependency-plugin	pre-integration-test	unpack-docker-driver-bundle	unpack
maven-dependency-plugin	pre-integration-test	unpack-docker-executor-bundle	unpack
download-maven-plugin	pre-integration-test	download-minikube-linux	wget
download-maven-plugin	pre-integration-test	download-minikube-darwin

So the above KubernetesSuite code will find the target directory missing, leading to the following exception which is raised because the line 50 above referrs to the first item in an empty list causing a NPE:

  java.lang.RuntimeException: Unable to load a Suite class that was discovered in the runpath: org.apache.spark.deploy.kubernetes.integrationtest.KubernetesSuite
  at org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:84)
  at org.scalatest.tools.DiscoverySuite$$anonfun$1.apply(DiscoverySuite.scala:38)
  at org.scalatest.tools.DiscoverySuite$$anonfun$1.apply(DiscoverySuite.scala:37)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.Iterator$class.foreach(Iterator.scala:893)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
  at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  ...
  Cause: java.lang.NullPointerException:
  at org.apache.spark.deploy.kubernetes.integrationtest.KubernetesSuite.<init>(KubernetesSuite.scala:50)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
  at java.lang.Class.newInstance(Class.java:442)
  at org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:69)
  at org.scalatest.tools.DiscoverySuite$$anonfun$1.apply(DiscoverySuite.scala:38)
  at org.scalatest.tools.DiscoverySuite$$anonfun$1.apply(DiscoverySuite.scala:37)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  ...

Now, some of us did not encounter this exception before. How is this possible? Here's one sequence of commands that will hide this problem (I have tried this sequence on my local checkout and it does hide the problem).

Issue a maven build command targeting pre-integration-test while specifying -DskipTests. This will allow the test jobs jar to be copied in place while skipping KubernetseSuite.
$ build/mvn clean pre-integration-test -DskipTests -Pkubernetes -Pkubernetes-integration-tests -pl resource-managers/kubernetes/integration-tests -am
Then, issue another maven command specifying the integration-test target, but without the clean goal specified. This time, maven will run KubernetseSuite (again in the test phase), but the exception won't happen because the test job jar was pre-populated the command above.
$ build/mvn -B integration-test -Pkubernetes -Pkubernetes-integration-tests -pl resource-managers/kubernetes/integration-tests -am -Dtest=none -DwildcardSuites=org.apache.spark.deploy.kubernetes.integrationtest.KubernetesSuite

One more related command that can add to the confusion is doing mvn clean in between (1) and (2), without specifying the two kubernetes profiles. i.e. $ build/mvn clean. This will disable kubernete maven modules and thus won't wipe out the target dirs like resource-managers/kubernetes/integration-tests/target. So doing (2) after this way of cleaning will still hide the problem.

mccheah · 2017-02-13T20:53:25Z

+1 this makes sense - the Maven pom was originally created under the assumption that the build process would execute both the test phase and the integration-test phase, but it looks like they just invoke the Scalatest plugin and that doesn't trigger integration-test by default. Should we not just change our tests to target the test phase and not the integration-test phase? Thus instead of putting the build preparation steps in the pre-integration-test phase, is there some equivalent like a pre-test phase?

kimoonkim · 2017-02-13T21:54:55Z

@mccheah Yes, targeting the test phase is one possible solution. There are phases like generate-test-resources or process-test-resources that run before the test phase. So we could use them.

From the same maven phase list web page above:

generate-test-resources	create resources for testing.
process-test-resources	copy and process the resources into the test destination directory.
test-compile	compile the test source code into the test destination directory
process-test-classes	post-process the generated files from test compilation, for example to do bytecode enhancement on Java classes. For Maven 2.0.5 and above.
test	run tests using a suitable unit testing framework. These tests should not require the code be packaged or deployed.
...
package	take the compiled code and package it in its distributable format, such as a JAR.
pre-integration-test	perform actions required before integration tests are executed. This may involve things such as setting up the required environment.
integration-test

The downside is that there is a subtle usage issue with this approach. If a user issues a maven command with the test target, then the other modules like resource-managers/kubernetes/integration-tests-spark-jobs might not produce the jars that resource-managers/kubernetes/integration-tests needs. Imagine the following command:

$ build/mvn -B clean test -Pkubernetes -Pkubernetes-integration-tests  \
   -pl resource-managers/kubernetes/integration-tests -am -Dtest=none  \
   -DwildcardSuites=org.apache.spark.deploy.kubernetes.integrationtest.KubernetesSuite

With this command, the resource-managers/kubernetes/integration-tests-spark-jobs module only runs until the test phase. The test jobs jar is produced at the package phase that comes after the test phase.

So when the resource-managers/kubernetes/integration-tests module starts, the test jobs jar will be missing. The copy-test-spark-jobs (now say at the process-test-resources phase) will fail as the result.

The failure can be avoided if one issues a maven command specifying the package or any subsequent target like below. But this is a bit counter-intuitive. People may try the above command first and do unnecessary trouble-shooting before they reach here:

$ build/mvn -B clean package -Pkubernetes -Pkubernetes-integration-tests  \
   -pl resource-managers/kubernetes/integration-tests -am -Dtest=none  \
   -DwildcardSuites=org.apache.spark.deploy.kubernetes.integrationtest.KubernetesSuite

Please let me know what you guys think.

mccheah · 2017-02-13T22:01:41Z

I thought copy-test-spark-jobs depends on the artifact from the integration-tests-spark-jobs module in a way such that it requires the equivalent phase of integration-tests-spark-jobs to be executed first?

Alternatively, we can try to take a compile-time dependency on integration-tests-spark-jobs and depend on that jar in a way that doesn't require a separate copy. For example, is it the case that in a multi-module build, if module B depends on module A, then does a.jar exist somewhere in module B's subtree in a location that we could easily reference from the integration test code? Maybe under target/?

kimoonkim · 2017-02-14T00:39:07Z

@mccheah How exactly maven reactor handles multi-module dependencies is a bit mysterious to me. I found this blog saying below:

As part of all the refactoring in Maven 3, a dependency resolution has been reworked to consistently check the reactor output. Apparently, the reactor output depends on the lifecycle phases that a project has completed. So if you invoke mvn compile or mvn test on a multi-module project, the loose class files from target/classes and target/test-classes, respectively, are used to create the required class path. As soon as the actual artifact has been assembled which usually happens during the package phase, dependency resolution will use this file.

I think we still need to do mvn package to use the jar. Can integration-tests-spark-jobs directly use the target/classes? Probably not, given spark-submit needs the jar?

mccheah · 2017-02-14T00:41:13Z

Right - the test expects to ship the jar over as the application dependency of the tests.

lins05 · 2017-02-14T02:56:39Z

I also wanted to propose moving the copy-test-spark-jobs action to some phase like generate-test-resources that happens before the test, which could effectively solve the problem this PR tries to address, but just found it seems pretty hard to get the spark-integration-test-jobs jar without running the package phase.

What if in the spark-integration-test-jobs's pom, we try to attach the jar:jar goal to the generate-test-resources phase?

kimoonkim · 2017-02-14T17:31:44Z

@lins05 Creating jars at the generate-test-resources phase sounds promising. The only concern is whether the maven reactor component will recognize the jars from non-package phases. I think we can try and see. Also note this approach will lead to a slightly large change since we are going to touch other modules. @mccheah what do you think of this suggestion? If yes, I can write a new sister PR so we can compare with this PR.

Also, what do we think the upside of using the test phase? I like the overall build time can be shorter. Anything else? I am just curious.

lins05 · 2017-02-14T17:35:18Z

Also note this approach will lead to a slightly large change since we are going to touch other modules

Emm, what other modules? IIUC the affected modules would only be the integration-tests, integration-tests-spark-jobs, and integration-tests-spark-jobs-helpers/ modules.

kimoonkim · 2017-02-14T17:41:45Z

They are what I meant. Add docker-minimal-bundle there too?

lins05 · 2017-02-14T17:43:41Z

I see, then that's ok.

docker-minimal-bundle there too?

Right.

kimoonkim · 2017-02-14T17:51:04Z

@lins05 @mccheah
I think docker-minimal-bundle may be a show-stopper. The tarballs from the module has jars from all other spark modules (and their dependencies). These spark module jars need mvn package.

$ tar tvfz docker-minimal-bundle/target/spark-docker-minimal-bundle_2.11-2.2.0-SNAPSHOT-driver-docker-dist.tar.gz | grep spark-.*.jar | head
-rw-r--r-- kimoonkim/staff 12015040 2017-02-13 09:30 jars/spark-core_2.11-2.2.0-SNAPSHOT.jar
-rw-r--r-- kimoonkim/staff 66585 2017-02-13 09:27 jars/spark-launcher_2.11-2.2.0-SNAPSHOT.jar
-rw-r--r-- kimoonkim/staff 15490 2017-02-13 09:26 jars/spark-tags_2.11-2.2.0-SNAPSHOT.jar
-rw-r--r-- kimoonkim/staff 2367947 2017-02-13 09:26 jars/spark-network-common_2.11-2.2.0-SNAPSHOT.jar
-rw-r--r-- kimoonkim/staff 61878 2017-02-13 09:26 jars/spark-network-shuffle_2.11-2.2.0-SNAPSHOT.jar
-rw-r--r-- kimoonkim/staff 44552 2017-02-13 09:26 jars/spark-unsafe_2.11-2.2.0-SNAPSHOT.jar
-rw-r--r-- kimoonkim/staff 6195220 2017-02-13 09:39 jars/spark-mllib_2.11-2.2.0-SNAPSHOT.jar
-rw-r--r-- kimoonkim/staff 2175666 2017-02-13 09:31 jars/spark-streaming_2.11-2.2.0-SNAPSHOT.jar
-rw-r--r-- kimoonkim/staff 6847170 2017-02-13 09:37 jars/spark-sql_2.11-2.2.0-SNAPSHOT.jar
-rw-r--r-- kimoonkim/staff 30029 2017-02-13 09:26 jars/spark-sketch_2.11-2.2.0-SNAPSHOT.jar

$ tar tvfz docker-minimal-bundle/target/spark-docker-minimal-bundle_2.11-2.2.0-SNAPSHOT-driver-docker-dist.tar.gz | grep .jar | wc -l
163

kimoonkim · 2017-02-14T22:33:13Z

@lins05 Looked at docker-minimal-bundle. It is using maven-assembly-plugin to put all spark jars in the tarballs. The driver-assembly.xml, puts all the jars it depends on except a few:

<dependencySets>
    <dependencySet>
      <outputDirectory>jars</outputDirectory>
      <useTransitiveDependencies>true</useTransitiveDependencies>
      <unpack>false</unpack>
      <scope>runtime</scope>
      <useProjectArtifact>false</useProjectArtifact>
      <excludes>
        <exclude>org.apache.spark:spark-assembly_${scala.binary.version}:pom</exclude>
        <exclude>org.spark-project.spark:unused</exclude>
        <exclude>org.apache.spark:spark-examples_${scala.binary.version}</exclude>
      </excludes>
    </dependencySet>

And the pom.xml specifies spark-assembly as its main dependency, from which all other spark module jars are pulled from.

   <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-assembly_${scala.binary.version}</artifactId>
      <version>${project.version}</version>
      <type>pom</type>
    </dependency>

I can't imagine we build all these jars in non-package phase. That's just too many pom.xml's to touch.

If we only make docker-minimal-bundle run maven-assembly-plugin in the generate-test-resources phase, I think it'll try to put target/classes of all other spark modules into the tarball. I don't know if that will even succeed, but even if it does, it doesn't sound like what we want to have inside a docker image for testing.

I am afraid this does look like a show-stopper. Thoughts?

mccheah · 2017-02-14T22:56:11Z

Hm, yeah I suppose our findings are showing that the integration-test phase is the right one to use, and resource preparation should be in pre-integration-test, and Scalatest should just invoke our integration-test phase.

kimoonkim · 2017-02-14T23:56:38Z

@mccheah SGTM. Then this PR is ready for merge?

kimoonkim · 2017-02-15T17:43:06Z

@lins05 @mccheah Thanks for discussions so far. Do we have any more questions or feedbacks?

Given our findings so far, I believe this PR is useful as is. FYI, the patch is updated earlier to address @lins05's comments. Can you guys please give one more look?

ash211 · 2017-02-16T01:22:32Z

@kimoonkim an enormous thank you for all your work on this PR! Clearly you've put a lot of effort and research into getting this right.

I can't say I'm familiar enough with Maven to say this is right, but I think whether it's perfect or not it's certainly a step in the right direction. Let's merge and move closer to running integration tests in Travis (one of the goals coming out of this week's weekly meeting).

Thanks again for the well-researched contribution!

kimoonkim · 2017-02-16T16:49:03Z

@ash211 I am not a big fan of Maven either, but it was a great learning experience for me :-) Thank you, @mccheah and @lins05 for asking right questions and having discussions together.

* Trigger scalatest plugin in the integration-test phase * Clean up unnecessary config section

…on-k8s#93) * Trigger scalatest plugin in the integration-test phase * Clean up unnecessary config section

…tions ### What changes were proposed in this pull request? In order to avoid frequently changing the value of `spark.sql.adaptive.shuffle.maxNumPostShufflePartitions`, we usually set `spark.sql.adaptive.shuffle.maxNumPostShufflePartitions` much larger than `spark.sql.shuffle.partitions` after enabling adaptive execution, which causes some bucket map join lose efficacy and add more `ShuffleExchange`. How to reproduce: ```scala val bucketedTableName = "bucketed_table" spark.range(10000).write.bucketBy(500, "id").sortBy("id").mode(org.apache.spark.sql.SaveMode.Overwrite).saveAsTable(bucketedTableName) val bucketedTable = spark.table(bucketedTableName) val df = spark.range(8) spark.conf.set("spark.sql.autoBroadcastJoinThreshold", -1) // Spark 2.4. spark.sql.adaptive.enabled=false // We set spark.sql.shuffle.partitions <= 500 every time based on our data in this case. spark.conf.set("spark.sql.shuffle.partitions", 500) bucketedTable.join(df, "id").explain() // Since 3.0. We enabled adaptive execution and set spark.sql.adaptive.shuffle.maxNumPostShufflePartitions to a larger values to fit more cases. spark.conf.set("spark.sql.adaptive.enabled", true) spark.conf.set("spark.sql.adaptive.shuffle.maxNumPostShufflePartitions", 1000) bucketedTable.join(df, "id").explain() ``` ``` scala> bucketedTable.join(df, "id").explain() == Physical Plan == *(4) Project [id#5L] +- *(4) SortMergeJoin [id#5L], [id#7L], Inner :- *(1) Sort [id#5L ASC NULLS FIRST], false, 0 : +- *(1) Project [id#5L] : +- *(1) Filter isnotnull(id#5L) : +- *(1) ColumnarToRow : +- FileScan parquet default.bucketed_table[id#5L] Batched: true, DataFilters: [isnotnull(id#5L)], Format: Parquet, Location: InMemoryFileIndex[file:/root/opensource/apache-spark/spark-3.0.0-SNAPSHOT-bin-3.2.0/spark-warehou..., PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema: struct<id:bigint>, SelectedBucketsCount: 500 out of 500 +- *(3) Sort [id#7L ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(id#7L, 500), true, [id=apache-spark-on-k8s#49] +- *(2) Range (0, 8, step=1, splits=16) ``` vs ``` scala> bucketedTable.join(df, "id").explain() == Physical Plan == AdaptiveSparkPlan(isFinalPlan=false) +- Project [id#5L] +- SortMergeJoin [id#5L], [id#7L], Inner :- Sort [id#5L ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(id#5L, 1000), true, [id=apache-spark-on-k8s#93] : +- Project [id#5L] : +- Filter isnotnull(id#5L) : +- FileScan parquet default.bucketed_table[id#5L] Batched: true, DataFilters: [isnotnull(id#5L)], Format: Parquet, Location: InMemoryFileIndex[file:/root/opensource/apache-spark/spark-3.0.0-SNAPSHOT-bin-3.2.0/spark-warehou..., PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema: struct<id:bigint>, SelectedBucketsCount: 500 out of 500 +- Sort [id#7L ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(id#7L, 1000), true, [id=apache-spark-on-k8s#92] +- Range (0, 8, step=1, splits=16) ``` This PR makes read bucketed tables always obeys `spark.sql.shuffle.partitions` even enabling adaptive execution and set `spark.sql.adaptive.shuffle.maxNumPostShufflePartitions` to avoid add more `ShuffleExchange`. ### Why are the changes needed? Do not degrade performance after enabling adaptive execution. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Unit test. Closes apache#26409 from wangyum/SPARK-29655. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

Trigger scalatest plugin in the integration-test phase

1d44743

kimoonkim mentioned this pull request Feb 8, 2017

Configure Travis to run K8s integration test #74

Closed

lins05 reviewed Feb 11, 2017

View reviewed changes

Clean up unnecessary config section

57171ee

ash211 merged commit 9d250a2 into apache-spark-on-k8s:k8s-support-alternate-incremental Feb 16, 2017

kimoonkim deleted the run-scalatest-on-integration-test-phase branch February 17, 2017 23:22

kimoonkim mentioned this pull request Feb 21, 2017

Run unit tests as part of CI builds #110

Closed

ash211 pushed a commit that referenced this pull request Mar 8, 2017

Trigger scalatest plugin in the integration-test phase (#93)

a800e20

* Trigger scalatest plugin in the integration-test phase * Clean up unnecessary config section

foxish pushed a commit that referenced this pull request Jul 24, 2017

Trigger scalatest plugin in the integration-test phase (#93)

6ea3047

* Trigger scalatest plugin in the integration-test phase * Clean up unnecessary config section

ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 25, 2019

Trigger scalatest plugin in the integration-test phase (apache-spark-…

61cd34c

…on-k8s#93) * Trigger scalatest plugin in the integration-test phase * Clean up unnecessary config section

puneetloya pushed a commit to puneetloya/spark that referenced this pull request Mar 11, 2019

Trigger scalatest plugin in the integration-test phase (apache-spark-…

e6c1f74

…on-k8s#93) * Trigger scalatest plugin in the integration-test phase * Clean up unnecessary config section

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trigger scalatest plugin in the integration-test phase #93

Trigger scalatest plugin in the integration-test phase #93

kimoonkim commented Feb 8, 2017 •

edited

Loading

ash211 commented Feb 10, 2017

kimoonkim commented Feb 10, 2017

kimoonkim commented Feb 10, 2017

kimoonkim commented Feb 10, 2017

kimoonkim commented Feb 10, 2017 •

edited

Loading

lins05 Feb 11, 2017

kimoonkim Feb 13, 2017

kimoonkim Feb 13, 2017

lins05 Feb 11, 2017

kimoonkim Feb 13, 2017

lins05 Feb 11, 2017

kimoonkim Feb 13, 2017

lins05 commented Feb 11, 2017

ash211 commented Feb 13, 2017

kimoonkim commented Feb 13, 2017

kimoonkim commented Feb 13, 2017

mccheah commented Feb 13, 2017

kimoonkim commented Feb 13, 2017

mccheah commented Feb 13, 2017

kimoonkim commented Feb 14, 2017

mccheah commented Feb 14, 2017

lins05 commented Feb 14, 2017

kimoonkim commented Feb 14, 2017

lins05 commented Feb 14, 2017

kimoonkim commented Feb 14, 2017

lins05 commented Feb 14, 2017

kimoonkim commented Feb 14, 2017 •

edited

Loading

kimoonkim commented Feb 14, 2017

mccheah commented Feb 14, 2017

kimoonkim commented Feb 14, 2017

kimoonkim commented Feb 15, 2017

ash211 commented Feb 16, 2017

kimoonkim commented Feb 16, 2017

Trigger scalatest plugin in the integration-test phase #93

Trigger scalatest plugin in the integration-test phase #93

Conversation

kimoonkim commented Feb 8, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

ash211 commented Feb 10, 2017

kimoonkim commented Feb 10, 2017

kimoonkim commented Feb 10, 2017

kimoonkim commented Feb 10, 2017

kimoonkim commented Feb 10, 2017 • edited Loading

lins05 Feb 11, 2017

Choose a reason for hiding this comment

kimoonkim Feb 13, 2017

Choose a reason for hiding this comment

kimoonkim Feb 13, 2017

Choose a reason for hiding this comment

lins05 Feb 11, 2017

Choose a reason for hiding this comment

kimoonkim Feb 13, 2017

Choose a reason for hiding this comment

lins05 Feb 11, 2017

Choose a reason for hiding this comment

kimoonkim Feb 13, 2017

Choose a reason for hiding this comment

lins05 commented Feb 11, 2017

ash211 commented Feb 13, 2017

kimoonkim commented Feb 13, 2017

kimoonkim commented Feb 13, 2017

mccheah commented Feb 13, 2017

kimoonkim commented Feb 13, 2017

mccheah commented Feb 13, 2017

kimoonkim commented Feb 14, 2017

mccheah commented Feb 14, 2017

lins05 commented Feb 14, 2017

kimoonkim commented Feb 14, 2017

lins05 commented Feb 14, 2017

kimoonkim commented Feb 14, 2017

lins05 commented Feb 14, 2017

kimoonkim commented Feb 14, 2017 • edited Loading

kimoonkim commented Feb 14, 2017

mccheah commented Feb 14, 2017

kimoonkim commented Feb 14, 2017

kimoonkim commented Feb 15, 2017

ash211 commented Feb 16, 2017

kimoonkim commented Feb 16, 2017

kimoonkim commented Feb 8, 2017 •

edited

Loading

kimoonkim commented Feb 10, 2017 •

edited

Loading

kimoonkim commented Feb 14, 2017 •

edited

Loading