Skip to content

Parallelize Java precommit integration tests#5731

Merged
kennknowles merged 1 commit intoapache:masterfrom
udim:java-precommits
Jun 22, 2018
Merged

Parallelize Java precommit integration tests#5731
kennknowles merged 1 commit intoapache:masterfrom
udim:java-precommits

Conversation

@udim
Copy link
Member

@udim udim commented Jun 22, 2018

Splits out slow Dataflow precommit tasks into separate Gradle
subprojects.
Adds a 'publish' flag to JavaNatureConfiguration, that controls Maven
publishing.


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

@udim
Copy link
Member Author

udim commented Jun 22, 2018

R: @boyuanzz, @pabloem
CC: @kennknowles

// apexRunner - https://issues.apache.org/jira/browse/BEAM-3583
def preCommitRunners = ["directRunner", "flinkRunner", "sparkRunner"]
// Dataflow runner based tests are defined in their own subprojects so that
// they can run in parallel because they are slow.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$0.02: I love having them in separate projects anyhow. No slapping together strings to come up with magic strings.

apply plugin: org.apache.beam.gradle.BeamModulePlugin
applyJavaNature(failOnWarning: true, publish: false)
// Evaluate the given project before this one, to allow referencing
// "sourceSets.test.output" directly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this comment.

settings.gradle Outdated
include "beam-runners-google-cloud-dataflow-java"
project(":beam-runners-google-cloud-dataflow-java").dir = file("runners/google-cloud-dataflow-java")
include "beam-runners-google-cloud-dataflow-java-examples"
project(":beam-runners-google-cloud-dataflow-java-examples").dir = file("runners/google-cloud-dataflow-java/examples")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this long names are just a maven hack as I understand it. Mapping them here saves us ~2 lines in the publishing config where we'd have to set the artifactId. So for things that aren't published, the best name for them is no name and gradle automatically treats a directory foo/baz/bizzle as the project foo:baz:bizzle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Removed special naming.

testRuntimeOnly project(path: ":beam-runners-google-cloud-dataflow-java", configuration: "shadow")
}

task preCommit(type: Test) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gonna need bits like this:

 def gcsProject = project.findProperty('gcsProject') ?: 'apache-beam-testing'	 def gcsProject = project.findProperty('gcsProject') ?: 'apache-beam-testing'
 def gcsTempRoot = project.findProperty('gcsTempRoot') ?: 'gs://temp-storage-for-end-to-end-tests/'

It looks as though this whole block is actually pretty generic, so you could move it into the plugin and just specify the tests to include as a param.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm totally happy to merge it with duplication to speed things up, and then refactor later)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like gcsProject is a misspelling, and should be gcpProject.

I'm not sure if adding gcpProject and gcpTempRoot everywhere is a good idea. Will these properties appear in a usage help message, such as --help? If so, it might confuse users into thinking that these options do something.

So I'm duplicating the two in the Dataflow runner projects.

@udim udim force-pushed the java-precommits branch 2 times, most recently from 4064017 to b1d54d2 Compare June 22, 2018 16:22
@udim
Copy link
Member Author

udim commented Jun 22, 2018

@kennknowles note that I removed WindowedWordCountIT from examples-streaming, since it runs the same tests (!) regardless of --streaming setting. I would similarly like to modify WordCountIT to have batch and streaming test cases and remove examples-streaming precommit altogether.

@udim
Copy link
Member Author

udim commented Jun 22, 2018

retest this please

1 similar comment
@udim
Copy link
Member Author

udim commented Jun 22, 2018

retest this please

settings.gradle Outdated
project(":beam-runners-google-cloud-dataflow-java").dir = file("runners/google-cloud-dataflow-java")
// These 2 projects will not be published to Maven so they don't get a special
// dashed name.
include ":runners:google-cloud-dataflow-java:examples:preCommit"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you intended to add the ":preCommit" on the end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@kennknowles
Copy link
Member

The reason that doesn't work is that not all runners have distinct streaming and non-streaming modes. It wouldn't make sense to run the same test twice on e.g. the DirectRunner, Gearpump, Apex, Samza, Flink*.

*FlinkRunner does have streaming/non-streaming modes but I don't think it needs them so they might go away and become automatic.

@udim udim force-pushed the java-precommits branch from b1d54d2 to ec5f06e Compare June 22, 2018 18:36
@udim
Copy link
Member Author

udim commented Jun 22, 2018

PTAL, precommit tests should start soon

@udim udim changed the title Parallelize Java precommits Parallelize Java precommit integration tests Jun 22, 2018
Copy link
Member

@pabloem pabloem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM : )
I like the build scan


if (isRelease(project) || project.hasProperty('publishing')) {
if ((isRelease(project) || project.hasProperty('publishing')) &&
configuration.publish) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this so we'll avoid publishing with languages other than Java?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, its just not needed for certain subprojects because they do not produce any artifacts.

* Some runners are run from separate projects, see the preCommit task below
* for details.
*/
// apexRunner - https://issues.apache.org/jira/browse/BEAM-3583
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a TODO? so it's well discoverable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Splits out slow Dataflow precommit tasks into separate Gradle
subprojects.
Adds a 'publish' flag to JavaNatureConfiguration, that controls Maven
publishing.
@udim udim force-pushed the java-precommits branch from ec5f06e to 60dd45b Compare June 22, 2018 22:23
@udim
Copy link
Member Author

udim commented Jun 22, 2018

Ready to merge.

@kennknowles
Copy link
Member

Yea, nice.

@kennknowles kennknowles merged commit bf1f999 into apache:master Jun 22, 2018
@udim udim deleted the java-precommits branch June 22, 2018 23:10
charlesccychen pushed a commit to charlesccychen/beam that referenced this pull request Jul 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants