Parallelize Java precommit integration tests#5731
Conversation
|
R: @boyuanzz, @pabloem |
examples/java/build.gradle
Outdated
| // apexRunner - https://issues.apache.org/jira/browse/BEAM-3583 | ||
| def preCommitRunners = ["directRunner", "flinkRunner", "sparkRunner"] | ||
| // Dataflow runner based tests are defined in their own subprojects so that | ||
| // they can run in parallel because they are slow. |
There was a problem hiding this comment.
$0.02: I love having them in separate projects anyhow. No slapping together strings to come up with magic strings.
| apply plugin: org.apache.beam.gradle.BeamModulePlugin | ||
| applyJavaNature(failOnWarning: true, publish: false) | ||
| // Evaluate the given project before this one, to allow referencing | ||
| // "sourceSets.test.output" directly. |
There was a problem hiding this comment.
Thank you for this comment.
settings.gradle
Outdated
| include "beam-runners-google-cloud-dataflow-java" | ||
| project(":beam-runners-google-cloud-dataflow-java").dir = file("runners/google-cloud-dataflow-java") | ||
| include "beam-runners-google-cloud-dataflow-java-examples" | ||
| project(":beam-runners-google-cloud-dataflow-java-examples").dir = file("runners/google-cloud-dataflow-java/examples") |
There was a problem hiding this comment.
FWIW this long names are just a maven hack as I understand it. Mapping them here saves us ~2 lines in the publishing config where we'd have to set the artifactId. So for things that aren't published, the best name for them is no name and gradle automatically treats a directory foo/baz/bizzle as the project foo:baz:bizzle.
There was a problem hiding this comment.
Done. Removed special naming.
| testRuntimeOnly project(path: ":beam-runners-google-cloud-dataflow-java", configuration: "shadow") | ||
| } | ||
|
|
||
| task preCommit(type: Test) { |
There was a problem hiding this comment.
Gonna need bits like this:
def gcsProject = project.findProperty('gcsProject') ?: 'apache-beam-testing' def gcsProject = project.findProperty('gcsProject') ?: 'apache-beam-testing'
def gcsTempRoot = project.findProperty('gcsTempRoot') ?: 'gs://temp-storage-for-end-to-end-tests/'
It looks as though this whole block is actually pretty generic, so you could move it into the plugin and just specify the tests to include as a param.
There was a problem hiding this comment.
(I'm totally happy to merge it with duplication to speed things up, and then refactor later)
There was a problem hiding this comment.
Seems like gcsProject is a misspelling, and should be gcpProject.
I'm not sure if adding gcpProject and gcpTempRoot everywhere is a good idea. Will these properties appear in a usage help message, such as --help? If so, it might confuse users into thinking that these options do something.
So I'm duplicating the two in the Dataflow runner projects.
4064017 to
b1d54d2
Compare
|
@kennknowles note that I removed WindowedWordCountIT from examples-streaming, since it runs the same tests (!) regardless of --streaming setting. I would similarly like to modify WordCountIT to have batch and streaming test cases and remove examples-streaming precommit altogether. |
|
retest this please |
1 similar comment
|
retest this please |
settings.gradle
Outdated
| project(":beam-runners-google-cloud-dataflow-java").dir = file("runners/google-cloud-dataflow-java") | ||
| // These 2 projects will not be published to Maven so they don't get a special | ||
| // dashed name. | ||
| include ":runners:google-cloud-dataflow-java:examples:preCommit" |
There was a problem hiding this comment.
I don't think you intended to add the ":preCommit" on the end.
|
The reason that doesn't work is that not all runners have distinct streaming and non-streaming modes. It wouldn't make sense to run the same test twice on e.g. the DirectRunner, Gearpump, Apex, Samza, Flink*. *FlinkRunner does have streaming/non-streaming modes but I don't think it needs them so they might go away and become automatic. |
|
PTAL, precommit tests should start soon |
pabloem
left a comment
There was a problem hiding this comment.
LGTM : )
I like the build scan
|
|
||
| if (isRelease(project) || project.hasProperty('publishing')) { | ||
| if ((isRelease(project) || project.hasProperty('publishing')) && | ||
| configuration.publish) { |
There was a problem hiding this comment.
Is this so we'll avoid publishing with languages other than Java?
There was a problem hiding this comment.
no, its just not needed for certain subprojects because they do not produce any artifacts.
examples/java/build.gradle
Outdated
| * Some runners are run from separate projects, see the preCommit task below | ||
| * for details. | ||
| */ | ||
| // apexRunner - https://issues.apache.org/jira/browse/BEAM-3583 |
There was a problem hiding this comment.
Maybe add a TODO? so it's well discoverable.
Splits out slow Dataflow precommit tasks into separate Gradle subprojects. Adds a 'publish' flag to JavaNatureConfiguration, that controls Maven publishing.
|
Ready to merge. |
|
Yea, nice. |
Splits out slow Dataflow precommit tasks into separate Gradle
subprojects.
Adds a 'publish' flag to JavaNatureConfiguration, that controls Maven
publishing.
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.It will help us expedite review of your Pull Request if you tag someone (e.g.
@username) to look at it.