New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-4059] Reduce number of ValidatesRunner tests and reorganize them for better parallelization #5193
Conversation
Run Dataflow ValidatesRunner |
4b2facc
to
7d9cea8
Compare
Run Dataflow ValidatesRunner |
R: @kennknowles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extraordinarily helpful. Great. I had a couple things - PAssert and Flatten - that really need to be VR in my opinion. Then Combine is a gray area worth a follow-up discussion or factoring, and Create is similar but less important. And those TestPipeline test methods just need fixing.
@@ -75,6 +76,8 @@ dependencies { | |||
needsRunner project(path: ":beam-sdks-java-core", configuration: "shadowTest") | |||
needsRunner project(path: project.path, configuration: "shadow") | |||
needsRunner project(path: project.path, configuration: "shadowTest") | |||
validatesRunner project(path: ":beam-sdks-java-core", configuration: "shadowTest") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will duplicate them, I think. Did you confirm? Because ValidatesRunner extends NeedsRunner, all the VR tests should be pulled in by the other execution. I think your version is cleaner, so I would agree to break this relationship. We could do NR minus VR for the needsRunner gradle suite, or just solve it with documentation/rename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment; the intention is to have a DirectRunner version of the @ValidatesRunner
suite so that I can run the same set of tests but in a faster in-memory runner. It's not being used for Jenkins.
@@ -672,13 +672,13 @@ public void populateDisplayData(DisplayData.Builder builder) { | |||
} | |||
|
|||
@Test | |||
@Category({ValidatesRunner.class, UsesTestStream.class}) | |||
@Category({NeedsRunner.class, UsesTestStream.class}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this and all tests that are not one of the core primitives. It is true that complex transforms sometimes exercise primitives in exciting ways, but we can and should run integration tests separately from VR which are basically compliance unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few that seemed like they may be exercising per-runner functionality. I agree they probably shouldn't be necessary, but I'm not ready to go through and validate that we have sufficient integration test coverage before removing.
@@ -173,7 +173,7 @@ public void testSuccessEncodedDecoded() throws IOException { | |||
* serializable. | |||
*/ | |||
@Test | |||
@Category(ValidatesRunner.class) | |||
@Category(NeedsRunner.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think that PAssertTest might be an exception - even though it is a composite transform, it is important to run alongside the VR runner suite to make sure the VR suite is meaningful. At least the particular pipelines that make sure suites are vacuous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted.
@@ -208,21 +208,21 @@ public String apply(final String input) { | |||
@Rule | |||
public final transient RuleChain chain = RuleChain.outerRule(exception).around(pipeline); | |||
|
|||
@Category(ValidatesRunner.class) | |||
@Category(NeedsRunner.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is also iffy, maybe? But really these tests are also violating the core tenet of tests which is straight-line readability. So I can't actually tell at a glance what kind of test they really are. So go ahead and leave this as you have it, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
@@ -184,13 +184,13 @@ public void testSimpleCombineWithContext() { | |||
} | |||
|
|||
@Test | |||
@Category(ValidatesRunner.class) | |||
@Category(NeedsRunner.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Combine there's an argument to bring some of them back since this is almost always implemented as a primitive, even though it is not one, technically. I would favor a JIRA TODO for factoring Combine into two suites - one for the composite's various configs but one that is focused on testing a runner's basic primitive implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've reverted these changes and instead subdivided the tests into subclasses.
@@ -224,7 +224,7 @@ public UnserializableRecord decode( | |||
} | |||
|
|||
@Test | |||
@Category(ValidatesRunner.class) | |||
@Category(NeedsRunner.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many runners are currently overriding Create as a primitive? If they aren't, then done! If they are, then (and this is a new thought that I thought of after my prior comments) another place for the tests to move is into ITs for just that runner, via some organization of modules. This is after-this-PR suggestion. I'm OK with this as-is unless runners are all turning it back into a primitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, I'll leave as-is.
@@ -138,7 +138,7 @@ public void testFlattenPCollectionsEmpty() { | |||
} | |||
|
|||
@Test | |||
@Category(ValidatesRunner.class) | |||
@Category(NeedsRunner.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flatten.pCollections()
is a primitive that should be ValidatesRunner.
Flatten.iterables()
is unrelated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, reverted.
7d9cea8
to
c91697e
Compare
Thanks for the quick review! All feedback is addressed; PTAL @kennknowles |
run java precommit |
It seems the |
c32d444
to
dfbdc02
Compare
dfbdc02
to
5242334
Compare
Run Dataflow ValidatesRunner |
Java precommit now fails on only a single test case, which is unrelated to this change. |
Run Dataflow ValidatesRunner |
@kennknowles please take a look. Still waiting on Dataflow ValidatesRunner results, but if the current state is a clear improvement we should merge it since this test suite is currently executing at around 5 hours. |
82b8456
to
2157a66
Compare
FYI, current Dataflow ValidatesRunner failure is due to the fact that current Gradle is set to max 2 workers. This is being addressed in #5218. |
Run Flink ValidatesRunner |
I think someone told me that the failure which you encountered is on |
This is useful for local validation. It is not yet run via Jenkins.
This is an effort to reduce the overhead of our ValidatesRunner test suite. By manual analysis of the @ValidatesRunner test, I've identified a number of tests which don't seem to validate runner-specific functionality and are better suited as @NeedsRunner tests. As a result, these tests will only run on a single runner (DirectRunner), which executes efficiently in-memory.
2157a66
to
4167138
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@swegner thanks for that !
I only quickly looked at the ValidatesRunner tests that I wrote (you modified none) and the ones that impact my ongoing work (metrics).
} | ||
|
||
@Test | ||
@Category({NeedsRunner.class, UsesAttemptedMetrics.class, UsesCounterMetrics.class}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one is a ValidatesRunner
because PipelineResults.metrics()
actually uses MetricsContainerStepMap
which is a runner-core componant and which is set by the different runners with their own aggregators to support metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For MetricsTest
, I tried to reduce what I felt was redundant validation of the runner-provided pieces. I kept a few @ValidatesRunner
tests exercise different parts of the MetricsContainerStepMap
; in particular:
testAllAttemptedMetrics()
testAllCommittedMetrics()
testAttempted[Counter|Distribution|Gauge]Metrics()
testCommitted[Counter|Distribution|Gauge]Metrics()
This test, testBoundedSourceMetrics()
is validating that metrics from BoundedSources
get added into the container. I don't believe this exercises any new runner-behavior not already covered by the other tests. Which is why I converted it.
Let me know if you disagree. If there is runner-behavior that needs to be validated we shouldn't be shy about keeping it as @ValidatesRunner
. I think it's better to have long test runs than gaps in our validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit late for this comment but +1:
indeed the tests you pointed actually test the containers and you left them annotated with ValidatesRunner, so it is ok. You're right to say that the other tests that use PipelineResults.metrics()
test other parts that the metrics aggregation with the runners.
Also I agree to keep validatesRunner tests as short as possible and remove any redundancy; only one validatesRunner test is enough to test the metrics aggregation.
|
||
MetricQueryResults metrics = | ||
pipelineResult | ||
.metrics() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above. Actually, I think this is true for all the tests using PepelineResults.metrics()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above. If the tests are exercising different behaviors of PipelineResults.metrics()
that are runner-provided functionality, we should keep them all as @ValidatesRunner
. Otherwise, having at least one as @ValidatesRunner
is sufficient and the rest can be @NeedsRunner
Ping @echauchot; take a look at my comments above and let me know what you think. |
Ping @echauchot @kennknowles I believe all feedback has been addressed. Can you let me know if this is ready to merge? |
Well, it seems to have sat for 4 days. So I think we could move forward and then could change those two test cases back if decided. |
Dataflow ValidatesRunner test suite has gotten obscenely slow due to the number of ValidatesRunner tests. There are currently over 250, and each one takes at least 3 minutes to run on Dataflow. Many of them don't actually need to be run on every worker, and this test converts them to
@NeedsRunner
tests instead.Gradle also parallelizes tests differently, parallelizing at the test class level rather than per test case. As a result, the largest test class will be a bottleneck for the overall execution. This PR splits up large
@ValidatesRunner
test classes into scenario-based subclasses.Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue../gradlew build
to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.