[BEAM-1582, BEAM-1562] Stop streaming tests on EOT Watermark. #2168

amitsela · 2017-03-06T12:55:51Z

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

Make sure the PR title is formatted like:
[BEAM-<Jira issue #>] Description of pull request
Make sure tests pass via mvn clean verify. (Even better, enable
Travis-CI on your fork and ensure the whole test matrix passes).
Replace <Jira issue #> in the title with the actual Jira issue
number, if there is one.
If this contribution is large, please file an Apache
Individual Contributor License Agreement.

coveralls · 2017-03-06T13:39:54Z

Coverage remained the same at 70.098% when pulling 3bc74ab on amitsela:stop-streaming-tests into 34b38ef on apache:master.

asfbot · 2017-03-06T13:42:52Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/8141/

Failed Tests: 2

beam_PreCommit_Java_MavenInstall/org.apache.beam:beam-runners-spark: 2

--none--

amitsela · 2017-03-06T13:44:09Z

Pushed an updated that should avoid multi-context issues.

amitsela · 2017-03-06T14:06:36Z

Run Spark RunnableOnService

coveralls · 2017-03-06T14:24:19Z

Coverage decreased (-0.003%) to 70.095% when pulling 3bc74ab on amitsela:stop-streaming-tests into 34b38ef on apache:master.

asfbot · 2017-03-06T14:31:13Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/8143/
--none--

amitsela · 2017-03-06T14:31:52Z

Run Spark RunnableOnService

asfbot · 2017-03-06T15:04:20Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Spark/1149/
--none--

amitsela · 2017-03-06T15:06:38Z

retest this please

coveralls · 2017-03-06T15:53:32Z

Coverage decreased (-0.003%) to 70.095% when pulling 3bc74ab on amitsela:stop-streaming-tests into 34b38ef on apache:master.

asfbot · 2017-03-06T15:56:55Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/8146/
--none--

amitsela · 2017-03-06T16:49:59Z

Run Spark RunnableOnService

asfbot · 2017-03-06T17:20:29Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Spark/1150/
--none--

amitsela · 2017-03-06T17:21:03Z

retest this please

coveralls · 2017-03-06T18:36:42Z

Coverage decreased (-0.05%) to 70.053% when pulling 3bc74ab on amitsela:stop-streaming-tests into 34b38ef on apache:master.

asfbot · 2017-03-06T18:39:15Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/8148/
--none--

amitsela · 2017-03-06T19:19:59Z

Run Spark RunnableOnService

amitsela · 2017-03-06T20:04:22Z

retest this please

asfbot · 2017-03-06T20:21:08Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Spark/1154/
--none--

amitsela · 2017-03-06T20:55:53Z

Okay, ran 3xROS and 3xMaven_Install - both run all streaming tests, so total of 6 runs. All green. Do I believe it's totally stable now ? hopeful.

R: @staslev note the changes I made to SparkPipelineResult and breaking the PipelineResult API, which is up for redesign anyway in BEAM-849 and doesn't seem to make complete sense in it's current state (the way I'm using it in TestSparkRunner is a good example of how I thought it could be used).
CC: @kennknowles following streaming test issues, this should help stabilize them.
CC: @jkff we chatted over a JIRA about this, so this is using watermarks to force termination of streaming pipelines (or pipelines reading from unbounded sources 😉 ).

coveralls · 2017-03-06T21:01:27Z

Coverage decreased (-0.05%) to 70.049% when pulling 3bc74ab on amitsela:stop-streaming-tests into 34b38ef on apache:master.

asfbot · 2017-03-06T21:04:10Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/8152/
--none--

kennknowles

A couple of comments on points of customization that I don't really understand.

Since it is in TestSparkOptions I don't care too much, and deflaking definitely is worth the tradeoff in the near term.

So LGTM to deflake but I am still interested in the answers to my questions, and I suspect the options should either be more heavily documented or JIRAs filed to remove.

kennknowles · 2017-03-07T03:18:45Z

runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkPipelineOptions.java

@@ -32,4 +36,22 @@
  boolean isForceStreaming();
  void setForceStreaming(boolean forceStreaming);

+  @Description("A hard-coded expected number of assertions for this test pipeline.")
+  @Nullable
+  Integer getExpectedAssertions();


Since we have PAssert.countAssertions(Pipeline) what is this for?

Some of the tests for the Spark runner test a recovery/resume from checkpoint, so while "countAssertions" might expect 1 assertion, it is fair that this assertion would only happen after recovery, so that first execution of the pipeline has 0 assertions, and the following that resumes from checkpoint has the actual expected, 1 assertion. A manual override seemed to fit here.
Usually I'd expect runners not to test the underlying engine's features (such as recovery from checkpoint), but since Spark doesn't provide things like recovering metric values and the runner implements those we have to test them.

kennknowles · 2017-03-07T03:19:24Z

runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkPipelineOptions.java

+  Integer getExpectedAssertions();
+  void setExpectedAssertions(Integer expectedAssertions);
+
+  @Description("A customizable EOT watermark in Millis.")


What is this for? If it stays, I'd spell out "end of time" in the docstring.

Again, relating to my previous comment, I'd like to be able to terminate the pipeline on "end-of-time" watermark. But for the first execution I'd like to do it "mid-life" to simulate a failure/stop. so the watermark is not at infinity yet.

I really don't know why I decided to save words in documentation 🙃, I'll fix that.

amitsela · 2017-03-07T09:17:28Z

@kennknowles your comments all relate to a test that tests resuming from checkpoint - ResumeFromCheckpointStreamingTest.
This test reads a bunch of elements from Kafka, keeps the state in state internals implementation (checkpointed), and recovers to read more and fire in the second execution (with watermark hitting infinity).
To orchestrate this, while being able to terminate every run once its "complete", and eventually assert across multiple executions, I had to put in hooks to manually override the watermark that terminates pipeline execution and the expected number of assertions.

asfbot · 2017-03-07T10:19:53Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/8166/

Build result: FAILURE

[...truncated 953.97 KB...] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)Caused by: org.apache.maven.plugin.compiler.CompilationFailureException: Compilation failure/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_MavenInstall/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/ResumeFromCheckpointStreamingTest.java:[246,14] cannot find symbol symbol: method setEndOfTimeWatermark(long) location: variable options of type org.apache.beam.runners.spark.TestSparkPipelineOptions at org.apache.maven.plugin.compiler.AbstractCompilerMojo.execute(AbstractCompilerMojo.java:1029) at org.apache.maven.plugin.compiler.TestCompilerMojo.execute(TestCompilerMojo.java:170) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) ... 31 more2017-03-07T10:14:31.734 [ERROR] 2017-03-07T10:14:31.734 [ERROR] Re-run Maven using the -X switch to enable full debug logging.2017-03-07T10:14:31.734 [ERROR] 2017-03-07T10:14:31.734 [ERROR] For more information about the errors and possible solutions, please read the following articles:2017-03-07T10:14:31.735 [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException2017-03-07T10:14:31.735 [ERROR] 2017-03-07T10:14:31.735 [ERROR] After correcting the problems, you can resume the build with the command2017-03-07T10:14:31.735 [ERROR] mvn -rf :beam-runners-sparkchannel stoppedSetting status of b39783e to FAILURE with url https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/8166/ and message: 'Build finished. 'Using context: Jenkins: Maven clean install
--none--

asfbot · 2017-03-07T12:17:47Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/8168/

Build result: FAILURE

[...truncated 953.19 KB...] at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:336) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)Caused by: org.apache.maven.plugin.MojoFailureException: You have 1 Checkstyle violation. at org.apache.maven.plugin.checkstyle.CheckstyleViolationCheckMojo.execute(CheckstyleViolationCheckMojo.java:588) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) ... 31 more2017-03-07T12:13:48.085 [ERROR] 2017-03-07T12:13:48.085 [ERROR] Re-run Maven using the -X switch to enable full debug logging.2017-03-07T12:13:48.085 [ERROR] 2017-03-07T12:13:48.085 [ERROR] For more information about the errors and possible solutions, please read the following articles:2017-03-07T12:13:48.085 [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException2017-03-07T12:13:48.086 [ERROR] 2017-03-07T12:13:48.086 [ERROR] After correcting the problems, you can resume the build with the command2017-03-07T12:13:48.086 [ERROR] mvn -rf :beam-runners-sparkchannel stoppedSetting status of c3bc49a to FAILURE with url https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/8168/ and message: 'Build finished. 'Using context: Jenkins: Maven clean install
--none--

coveralls · 2017-03-07T13:30:09Z

Coverage decreased (-0.003%) to 70.051% when pulling b079349 on amitsela:stop-streaming-tests into 1fd52f5 on apache:master.

asfbot · 2017-03-07T13:35:41Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/8169/
--none--

Remove timeout since it is already a pipeline option.

…s better.

…n EOT watermark.

kennknowles · 2017-03-08T02:46:08Z

Just noting this still LGTM, and thanks for indulging my curiosity. Not sure what you want to do as far as squashing (look like there's at least one fixup commit) so I'm not merging - go ahead when ready.

amitsela · 2017-03-08T06:32:16Z

Waiting for @staslev to review as well regarding SparkPipelineResult. I'll merge once he LGTM's

staslev · 2017-03-09T11:00:45Z

runners/spark/src/main/java/org/apache/beam/runners/spark/stateful/SparkTimerInternals.java

@@ -92,6 +93,13 @@ public static SparkTimerInternals forStreamFromSources(
        slowestLowWatermark, slowestHighWatermark, synchronizedProcessingTime);
  }

+  /** Build a global {@link TimerInternals} for all feeding streams.*/
+  public static SparkTimerInternals forStreamFromSources(


forStreamFromSources's parameters are not sources per-se, perhaps it could be renamed to reflect it's params better.

Changed this specific method to global, since it really provides a "global" SparkTimerInterals.

staslev · 2017-03-09T11:11:04Z

runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java

+        } while ((timeoutMillis -= batchDurationMillis) > 0
+            && globalWatermark.isBefore(stopPipelineWatermark));
+
+        result.stop();


L127 - L145 could be encapsulated in a neat method so it looks something like:

result = delegate.run(pipeline); awaitWatermark(testSparkPipelineOptions, result); result.stop();

staslev · 2017-03-09T11:27:36Z

runners/spark/src/test/java/org/apache/beam/runners/spark/PipelineRule.java

-
-  private PipelineRule(Duration forcedTimeout) {
-    this.delegate = new SparkStreamingPipelineRule(forcedTimeout, testName);
+  private PipelineRule(SparkPipelineRule delegate) {


Aren't PipelineRule.SparkStreamingPipelineRule#after(..) and TestSparkRunner#run() both try to delete the checkpoint? Do we need both?

Well, that's me being "over protective" 😄
PipelineRule might not work for all test scenarios, so just in case. Better to remove an empty directory (or none) rather then risking junk in Jenkins.
Would you agree ?

staslev · 2017-03-09T11:39:18Z

...a/org/apache/beam/runners/spark/translation/streaming/ResumeFromCheckpointStreamingTest.java

@@ -102,11 +101,9 @@
  private static final String TOPIC = "kafka_beam_test_topic";

  @Rule
-  public TemporaryFolder tmpFolder = new TemporaryFolder();
+  public final transient ReuseSparkContextRule noContextResue = ReuseSparkContextRule.no();


Typo noContextRe*sue* => noContextReuse

I wonder if ReuseSparkContextRule can be merged into PipelineRule to simplify things and have a single rule that can be used in the context of Spark pipeline tests.

At the moment we have cases with up to 3 different rules in a single test (e.g. CreateStreamTest).

This could be a separate ticket: "enhance PipelineRule".
Should be doable since it uses RuleChain
Could you open a ticket ?

staslev · 2017-03-09T11:46:21Z

runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java

@@ -149,6 +150,9 @@ public MetricResults metrics() {
    @Override
    protected void stop() {
      SparkContextFactory.stopSparkContext(javaSparkContext);
+      if (Objects.equals(state, State.RUNNING)) {


state == State.RUNNING ?
http://stackoverflow.com/questions/34486832/objects-equals-and-object-equals

I'll leave it that way until the PipelineResult API officially stops using null as a valid State

staslev · 2017-03-09T11:46:34Z

runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java

+        throw beamExceptionFrom(e);
+      } finally {
+        SparkContextFactory.stopSparkContext(javaSparkContext);
+        if (Objects.equals(state, State.RUNNING)) {


state == State.RUNNING ?

staslev · 2017-03-09T12:01:03Z

runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java

+           this.state = State.DONE;
+           break;
+         default:
+           this.state = null;


I think we can avoid mutating state here, and just return the appropriate State back to the caller.
Since SparkPipelineResult#waitUntilFinish() stores the state returned by awaitTermination(...) anyway, the bottom line should be the same with the advantage we'll only have one place where state is set.

Also perhaps if the state is neither DONE nor RUNNING, then UNKNOWN should be returned so that users are not taken by surprise getting a null here.

amitsela · 2017-03-09T14:46:24Z

@staslev I addressed some of your comments, and responded on others where I thought it's a no-op.
Once this LGTY I will squash an merge, thanks!

staslev · 2017-03-09T15:32:56Z

LGTM, pending the tests to confirm.

coveralls · 2017-03-09T15:49:48Z

Coverage increased (+0.007%) to 70.17% when pulling 157ab10 on amitsela:stop-streaming-tests into 7954896 on apache:master.

asfbot · 2017-03-09T15:54:57Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/8264/
--none--

amitsela · 2017-03-09T15:56:45Z

Merging, thanks!

amitsela force-pushed the stop-streaming-tests branch from 4d1222f to 3bc74ab Compare March 6, 2017 13:31

kennknowles reviewed Mar 7, 2017

View reviewed changes

amitsela force-pushed the stop-streaming-tests branch from b39783e to c3bc49a Compare March 7, 2017 11:22

amitsela force-pushed the stop-streaming-tests branch from c3bc49a to b079349 Compare March 7, 2017 12:27

Sela added 7 commits March 7, 2017 16:32

Test runner to stop on EOT watermark, or timeout.

2e5bc9c

Remove timeout since it is already a pipeline option.

Advance to infinity at the end of pipelines.

6988d21

Add EOT watermark and expected assertions test options.

130f2fc

SparkPipelineResult should avoid returning null, and handle exception…

22bfcdf

…s better.

Make ResumeFromCheckpointStreamingTest use TestSparkRunner and stop o…

11ed06b

…n EOT watermark.

Stop the context and update the state in finally.

1cd899e

Addressed comments - better name for a watermark that stops execution.

a67eabe

staslev requested changes Mar 9, 2017

View reviewed changes

amitsela added 2 commits March 9, 2017 16:35

fixup! addressed comments

8a348c0

fixup! typo

157ab10

amitsela force-pushed the stop-streaming-tests branch from b079349 to 157ab10 Compare March 9, 2017 14:44

asfgit closed this in efc701e Mar 9, 2017

amitsela deleted the stop-streaming-tests branch March 9, 2017 17:13

[BEAM-1582, BEAM-1562] Stop streaming tests on EOT Watermark. #2168

[BEAM-1582, BEAM-1562] Stop streaming tests on EOT Watermark. #2168

Conversation

amitsela commented Mar 6, 2017 • edited

coveralls commented Mar 6, 2017

asfbot commented Mar 6, 2017

Failed Tests: 2

beam_PreCommit_Java_MavenInstall/org.apache.beam:beam-runners-spark: 2

amitsela commented Mar 6, 2017

amitsela commented Mar 6, 2017 • edited

coveralls commented Mar 6, 2017

asfbot commented Mar 6, 2017

amitsela commented Mar 6, 2017

asfbot commented Mar 6, 2017

amitsela commented Mar 6, 2017

coveralls commented Mar 6, 2017

asfbot commented Mar 6, 2017

amitsela commented Mar 6, 2017

asfbot commented Mar 6, 2017

amitsela commented Mar 6, 2017

coveralls commented Mar 6, 2017

asfbot commented Mar 6, 2017

amitsela commented Mar 6, 2017

amitsela commented Mar 6, 2017

asfbot commented Mar 6, 2017

amitsela commented Mar 6, 2017 • edited

coveralls commented Mar 6, 2017

asfbot commented Mar 6, 2017

kennknowles left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amitsela Mar 7, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amitsela commented Mar 7, 2017 • edited

asfbot commented Mar 7, 2017

Build result: FAILURE

asfbot commented Mar 7, 2017

Build result: FAILURE

coveralls commented Mar 7, 2017

asfbot commented Mar 7, 2017

kennknowles commented Mar 8, 2017

amitsela commented Mar 8, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

staslev Mar 9, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

staslev Mar 9, 2017 • edited

Choose a reason for hiding this comment

amitsela Mar 9, 2017 • edited

Choose a reason for hiding this comment

staslev Mar 9, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

staslev Mar 9, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amitsela commented Mar 9, 2017

staslev commented Mar 9, 2017

coveralls commented Mar 9, 2017

asfbot commented Mar 9, 2017

amitsela commented Mar 9, 2017

amitsela commented Mar 6, 2017 •

edited

amitsela commented Mar 6, 2017 •

edited

amitsela commented Mar 6, 2017 •

edited

amitsela Mar 7, 2017 •

edited

amitsela commented Mar 7, 2017 •

edited

staslev Mar 9, 2017 •

edited

staslev Mar 9, 2017 •

edited

amitsela Mar 9, 2017 •

edited

staslev Mar 9, 2017 •

edited

staslev Mar 9, 2017 •

edited