Deflake progress metrics test with retry. #7455

robertwb · 2019-01-10T09:48:24Z

Due to the not perfectly determanistic nature of state sampling and sleep,
this test has a 1% failure rate. Adding a single retry drops that to
1 in 10,000.

Follow this checklist to help us incorporate your contribution quickly and easily:

Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

Post-Commit Tests Status (on master branch)

Lang	Apex	Dataflow	Flink	Gearpump	Samza	Spark
Go	---	---	---	---	---	---
Java
Python	---			---	---	---

robertwb · 2019-01-10T09:48:34Z

R: @tvalentyn

tvalentyn · 2019-01-10T23:53:05Z

sdks/python/apache_beam/runners/portability/fn_api_runner_test.py

  DEFAULT_SAMPLING_PERIOD_MS = 0


+def retry(attempts):


Consider moving this to third_party/py/apache_beam/testing/util.py. If you do so:

Consider adding a docstring with sample usage and clarify that input param is total number of attempts, not number of retries. Perhaps a different name (total_attempts?) would help disambiguate.

I would add a comment to use this very sparingly, since we may otherwise start sweeping flakes under the rug, which may be legitimate bugs that happen due to rare race conditions.

Consider adding a test for the decorator to util_test.py.

If you think that in general this is a good idea but don't have time to do it, I can do that in a follow-up PR.

Looks like there are a few libraries that offer retry-decorators for the same purpose, we can consider adopting one of them, see: https://lists.apache.org/thread.html/16060fd7f4d408857a5e4a2598cc96ebac0f744b65bf4699001350af@%3Cdev.beam.apache.org%3E

I thought about test_util, but that's primarily user-facing. Let's see where the discussion goes on tenacity.

tvalentyn · 2019-01-11T00:03:02Z

sdks/python/apache_beam/runners/portability/fn_api_runner_test.py

      pcoll_b = p | 'b' >> beam.Create(['b'])
      assert_that((pcoll_a, pcoll_b) | First(), equal_to(['a']))

+  @retry(2)


Let's add a comment from PR description here to explain why we had to add a retry.

tvalentyn · 2019-01-11T00:04:03Z

sdks/python/apache_beam/runners/portability/fn_api_runner_test.py

      pcoll_b = p | 'b' >> beam.Create(['b'])
      assert_that((pcoll_a, pcoll_b) | First(), equal_to(['a']))

+  @retry(2)


I think you meant to decorate test_progress_metrics method instead.

tvalentyn · 2019-01-11T00:07:18Z

sdks/python/apache_beam/runners/portability/fn_api_runner_test.py

+  def apply(f):
+    @functools.wraps(f)
+    def wrapper(*args):
+      for _ in range(attempts - 1):


Do we need to return the value of decorated function or pass kwargs? I guess neither is relevant for test methods.

robertwb · 2019-01-14T09:08:00Z

PTAL. This should look very similar to #7492 now :).

tvalentyn

LGTM. I assume a failing PreCommit test test_error_traceback_includes_user_code is preexisting failure, but we should fix that before merge to make sure it's not a surprising side effect...

tvalentyn · 2019-01-14T21:07:33Z

retest this please

tvalentyn · 2019-01-14T22:34:08Z

sdks/python/apache_beam/runners/portability/fn_api_runner_test.py

          all_metrics_via_montoring_infos, namespace, name, step='MyStep')

+  # Due to somewhat non-deterministic nature of state sampling and sleep,
+  # this test is flaky when state duraiton is low.


nit: there was a typo in my PR, should be duration in line 572.

Due to the not perfectly determanistic nature of state sampling and sleep, this test has a 1% failure rate. Adding a retry rate of 3 drops that to one in a million.

angoenka · 2019-01-19T00:11:02Z

Tenacity is breaking Flink python tests as https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/console
Shall we move to internal retry logic from apache_beam.utils?

tvalentyn · 2019-01-19T00:29:39Z

@angoenka we should fix Flink test suite, see: https://issues.apache.org/jira/browse/BEAM-6469.

tvalentyn · 2019-01-19T00:31:04Z

We may consider replacing apache_beam.utils.retry with tenacity at some point.

robertwb force-pushed the metrics-retry branch from 73ea215 to f55f46e Compare January 10, 2019 10:56

tvalentyn requested changes Jan 11, 2019

View reviewed changes

robertwb force-pushed the metrics-retry branch from f55f46e to 07ef317 Compare January 14, 2019 09:07

robertwb changed the title ~~Add a retry to the metrics test.~~ Deflake progress metrics test with retry. Jan 14, 2019

tvalentyn approved these changes Jan 14, 2019

View reviewed changes

tvalentyn reviewed Jan 14, 2019

View reviewed changes

Deflake progress metrics test with a retry.

e0f64e9

Due to the not perfectly determanistic nature of state sampling and sleep, this test has a 1% failure rate. Adding a retry rate of 3 drops that to one in a million.

robertwb force-pushed the metrics-retry branch from 07ef317 to e0f64e9 Compare January 15, 2019 12:34

robertwb merged commit e1486fc into apache:master Jan 18, 2019

Deflake progress metrics test with retry. #7455

Deflake progress metrics test with retry. #7455

Uh oh!

Conversation

robertwb commented Jan 10, 2019

Post-Commit Tests Status (on master branch)

Uh oh!

robertwb commented Jan 10, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robertwb commented Jan 14, 2019

Uh oh!

tvalentyn left a comment

Choose a reason for hiding this comment

Uh oh!

tvalentyn commented Jan 14, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

angoenka commented Jan 19, 2019

Uh oh!

tvalentyn commented Jan 19, 2019

Uh oh!

tvalentyn commented Jan 19, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants