Skip to content

Conversation

@robertwb
Copy link
Contributor

Due to the not perfectly determanistic nature of state sampling and sleep,
this test has a 1% failure rate. Adding a single retry drops that to
1 in 10,000.


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
Build Status --- --- ---

@robertwb
Copy link
Contributor Author

R: @tvalentyn

DEFAULT_SAMPLING_PERIOD_MS = 0


def retry(attempts):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving this to third_party/py/apache_beam/testing/util.py. If you do so:

  • Consider adding a docstring with sample usage and clarify that input param is total number of attempts, not number of retries. Perhaps a different name (total_attempts?) would help disambiguate.
  • I would add a comment to use this very sparingly, since we may otherwise start sweeping flakes under the rug, which may be legitimate bugs that happen due to rare race conditions.
  • Consider adding a test for the decorator to util_test.py.

If you think that in general this is a good idea but don't have time to do it, I can do that in a follow-up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there are a few libraries that offer retry-decorators for the same purpose, we can consider adopting one of them, see: https://lists.apache.org/thread.html/16060fd7f4d408857a5e4a2598cc96ebac0f744b65bf4699001350af@%3Cdev.beam.apache.org%3E

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about test_util, but that's primarily user-facing. Let's see where the discussion goes on tenacity.

pcoll_b = p | 'b' >> beam.Create(['b'])
assert_that((pcoll_a, pcoll_b) | First(), equal_to(['a']))

@retry(2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment from PR description here to explain why we had to add a retry.

pcoll_b = p | 'b' >> beam.Create(['b'])
assert_that((pcoll_a, pcoll_b) | First(), equal_to(['a']))

@retry(2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to decorate test_progress_metrics method instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. yes.

def apply(f):
@functools.wraps(f)
def wrapper(*args):
for _ in range(attempts - 1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to return the value of decorated function or pass kwargs? I guess neither is relevant for test methods.

@robertwb
Copy link
Contributor Author

PTAL. This should look very similar to #7492 now :).

@robertwb robertwb changed the title Add a retry to the metrics test. Deflake progress metrics test with retry. Jan 14, 2019
Copy link
Contributor

@tvalentyn tvalentyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I assume a failing PreCommit test test_error_traceback_includes_user_code is preexisting failure, but we should fix that before merge to make sure it's not a surprising side effect...

@tvalentyn
Copy link
Contributor

retest this please

all_metrics_via_montoring_infos, namespace, name, step='MyStep')

# Due to somewhat non-deterministic nature of state sampling and sleep,
# this test is flaky when state duraiton is low.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there was a typo in my PR, should be duration in line 572.

Due to the not perfectly determanistic nature of state sampling and sleep,
this test has a 1% failure rate. Adding a retry rate of 3 drops that to
one in a million.
@robertwb robertwb merged commit e1486fc into apache:master Jan 18, 2019
@angoenka
Copy link
Contributor

Tenacity is breaking Flink python tests as https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/console
Shall we move to internal retry logic from apache_beam.utils?

@tvalentyn
Copy link
Contributor

@angoenka we should fix Flink test suite, see: https://issues.apache.org/jira/browse/BEAM-6469.

@tvalentyn
Copy link
Contributor

We may consider replacing apache_beam.utils.retry with tenacity at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants