Skip to content

Conversation

@tvalentyn
Copy link
Contributor

@tvalentyn tvalentyn commented Jan 12, 2019

  1. Add a dependency on Tenacity, a general-purpose retry library: https://github.com/jd/tenacity.
  2. Make statesampler_test easier to configure and adjust it to reduce flakiness: make the fastest state last 1 second, or total test runtime to last 6 seconds, and retry once if the test fails. This should reduce flakiness below 1 in 100000. Alternative considered: keep current state duration, but retry 5 times. The alternative would be less sensitive to a potential change in statesampler that would result in real loss of precision in measurement.

Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
Build Status --- --- ---

@tvalentyn tvalentyn force-pushed the deflake_statesampler_test branch from dc0f0d4 to 0912d52 Compare January 12, 2019 01:44
…last 1 second, retry once on failure to reduce flakiness
@tvalentyn tvalentyn force-pushed the deflake_statesampler_test branch from 0912d52 to 7d15def Compare January 12, 2019 01:46
@tvalentyn
Copy link
Contributor Author

R: @aaltay
cc: @robertwb


# Due to somewhat non-deterministic nature of state sampling and sleep,
# this test is flaky when state duraiton is low.
# Since increasing state duration significantly would als slow down
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: als -> also

# this test is flaky when state duraiton is low.
# Since increasing state duration significantly would als slow down
# the test suite, we are retrying once on failure as a mitigation.
@retry(reraise=True, stop=stop_after_attempt(2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this try a total of 2 or 3 times? I recommend 3 if that is not the case already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives the test two attempts total to pass. Originally, I intentionally set a low amount of retries, in case there is some issue we may want to catch, related to original purpose of this test. Increasing to 3.

Copy link
Member

@aaltay aaltay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can merge after tests pass. Thank you!

@aaltay aaltay merged commit d25d800 into apache:master Jan 13, 2019
@tvalentyn tvalentyn deleted the deflake_statesampler_test branch October 23, 2019 00:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants