-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-12603] Add retry on grpc data channel and remove retry from test. #17537
Conversation
I was able to get 100 runs successfully with this patch, do you mind take a look and help me validate and decide if this is an acceptable fix? |
Codecov Report
@@ Coverage Diff @@
## master #17537 +/- ##
==========================================
+ Coverage 73.85% 73.89% +0.03%
==========================================
Files 691 691
Lines 91255 91547 +292
==========================================
+ Hits 67396 67645 +249
- Misses 22626 22669 +43
Partials 1233 1233
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think this workaround is preferable to retrying the entire test. A couple of questions/concerns:
- Are we sure it's safe to retry these UNAVAILABLE responses? The [gRPC docs note "it is not always safe to retry non-idempotent operations."
- I think this is a better workaround, but it's still a little concerning that we don't understand the root cause - any thoughts on how we could dig deeper?
I believe UNAVAILABLE in this cases means that the underneath tcp connection was broken (not sure why and unclear how to debug that) before the request is handled so that means it should be retriable(had more than 250 runs and didn't see any side effect like getting wrong results after adding retry). I enabled the GRPC debug log but didn't find anything interesting also. |
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.