New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rgw: avoid stuck worker thread infinitely when there are PG stuck at peering #5501
Conversation
TODO:
|
@liewegas , @yehudasa , @athanatos , would you please take a look if the general flow makes sense or not? |
@yehudasa is the rgw piece that simple? |
The only place I see that EAGAIN is currently returned by the OSD is when you are sending to replicas and they tell you to go back to the primary. We probably do want to use a different error code. EHOSTDOWN? Or we could define our own... |
Hi @liewegas , |
Fixed according to @liewegas 's comments, add two commits:
|
@guangyy this needs rebasing |
…wants to wait upon unavailibity or not Fixes: #12623 Signed-off-by: Guang Yang <yguang@yahoo-inc.com>
Fixes: #12623 Signed-off-by: Guang Yang <yguang@yahoo-inc.com> Author: Guang Yang <yguang@yahoo-inc.com>
…hread infinitely by stuck op Fixes: #12623 Signed-off-by: Guang Yang <yguang@yahoo-inc.com>
Signed-off-by: Guang Yang <yguang@yahoo-inc.com>
Signed-off-by: Guang Yang <yguang@yahoo-inc.com>
@dachary , sorry for the delayed response. Rebased. |
@guangyy thanks :-) |
I'm a bit worried about this, doesn't it break write ordering on a single object? You wouldn't be able to pipeline ops on a single object while using this feature. That seems awfully brittle. |
Yeah, this would reorder operations. Unless it was only used on reads? (It's ok to reorder those, as we do with RWORDERED vs !RWORDERED) |
Closing this pull request since it appears to be abandoned. If you it is not abandoned, and you are still interested in championing this change, please address any review comments so far and reopen the pull request. |
At OSD side - if the op has a flag that it does not want to wait, and the PG it is hitting is at peering, reply back immediately with -EAGAIN, rather than queue it into waiting list
At librados side - Extend the API to support a timeout passed from caller, once it is set, it will switch to 'do not wait' mode and retry (polling) at client side until the timeout reaches.
At radosgw - leverage the new API and add a configuration to turn the flag to do timeout when communicating with OSD.