Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: avoid stuck worker thread infinitely when there are PG stuck at peering #5501

Closed
wants to merge 5 commits into from
Closed

Conversation

guangyy
Copy link
Contributor

@guangyy guangyy commented Aug 6, 2015

At OSD side - if the op has a flag that it does not want to wait, and the PG it is hitting is at peering, reply back immediately with -EAGAIN, rather than queue it into waiting list

At librados side - Extend the API to support a timeout passed from caller, once it is set, it will switch to 'do not wait' mode and retry (polling) at client side until the timeout reaches.

At radosgw - leverage the new API and add a configuration to turn the flag to do timeout when communicating with OSD.

@guangyy
Copy link
Contributor Author

guangyy commented Aug 6, 2015

TODO:

  1. Testing - this is completely not tested yet.
  2. At librados, instead of busy polling, we properly want some back-off wait.

@guangyy
Copy link
Contributor Author

guangyy commented Aug 6, 2015

@liewegas , @yehudasa , @athanatos , would you please take a look if the general flow makes sense or not?

@liewegas
Copy link
Member

liewegas commented Aug 6, 2015

@yehudasa is the rgw piece that simple?

@liewegas
Copy link
Member

liewegas commented Aug 6, 2015

The only place I see that EAGAIN is currently returned by the OSD is when you are sending to replicas and they tell you to go back to the primary. We probably do want to use a different error code. EHOSTDOWN? Or we could define our own...

@yehudasa
Copy link
Member

yehudasa commented Aug 7, 2015

@liewegas @guangyy not sure if we want the timeout to apply to all operations or just very specific operations. If we want it to apply to all rados operations then maybe there are a few more places that need to be changed?

@guangyy guangyy changed the title [DNM] rgw: avoid stuck worker thread infinitely when there are PG stuck at peering rgw: avoid stuck worker thread infinitely when there are PG stuck at peering Aug 10, 2015
@guangyy
Copy link
Contributor Author

guangyy commented Aug 10, 2015

Hi @liewegas ,
Could you take a look at this one? Thanks.

@guangyy
Copy link
Contributor Author

guangyy commented Aug 13, 2015

Fixed according to @liewegas 's comments, add two commits:

  1. Expose the option for rados tool
  2. Add a test to cover the newly added option

@ghost
Copy link

ghost commented Sep 1, 2015

@guangyy this needs rebasing

Guang Yang added 5 commits September 4, 2015 17:06
…wants to wait upon unavailibity or not

Fixes: #12623
Signed-off-by: Guang Yang <yguang@yahoo-inc.com>
Fixes: #12623
Signed-off-by: Guang Yang <yguang@yahoo-inc.com>

Author:    Guang Yang <yguang@yahoo-inc.com>
…hread infinitely by stuck op

Fixes: #12623
Signed-off-by: Guang Yang <yguang@yahoo-inc.com>
Signed-off-by: Guang Yang <yguang@yahoo-inc.com>
Signed-off-by: Guang Yang <yguang@yahoo-inc.com>
@guangyy
Copy link
Contributor Author

guangyy commented Sep 4, 2015

@dachary , sorry for the delayed response. Rebased.

@ghost
Copy link

ghost commented Sep 4, 2015

@guangyy thanks :-)

@ghost ghost added the rgw label Sep 6, 2015
@athanatos
Copy link
Contributor

I'm a bit worried about this, doesn't it break write ordering on a single object? You wouldn't be able to pipeline ops on a single object while using this feature. That seems awfully brittle.

@liewegas
Copy link
Member

liewegas commented Jan 7, 2016

Yeah, this would reorder operations. Unless it was only used on reads? (It's ok to reorder those, as we do with RWORDERED vs !RWORDERED)

@liewegas
Copy link
Member

liewegas commented May 3, 2016

Closing this pull request since it appears to be abandoned. If you it is not abandoned, and you are still interested in championing this change, please address any review comments so far and reopen the pull request.

@liewegas liewegas closed this May 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants