Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Conflict errors and allow wrapping get/replace with backoff #34

Closed
clux opened this issue May 31, 2019 · 9 comments · May be fixed by #314
Closed

Expose Conflict errors and allow wrapping get/replace with backoff #34

clux opened this issue May 31, 2019 · 9 comments · May be fixed by #314
Labels
api Api abstraction related client http issues with the client

Comments

@clux
Copy link
Member

clux commented May 31, 2019

client-go encourages RetryOnConflict which is a reasonable pattern - code's also not that advanced.

Rust has:

We could probably make wrappers around that..

AFAIKT it's just a wrapper around a deliberate error match case, where the backoff continues. That should be easy to emulate now that we've exposed api error helpers on our Error type. Backoff seems like a good fit atm.

@clux
Copy link
Member Author

clux commented Jul 28, 2020

https://docs.rs/backoff/0.2.1/backoff/future/fn.retry.html looks pretty nice now. Want to get this into one of the bigger examples and close this.

@clux
Copy link
Member Author

clux commented Jul 28, 2020

Actually, I guess the key point here is to figure out where to use this. If we just put retry inside the reconciliation loop it wouldn't matter much, because that kind of has backoff built in.

However, as the go side lets you wrap a RetryOnConflict to retry the get + replace until it works. That should be less of an issue with server side apply. Still, a util that wraps Api::replace with something like that might be useful.

@clux clux changed the title exponential backoff Expose Conflict errors and allow wrapping get/replace with backoff Jul 28, 2020
@clux clux added api Api abstraction related help wanted Not immediately prioritised, please help! labels Jul 28, 2020
@clux clux removed their assignment Jul 28, 2020
@nightkr
Copy link
Member

nightkr commented Jul 28, 2020

IMO the "obvious" place to put this would be to provide it as an error_policy helper. The current controller machinery isn't well-suited for long-running reconcilers, since that would block reconciling for all watched objects, not just that one. The main problem then would be to have a good way to associate the controller::Error back to the ObjectRef, but that would be nice to have regardless.

@nightkr
Copy link
Member

nightkr commented Jul 28, 2020

Maybe consider also adding backoff to watcher, actually..

@clux clux removed the help wanted Not immediately prioritised, please help! label Feb 28, 2021
@kazk
Copy link
Member

kazk commented Jun 7, 2021

Maybe try RetryLayer? It's per request and Policy looks flexible (return Option<Future<Policy>> based on request/response).

@clux
Copy link
Member Author

clux commented Jun 7, 2021

Oh, that's nice! We could even use the Policy to do error matching and globally set RetryOnConflict equivalents if we modified Api::replace methods to optionally do an Api::get right before to get the latest resource version.

@clux clux added the client http issues with the client label Jun 7, 2021
@nightkr
Copy link
Member

nightkr commented Jun 7, 2021

Maybe try RetryLayer? It's per request and Policy looks flexible (return Option<Future> based on request/response).

Isn't the layer stack decided when building the Client, IIRC?

This needs cooperation from the user code, since only the user knows what exact changes were made, and why. At the very least, you'd need to replace the current Api<K>::replace(new: K) with a Api<K>::update(mutate: impl FnMut(K) -> K), which could call mutate on every conflict.

We could even use the Policy to do error matching and globally set RetryOnConflict equivalents if we modified Api::replace methods to optionally do an Api::get right before to get the latest resource version.

If you want to steamroll the conflicts (which is the only thing that we can do with our current API) then you don't need any of this machinery anyway; just set resource_version to None.

In order to actually handle the conflicts, you need to understand the mutation. For example, with Controller you'd let it crash and retry the whole reconciliation. I'm not saying Controller is right for every use case, but I think that's roughly the level where you'd need to hook in conflict resolution in any case.


Of course, there are still transient errors that do warrant transparent retries (network errors, and so on). But it's important that we don't get overzealous.

@kazk
Copy link
Member

kazk commented Jun 8, 2021

Yeah, sorry, I didn't understand the requirements and the Policy trait correctly. We can't use RetryLayer for this. (RetryLayer can be used for other retries, though. e.g., retrying GET requests on Retry-After response like client-go.)

This needs cooperation from the user code, since only the user knows what exact changes were made, and why.

Yeah, I thought users can provide their Policy when creating a client used by the controller. But that Policy won't have enough information to do what we want, and the trait doesn't allow it either.

@clux
Copy link
Member Author

clux commented Nov 21, 2021

Going to close this. It's not worth doing anything smart for get/replace now that server-side apply is ubiquitous.

The way to do this for individual calls is using backoff::future::retry. Based on the error type you set there, it will either retry (for a configurable amount of time/times), or bubble up the error.

Watcher has a similar backoff_watch system (via #577 ) that also uses the same backoff library.

Note that StatusDetails struct (from the api error) will include a retry_after_seconds that indicates approximate wait time for "alternate actions"

@clux clux closed this as completed Nov 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Api abstraction related client http issues with the client
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants