Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backoff in the transaction loop is problematic #188

Open
ajwerner opened this issue Oct 3, 2022 · 5 comments
Open

backoff in the transaction loop is problematic #188

ajwerner opened this issue Oct 3, 2022 · 5 comments

Comments

@ajwerner
Copy link

ajwerner commented Oct 3, 2022

In cockroach, when using the SAVEPOINT cockroach_restart technique to recover from serializable restarts, the transaction's locks are held across restarts. This enables liveness among contended transactions. In the driver today, there's exponential backoff in those restarts. That means that locks will be held for a long period of inactivity, exacerbating the cost of contention potentially greatly.

See

def retry_exponential_backoff(retry_count: int, max_backoff: int = 0) -> None:
"""
This is a function for an exponential back-off whenever we encounter a retry error.
So we sleep for a bit before retrying,
and the sleep time varies for each failed transaction
capped by the max_backoff parameter.
:param retry_count: The number for the current retry count
:param max_backoff: The capped number of seconds for the exponential back-off
:return: None
"""
sleep_secs = uniform(0, min(max_backoff, 0.1 * (2 ** retry_count)))
sleep(sleep_secs)

@gordthompson
Copy link
Collaborator

Thanks for your comments. As noted in the docstring, there are arguments to limit the number of retries and their (maximum) duration, which could be used to mitigate the "cost of contention" to which you refer. What would you suggest as an alternative?

@rafiss
Copy link
Contributor

rafiss commented Oct 13, 2022

It should not perform any exponential backoff. It should retry right away, to minimize the amount of time that the client is asleep while holding these locks in the database.

@gordthompson
Copy link
Collaborator

sleep_secs = uniform(0, min(max_backoff, 0.1 * (2 ** retry_count)))

Specifying max_backoff=1 would ensure that the wait time is no more than one second, and would be 0.5 seconds on average. I'd want to understand why @bdarnell implemented the exponential backoff before considering removing it completely.

@rafiss
Copy link
Contributor

rafiss commented Oct 13, 2022

@bdarnell did not add it (see #19)

It was added in #115 and I approved it before I had the correct understanding of how our SAVEPOINT retries worked. Very much a mistake/oversight on my part.

@rafiss
Copy link
Contributor

rafiss commented Oct 13, 2022

The exponential backoff potentially could make more sense if the transaction was fully rolled back, and then a new one was started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants