backoff in the transaction loop is problematic #188

ajwerner · 2022-10-03T14:03:06Z

In cockroach, when using the SAVEPOINT cockroach_restart technique to recover from serializable restarts, the transaction's locks are held across restarts. This enables liveness among contended transactions. In the driver today, there's exponential backoff in those restarts. That means that locks will be held for a long period of inactivity, exacerbating the cost of contention potentially greatly.

See

sqlalchemy-cockroachdb/sqlalchemy_cockroachdb/transaction.py

Lines 77 to 90 in 19e66cc

    
           def retry_exponential_backoff(retry_count: int, max_backoff: int = 0) -> None: 
        
               """ 
        
               This is a function for an exponential back-off whenever we encounter a retry error. 
        
               So we sleep for a bit before retrying, 
        
               and the sleep time varies for each failed transaction 
        
               capped by the max_backoff parameter. 
        
               :param retry_count: The number for the current retry count 
        
               :param max_backoff: The capped number of seconds for the exponential back-off 
        
               :return: None 
        
               """ 
        
               sleep_secs = uniform(0, min(max_backoff, 0.1 * (2 ** retry_count))) 
        
               sleep(sleep_secs)

The text was updated successfully, but these errors were encountered:

gordthompson · 2022-10-13T16:10:09Z

Thanks for your comments. As noted in the docstring, there are arguments to limit the number of retries and their (maximum) duration, which could be used to mitigate the "cost of contention" to which you refer. What would you suggest as an alternative?

rafiss · 2022-10-13T16:38:09Z

It should not perform any exponential backoff. It should retry right away, to minimize the amount of time that the client is asleep while holding these locks in the database.

gordthompson · 2022-10-13T17:28:29Z

sqlalchemy-cockroachdb/sqlalchemy_cockroachdb/transaction.py

Line 89 in 19e66cc

sleep_secs = uniform(0, min(max_backoff, 0.1 * (2 ** retry_count)))

Specifying max_backoff=1 would ensure that the wait time is no more than one second, and would be 0.5 seconds on average. I'd want to understand why @bdarnell implemented the exponential backoff before considering removing it completely.

rafiss · 2022-10-13T17:50:58Z

@bdarnell did not add it (see #19)

It was added in #115 and I approved it before I had the correct understanding of how our SAVEPOINT retries worked. Very much a mistake/oversight on my part.

rafiss · 2022-10-13T17:52:05Z

The exponential backoff potentially could make more sense if the transaction was fully rolled back, and then a new one was started.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backoff in the transaction loop is problematic #188

backoff in the transaction loop is problematic #188

ajwerner commented Oct 3, 2022

gordthompson commented Oct 13, 2022

rafiss commented Oct 13, 2022

gordthompson commented Oct 13, 2022

rafiss commented Oct 13, 2022

rafiss commented Oct 13, 2022

backoff in the transaction loop is problematic #188

backoff in the transaction loop is problematic #188

Comments

ajwerner commented Oct 3, 2022

gordthompson commented Oct 13, 2022

rafiss commented Oct 13, 2022

gordthompson commented Oct 13, 2022

rafiss commented Oct 13, 2022

rafiss commented Oct 13, 2022