New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify errors which indicate the transaction should be retried #1095

Open
Diggsey opened this Issue Aug 13, 2017 · 8 comments

Comments

Projects
None yet
3 participants
@Diggsey
Contributor

Diggsey commented Aug 13, 2017

It's very important to be able to distinguish "transient" errors, such as the transaction being rolled back due to a deadlock, or because of a serialization conflict, from other errors, such as constraint violations, malformed queries, etc. where retrying makes no sense.

I think there should probably be three categories:

  1. Transaction should be retried (conflicting transactions)
  2. Connection needs to be re-established (network issues)
  3. Operation should never be retried (everything else)

The first kind is the most important, since these errors are expected to happen as part of normal database usage.

@killercup

This comment has been minimized.

Member

killercup commented Aug 13, 2017

Sounds good. What other ORMs do this and can we learn from their API design?

@Diggsey

This comment has been minimized.

Contributor

Diggsey commented Aug 13, 2017

I'm not aware of any which make these categorisations explicit. However, with most ORMs/database connectors you can simply catch an exception of the appropriate type.

The closest might be sqlalchemy, which has an "invalidates_connection" flag which is set for exceptions wrapping database errors: http://docs.sqlalchemy.org/en/latest/core/exceptions.html#sqlalchemy.exc.DBAPIError

For detecting deadlocks, etc. you have to look at the inner exception raised by the database connector, or just catch all OperationalErrors and InternalErrors and accept that there will extra retries, neither of which is a good solution.

@sgrif

This comment has been minimized.

Member

sgrif commented Aug 16, 2017

I don't think it makes sense for Diesel to arbitrarily retry anything by default, but I'd be fine with adding a RetryTransaction error variant. If nothing else we should potentially change transaction to take a FnMut as a defensive measure so we can change this later.

@Diggsey

This comment has been minimized.

Contributor

Diggsey commented Aug 16, 2017

I think changing transaction in the future to call-back multiple times would be too much of a breaking change, and FnMut is much more restrictive that FnOnce. If you want automatic retries, I think just adding a second method would be the way to go.

@sgrif

This comment has been minimized.

Member

sgrif commented Dec 16, 2017

Update on this: We didn't change the signature of transaction to take FnMut, so if we want to implement something that does retries, it will need to be a separate method.

@Diggsey

This comment has been minimized.

Contributor

Diggsey commented Apr 18, 2018

This is also very important when using a connection pool: currently r2d2_diesel implements the has_broken method on connections to always return false 😢

Something like an invalidates_connection property on errors is really needed, and for diesel to use that to set a broken flag on connections. It can then implement has_broken using that flag, and there should also be a method to manually mark a connection as broken.

For example, if rollback fails on a transaction, then the connection really needs to be marked as broken, or else it will be returned to the connection pool!

@sgrif

This comment has been minimized.

Member

sgrif commented Apr 19, 2018

@Diggsey It is completely valid to return false from has_broken. is_valid will prevent the broken connection from being reused.

@Diggsey

This comment has been minimized.

Contributor

Diggsey commented Apr 19, 2018

@sgrif I am worried about transactions not being closed properly, ie. if a connection is returned to the pool with an open transaction, then SELECT 1 will still succeed. This can be fixed by the other issue I opened re: rolling back transactions on panic.

Another issue is if the "transaction rollback" call fails, then even if the is_valid check catches it, the connection will sit indefinitely in the connection pool, holding onto any locks acquired during the transaction.

A better option would be to immediately close the connection any time an unrecoverable error occurs: then has_broken can be implemented to just check whether the connection is still open. Any errors from rolling back the transaction should be counted as unrecoverable.

If this is combined with the changes to r2d2 to discard connections dropped during a panic, then I think that covers all the edge cases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment