Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds consecutive exception tracker to connections #249

Closed
wants to merge 3 commits into from

Conversation

egalpin
Copy link

@egalpin egalpin commented Jan 31, 2018

This change might not be for everyone, so I respect the right of the core maintainers to downvote/close.

The backstory here is that I am using asyncpg to connect to an HA postgres cluster. In the event of a failover, the writer (where the asyncpg connection pool is connected to) crashes/fails, and is rebooted. The reader is then promoted to writer. The DB DNS is automatically updated, but this takes time. In addition, the connections in the pool continue operating once the crashed DB recovers, except now that DB is the reader (i.e. SHOW transaction_read_only; -> on). As a result, INSERT/UPDATE/DELETE operations result in:

asyncpg.exceptions.ReadOnlySQLTransactionError: cannot execute INSERT in a read-only transaction

Other errors, like timeouts, are also possible offenders. Unfortunately, the only way I've found around this to refresh the connections is to set a low max_queries config param. This is generally ok, but degrades performance due to increased cycles of closing and opening new connections.

With this PR, a configurable max_consecutive_exceptions config param is introduced. This param is checked against every time _do_execute results in an exception of any kind. I would very much like to be more precise with exception handling. Initially, I wanted to except exceptions.__all__, but some exception classes do not inherit from Exception, which causes other exceptions. I would certainly appreciate input on how to make the _do_execute catch-all more limited to asyncpg exceptions.

Copy link

@TheKevJames TheKevJames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm personally very +1 on this change overall, given my review comments. The failover case, especially, is something that doesn't seem to be handle-able with the asyncpg API otherwise.

@@ -1431,6 +1447,11 @@ def _drop_global_statement_cache(self):
default). Pass ``0`` to allow all statements to be cached
regardless of their size.

:param int max_consecutive_exceptions:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably important to allow users to disable this, ie. by sending 0 in the same way as many of the rest of these parameters.

@asyncpg/maintainers: personally, I think the default of 5ish is pretty reasonable here, but depending on your preference in backwards-compatibility we might want to default this to 0 for now behavior change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea with 0 setting

@@ -1331,6 +1332,8 @@ def _drop_global_statement_cache(self):
# It is not possible to recover (the statement is already done at
# the server's side), the only way is to drop our caches and
# reraise the exception to the caller.
#
await self._maybe_close_bad_connection()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure OutdatedSchemaCacheError is necessarily an indicator of a bad connection. Thoughts on this?

@@ -1356,15 +1359,27 @@ def _drop_global_statement_cache(self):
# and https://github.com/MagicStack/asyncpg/issues/76
# for discussion.
#
await self._maybe_close_bad_connection()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure OutdatedSchemaCacheError is necessarily an indicator of a bad connection. Maybe, move this down a few lines into the if not retry box? Thoughts on this?

self._drop_global_statement_cache()
if self._protocol.is_in_transaction() or not retry:
raise
else:
return await self._do_execute(
query, executor, timeout, retry=False)
except:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I see from my (admittedly cursory) glance of the custom exceptions here, it looks like they all inherit from Exception. At the very least, inheriting from that would be great, although it might be even better to go through and mark some as "potentially signifying a bad connection" and catching only those.

Just grabbing a random exception from that file as an example: DeprecatedFeature probably shouldn't lead to killing&restarting the connection, though (as you commented) ReadOnlySQLTransactionError definitely could indicate this case and I am +1 on reacting to that case.

self._drop_global_statement_cache()
if self._protocol.is_in_transaction() or not retry:
raise
else:
return await self._do_execute(
query, executor, timeout, retry=False)
except:
logging.warning('exception - maybe closing bad connection')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this into the _maybe_close_bad_connection method, and only log a logging.warning when we actually close the connection.

Also -- need an import logging. Even better, initialize your logger before using with:

import logging
# ...
log = logging.getLogger(__name__)
# ...
log.warning("I'm a configured log warning!")

@1st1
Copy link
Member

1st1 commented Jan 31, 2018

I have a strong feeling that this feature should be part of the connection pool. It already has the machinery to overload all methods on the Connection object. This way Connection object (which has already a very complex implementation) stays unmodified.

There are many different things we want to do about the pool in asyncpg, including factoring it out in a separate generic package. This way we'll be able to have different connection pool implementations fined tuned for specific use cases. So although I understand the motivation for this PR, I'm -1 on it in its current form.

@egalpin
Copy link
Author

egalpin commented Feb 2, 2018

@1st1 Fair points. I'm working towards implementing this in the pool itself, just getting a bit lost in the pool/ch/proxy implementation at the moment.

@egalpin
Copy link
Author

egalpin commented Feb 2, 2018

Rather than reverting work, it was simpler to just open a new PR. See #253

@egalpin egalpin closed this Feb 2, 2018
@1st1
Copy link
Member

1st1 commented Feb 2, 2018

@1st1 Fair points. I'm working towards implementing this in the pool itself, just getting a bit lost in the pool/ch/proxy implementation at the moment.

TBH I'd like to put this on hold till we figure out how exactly we want asyncpg pools to be refactored. For instance, I'm not sure that asyncpg.Pool should have this functionality, maybe we need a new asyncpg.HAPool with extra-stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants