Adds consecutive exception tracker to connections #249

egalpin · 2018-01-31T15:40:34Z

This change might not be for everyone, so I respect the right of the core maintainers to downvote/close.

The backstory here is that I am using asyncpg to connect to an HA postgres cluster. In the event of a failover, the writer (where the asyncpg connection pool is connected to) crashes/fails, and is rebooted. The reader is then promoted to writer. The DB DNS is automatically updated, but this takes time. In addition, the connections in the pool continue operating once the crashed DB recovers, except now that DB is the reader (i.e. SHOW transaction_read_only; -> on). As a result, INSERT/UPDATE/DELETE operations result in:

asyncpg.exceptions.ReadOnlySQLTransactionError: cannot execute INSERT in a read-only transaction

Other errors, like timeouts, are also possible offenders. Unfortunately, the only way I've found around this to refresh the connections is to set a low max_queries config param. This is generally ok, but degrades performance due to increased cycles of closing and opening new connections.

With this PR, a configurable max_consecutive_exceptions config param is introduced. This param is checked against every time _do_execute results in an exception of any kind. I would very much like to be more precise with exception handling. Initially, I wanted to except exceptions.__all__, but some exception classes do not inherit from Exception, which causes other exceptions. I would certainly appreciate input on how to make the _do_execute catch-all more limited to asyncpg exceptions.

TheKevJames

I'm personally very +1 on this change overall, given my review comments. The failover case, especially, is something that doesn't seem to be handle-able with the asyncpg API otherwise.

TheKevJames · 2018-01-31T17:54:41Z

asyncpg/connection.py

@@ -1431,6 +1447,11 @@ def _drop_global_statement_cache(self):
        default).  Pass ``0`` to allow all statements to be cached
        regardless of their size.

+    :param int max_consecutive_exceptions:


Probably important to allow users to disable this, ie. by sending 0 in the same way as many of the rest of these parameters.

@asyncpg/maintainers: personally, I think the default of 5ish is pretty reasonable here, but depending on your preference in backwards-compatibility we might want to default this to 0 for now behavior change.

Great idea with 0 setting

TheKevJames · 2018-01-31T17:56:25Z

asyncpg/connection.py

@@ -1331,6 +1332,8 @@ def _drop_global_statement_cache(self):
            # It is not possible to recover (the statement is already done at
            # the server's side), the only way is to drop our caches and
            # reraise the exception to the caller.
+            #
+            await self._maybe_close_bad_connection()


I'm unsure OutdatedSchemaCacheError is necessarily an indicator of a bad connection. Thoughts on this?

TheKevJames · 2018-01-31T17:57:11Z

asyncpg/connection.py

@@ -1356,15 +1359,27 @@ def _drop_global_statement_cache(self):
            # and https://github.com/MagicStack/asyncpg/issues/76
            # for discussion.
            #
+            await self._maybe_close_bad_connection()


I'm unsure OutdatedSchemaCacheError is necessarily an indicator of a bad connection. Maybe, move this down a few lines into the if not retry box? Thoughts on this?

TheKevJames · 2018-01-31T18:02:45Z

asyncpg/connection.py

            self._drop_global_statement_cache()
            if self._protocol.is_in_transaction() or not retry:
                raise
            else:
                return await self._do_execute(
                    query, executor, timeout, retry=False)
+        except:


As far as I see from my (admittedly cursory) glance of the custom exceptions here, it looks like they all inherit from Exception. At the very least, inheriting from that would be great, although it might be even better to go through and mark some as "potentially signifying a bad connection" and catching only those.

Just grabbing a random exception from that file as an example: DeprecatedFeature probably shouldn't lead to killing&restarting the connection, though (as you commented) ReadOnlySQLTransactionError definitely could indicate this case and I am +1 on reacting to that case.

TheKevJames · 2018-01-31T18:03:47Z

asyncpg/connection.py

            self._drop_global_statement_cache()
            if self._protocol.is_in_transaction() or not retry:
                raise
            else:
                return await self._do_execute(
                    query, executor, timeout, retry=False)
+        except:
+            logging.warning('exception - maybe closing bad connection')


I would move this into the _maybe_close_bad_connection method, and only log a logging.warning when we actually close the connection.

Also -- need an import logging. Even better, initialize your logger before using with:

import logging # ... log = logging.getLogger(__name__) # ... log.warning("I'm a configured log warning!")

1st1 · 2018-01-31T20:05:59Z

I have a strong feeling that this feature should be part of the connection pool. It already has the machinery to overload all methods on the Connection object. This way Connection object (which has already a very complex implementation) stays unmodified.

There are many different things we want to do about the pool in asyncpg, including factoring it out in a separate generic package. This way we'll be able to have different connection pool implementations fined tuned for specific use cases. So although I understand the motivation for this PR, I'm -1 on it in its current form.

egalpin · 2018-02-02T15:19:11Z

@1st1 Fair points. I'm working towards implementing this in the pool itself, just getting a bit lost in the pool/ch/proxy implementation at the moment.

egalpin · 2018-02-02T17:52:09Z

Rather than reverting work, it was simpler to just open a new PR. See #253

1st1 · 2018-02-02T18:37:52Z

@1st1 Fair points. I'm working towards implementing this in the pool itself, just getting a bit lost in the pool/ch/proxy implementation at the moment.

TBH I'd like to put this on hold till we figure out how exactly we want asyncpg pools to be refactored. For instance, I'm not sure that asyncpg.Pool should have this functionality, maybe we need a new asyncpg.HAPool with extra-stuff.

Adds consecutive exception tracker to connections

eaae8c0

TheKevJames reviewed Jan 31, 2018

View reviewed changes

Removes errant logging and close calls

8da94e3

Adds setting to disable max_consecutive_exceptions

79317dc

egalpin mentioned this pull request Feb 2, 2018

Adds max_consecutive_exceptions to pool #253

Open

egalpin closed this Feb 2, 2018

egalpin mentioned this pull request Feb 7, 2018

Pool connect ignores DNS ttl #254

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds consecutive exception tracker to connections #249

Adds consecutive exception tracker to connections #249

egalpin commented Jan 31, 2018

TheKevJames left a comment

TheKevJames Jan 31, 2018

egalpin Jan 31, 2018

TheKevJames Jan 31, 2018

TheKevJames Jan 31, 2018

TheKevJames Jan 31, 2018

TheKevJames Jan 31, 2018 •

edited

1st1 commented Jan 31, 2018

egalpin commented Feb 2, 2018

egalpin commented Feb 2, 2018

1st1 commented Feb 2, 2018

Adds consecutive exception tracker to connections #249

Adds consecutive exception tracker to connections #249

Conversation

egalpin commented Jan 31, 2018

TheKevJames left a comment

Choose a reason for hiding this comment

TheKevJames Jan 31, 2018

Choose a reason for hiding this comment

egalpin Jan 31, 2018

Choose a reason for hiding this comment

TheKevJames Jan 31, 2018

Choose a reason for hiding this comment

TheKevJames Jan 31, 2018

Choose a reason for hiding this comment

TheKevJames Jan 31, 2018

Choose a reason for hiding this comment

TheKevJames Jan 31, 2018 • edited

Choose a reason for hiding this comment

1st1 commented Jan 31, 2018

egalpin commented Feb 2, 2018

egalpin commented Feb 2, 2018

1st1 commented Feb 2, 2018

TheKevJames Jan 31, 2018 •

edited