Proposal for disabling long-lived prepared statements as a cache by default for `Connection` methods #76

vitaly-burovoy · 2017-02-07T04:20:01Z

asyncpg version: asyncpg-0.8.4
PostgreSQL version: any
Python version: CPython-3.5
Platform: Linux
Do you use pgbouncer?: No
Do you use the wheel package or built locally
(if locally, which version of Cython was used)?: Cython-0.25.1
Can your issue be reproduced when running under uvloop?: Yes

While I was commenting #72 I paid attention that asyncpg caches prepared statements by default for methods like fetch and execute of the Connection class. I.e. it uses long-lived objects by default instead of just do what it declares (sending a query to_a_connection because for explicit long-lived statements there is a special "prepare" method).

Prepared statements allow to separate a query from its parameters, but they also have unobvious disadvantages.

The first one is holding types' parameters of resulting rows. If database objects have changed since the first query of a prepared statement ("PS") was run, all queries of the PS are failed with PG's error mapped to asyncpg.exceptions.FeatureNotSupportedError: cached plan must not change result type (see #72).
The most often cause of changing schema's objects is one of migration tools which are used for all non-"Hello, world" projects.

The second one is holding "generic" plans[1] after five[2] runs of a PS which leads to ineffectiveness under some circumstances because generic plan is not consulting whether concrete parameters match table's statistics or not (see below). It can be very dramatic for OLAP queries. It worsens by inability to flush a Connection cache (except disabling it right at the connecting stage by passing the statement_cache_size=0 to the Connection.__init__).
Also note that many users use the Connection class indirectly by the Pool class and can miss the statement_cache_size argument.

The third one is inability to use external connection poolers like pgbouncer. There are many issues in the tracker and there is a special question for filling a new issue.

The fourth is hiddenly occupying PS' names which can lead to errors:

>>> await conn.execute("CREATE TABLE IF NOT EXISTS abc(i int)")
>>> await conn.execute("PREPARE stmt_1 AS SELECT * from abc")
>>> await conn.fetch("EXECUTE stmt_1")

asyncpg.exceptions.DuplicatePreparedStatementError: prepared statement "stmt_1" already exists

So users can write SQL queries consciously avoiding PS because of disadvantages in their own projects, but they still can get either ruined service (until its restart) because all important queries are failed due to an external migration tool or dramatical decreasing speed due to ineffective generic plan. And it is just because they have not paid attention to the statement_cache_size argument.

From my (user's) point of view these issues appear from nowhere because I work with Postgres with invisible (but important) asyncpg's help. And in my mind I don't work directly with the library the same as I don't move my wrist in my mind when I move a pointer over a screen.

The current behavior for disabled cache can be more effective. An implementation of the libpq's PQsendQueryParams uses unnamed prepared statements which adds only one round-trip to a server (the "parse" stage) comparing to the PS behavior whereas asyncpg sends one more request (besides "prepare") to Postgres to clear just used PS. Unnamed PS' last until the next "parse" command, so they don't require clearing.

Your library can be fast. It is great! But do it stable by default with minimum side effects from its internals and let people make a decision. Not all of them want the top speed (they are ready to be slower by parsing a request for their convenience - see issue #9).

Then, if you declare that it is possible to cache execution plans using PS, I (as a user) expect the library does its best to keep the original behaviour (I'm about catching "cached plan must not change result type") except cases where it is really not possible (inside transactions except the first command there - see discussion in #72).

===
My proposal:

Set statement_cache_size to 0 by default.
If statement_cache_size == 0 use "unnamed prepared statement" and don't send cleanup message (now it takes an extra roundtrip).
(optional) If statement_cache_size > 0 catch the discussed exception at least outside transactions and try to do it inside ones.
Of course, document tradeoffs of "statement_cache_size" usage in its chapter.

P.S.: About proposed solution at #72 (comment)
I don't think it is wise to offer event triggers, it is an application level, not a library one.
Also:

What's your plan for avoiding names intersection (with user's database objects)?
The library allows to connect to PG9.2, but it is not supported there.
Whether users have to do that steps just to use the library (add an event trigger to a migration tool and write NOTIFY channel name in connection params)?
It is not guaranteed that the library receives a notification before sending query (assembling a protocol's message). When server receives its message with an old prepared statement name, it can have changed schema object which leads raising "cached plan must not change result type". So it makes system be in a working state (not constantly fail until reboot/reconnect), but it does not avoid the source of issues.
Finally - it does not work at all for READ COMMITTED transactions. Notification is received after commit, but new version of the pg_catalog is visible in the middle of a transaction. I've just checked it.

P.P.S.:
I wrote a little bench which shows speed decreasing in specific circumstances due to generic plans for prepared statements.
I got the next result:

Stage 1
elapsed:  3.405ms - (SELECT i FROM abc WHERE i=$1 LIMIT 1) [1]
elapsed:  6.975ms - (SELECT i FROM abc WHERE i=$1 LIMIT 1) [1]
elapsed:  0.923ms - (SELECT i FROM abc WHERE i=$1 LIMIT 1) [1]
elapsed:  0.590ms - (SELECT i FROM abc WHERE i=$1 LIMIT 1) [1]
elapsed:  0.508ms - (SELECT i FROM abc WHERE i=$1 LIMIT 1) [1]
Elapsed: 12.402ms

Stage 2
elapsed: 33.525ms - (SELECT i FROM abc WHERE i=$1 LIMIT 1) [0]
elapsed:  0.380ms - (SELECT i FROM abc WHERE i=$1 LIMIT 1) [1]
elapsed: 14.121ms - (SELECT i FROM abc WHERE i=$1 LIMIT 1) [2]
elapsed: 14.376ms - (SELECT i FROM abc WHERE i=$1 LIMIT 1) [3]
elapsed: 14.240ms - (SELECT i FROM abc WHERE i=$1 LIMIT 1) [4]
Elapsed: 76.640ms

Stage 3
elapsed:  1.050ms - (SELECT i FROM abc WHERE i=$1 LIMIT +1) [0]
elapsed:  0.459ms - (SELECT i FROM abc WHERE i=$1 LIMIT +1) [1]
elapsed:  0.455ms - (SELECT i FROM abc WHERE i=$1 LIMIT +1) [2]
elapsed:  0.375ms - (SELECT i FROM abc WHERE i=$1 LIMIT +1) [3]
elapsed:  0.288ms - (SELECT i FROM abc WHERE i=$1 LIMIT +1) [4]
Elapsed:  2.627ms

Stage 4
elapsed:  0.538ms - (SELECT i FROM abc WHERE i=$1 LIMIT 1 ) [0]
elapsed:  0.513ms - (SELECT i FROM abc WHERE i=$1 LIMIT 1  ) [1]
elapsed:  0.999ms - (SELECT i FROM abc WHERE i=$1 LIMIT 1   ) [2]
elapsed:  0.575ms - (SELECT i FROM abc WHERE i=$1 LIMIT 1    ) [3]
elapsed:  0.781ms - (SELECT i FROM abc WHERE i=$1 LIMIT 1     ) [4]
Elapsed:  3.406ms

Stage 5 (statement_cache_size=0)
elapsed:  1.327ms - (SELECT i FROM abc WHERE i=$1 LIMIT 1) [0]
elapsed:  0.858ms - (SELECT i FROM abc WHERE i=$1 LIMIT 1) [1]
elapsed:  0.834ms - (SELECT i FROM abc WHERE i=$1 LIMIT 1) [2]
elapsed:  0.861ms - (SELECT i FROM abc WHERE i=$1 LIMIT 1) [3]
elapsed:  0.777ms - (SELECT i FROM abc WHERE i=$1 LIMIT 1) [4]
Elapsed:  4.656ms

Stage 6 (parameters in SQL)
elapsed:  0.236ms - (SELECT i FROM abc WHERE i=0 LIMIT 1) [None]
elapsed:  0.232ms - (SELECT i FROM abc WHERE i=1 LIMIT 1) [None]
elapsed:  0.234ms - (SELECT i FROM abc WHERE i=2 LIMIT 1) [None]
elapsed:  0.226ms - (SELECT i FROM abc WHERE i=3 LIMIT 1) [None]
elapsed:  0.244ms - (SELECT i FROM abc WHERE i=4 LIMIT 1) [None]
Elapsed:  1.172ms

Concrete values vary from run to run, but Stage2 is always much bigger than other ones (with uvloop it is even more dramatic).

Pay attention to the difference between stages 2 and 3.
The reason for the second stage is usage of a prepared statement which holds a generic plan (after the first stage) with SeqScan (I recommend you to repeat it with real prepared statements and run "analyze" of it instead of "execute"; see[1]).
The third stage uses a different prepared statement due to a '+' sign with different execution plan in the same connection.
The fourth stage uses completely different statements and pretend to be PQsendQueryParams from libpq. There is a 1.5x to 2x speed loss, but I do not know how much time takes "Connection._get_statement" internals (PG's parse time is 0.040ms! + roundtrip time).

For the concrete example speed loss between:

stages 3 and 4 is 1.27x
stages 3 and 5 is 1.77x
between stages 4 and 2 is 22x!
between stages 5 and 2 is 16x!

I think when users impact that speed loss they'll search anywhere except the "statement_cache_size" argument.
Also those who know their data and require really maximal speed will use cases like Stage 6 (2x faster than Stage 3!), i.e. will not use your cache at all.

[1] https://www.postgresql.org/docs/current/static/sql-prepare.html#SQL-PREPARE-NOTES
[2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/cache/plancache.c#l1037
If the file is changed then search "if (plansource->num_custom_plans < 5)" around it.

The text was updated successfully, but these errors were encountered:

elprans · 2017-02-07T14:00:03Z

@vitaly-burovoy First of all, thanks for looking into these things! This is a good discussion to have.

That said, I don't think that all your points are necessarily valid grounds for disabling the statement cache by default. I think this topic warrants further discussion and feedback from the users.

I'll go through each point.

The first one is holding types' parameters of resulting rows. If database objects have changed since the first query of a prepared statement ("PS") was run, all queries of the PS are failed with PG's error mapped to asyncpg.exceptions.FeatureNotSupportedError: cached plan must not change result type (see #72).
The most often cause of changing schema's objects is one of migration tools which are used for all non-"Hello, world" projects.

Postgres JDBC [1] and Rails ActiveRecord [2] use cached prepared statements by default. I'd like to think that that this is a proof that it can be done and it can work beneficially. Specifically, AR handles the "cached plan" issue [3] with the following algorithm:

Catch the exception.
Drop the prepared statement cache.
If not in a transaction, re-plan and re-run the query.
If inside a transaction, rollback and raise a special exception that can be caught and handled by the application.

The maximum risk with this approach is that a very limited number of requests to the database might fail after the live migration. But that can happen for a bunch of other reasons as well, and if that part of your application is so critical that it cannot tolerate a failed transaction, you should write the retry logic anyway.

The second one is holding "generic" plans after five runs of a PS which leads to ineffectiveness under some circumstances because generic plan is not consulting whether concrete parameters match table's statistics or not (see below). It can be very dramatic for OLAP queries. It worsens by inability to flush a Connection cache (except disabling it right at the connecting stage by passing the statement_cache_size=0 to the Connection.init).
Also note that many users use the Connection class indirectly by the Pool class and can miss the statement_cache_size argument.

If 99.99% of your table column consists of the same value, then creating an index on it is pointless, and will just impact write performance for no benefit. Your example is an extremely pathological case. In practice, Postgres' query cost stats are usually fairly accurate, and the server will only choose the generic plan, if the last five custom plans were of similar cost.

Thus, I'm not convinced. Given [1] and [2], I'd like to see some real reports of users experiencing significant performance issues on their data before making a conclusion that caching prepared statements by default is bad for the majority of cases.

The third one is inability to use external connection poolers like pgbouncer. There are many issues in the tracker and there is a special question for filling a new issue.

pgbouncer does not support prepared statements period. Disabling cache will not help, as any query with parameters will still go through the PREPARE/EXECUTE path. I think pgbouncer is broken, as it violates the PostgreSQL protocol. There is an alternative that works: pgpool.

The fourth is hiddenly occupying PS' names which can lead to errors:

This is trivially fixable by choosing a prefix that is unlikely to collide: '__asyncpg__' or something like that.

Bottom line: a) there are straightforward fixes for the "cached plan" issue and the name collision issue; b) the generic plan issue is debatable, and we'd like to see some real-world feedback before making a decision; c) we can't do anything about pgbouncer anyway.

[1] JDBC connection parameters
[2] RoR docs, blog post on the topic
[3] rails/rails#22170

See issue #76 for some context.

vitaly-burovoy · 2017-02-28T08:40:30Z

I think this topic warrants further discussion and feedback from the users.

I'm sorry for the very late response. Unfortunately, nobody wants to join the discussion and to support either side.

Postgres JDBC [1] and ... use cached prepared statements by default.

If I'm wrong, please, correct me. I'm neither Java user nor PgJDBC one.

What I found by the link[1]: prepareThreshold, preparedStatementCacheQueries and preparedStatementCacheSizeMiB have an influence to prepareStatement() calls to avoid setting these parameters for each created prepared statement object.

But there is a property[2] which is set by default to extended and a comment there explicitly explains that it differs from the extendedCacheEveryting value (which is "try cache every statement"). Also it proves by a check in the function[3] executeCachedSql for an SQL passed as a simple string.

... and Rails ActiveRecord use cached prepared statements by default.

No doubt about. But ActiveRecord is a higher-lever abstraction (I'd compare it with asyncpg as HTTP with TCP/IP). If users use it they read its docs to find how it works and its side effects (e.g. special exceptions). So this example is not applicable.

Specifically, AR handles the "cached plan" issue with the following algorithm:

Yes, it is a good algorithm for the case statement_cache_size > 0.

... and if that part of your application is so critical that it cannot tolerate a failed transaction, you should write the retry logic anyway.

Sure. I created this issue because I expect the exception cached plan must not change result type for prepare statements, and would write a wrapper there, not for simple queries where that exception should not raise by default. If I decide to set statement_cache_size > 0 after reading the docs (where a note should be placed), I'll expect that exception for simple queries too because I'll know they are in fact prepared statements.

If 99.99% of your table column consists of the same value, then creating an index on it is pointless, and will just impact write performance for no benefit.

OK, change:

- await conn.execute("CREATE INDEX ON abc(i);")
+ await conn.execute("CREATE INDEX ON abc(i) WHERE i <> 1;")

and get the same result without impacting write performance.

Also note that statistics change during filling/updating a table and (auto-)ANALYZE. But a prepared statement will still use a generic plan. Pay attention that my examples have very simple queries. OLAP ones can be several kB long and have multiple joins for which static generic plan is a pretty bad.

Your example is an extremely pathological case.

Yes, because it is an example. I can't give you my real data, but such cases take place.

Disabling cache will not help,

as any query with parameters will still go through the PREPARE/EXECUTE path.

Yes, but if there is no "Sync" message between them, it is a single transaction ("ReadyForQuery" is sent only once after EXECUTE) and if an unnamed statements are used they can't conflict with prepared statement's names from other connections (if other services are used the asyncpg, conflicts happen no matter what prefix for names is used - "stmt_" or "_asyncpg_stmt").

I think pgbouncer is broken, as it violates the PostgreSQL protocol.

I don't think so. It just pays attention to the 'ReadyForQuery' message[4] which is sent as an answer for the 'Sync' message. If the current state is "Idle" it can reuse the server's connection for one of other clients.

Bottom line:
a) PgJDBC does not cache queries in a connection by default;
b) ActiveRecord is a wrapper above a driver to Postgres;
c) a driver to Postgres should provide maximum features safe by default with optional optimizations;
d) it is possible to change the asyncpg to work with pgbouncer.

[1] JDBC connection parameters
[2] "preferQueryMode" property
[3] "shouldCache" check for executing "String sql"
[4] pgbouncer's pooling decisions

PostgreSQL will raise an exception when it detects that the result type of the query has changed from when the statement was prepared. This may happen, for example, after an ALTER TABLE or SET search_path. When this happens, and there is no transaction running, we can simply re-prepare the statement and try again. If the transaction _is_ running, this error will put it into an error state, and we have no choice but to raise an exception. The original error is somewhat cryptic, so we raise a custom InvalidCachedStatementError with the original server exception as context. In either case we clear the statement cache for this connection and all other connections of the pool this connection belongs to (if any). See #72 and #76 for discussion. Fixes: #72.

The parameter allows asyncpg to refresh cached prepared statements periodically. See also issue #76.

1st1 · 2017-04-05T20:45:53Z

In asyncpg 0.10.0 we made some changes to address some of the problems you highlighted, specifically:

Add max_cached_statement_lifetime parameter to asyncpg.connect().
Invalidate statement cache on schema changes affecting statement result.
Use anonymous prepared statements when the cache is off.

I believe that at this point the discussion boils down to one question: should we disable the statement cache by default or not. Our position is to have the cache enabled by default, because:

in most cases the cache positively affects performance;
even if you use anonymous prepared statements there's still a race condition, because the schema might change between the statement being parsed and the statement being executed.

Thank you for the discussion, Vitaly, it really helped us make asyncpg better. We might reconsider if there's more evidence to support disabling the cache (feel free to reopen the issue).

vitaly-burovoy · 2017-04-13T11:19:05Z

Our position is to have the cache enabled by default, because:

OK, at least now people who need a pgbouncer (or know for sure caching it is not their case) can set statement_cache_size to 0 after reading the docs and get correct results.

if there's more evidence

I don't think people will post issues about it since they can just disable the cache.
Moreover, here was no people in the discussion besides ourselves.
Nevertheless, let's leave it enabled if you think it helps new users to get better results.

Thank you for the discussion,

Thank you very much! It was very exciting discussion.

davidtgq · 2017-11-06T20:18:20Z

How about using a keyword arg to give the user explicit control whether or not cache should be used?

connection.fetch('SQL', *args, cache_ps=True/False)

If cache_ps is unspecified then use the default cache_ps for the connection, and if that's also unspecified then use the existing LRU cache

elprans added feedback question labels Feb 7, 2017

1st1 added a commit that referenced this issue Feb 22, 2017

Give auto-prepared statements more distinct names.

27721fb

See issue #76 for some context.

elprans mentioned this issue Mar 30, 2017

Invalidate statement cache on schema changes affecting statement result. #103

Merged

1st1 added a commit that referenced this issue Mar 31, 2017

Add new max_cached_statement_use_count parameter to Connection.

29d619b

The parameter allows asyncpg to refresh cached prepared statements periodically. See also issue #76.

1st1 mentioned this issue Mar 31, 2017

Add new max_cached_statement_lifetime parameter to Connection. #109

Merged

1st1 added a commit that referenced this issue Mar 31, 2017

Add new max_cached_statement_use_count parameter to Connection.

101311b

The parameter allows asyncpg to refresh cached prepared statements periodically. See also issue #76.

1st1 added a commit that referenced this issue Apr 3, 2017

Add new max_cached_statement_use_count parameter to Connection.

9679d14

The parameter allows asyncpg to refresh cached prepared statements periodically. See also issue #76.

1st1 added a commit that referenced this issue Apr 3, 2017

Add new max_cached_statement_lifetime parameter to Connection.

10955fd

The parameter allows asyncpg to refresh cached prepared statements periodically. See also issue #76.

1st1 closed this as completed Apr 5, 2017

kxepal mentioned this issue Oct 23, 2017

asyncpg.exceptions.InvalidSQLStatementNameError: prepared statement "__asyncpg_stmt_37__" does not exist #121

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for disabling long-lived prepared statements as a cache by default for `Connection` methods #76

Proposal for disabling long-lived prepared statements as a cache by default for `Connection` methods #76

vitaly-burovoy commented Feb 7, 2017

elprans commented Feb 7, 2017

vitaly-burovoy commented Feb 28, 2017 •

edited

1st1 commented Apr 5, 2017

vitaly-burovoy commented Apr 13, 2017

davidtgq commented Nov 6, 2017

Proposal for disabling long-lived prepared statements as a cache by default for Connection methods #76

Proposal for disabling long-lived prepared statements as a cache by default for Connection methods #76

Comments

vitaly-burovoy commented Feb 7, 2017

elprans commented Feb 7, 2017

vitaly-burovoy commented Feb 28, 2017 • edited

1st1 commented Apr 5, 2017

vitaly-burovoy commented Apr 13, 2017

davidtgq commented Nov 6, 2017

Proposal for disabling long-lived prepared statements as a cache by default for `Connection` methods #76

Proposal for disabling long-lived prepared statements as a cache by default for `Connection` methods #76

vitaly-burovoy commented Feb 28, 2017 •

edited