is there any way asyncpg could be configured to not do a giant query for PG custom types when you prepare #1078

zzzeek · 2023-09-17T14:11:24Z

hey there -

as you know, SQLAlchemy relies upon knowing what the names of columns will be in result sets, and for asycnpg that forces us to use PreparedStatements for statements that we want to fetch rows from.

When the preparedstatement has a custom type like an ENUM inside of it, asyncpg does a giant query up front to cache information about the datatype, here it is from my test case below inside my PG SQL log:

2023-09-17 10:04:41.545 EDT [1728595] LOG:  execute __asyncpg_stmt_3__: WITH RECURSIVE typeinfo_tree(
	    oid, ns, name, kind, basetype, elemtype, elemdelim,
	    range_subtype, attrtypoids, attrnames, depth)
	AS (
	    SELECT
	        ti.oid, ti.ns, ti.name, ti.kind, ti.basetype,
	        ti.elemtype, ti.elemdelim, ti.range_subtype,
	        ti.attrtypoids, ti.attrnames, 0
	    FROM
	            (
	        SELECT
	            t.oid                           AS oid,
	            ns.nspname                      AS ns,
	            t.typname                       AS name,
	            t.typtype                       AS kind,
	            (CASE WHEN t.typtype = 'd' THEN
	                (WITH RECURSIVE typebases(oid, depth) AS (
	                    SELECT
	                        t2.typbasetype      AS oid,
	                        0                   AS depth
	                    FROM
	                        pg_type t2
	                    WHERE
	                        t2.oid = t.oid
	
	                    UNION ALL
	
	                    SELECT
	                        t2.typbasetype      AS oid,
	                        tb.depth + 1        AS depth
	                    FROM
	                        pg_type t2,
	                        typebases tb
	                    WHERE
	                       tb.oid = t2.oid
	                       AND t2.typbasetype != 0
	               ) SELECT oid FROM typebases ORDER BY depth DESC LIMIT 1)
	
	               ELSE NULL
	            END)                            AS basetype,
	            t.typelem                       AS elemtype,
	            elem_t.typdelim                 AS elemdelim,
	            COALESCE(
	                range_t.rngsubtype,
	                multirange_t.rngsubtype)    AS range_subtype,
	            (CASE WHEN t.typtype = 'c' THEN
	                (SELECT
	                    array_agg(ia.atttypid ORDER BY ia.attnum)
	                FROM
	                    pg_attribute ia
	                    INNER JOIN pg_class c
	                        ON (ia.attrelid = c.oid)
	                WHERE
	                    ia.attnum > 0 AND NOT ia.attisdropped
	                    AND c.reltype = t.oid)
	
	                ELSE NULL
	            END)                            AS attrtypoids,
	            (CASE WHEN t.typtype = 'c' THEN
	                (SELECT
	                    array_agg(ia.attname::text ORDER BY ia.attnum)
	                FROM
	                    pg_attribute ia
	                    INNER JOIN pg_class c
	                        ON (ia.attrelid = c.oid)
	                WHERE
	                    ia.attnum > 0 AND NOT ia.attisdropped
	                    AND c.reltype = t.oid)
	
	                ELSE NULL
	            END)                            AS attrnames
	        FROM
	            pg_catalog.pg_type AS t
	            INNER JOIN pg_catalog.pg_namespace ns ON (
	                ns.oid = t.typnamespace)
	            LEFT JOIN pg_type elem_t ON (
	                t.typlen = -1 AND
	                t.typelem != 0 AND
	                t.typelem = elem_t.oid
	            )
	            LEFT JOIN pg_range range_t ON (
	                t.oid = range_t.rngtypid
	            )
	            LEFT JOIN pg_range multirange_t ON (
	                t.oid = multirange_t.rngmultitypid
	            )
	    )
	 AS ti
	    WHERE
	        ti.oid = any($1::oid[])
	
	    UNION ALL
	
	    SELECT
	        ti.oid, ti.ns, ti.name, ti.kind, ti.basetype,
	        ti.elemtype, ti.elemdelim, ti.range_subtype,
	        ti.attrtypoids, ti.attrnames, tt.depth + 1
	    FROM
	            (
	        SELECT
	            t.oid                           AS oid,
	            ns.nspname                      AS ns,
	            t.typname                       AS name,
	            t.typtype                       AS kind,
	            (CASE WHEN t.typtype = 'd' THEN
	                (WITH RECURSIVE typebases(oid, depth) AS (
	                    SELECT
	                        t2.typbasetype      AS oid,
	                        0                   AS depth
	                    FROM
	                        pg_type t2
	                    WHERE
	                        t2.oid = t.oid
	
	                    UNION ALL
	
	                    SELECT
	                        t2.typbasetype      AS oid,
	                        tb.depth + 1        AS depth
	                    FROM
	                        pg_type t2,
	                        typebases tb
	                    WHERE
	                       tb.oid = t2.oid
	                       AND t2.typbasetype != 0
	               ) SELECT oid FROM typebases ORDER BY depth DESC LIMIT 1)
	
	               ELSE NULL
	            END)                            AS basetype,
	            t.typelem                       AS elemtype,
	            elem_t.typdelim                 AS elemdelim,
	            COALESCE(
	                range_t.rngsubtype,
	                multirange_t.rngsubtype)    AS range_subtype,
	            (CASE WHEN t.typtype = 'c' THEN
	                (SELECT
	                    array_agg(ia.atttypid ORDER BY ia.attnum)
	                FROM
	                    pg_attribute ia
	                    INNER JOIN pg_class c
	                        ON (ia.attrelid = c.oid)
	                WHERE
	                    ia.attnum > 0 AND NOT ia.attisdropped
	                    AND c.reltype = t.oid)
	
	                ELSE NULL
	            END)                            AS attrtypoids,
	            (CASE WHEN t.typtype = 'c' THEN
	                (SELECT
	                    array_agg(ia.attname::text ORDER BY ia.attnum)
	                FROM
	                    pg_attribute ia
	                    INNER JOIN pg_class c
	                        ON (ia.attrelid = c.oid)
	                WHERE
	                    ia.attnum > 0 AND NOT ia.attisdropped
	                    AND c.reltype = t.oid)
	
	                ELSE NULL
	            END)                            AS attrnames
	        FROM
	            pg_catalog.pg_type AS t
	            INNER JOIN pg_catalog.pg_namespace ns ON (
	                ns.oid = t.typnamespace)
	            LEFT JOIN pg_type elem_t ON (
	                t.typlen = -1 AND
	                t.typelem != 0 AND
	                t.typelem = elem_t.oid
	            )
	            LEFT JOIN pg_range range_t ON (
	                t.oid = range_t.rngtypid
	            )
	            LEFT JOIN pg_range multirange_t ON (
	                t.oid = multirange_t.rngmultitypid
	            )
	    )
	 ti,
	        typeinfo_tree tt
	    WHERE
	        (tt.elemtype IS NOT NULL AND ti.oid = tt.elemtype)
	        OR (tt.attrtypoids IS NOT NULL AND ti.oid = any(tt.attrtypoids))
	        OR (tt.range_subtype IS NOT NULL AND ti.oid = tt.range_subtype)
	        OR (tt.basetype IS NOT NULL AND ti.oid = tt.basetype)
	)
	
	SELECT DISTINCT
	    *,
	    basetype::regtype::text AS basetype_name,
	    elemtype::regtype::text AS elemtype_name,
	    range_subtype::regtype::text AS range_subtype_name
	FROM
	    typeinfo_tree
	ORDER BY
	    depth DESC
	
2023-09-17 10:04:41.545 EDT [1728595] DETAIL:  parameters: $1 = '{2732440}'

This is I assume once per connection/type, but we still are getting performance concerns about it as we see in #10356. SQLAlchemy uses a connection pool by default however this is still an upfront cost and some folks don't use the pool.

I would assume the purpose of this query has to do with get_attributes() having all the information about the oid for the type.

We don't actually need the "type" part of get_attributes(), just the names (we already know the types on our end). Is there a possibility asyncpg could have some kind of connection parameter, or prepared statement parameter, that is something to the effect use_varchar_for_custom_type or omit_custom_type or something such that this giant query on prepare can be skipped?

Demo that produces the query in question:

import asyncio

import asyncpg


async def main():
    conn = await asyncpg.connect(
        user="scott", password="tiger", database="test"
    )

    await conn.execute("DROP TABLE IF EXISTS mood_test")
    await conn.execute(
        "DROP TYPE IF EXISTS mood"
    )
    await conn.execute(
        "CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy')"
    )

    await conn.execute(
        "CREATE TABLE mood_test (id SERIAL primary key, current_mood mood)"
    )

    await conn.fetch("SELECT 'right before the prepare'")
    pp = await conn.prepare("SELECT id, current_mood FROM mood_test")

    await conn.fetch("SELECT 'right after the prepare'")

    cols = pp.get_attributes()

    await conn.fetch("SELECT 'right after the get_attributes'")

    print(f"attributes: {cols}")

    await conn.close()

asyncio.run(main())

The text was updated successfully, but these errors were encountered:

elprans · 2023-09-19T05:25:03Z

The issue is the Postgres JIT that gets erroneously invoked on the query. Turning jit off makes it run under a millisecond. I have a patch that explicitly disables jit for introspection queries. Will push soon.

zzzeek · 2023-09-19T15:03:49Z

oh i see, it would still run the query but the PG server side of it will just push it through without trying to optimize it, IIUC

elprans · 2023-09-19T15:24:27Z

oh i see, it would still run the query but the PG server side of it will just push it through without trying to optimize it, IIUC

Yes. The fact that Postgres is even trying to JIT-compile this query is arguably a Postgres cost estimation bug, because the cost of compilation here far outweighs any runtime benefits. The query is plenty fast as-is and there isn't any heavy expression computation that might benefit from compiling it to machine code.

elprans · 2023-09-19T15:33:42Z

Also, to better answer your original question, the introspection query is necessary because Postgres does not send any type information in the course of normal query flow other than type OIDs. In order to actually be able to decode data bytes, asyncpg must know what it is decoding, so the introspection query is ran whenever there is a previously unseen type OID (neither a standard type nor a previously introspected one).

You can remove the need for introspection by pre-populating the type OID mapping via set_type_codec() or set_builtin_type_codec(), however that requires you to know which types are what kind ahead of time.

zzzeek · 2023-09-19T15:42:40Z

by "decode" I assume you mean on the result fetching side, does that mean this same introspection occurs when using conn.fetch() without a prepared statement? Other drivers seem to be willing to let unknown OIDs be passed through as bytes and/or bytes that they somehow guess can be treated as strings, but I haven't confirmed they aren't sneaking in OID instrospection in there somehow

zzzeek · 2023-09-19T15:46:36Z

that is, we have some such codecs set up in our impl for this, like this:

        await asyncpg_connection.set_type_codec(
            "cidr",
            encoder=lambda s: s,
            decoder=lambda s: s,
            schema="pg_catalog",
            format="text",
        )

I guess other drivers do essentially that for "unknown" oids. If we had that ability here, I dont think it would cause problems for us, since we dont expect custom types besides ENUM to have any particular behaviors.

doesn't matter much, if the query's time is reduced then we wont get more complaints :)

The misapplication of JIT to asyncpg introspection queries has been a constant source of user complaints. Closes: #530 Closes: #1078 Previously: #875, #794, #782, #741, #727 (and probably more).

amotl mentioned this issue Sep 21, 2023

asyncpg issues a WITH RECURSIVE query when using float[] crate/crate-clients-tools#52

Open

elprans mentioned this issue Sep 23, 2023

Disable JIT while doing type introspection #1082

Merged

elprans closed this as completed in #1082 Oct 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is there any way asyncpg could be configured to not do a giant query for PG custom types when you prepare #1078

is there any way asyncpg could be configured to not do a giant query for PG custom types when you prepare #1078

zzzeek commented Sep 17, 2023 •

edited

Loading

elprans commented Sep 19, 2023

zzzeek commented Sep 19, 2023

elprans commented Sep 19, 2023

elprans commented Sep 19, 2023 •

edited

Loading

zzzeek commented Sep 19, 2023

zzzeek commented Sep 19, 2023

is there any way asyncpg could be configured to not do a giant query for PG custom types when you prepare #1078

is there any way asyncpg could be configured to not do a giant query for PG custom types when you prepare #1078

Comments

zzzeek commented Sep 17, 2023 • edited Loading

elprans commented Sep 19, 2023

zzzeek commented Sep 19, 2023

elprans commented Sep 19, 2023

elprans commented Sep 19, 2023 • edited Loading

zzzeek commented Sep 19, 2023

zzzeek commented Sep 19, 2023

zzzeek commented Sep 17, 2023 •

edited

Loading

elprans commented Sep 19, 2023 •

edited

Loading