Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow SELECT query on large table #741

Closed
mlexs opened this issue Apr 20, 2021 · 8 comments
Closed

Slow SELECT query on large table #741

mlexs opened this issue Apr 20, 2021 · 8 comments

Comments

@mlexs
Copy link

mlexs commented Apr 20, 2021

  • asyncpg version: 0.22.0
  • PostgreSQL version: 11.10
  • Do you use a PostgreSQL SaaS? If so, which? Can you reproduce
    the issue with a local PostgreSQL install?
    : No, and I was able to reproduce it locally.
  • Python version: 3.7.9
  • Platform: Fedora 31
  • Do you use pgbouncer?: No
  • Did you install asyncpg with pip?: Yes
  • If you built asyncpg locally, which version of Cython did you use?:
  • Can the issue be reproduced under both asyncio and
    uvloop?
    : We use Starlette for serving API requests.

We have a large (~30 million rows) database table. I have noticed the SQL would run terribly slow when executed via asyncpg but very fast via psql.

Schema for table:

CREATE TABLE addresses(
    zipcode VARCHAR(12),
    line1 text,
    -- 13x more fields of text type
);
CREATE INDEX z_idx ON addresses(zipcode, line1);

Query to execute.

SELECT line_1, ..., zipcode FROM addresses WHERE REPLACE(zipcode, ' ', '')=$1 GROUP BY A, B, line1 ORDER BY A, B

This query never results in more than 100 rows (out of mentioned 30 million).

Time to execute via psql: less than 1 ms
Same query via asyncpg: more than 6 seconds (!)

I have not looked at source code for asyncpg so not sure what's going on here.

Can someone tell me why asyncpg runs this query so slow? Thanks!

FYI I have fixed this "temporarily" by some quick data normalisation where I added zipcode_x that contains no space so I could got rid of the SQL's REPLACE function.

SELECT line_1, ..., zipcode FROM addresses WHERE zipcode=$1 GROUP BY A, B, line1 ORDER BY A, B

and now asyncpg is very fast (as it should be).

@elprans
Copy link
Member

elprans commented Apr 20, 2021

The REPLACE operation effectively negates the index you created on zipcode. Try creating an expression index instead: CREATE INDEX z_idx ON addresses(REPLACE(zipcode, ' ', ''), line1);

@mlexs
Copy link
Author

mlexs commented Apr 21, 2021

@elprans thanks. your solution does not solve the issue we had with asyncpg. NB the REPLACE in pure SQL is fast. same SQL in asyncpg is slow. that's the issue here.

@elprans
Copy link
Member

elprans commented Apr 21, 2021

Well, my best guess would be that Postgres is grossly mis-planning the query somehow if a prepared statement is used. Try statement_cache_size=0 when connecting and see if that helps.

@mlexs
Copy link
Author

mlexs commented Apr 29, 2021

This does not use a prepared statement. Tried statement_cache_size & hadn't helped.

@mlexs
Copy link
Author

mlexs commented Apr 29, 2021

TL;DR our query is fine in PSQL client, but somehow slow when executed via asyncpg. 👀

@hipertracker
Copy link

hipertracker commented May 16, 2021

Did you try to reproduce that issue with the newer versions of Postgres (13.3) and Python (3.9.5)?

@Tomcat-Engineering
Copy link

I just had very similar symptoms, with a query that was fast in psql taking 6 seconds via asyncpg.

After a lot of testing I narrowed it down to having ENUM columns in the query results. My query returning 16 columns took 15ms... adding an enum column made that into 1.5 seconds, and adding a second enum column made that into 6 seconds!

The fix discussed at #530 (comment) seemed to work, and reduced the time back down to 20ms, so I guess this is related to the postgres JIT rather than something in asyncpg. Perhaps that might be the cause in your query too?

@mlexs
Copy link
Author

mlexs commented Jun 17, 2021

Thanks @Tomcat-Engineering - the JIT issue sounds like a good rabbit hole to follow 😃

Unfortunately, I no longer have means to test the above using asyncpq .... since I switched to Sqlite as our huge database is generated once a month and it's read-only 100% of time so the little dB engine is more suited for now.

@mlexs mlexs closed this as completed Jun 17, 2021
elprans added a commit that referenced this issue Sep 23, 2023
The misapplication of JIT to asyncpg introspection queries has been a
constant source of user complaints.

Closes: #530
Closes: #1078
Previously: #875, #794, #782, #741, #727 (and probably more).
elprans added a commit that referenced this issue Oct 7, 2023
The misapplication of JIT to asyncpg introspection queries has been a
constant source of user complaints.

Closes: #530
Closes: #1078
Previously: #875, #794, #782, #741, #727 (and probably more).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants