Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic collection-membership query can be very slow #92

Open
nevali opened this issue Sep 19, 2016 · 7 comments
Open

Basic collection-membership query can be very slow #92

nevali opened this issue Sep 19, 2016 · 7 comments

Comments

@nevali
Copy link
Member

nevali commented Sep 19, 2016

Actual query executed by the Spindle module for Quilt:

SELECT "i"."id", "i"."classes", "i"."title", "i"."description", "i"."coordinates", "i"."modified" FROM "index" "i" INNER JOIN "membership" "cm" ON ("i"."id" = "cm"."id" AND "cm"."collection" = 'cbe0cdeca6b340b19ed57d7290fcceaf') WHERE "i"."score" <= 40 ORDER BY "modified" DESC LIMIT 26;

Notably, this actually returns an empty set (because nothing is a member of cbe0cdeca6b340b19ed57d7290fcceaf), but takes some time to do it. A query only on the membership table returns instantly, suggesting that the query planner is performing the join in a non-optimal order.

We should investigate how best to hint to the planner that the collection-membership constraint is one which has a low cost and high benefit in terms of filtering rows from the joined tables.

Note that reversing the join order doesn't appear to make a material difference to query performance.

Related to #72.
Possibly the cause of some of the issues ascribed to #87.

Internal tracking: RESDATA-1093

@nevali
Copy link
Member Author

nevali commented Sep 20, 2016

Even after a VACUUM ANALYZE and ANALYZE of the index and membership tables this is still very slow on PostgreSQL.

@CygnusAlpha
Copy link

CygnusAlpha commented Oct 12, 2016

Would an index on 'score' and 'modified' help?

https://www.postgresql.org/docs/8.3/static/indexes-ordering.html

@rjpwork
Copy link

rjpwork commented Nov 28, 2016

partir01=# explain SELECT "i"."id", "i"."classes", "i"."title", "i"."description", "i"."coordinates", "i"."modified" FROM "index" "i" INNER JOIN "membership" "cm" ON ("i"."id" = "cm"."id" AND "cm"."collection" = 'cbe0cdeca6b340b19ed57d7290fcceaf') WHERE "i"."score" <= 40 ORDER BY "modified" DESC LIMIT 26;
                                                QUERY PLAN                                                
----------------------------------------------------------------------------------------------------------
 Limit  (cost=30246.79..30246.86 rows=26 width=160)
   ->  Sort  (cost=30246.79..30249.88 rows=1236 width=160)
         Sort Key: i.modified DESC
         ->  Nested Loop  (cost=71.43..30211.56 rows=1236 width=160)
               ->  Bitmap Heap Scan on membership cm  (cost=71.01..10953.88 rows=2896 width=16)
                     Recheck Cond: (collection = 'cbe0cdec-a6b3-40b1-9ed5-7d7290fcceaf'::uuid)
                     ->  Bitmap Index Scan on membership_collection  (cost=0.00..70.28 rows=2896 width=0)
                           Index Cond: (collection = 'cbe0cdec-a6b3-40b1-9ed5-7d7290fcceaf'::uuid)
               ->  Index Scan using index_id on index i  (cost=0.42..6.64 rows=1 width=160)
                     Index Cond: (id = cm.id)
                     Filter: (score <= 40)
(11 rows)

partir01=# drop index membership_collection;
DROP INDEX

partir01=# explain SELECT "i"."id", "i"."classes", "i"."title", "i"."description", "i"."coordinates", "i"."modified" FROM "index" "i" INNER JOIN "membership" "cm" ON ("i"."id" = "cm"."id" AND "cm"."collection" = 'cbe0cdeca6b340b19ed57d7290fcceaf') WHERE "i"."score" <= 40 ORDER BY "modified" DESC LIMIT 26;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.99..37092.72 rows=26 width=160)
   ->  Nested Loop  (cost=0.99..1763284.90 rows=1236 width=160)
         ->  Index Scan Backward using index_modified_temp on index i  (cost=0.42..79722.21 rows=257181 width=160)
               Filter: (score <= 40)
         ->  Index Only Scan using membership_pkey on membership cm  (cost=0.56..6.54 rows=1 width=16)
               Index Cond: ((id = i.id) AND (collection = 'cbe0cdec-a6b3-40b1-9ed5-7d7290fcceaf'::uuid))
(6 rows)

@nevali nevali added the triaged label Feb 7, 2017
@rjpwork
Copy link

rjpwork commented Feb 8, 2017

I think the basic problem here might be the 50M+ extraneous rows in membership which seem to be related to the billings collection bug.

@nevali
Copy link
Member Author

nevali commented Feb 8, 2017

fifty million is not an especially big number

@rjpwork
Copy link

rjpwork commented Feb 8, 2017

Sure, if they're intraneous. If they're extraneous, it's a big number.

@nevali
Copy link
Member Author

nevali commented Feb 8, 2017

yes, but the point is — querying that table at that size shouldn't be nearly as expensive an operation as it is; tidying up the collections masks, but doesn't solve, the problem with this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants