Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMWSQLStore3Readers::getInProperties performance #1234

Closed
benhutchins opened this issue Oct 27, 2015 · 2 comments
Closed

SMWSQLStore3Readers::getInProperties performance #1234

benhutchins opened this issue Oct 27, 2015 · 2 comments

Comments

@benhutchins
Copy link

I'm working with a wiki that has a large data set, and seeing that currently I'm unable to edit a large number of pages due to a performance issue resulting from SMWSQLStore3Readers::getInProperties.

includes/storage/SQLStore/SMW_SQLStore3_Readers.php:getInProperties results in a generated SQL statement that uses a very inefficient INNER JOIN.

The generated query is SELECT DISTINCT smw_title,smw_sortkey FROM "smw_object_ids" INNER JOIN "smw_di_wikipage" AS t1 ON t1.p_id=smw_id WHERE t1.o_id='19683';

EXPLAIN ANALYZE VERBOSE SELECT DISTINCT smw_title,smw_sortkey FROM "smw_object_ids" INNER JOIN "smw_di_wikipage" AS t1 ON t1.p_id=smw_id WHERE t1.o_id='19683';

                                                                             QUERY PLAN                                                                             
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=27277.13..28047.51 rows=66035 width=54) (actual time=49404.123..78673.174 rows=647 loops=1)
   Output: smw_object_ids.smw_title, smw_object_ids.smw_sortkey
   ->  Sort  (cost=27277.13..27533.92 rows=102717 width=54) (actual time=49404.119..76402.701 rows=101614 loops=1)
         Output: smw_object_ids.smw_title, smw_object_ids.smw_sortkey
         Sort Key: smw_object_ids.smw_title, smw_object_ids.smw_sortkey
         Sort Method: external merge  Disk: 5040kB
         ->  Hash Join  (cost=8849.69..15212.81 rows=102717 width=54) (actual time=971.847..4724.746 rows=101614 loops=1)
               Output: smw_object_ids.smw_title, smw_object_ids.smw_sortkey
               Hash Cond: (t1.p_id = smw_object_ids.smw_id)
               ->  Bitmap Heap Scan on mediawiki.smw_di_wikipage t1  (cost=1940.48..4644.44 rows=102717 width=4) (actual time=24.473..1161.580 rows=101614 loops=1)
                     Output: t1.p_id
                     Recheck Cond: (t1.o_id = 19683)
                     ->  Bitmap Index Scan on smw_di_wikipage_index0  (cost=0.00..1914.80 rows=102717 width=0) (actual time=24.193..24.193 rows=101614 loops=1)
                           Index Cond: (t1.o_id = 19683)
               ->  Hash  (cost=4343.65..4343.65 rows=110365 width=58) (actual time=947.249..947.249 rows=110174 loops=1)
                     Output: smw_object_ids.smw_title, smw_object_ids.smw_sortkey, smw_object_ids.smw_id
                     Buckets: 2048  Batches: 16  Memory Usage: 629kB
                     ->  Seq Scan on mediawiki.smw_object_ids  (cost=0.00..4343.65 rows=110365 width=58) (actual time=0.009..377.686 rows=110174 loops=1)
                           Output: smw_object_ids.smw_title, smw_object_ids.smw_sortkey, smw_object_ids.smw_id
 Total runtime: 78676.119 ms
(20 rows)

The slowness here is a result of the larger data set, I have a lot of articles referencing each other. This isn't a test set either, this data resulted from active use. select count(*) from mediawiki.smw_di_wikipage where o_id=19683; shows I have 101614 objects. So this INNER JOIN is an obvious killer.

Rewriting this to use a subquery instead of a join shows obvious performance improvements.

EXPLAIN ANALYZE VERBOSE SELECT DISTINCT smw_title,smw_sortkey FROM mediawiki."smw_object_ids" WHERE smw_id IN (SELECT p_id FROM mediawiki."smw_di_wikipage" WHERE o_id='19683’);

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=5173.18..5184.58 rows=1140 width=54) (actual time=2737.383..2738.172 rows=647 loops=1)
   Output: smw_object_ids.smw_title, smw_object_ids.smw_sortkey
   ->  Nested Loop  (cost=4901.65..5167.48 rows=1140 width=54) (actual time=2620.646..2736.223 rows=647 loops=1)
         Output: smw_object_ids.smw_title, smw_object_ids.smw_sortkey
         ->  HashAggregate  (cost=4901.23..4905.70 rows=447 width=4) (actual time=2620.612..2621.517 rows=647 loops=1)
               Output: smw_di_wikipage.p_id
               ->  Bitmap Heap Scan on mediawiki.smw_di_wikipage  (cost=1940.48..4644.44 rows=102717 width=4) (actual time=108.031..1515.608 rows=101614 loops=1)
                     Output: smw_di_wikipage.s_id, smw_di_wikipage.p_id, smw_di_wikipage.o_id
                     Recheck Cond: (smw_di_wikipage.o_id = 19683)
                     ->  Bitmap Index Scan on smw_di_wikipage_index0  (cost=0.00..1914.80 rows=102717 width=0) (actual time=107.776..107.776 rows=101614 loops=1)
                           Index Cond: (smw_di_wikipage.o_id = 19683)
         ->  Index Scan using smw_object_ids_index0 on mediawiki.smw_object_ids  (cost=0.42..0.58 rows=1 width=58) (actual time=0.004..0.172 rows=1 loops=647)
               Output: smw_object_ids.smw_id, smw_object_ids.smw_namespace, smw_object_ids.smw_title, smw_object_ids.smw_iw, smw_object_ids.smw_subobject, smw_object_ids.smw_sortkey, smw_object_ids.smw_proptable_hash
               Index Cond: (smw_object_ids.smw_id = smw_di_wikipage.p_id)
 Total runtime: 2738.955 ms
(15 rows)

This subquery saves ~76 seconds and is 2800% faster. An obvious win overall. I'm prepared to rewrite this entire function, and am wondering if there is a reason I'm missing why a join is being used instead of a subquery before doing so.

@mwjames
Copy link
Contributor

mwjames commented Oct 28, 2015

@mwjames
Copy link
Contributor

mwjames commented May 14, 2017

I expect this to addressed with #2461 (3.0.x).

@mwjames mwjames closed this as completed May 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants