Skip to content

improve performance of lastrevisionbefore/firstrevisionsince for small result sets#29

Merged
Alexia merged 1 commit intoAlexia:masterfrom
cotto:rev-optimizations
Mar 10, 2017
Merged

improve performance of lastrevisionbefore/firstrevisionsince for small result sets#29
Alexia merged 1 commit intoAlexia:masterfrom
cotto:rev-optimizations

Conversation

@cotto
Copy link

@cotto cotto commented Mar 8, 2017

MySQL's query optimizer isn't quite smart enough to look at the subqueries generated for lastrevisionbefore and firstrevisionsince. This PR adds an extra WHERE clause to give it a hint. In my staging environment, this causes a drastic performance improvement (1.9s before vs .003s after) when picking a date that filters out most revisions. (e.g. firstrevisionsince=2017-01-01)

@cotto
Copy link
Author

cotto commented Mar 8, 2017

Scratch that. I may have been using non-representative data. I'll come back to this if there's a real perf improvement to be had.

@cotto cotto closed this Mar 8, 2017
@cotto
Copy link
Author

cotto commented Mar 8, 2017

After a bit of testing, it looks like this is still a worthwhile performance improvement, just not quite as earth-shattering as I thought. The most dramatic improvements will come from queries that return a small number of articles (~100 or fewer). Larger results will see more modest (but still measurable) improvements and I don't believe that this will make any queries slower.

@cotto cotto reopened this Mar 8, 2017
@cotto cotto changed the title improve performance of lastrevisionbefore and firstrevisionsince improve performance of lastrevisionbefore/firstrevisionsince for small result sets Mar 8, 2017
@Alexia
Copy link
Owner

Alexia commented Mar 10, 2017

Cotto, this looks good. Before I merge it in, could I get the DPL parser tag/call you are using so I can it over here? I have a few small and large wikis I can run it against for comparison purposes.

@cotto
Copy link
Author

cotto commented Mar 10, 2017

Sure. Here's what I've got. I was using "firstrevisionsince=last month" for testing because I don't have any live users in my dev environment ;). I was planning on using "firstrevisionsince=last day" in production. To test without #31, just sub in the appropriate absolute date.

{{#dpl:
|namespace=
|allowcachedresults=false
|ordermethod=lastedit
|order=descending
|firstrevisionsince=last month
|debug=0
|count=10
}}

edit: To clarify, I'm using allrevisionssince now. I found the optimization before I realized that firstrevisionsince wasn't the right selector to use in the first place.

@cotto
Copy link
Author

cotto commented Mar 10, 2017

I remembered that I realized that firstrevisionsince wasn't the selector that I actually wanted since that doesn't show all articles that with revisions since the given date. allrevisionssince ended up being the right selector. The performance improvement for firstrevisionsince/lastrevisionbefore still holds though.

@Alexia
Copy link
Owner

Alexia commented Mar 10, 2017

I am showing nearly identical query times before and after on several data sets. The explain queries below are from zelda.gamepedia.com before and after respectively. Do you get the same output from the explain queries? I am wondering if the usage of Percona MySQL on the wiki data sets I am testing does this optimization already.

{{#dpl:
|namespace=
|allowcachedresults=false
|ordermethod=lastedit
|order=descending
|firstrevisionsince=2010-01-01
}}
mysql> EXPLAIN SELECT DISTINCT rev.rev_timestamp,rev.rev_id,`page`.page_namespace AS `page_namespace`,`page`.page_id AS `page_id`,`page`.page_title AS `page_title` FROM `revision` `rev`,`page` WHERE (`page`.page_id = rev.rev_page) AND (rev.rev_timestamp = (SELECT MAX(rev_aux.rev_timestamp) FROM `revision` AS rev_aux WHERE rev_aux.rev_page = rev.rev_page)) AND `page`.page_is_redirect = '0' AND `page`.page_namespace = '0' AND (`page`.page_id = rev.rev_page) AND (rev.rev_timestamp = (SELECT MIN(rev_aux_snc.rev_timestamp) FROM `revision` AS rev_aux_snc WHERE rev_aux_snc.rev_page=rev.rev_page AND rev_aux_snc.rev_timestamp >= '20100101000000')) ORDER BY rev.rev_timestamp DESC LIMIT 500;
+----+--------------------+-------------+------+----------------------------------------------------------+-----------------------------+---------+---------------------------------+------+---------------------------------+
| id | select_type        | table       | type | possible_keys                                            | key                         | key_len | ref                             | rows | Extra                           |
+----+--------------------+-------------+------+----------------------------------------------------------+-----------------------------+---------+---------------------------------+------+---------------------------------+
|  1 | PRIMARY            | page        | ref  | PRIMARY,name_title,page_redirect_namespace_len           | page_redirect_namespace_len | 5       | const,const                     | 4698 | Using temporary; Using filesort |
|  1 | PRIMARY            | rev         | ref  | PRIMARY,page_timestamp,page_user_timestamp               | page_user_timestamp         | 4       | zelda_gamepedia_en.page.page_id |    2 | Using where; Using index        |
|  3 | DEPENDENT SUBQUERY | rev_aux_snc | ref  | PRIMARY,rev_timestamp,page_timestamp,page_user_timestamp | page_user_timestamp         | 4       | zelda_gamepedia_en.rev.rev_page |    2 | Using where; Using index        |
|  2 | DEPENDENT SUBQUERY | rev_aux     | ref  | PRIMARY,page_timestamp,page_user_timestamp               | page_user_timestamp         | 4       | zelda_gamepedia_en.rev.rev_page |    2 | Using index                     |
+----+--------------------+-------------+------+----------------------------------------------------------+-----------------------------+---------+---------------------------------+------+---------------------------------+
4 rows in set (0.00 sec)

mysql> EXPLAIN SELECT DISTINCT rev.rev_timestamp,rev.rev_id,`page`.page_namespace AS `page_namespace`,`page`.page_id AS `page_id`,`page`.page_title AS `page_title` FROM `revision` `rev`,`page` WHERE (`page`.page_id = rev.rev_page) AND (rev.rev_timestamp = (SELECT MAX(rev_aux.rev_timestamp) FROM `revision` AS rev_aux WHERE rev_aux.rev_page = rev.rev_page)) AND `page`.page_is_redirect = '0' AND `page`.page_namespace = '0' AND (`page`.page_id = rev.rev_page) AND (rev.rev_timestamp >= '20100101000000') AND (`page`.page_id = rev.rev_page) AND (rev.rev_timestamp = (SELECT MIN(rev_aux_snc.rev_timestamp) FROM `revision` AS rev_aux_snc WHERE rev_aux_snc.rev_page=rev.rev_page AND rev_aux_snc.rev_timestamp >= '20100101000000')) 
+----+--------------------+-------------+------+----------------------------------------------------------+-----------------------------+---------+---------------------------------+------+---------------------------------+
| id | select_type        | table       | type | possible_keys                                            | key                         | key_len | ref                             | rows | Extra                           |
+----+--------------------+-------------+------+----------------------------------------------------------+-----------------------------+---------+---------------------------------+------+---------------------------------+
|  1 | PRIMARY            | page        | ref  | PRIMARY,name_title,page_redirect_namespace_len           | page_redirect_namespace_len | 5       | const,const                     | 4698 | Using temporary; Using filesort |
|  1 | PRIMARY            | rev         | ref  | PRIMARY,rev_timestamp,page_timestamp,page_user_timestamp | page_user_timestamp         | 4       | zelda_gamepedia_en.page.page_id |    2 | Using where; Using index        |
|  3 | DEPENDENT SUBQUERY | rev_aux_snc | ref  | PRIMARY,rev_timestamp,page_timestamp,page_user_timestamp | page_user_timestamp         | 4       | zelda_gamepedia_en.rev.rev_page |    2 | Using where; Using index        |
|  2 | DEPENDENT SUBQUERY | rev_aux     | ref  | PRIMARY,page_timestamp,page_user_timestamp               | page_user_timestamp         | 4       | zelda_gamepedia_en.rev.rev_page |    2 | Using index                     |
+----+--------------------+-------------+------+----------------------------------------------------------+-----------------------------+---------+---------------------------------+------+---------------------------------+

@cotto
Copy link
Author

cotto commented Mar 10, 2017

@Alexia, that's expected for a large data set. Try a much smaller one, e.g. firstrevisionsince=2017-01-01

@Alexia
Copy link
Owner

Alexia commented Mar 10, 2017

I see it now, sorry. This is a huge performance increase on small queries in large data sets. On Zelda Wiki it went from 10.00 seconds average to 0.14 seconds average for firstrevisionsince=2017-02-01.

@Alexia Alexia merged commit d001a13 into Alexia:master Mar 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants