improve performance of lastrevisionbefore/firstrevisionsince for small result sets#29
Conversation
|
Scratch that. I may have been using non-representative data. I'll come back to this if there's a real perf improvement to be had. |
|
After a bit of testing, it looks like this is still a worthwhile performance improvement, just not quite as earth-shattering as I thought. The most dramatic improvements will come from queries that return a small number of articles (~100 or fewer). Larger results will see more modest (but still measurable) improvements and I don't believe that this will make any queries slower. |
|
Cotto, this looks good. Before I merge it in, could I get the DPL parser tag/call you are using so I can it over here? I have a few small and large wikis I can run it against for comparison purposes. |
|
Sure. Here's what I've got. I was using "firstrevisionsince=last month" for testing because I don't have any live users in my dev environment ;). I was planning on using "firstrevisionsince=last day" in production. To test without #31, just sub in the appropriate absolute date. edit: To clarify, I'm using allrevisionssince now. I found the optimization before I realized that firstrevisionsince wasn't the right selector to use in the first place. |
|
I remembered that I realized that firstrevisionsince wasn't the selector that I actually wanted since that doesn't show all articles that with revisions since the given date. allrevisionssince ended up being the right selector. The performance improvement for firstrevisionsince/lastrevisionbefore still holds though. |
|
I am showing nearly identical query times before and after on several data sets. The explain queries below are from zelda.gamepedia.com before and after respectively. Do you get the same output from the explain queries? I am wondering if the usage of Percona MySQL on the wiki data sets I am testing does this optimization already. |
|
@Alexia, that's expected for a large data set. Try a much smaller one, e.g. firstrevisionsince=2017-01-01 |
|
I see it now, sorry. This is a huge performance increase on small queries in large data sets. On Zelda Wiki it went from 10.00 seconds average to 0.14 seconds average for firstrevisionsince=2017-02-01. |
MySQL's query optimizer isn't quite smart enough to look at the subqueries generated for lastrevisionbefore and firstrevisionsince. This PR adds an extra WHERE clause to give it a hint. In my staging environment, this causes a drastic performance improvement (1.9s before vs .003s after) when picking a date that filters out most revisions. (e.g. firstrevisionsince=2017-01-01)