Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Please give us a description of what happened.
The performance of Yoast sitemaps is hurting servers on larger websites.
With a client of mine, they have 1.4 million products, and Yoast is causing table-scans to happen on every segment of data loaded. It's good that Yoast builds the sitemap in chunks, but it's bad that table scans happen for each of these chunks.
The table scans are causing sitemaps to not work on large websites, and worse, they're hurting other user queries.
Please describe what you expected to happen and why.
I expect index-seeks and good performance regardless of how large websites get.
I expect the sitemaps to work.
How can we reproduce this behavior?
The problem is two-fold:
I've posted a full analysis here:
Continuing our discussion from your post, in the past I worked on refactoring of sitemaps module and specifically paid close attention to feedback on performance at the scale you are talking about.
For the record I am not a DB specialist, so my experience with this is combination of my own observation, extensive feedback from plugin's users, and some reaching out to people who are specialists.
To reiterate your claim from the post is that main query for the sitemap post page is unoptimized and doesn't scale for large amounts of posts.
The query in question is (you chose to run a subquery part of it in your post):
SELECT l.ID, post_title, post_content, post_name, post_parent, post_author, post_modified_gmt, post_date, post_date_gmt FROM ( SELECT wp_posts.ID FROM wp_posts WHERE wp_posts.post_status = 'publish' AND wp_posts.post_type = 'test' AND wp_posts.post_password = '' AND wp_posts.post_date != '0000-00-00 00:00:00' ORDER BY wp_posts.post_modified ASC LIMIT 100 OFFSET 0 ) o JOIN wp_posts l ON l.ID = o.ID
The main challenge of SQL for sitemaps is that MySQL doesn't scale well for OFFSET operations. The more rows you have the more rows it walks through to reach your offset. The queries in the plugin had been very specifically written and audited to account for that.
I didn't have cool 1M posts on hand, but I generated 92K which is more than sufficient to illustrate the rate of performance drop off:
So with a considerable offset performance dropped of by 50 milliseconds, while your post claims performance drop off of multiple minutes (at larger sizes, but certainly not consistent with the pace I observe).
Do I believe that you observe poor performance at a large site? Sure, that is certainly possible.
Do I believe that this query is inherently problematic? No, not from my experience or any feedback I've gotten before.
I suspect the question to explore here is why that specific server runs this query slowly, not what's wrong with the query.
I didn't mean this dismissively, I apologize for it coming across like that. I believe your observed results, though I disagree on conclusions so far (the need to change the query and add custom index).
Gotcha. From point of view of plugin in general that would have two problems:
I can't speak for your server stack, I can only say that there is extensive feedback on record from large sites on sitemaps to the contrary and it performing sufficiently well. As such I would suspect individual issue with environment or otherwise, not a systematic one with implementation.
Since I am no longer involved with development, I think I'll shut up now. :) Thank you for taking the time to research and report and I (always) hope there are opportunities to improve sitemap performance for general case, it was my pet module. :)
This issue occurs for me and can be observed in large sites with several hundred thousand pages simply by visiting the sitemap_index.xml page. It is loading very slow. Its trivial to insert 1M+ records in the wp_posts table or using the WP API to replicate this. There is a workaround I found: https://markjaquith.wordpress.com/2018/01/22/how-i-fixed-yoast-seo-sitemaps-on-a-large-wordpress-site/
I have the same issue and highly penalising regarding loading time (15 minutes per sitemap).
Have you recently tested https://markjaquith.wordpress.com/2018/01/22/how-i-fixed-yoast-seo-sitemaps-on-a-large-wordpress-site/ solution ?
I tried to implement it, however the command line is failing, executing:
Am I missing something?
https://www.wpintense.com/2019/02/04/performance-optimisation-for-various-xml-sitemap-plugins/ > Yoast SEO XML Sitemaps performance (this is same as what he mentioned above in this issue thread)
I eventually dropped Yoast's sitemap and used Google XML Sitemaps as mentioned in the link as well. This was the best solution for me despite the bugs with the 3 year old plugin.
Hoping though that someone would adopt the Google XML Sitemaps plugin through WP official so the plugin gets updated and its bugs fixed since its performance really outperforms Yoast's sitemap generator.
Also got this problem with 500k+ posts. I think it's Google fetching the posts and related to XML sitemap 100%.