Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Consider not storing full sitemap XML #110
Right now, sitemap XML is generated async and stored in the database to allow them to be served super quickly. The downside is that any code changes that modify the XML output means all sitemaps need to be re-generated which can be a very slow, time-consuming process on really large sites with thousands of sitemaps.
We should explore alternate ways to handle this (while maintaining backwards compat with existing actions/filters) and evaluate whether those approaches make sense.
Sitemaps have to be regenerated when the template changes, no way around that. The trick here is to figure out how to do that efficently.
Right now the flow is as follows (simplistic version)
Here's what I'm proposing
What happens if the URL structure changes? How do we get other related data like the post modified date from just the URL?
Do we get any major benefits from switching to a hard-coded template? How will we maintain backwards compatibility (e.g. some filters pass in the simplexml object that sites use to add things like images)?
This is probably the biggest thing we'll need to watch. Some of the sites using this plugin have millions of posts dating back 5/10/20 years. If the newer method is significantly slower, it may not be worth it so it would be good to gather and compare some data as we work on this.
A few others asked the some of the same questions you did on the internal site.
For the urls, that's just an example, we'll just need to make sure we store the right data needed, and in the post_meta, the the post_content
Good catch about the backwards compatibility on the SimpleXML objects, I'm going to take a look at that.