Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 110 #116

Open
wants to merge 4 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@systemseven
Copy link

commented Mar 23, 2017

I've switched the generation to generate serialized data. Then that get's formatted on the front end with SimpleXML so we retain the benefits and abilities of using that.

Performance wise your inserts/generation should be unaffected. (Or if it is affected, it's only marginal). We're still inserting text data but it's not formatted as XML anymore

The front end render seems to be okay, the default 500 page limit is around .002 seconds to render and if we up that to 5000 it's around .04 seconds to render.

@mjangda

This comment has been minimized.

Copy link
Member

commented Mar 23, 2017

Thanks for taking the initial pass at this.

With the updated approach here, how do we handle this scenario that I noted in #110?

What happens if the URL structure changes? How do we get other related data like the post modified date from just the URL?

It would also be good to do more thorough benchmarks with a really large scale dataset (e.g. for a site with 100K+ posts, does that change the measurements at all?) and ideally include details on how you're measuring as well.

@systemseven

This comment has been minimized.

Copy link
Author

commented Mar 23, 2017

@mjangda

This comment has been minimized.

Copy link
Member

commented Mar 23, 2017

Can you clarify the "what happens if the URL changes?" Meaning the site map URL structure or the posts URL structure?

The posts URL structure. Example: going from /2017/01/post-name/ to /category/post-name/.

For other related data the site map generator has access to the whole post
object (I'm not storing just post urls like I originally stated) so if we
wanted extra data we could get it when we generate the sitemap.

How would someone hooking into this filter (https://github.com/Automattic/msm-sitemap/pull/116/files#diff-67d63994628dd99d09072eee489c7487R706) access the post object? Previously, we were running the filter in The Loop, so you could access the global post object.

@systemseven

This comment has been minimized.

Copy link
Author

commented Mar 24, 2017

I think after looking at this, that 1 fix could solve both things. So rather than just put an array of strings that are specific to the sitemap in the database (ie: loc, url, etc...) I can serialize the whole post object.

That way when it's popped back out, I could pass it as a parameter to the apply_filters callback.

The other way this helps is with generating the link at runtime. So rather than write the permalink to the DB we can get the permalink at runtime on the sitemap. Since we have the whole post object, we can pass it to get_permalink() and it should NOT run another DB query according to what I've read

Thoughts here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.