Implement a performant handling of sitemap pages #39

svandragt · 2019-10-31T11:13:15Z

svandragt · 2019-10-31T11:23:27Z

v3

"Supporting 50k/(2626) = 76 post-types. Scales up to 2626*2000 = 1.3 million posts per post-type."

md5 of post type and id
$length = 1 (26 pages, max length is 2)
for each post
- page-id is post_type + strleft($hash, $length)
- assign post to page.
- if page -post-count >= 2000-(26*$length)
  - $length++

svandragt · 2019-11-06T08:31:31Z

Blocked until the technical doc is updated with @joemcgill and @swissspidy thoughts.

joemcgill · 2019-11-11T23:04:12Z

Thanks for kicking off this discussion, @svandragt.

If I'm understanding your above description correctly, you're exploring the idea of a hash lookup table where we would automatically create sitemap/buckets for evenly distributing a large number of URLs into groups where we could quickly look up the location of each object based on some deterministic algorithm (in this case, based on object type and ID).

This is a really smart solution for doing fast lookups, but I'm concerned that we'll end up with a large number of buckets containing artificially low numbers of objects on sites that have a large number of custom post types and/or custom taxonomy types, which could create performance issues when generating the sitemap index.

Ideally, I think we want to come up with a solution that optimizes the objects:buckets ratio so that we can pack a large number of objects into the smallest number of buckets possible, while still being able to quickly look up which bucket an object is in so we can update/delete buckets whenever an objects within that bucket is updated/deleted.

The simplest solution for looking up which would be to save the bucket ID as metadata of the object (e.g., post_meta for a post), but as you pointed out in the requirements above, that would lead to a huge increase in meta rows in the database as we add references for each object.

If we're storing each bucket as a post of a custom post type, perhaps we can save the maximum and minimum post ID from each bucket in the post meta of each bucket and give each bucket a name which identifies which object type it includes, then we could look up all buckets for a particular object type using a LIKE query on wp_post.post_name, which is indexed, and loop through the post meta values until we find the one whose min/max ID contains the ID for the object we're modifying. For active buckets (i.e., the newest one being filled) each time we add an object, we'll update the max ID in that bucket's post meta to match the ID of the post we're publishing, assuming it's larger than the max ID that already exists.

joemcgill · 2019-11-14T23:08:03Z

I've started on a proof of concept in #64 based on a more fleshed out description detailed on the (still in progress) design document from #11.

swissspidy · 2020-04-21T12:25:47Z

Closing this one for now, as this optimization is off the table for now.

svandragt added the Type: Enhancement Enhancement to an existing feature label Oct 31, 2019

svandragt self-assigned this Oct 31, 2019

svandragt changed the title ~~Distribution strategy for the post sitemap relationship~~ Distribution strategy for the post <> sitemap-page relationship Oct 31, 2019

svandragt mentioned this issue Oct 31, 2019

18: Post Sitemaps #33

Merged

8 tasks

joemcgill mentioned this issue Nov 12, 2019

[WIP] Add initial implementation for saving sitemaps to buckets. #64

Closed

10 tasks

svandragt removed their assignment Nov 13, 2019

joemcgill self-assigned this Nov 13, 2019

joemcgill changed the title ~~Distribution strategy for the post <> sitemap-page relationship~~ Implement a performant handling of sitemap pages Nov 14, 2019

swissspidy closed this as completed Apr 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a performant handling of sitemap pages #39

Implement a performant handling of sitemap pages #39

svandragt commented Oct 31, 2019 •

edited

Loading

svandragt commented Oct 31, 2019

svandragt commented Nov 6, 2019

joemcgill commented Nov 11, 2019

joemcgill commented Nov 14, 2019

swissspidy commented Apr 21, 2020

Implement a performant handling of sitemap pages #39

Implement a performant handling of sitemap pages #39

Comments

svandragt commented Oct 31, 2019 • edited Loading

Description

Acceptance Criteria

svandragt commented Oct 31, 2019

svandragt commented Nov 6, 2019

joemcgill commented Nov 11, 2019

joemcgill commented Nov 14, 2019

swissspidy commented Apr 21, 2020

svandragt commented Oct 31, 2019 •

edited

Loading