Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sitemap.xml doens't scale #118

Open
tbille opened this issue Dec 2, 2021 · 3 comments
Open

sitemap.xml doens't scale #118

tbille opened this issue Dec 2, 2021 · 3 comments

Comments

@tbille
Copy link
Contributor

tbille commented Dec 2, 2021

Generating a sitemap is really expensive, every time the /sitemap.xml is loaded we make:

  • an API call to the index topic to get the list of URLs and topics.
  • and API call to the topic for EACH post to get the last modified date

This is really expensive when the amount of topics grows. For example, the engage pages have 278 discourse topics. This means we make 279 API calls (index topic + each topic) to generate the sitemap. The makes the engage page sitemap timeout constantly (https://ubuntu.com/engage/sitemap.xml).

To solve this we could paginate the sitemap:

  • the endpoint /sitemap.xml would become a sitemap index
  • we create the endpoint: /sitemap-<PAGE>.xml where PAGE is the page where 10(to define) topics are listed

This would make the sitemaps lighter, much faster to load and easier to parse in case we need to process them.

@minkyngkm
Copy link

@tbille @nottrobin Do you have any update about this issue by any chance?

@tbille
Copy link
Contributor Author

tbille commented Jan 6, 2022

This issue is also a problem for snapcraft.io/docs/sitemap.xml that contains a lot of documentation pages and fails.

@edlerd
Copy link
Contributor

edlerd commented Sep 30, 2022

We are already using the Discourse Data Explorer plugin [1] with a custom query to fetch multiple topics [2]. Why not create another custom query to get the last updated at field for multiple topics? Should be very cheap to execute in batch and much smaller datawise if we limit the query for a single date field.

[1] https://meta.discourse.org/t/discourse-data-explorer/32566
[2] https://github.com/canonical/canonicalwebteam.discourse/blob/main/canonicalwebteam/discourse/models.py#L45

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants