feat(utils): add sitemapFilter option to parseSitemap#3557
Merged
janbuchar merged 2 commits intoapify:masterfrom Apr 13, 2026
Merged
feat(utils): add sitemapFilter option to parseSitemap#3557janbuchar merged 2 commits intoapify:masterfrom
janbuchar merged 2 commits intoapify:masterfrom
Conversation
Add an optional `sitemapFilter` callback to `ParseSitemapOptions` that allows filtering which nested sitemaps from sitemap index files are followed. This is useful when a sitemap index contains many irrelevant child sitemaps (e.g., video sitemaps) that should be skipped. Made-with: Cursor
Contributor
|
I haven't read the code yet, but do I understand it correctly that this new callback is invoked for |
Contributor
Author
|
Yes that is correct. Only for child sitemap urls in a sitemap index. |
janbuchar
reviewed
Apr 9, 2026
Contributor
janbuchar
left a comment
There was a problem hiding this comment.
LGTM, but let's think about the naming.
| * Return `true` to include the sitemap, `false` to skip it. | ||
| * If not provided, all nested sitemaps are followed. | ||
| */ | ||
| sitemapFilter?: (sitemapUrl: string) => boolean; |
Contributor
There was a problem hiding this comment.
Let's think about alternative names for this option. How about nestedSitemapFilter?
Contributor
Author
There was a problem hiding this comment.
Yes that makes sense. A nested sitemap filter is exactly what it is.
Contributor
There was a problem hiding this comment.
Cool, let's go with that then
janbuchar
approved these changes
Apr 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
When working with sitemap index files,
parseSitemapcurrently follows all child sitemaps unconditionally. Sometimes sitemap indexes contain hundreds of child sitemaps, for instance, a child sitemap for every month going back 15 years (e.g.,/articles-2010-01.xmlthrough/articles-2026-03.xml). If you're only interested in the last 2 years of content, there's no way to skip the irrelevant ones without fetching and parsing all of them.This PR adds a
sitemapFiltercallback option that lets you control which child sitemaps to skip, based on their URL.Changes
sitemapFilter?: (sitemapUrl: string) => booleantoParseSitemapOptionstrueto include,falseto skip.Example usage