Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(webcrawler-sitemaps): Ability to handle sitemap index files #377

Merged
merged 2 commits into from
Feb 22, 2024

Conversation

surukonda
Copy link
Contributor

When multiple sitemap files exists for a website, sitemap index files are created, this change would allow to crawl the website when sitemap index urls are submitted and also handle compressed sitemap files when specified as part of the index

Description of changes:

  • Handle sitemap index files
  • Handle compressed sitemap files when specified as part of the sitemap index

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link
Collaborator

@massi-ang massi-ang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@spugachev spugachev merged commit bc6f120 into aws-samples:main Feb 22, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants