Skip to content
Scrape all the URLs from a sitemap or a sitemap index
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md
extract_urls_sitemap.py

README.md

extract_urls_from_sitemap_index

Scrape all the URLs from a sitemap index or a sitemap.xml. The parameter is the URL of the sitemap_index. Only works with XML format. The script will output an excel with three columns:

  • ID
  • Sitemap: in which sitemap was found the url
  • Url: A list of all the urls that appears in the sitemap(s)
You can’t perform that action at this time.