Skip to content
Ultimate Website Sitemap Parser
Python
Branch: develop
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.idea Rename PyCharm project file Aug 1, 2019
docs Autogenerate docs for .objects Jul 18, 2019
tests Normalize project name Jul 31, 2019
usp Normalize project name Jul 31, 2019
.coveragerc Fix tests and coverage on Travis Nov 29, 2018
.gitignore Add .gitignore Nov 27, 2018
.travis.yml Upload release tarball only once Jul 16, 2019
LICENSE.txt Add initial version Nov 28, 2018
MANIFEST.in Add initial version Nov 28, 2018
README.rst Normalize project name Jul 31, 2019
setup.cfg Add initial version Nov 28, 2018
setup.py Normalize project name Jul 31, 2019

README.rst

Build Status Documentation Status Coverage Status PyPI package Download stats

Website sitemap parser for Python 3.5+.

Features

Installation

pip install ultimate-sitemap-parser

Usage

from usp.tree import sitemap_tree_for_homepage

tree = sitemap_tree_for_homepage('https://www.nytimes.com/')
print(tree)

sitemap_tree_for_homepage() will return a tree of AbstractSitemap subclass objects that represent the sitemap hierarchy found on the website; see a reference of AbstractSitemap subclasses.

If you'd like to just list all the pages found in all of the sitemaps within the website, consider using all_pages() method:

# all_pages() returns an Iterator
for page in tree.all_pages():
    print(page)

all_pages() method will return an iterator yielding SitemapPage objects; see a reference of SitemapPage.

You can’t perform that action at this time.