A very very simple web crawler. Take in input an URL and outputs the site map. It uses Scrapy and PyGraphviz.
It ignores URLs of a different domain from the starting url.
Please follow the installation guide of Scrapy and Graphviz.
Then, run the following:
pip install -r requirements.txt
$python2 simple-web-crawler.py -h
usage: simple-web-crawler.py [-h] [-o output_filename] url
E.g.:
python2 simple-web-crawler.py https://www.lipsum.com/ -o lipsum-output.svg
That yields lipsum-output.svg
: