Skip to content

ExtremeDie/simple-web-crawler

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple Web Crawler

A very very simple web crawler. Take in input an URL and outputs the site map. It uses Scrapy and PyGraphviz.

It ignores URLs of a different domain from the starting url.

Installation

Please follow the installation guide of Scrapy and Graphviz.

Then, run the following:

pip install -r requirements.txt

Usage

$python2 simple-web-crawler.py -h
usage: simple-web-crawler.py [-h] [-o output_filename] url

E.g.:

python2 simple-web-crawler.py https://www.lipsum.com/ -o lipsum-output.svg

That yields lipsum-output.svg:

About

A very simple web crawler.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%