Skip to content
Python web scraper
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitattributes
.gitignore
README
scrape.py

README

Scrapes the pages and resources on a domain, starting from the provided URL.
Local directory structure will mimic the URL paths as closely as possible.
Inspects the HTML pages for src and href attributes.

Usage: usage = scrape.py OPTIONS domain url

Options:
  -h, --help  show the help message and exit
  --out  output directory, if not provided, will use working directory

Examples:

Scrape the google.com domain, starting at http://google.com/:
  python ./scrape.py google.com http://google.com/  

Scrape the github.com domain, store in the provided directory:
  python ./scrape.py --out ./github github.com http://github.com/
You can’t perform that action at this time.