grepurl
is a command line tool that extracts URLs from a website (or a
local HTML file).
grepurl http://example.com/ # extract all URLs from links and images grepurl -a http://example.com/foo.htm # only extract from <a> tags (i.e. links) grepurl -i http://example.com/bar.htm # only extract from <img> tags (i.e. images) grepurl -r "\.py$" http://example.com/ # only extract links that end in '.py' grepurl -r "\.zip$" -d http://example.com/ # download all zip files grepurl -r "\.zip$" -d -o download_dir http://example.com/ # download all zip files into download_dir
pip install grepurl
git clone https://github.com/arne-cl/grepurl cd grepurl pip install -e .
GPLv2 or later.
Gerome Fournier (original author). His implementation is only available via the Internet Archive.
Arne Neumann (added -l option for local files, minor changes).
GPT-4 (rewrote the script for Python 3 compatibility).