scrapAFile

A basic multithreaded configurable web crawler in python for crawling files of a particular type Currently under beta. For example to get a list of pdf's from [this] awesome resource:

$python file_scraper.py pdf -S <list of space seperated urls> -t 4 -depth 3 -output <folder to download files>

The list of urls can be given as space seperated urls. Note: This script is not honoring robots.txt right now and isn't entirely honest about user agent string either. I will open an issue for that.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
DownloadFromUrl.py		DownloadFromUrl.py
HashRabinKarp.py		HashRabinKarp.py
LICENSE		LICENSE
README.md		README.md
file_scraper.py		file_scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scrapAFile

About

Releases

Packages

Languages

License

bashrc-real/scrapAFile

Folders and files

Latest commit

History

Repository files navigation

scrapAFile

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages