Skip to content

A basic multithreaded configurable web crawler in python for crawling files of a particular type

License

Notifications You must be signed in to change notification settings

bashrc-real/scrapAFile

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scrapAFile

A basic multithreaded configurable web crawler in python for crawling files of a particular type Currently under beta. For example to get a list of pdf's from [this] awesome resource:

$python file_scraper.py pdf -S <list of space seperated urls> -t 4 -depth 3 -output <folder to download files>

The list of urls can be given as space seperated urls. Note: This script is not honoring robots.txt right now and isn't entirely honest about user agent string either. I will open an issue for that.

About

A basic multithreaded configurable web crawler in python for crawling files of a particular type

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages