TorScraper is a utility for multi-threaded scraping via the Tor network.
It provides a clean interface for anonymously scraping data from the web simultaneously through multiple Tor exit nodes. This is useful for maintaining privacy, circumventing IP blocking and various forms of censorship, and going beyond what rate limits would otherwise allow.
Note: other environments will very likely work with no or minimal changes. But, this is what was used to develop TorScraper.
- Ubuntu Trusty 14.04
- Python 2.7.6
- Third party
- PyYAML 3.10
- pycurl 7.43.0
- stem 1.4.0
See tor_scraper.py for example usage.
- Fork it!
- Create your feature branch:
git checkout -b my-new-feature
- Commit your changes:
git commit -am 'Add some feature.'
- Push to the branch:
git push origin my-new-feature
- Submit a pull request :D