Scrapy integration with Tor for anonymous web scraping
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
tor fixed comments Nov 17, 2015
.gitignore init project files Nov 17, 2015
README.md Created readme.md Nov 17, 2015
scrapy.cfg init project files Nov 17, 2015

README.md

scrapy-tor

This is a scrapy project skeleton with Tor integration

How to get started

Beacuse scrapy does not work with SOCKS proxy, you'll need to set up a web proxy server that relays requests to Tor. You can install Polipo, a lightweight web proxy. Then point Polipo to Tor's listening port, which is 9050 by default.

Uncomment or add the following lines to Polipo's config file etc/polipo/config to set up Polipo.

socksParentProxy = localhost:9050
disableLocalInterface=true
diskCacheRoot = ""

The function ProxyMiddleware defined in middlewares.py will relay all scrapy's requests to Polipo's default port of 8123

Don't forget to start Polipo and Tor before scraping!