Just a very simple demo using scrapy coupled with gocr to read proxy server addresses from torvpn.com which are displayed in image files. This uses my personal proxy api. I'll leave it open for now but will immediately close access to the API if I see too many requests coming in. "Too many" requests is a totally arbitrary number decided by me.
- gocr
- virtualenv
- see requirements.txt for python libraries
Don't ruin it for everyone. If you start heavily using my proxydump api, I will shut it down. I'm sure that it's only a matter of time before a bot picks it up though.
``` cd ocr/ocr/spiders scrapy runspider torvpn_com.py ```Downloaded images get saved to ocr/ocr/spiders/proxy_images/address.png