Skip to content

Basic setup with random user agents and IP addresses for Python Scrapy Framework.

Notifications You must be signed in to change notification settings

kim0051/PythonScrapyBasicSetup

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

PythonScrapyBasicSetup

Basic setup with random user agents and proxy addresses for Python Scrapy Framework.

####Setup

  1. Install Scrapy Framework
pip install Scrapy

Detailed installation guide 2. Install Beautiful Soup 4

pip install beautifulsoup4

Detailed installation guide

####Usage To see what it does just:

python run.py

Project contains two middleware classes in middlewares.py. ProxyMiddleware downloads IP proxy addresses and before every process request chooses one randomly. RandomUserAgentMiddleware is similar, downloads user agent strings and saves them into 'USER_AGENT_LIST' settings list. It also selects one UA randomly before every process request. Middlewares are activated in settings.py file. This project also contains two spiders just for testin purposes, spiders/iptester.py and spiders/uatester.py. You can run them individually:

scrapy crawl UAtester
scrapy crawl IPtester

run.py file is a also good example how to include and run your spiders sequentially from one script.

If you have any questions or problems, feel free to ask me via email.

About

Basic setup with random user agents and IP addresses for Python Scrapy Framework.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%