This bot demonstrates the techniques of obtaining data in a brute-force manner. Please note that this project is purely for academic purpose, any data obtained form the sample website belongs to its owner. DO NOT NUKE THE WEBSITE.
Change the database in EAAScraper/settings.py to your MongoDB settings. Default configuration:
MONGO_URI = 'mongodb://localhost:27017/'
MONGO_DATABASE = 'eaa'
MONGO_COLLECTION = 'companies'
After setting up, run:
service mongod start
cd EAAScraper
pip install -r requirements.txt
scrapy crawl eaa
- Submitting multipart/form-data
- Dealing with session authentication - the example site uses pre-negotiated values for form requests
- Using generators instead of pre-computed list to obtain all possible input efficiently
- Inserting valid records into MongoDB using pipelines
This scraper scrapes from Estate Agents Authority to look for real estate agents in Hong Kong.
In the normal manual way, you get the agent details by inputting a valid licence number in the form.
The spider loops through all possibilities and submits form requests for each licence number. For the sake of demonstration purpose, we only loop on one alphabet by default.