Use Scrapy to crawl text reviews & images from Dianping.com and generate pretty static pages!
Images
- The downloaded images will be stored under
../imgs/
, sorted by/user/shop/
- You can also custom images path by change in
IMAGES_STORE
insettings.py
Text Reviews
- The text reviews are exported in JSON format in
review.json
To Be Done..
- Install Python 3.6
- Install Scrapy following the tutorial
- Set
start_urls
indianping_spider.py
to the url of the review page that you want to crawl. e.g., click here to view my dianping reviews page
Under ../Dianping-Gallery/dianping_gallery/dianping_gallery/spiders
, run:
scrapy runspider dianping_spider.py -o review.json
The downloading process will then show in the command screen