Macy's
Tips. there is a little bit difference between the rank1's url's database and what I have written in the _01_request_of_prod_page.py, so you have two choice about it.
- copy and paste the url of "Women\ men \ toys and etc", and insert into the database.
or you can
- change database to the database what i have written in the code.
environment: conda install requests
run the code:
- unzip readme.zip
- create a database the same as readme/爬取Macy网用户评价日志(1)需求存储数据库设计.docx
- run the code: _01_request_of_prod_page.py to download the url of rank1, like "Women、men、toys and etc"
- run the code: _02_main.py to download the url of rank2, like "Women --> tops 、 women --> hoddies、 and etc"
- run the code: _03_main.py to download the url of rank3, like "tops1, tops2 and etc"
- run the code: _04_main.py to download the url of rank4, is the true product's specific url.
- run the code: _04_spider_of_rank4_prod_info_for_pic to download the img of prod.
- run the code: _04_spider_of_rank4_prod_info_main.py to download the product information like price and name and etc.
- run the code: _04_extractor_main.py to extract review information to database.
- run the other code which has the code like: if name == "main":
Tips2:
- You'd better use the scrapy. scrapy is more robust than only use request like above codes.