Crawler_of_Product_Comment

Macy's

Tips. there is a little bit difference between the rank1's url's database and what I have written in the _01_request_of_prod_page.py, so you have two choice about it.

copy and paste the url of "Women\ men \ toys and etc", and insert into the database.

or you can

change database to the database what i have written in the code.

environment: conda install requests

run the code:

unzip readme.zip
create a database the same as readme/爬取Macy网用户评价日志（1）需求存储数据库设计.docx
run the code: _01_request_of_prod_page.py to download the url of rank1, like "Women、men、toys and etc"
run the code: _02_main.py to download the url of rank2, like "Women --> tops 、 women --> hoddies、 and etc"
run the code: _03_main.py to download the url of rank3, like "tops1, tops2 and etc"
run the code: _04_main.py to download the url of rank4, is the true product's specific url.
run the code: _04_spider_of_rank4_prod_info_for_pic to download the img of prod.
run the code: _04_spider_of_rank4_prod_info_main.py to download the product information like price and name and etc.
run the code: _04_extractor_main.py to extract review information to database.
run the other code which has the code like: if name == "main":

Tips2:

You'd better use the scrapy. scrapy is more robust than only use request like above codes.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
_00_record_of_agent_pool.py		_00_record_of_agent_pool.py
_00_record_of_all_mysql.py		_00_record_of_all_mysql.py
_00_record_of_small_functions.py		_00_record_of_small_functions.py
_00_record_of_xpath_and_re_dict.py		_00_record_of_xpath_and_re_dict.py
_01_request_of_prod_page.py		_01_request_of_prod_page.py
_02_crawler_of_rank2.py		_02_crawler_of_rank2.py
_02_get_rank2_xpath.py		_02_get_rank2_xpath.py
_02_main.py		_02_main.py
_03_main.py		_03_main.py
_03_mysql_of_rank3.py		_03_mysql_of_rank3.py
_03_spider_of_rank3.py		_03_spider_of_rank3.py
_04_extractor_info_from_review.py		_04_extractor_info_from_review.py
_04_extractor_main.py		_04_extractor_main.py
_04_extractor_review_from_review.py		_04_extractor_review_from_review.py
_04_main.py		_04_main.py
_04_mysql_of_rank4.py		_04_mysql_of_rank4.py
_04_spider_of_rank4_prod_info.py		_04_spider_of_rank4_prod_info.py
_04_spider_of_rank4_prod_info_for_pic.py		_04_spider_of_rank4_prod_info_for_pic.py
_04_spider_of_rank4_prod_info_main.py		_04_spider_of_rank4_prod_info_main.py
readme.zip		readme.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crawler_of_Product_Comment

About

Releases

Packages

Languages

AtwoodZhang/Crawler_of_Product_Comment

Folders and files

Latest commit

History

Repository files navigation

Crawler_of_Product_Comment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages