This is a web scrapper built in Python, that can be used to extract reviews from online shopping website : https://www.amazon.in/
The scrapper makes use of a YAML file to create an extractor. The file contains almost all the attributes that can be required by any user, but it can be modified to add or drop data fields according to the requirement of the user.
The output will be saved in CSV format. The generated output contains the following 14 data fields:
Product Details:
- product_name
- product_variant
- product_image
- avg_reviews for that product
Review Details:
- review_title
- star_rating given by the reviewer
- review_date
- review_content
- helpful score of the review
- variant of the product for which review is given
- verified - whether the purchase is verified or not
- review_images - images posted by the reviewer
- author_profile - profile of the reviewer
- author_name - name of the reviewer
- Python Requests, to make requests and download the HTML content of the pages ( http://docs.python-requests.org/en/master/user/install/).
- LXML, for parsing the HTML Tree Structure using Xpaths
- Python Dateutil, for parsing review dates (https://github.com/dateutil/dateutil/)
- Selectorlib, to extract data from the YAML file (https://pypi.org/project/selectorlib/)