Web_Scraper

This is a web scrapper built in Python, that can be used to extract reviews from online shopping website : https://www.amazon.in/

The scrapper makes use of a YAML file to create an extractor. The file contains almost all the attributes that can be required by any user, but it can be modified to add or drop data fields according to the requirement of the user.

The output will be saved in CSV format. The generated output contains the following 14 data fields:

Product Details:

product_name
product_variant
product_image
avg_reviews for that product

Review Details:

review_title
star_rating given by the reviewer
review_date
review_content
helpful score of the review
variant of the product for which review is given
verified - whether the purchase is verified or not
review_images - images posted by the reviewer
author_profile - profile of the reviewer
author_name - name of the reviewer

Libraries used

Python Requests, to make requests and download the HTML content of the pages ( http://docs.python-requests.org/en/master/user/install/).
LXML, for parsing the HTML Tree Structure using Xpaths
Python Dateutil, for parsing review dates (https://github.com/dateutil/dateutil/)
Selectorlib, to extract data from the YAML file (https://pypi.org/project/selectorlib/)

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
sample_output		sample_output
AmazonReviewScraping.ipynb		AmazonReviewScraping.ipynb
README.md		README.md
select_data.yml		select_data.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web_Scraper

Libraries used

About

Releases

Packages

Languages

Nyble23/Web_Scraper

Folders and files

Latest commit

History

Repository files navigation

Web_Scraper

Libraries used

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages