Amazon Research

Repo for Amazon's Top Reviewer prediction project

Overview

This project will entail web scraping, text mining, and predictive models with the objective of predicting "Review" (y/n) and/or the rating (number of stars). This approach seeks to help sellers target the reviewers most likely to review their product with a high rating, which will also be seen as helpful to other shoppers.

Data

The information to be analyzed must be scraped from Amazon.com's list of Top Reviewers. For example, we'll need to identify the top reviewers then gather reviews. For each review, we'll want the review text, rating, percent and absolute value of helpful votes, product, product metadata, and any user (Reviewer) information available.

Method

As a first approximation, we'll apply the random forest algorithm (RF). I choose RF initially for out-of-box performance and relative ease of application. As the modeling progresses, the modelling approach will certainly evolve. The vast majority of programming will take place within the R language.

Storage and Computing

The ideal solution would involve the procurment of all reviews from all Top Reviewers. If this is achieved, the data set will become very large with respect to R's in-memory paradigm. Moreover, the nature of the data could pose a challenge to the required design of traditional RDBMS's. As a result, MongoDB on a cloud service, such as AWS might be a potential solution.

In addition, processing a data set of this size locally will strain computing resources. Thus, using a service like AWS EC2 could provide efficieny gains.

Final Product

A score of each reviewer for a particular product that will represent the probability of a highly rated, helpful review.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
python_implementation		python_implementation
ryan_data		ryan_data
.DS_Store		.DS_Store
Amazon API Links		Amazon API Links
Amazon Review Report.pdf		Amazon Review Report.pdf
Final Project Presentation- Amazon Group.pdf		Final Project Presentation- Amazon Group.pdf
README.md		README.md
Text_and_Network_Mining(1).R		Text_and_Network_Mining(1).R
amazon script.R		amazon script.R
call_last_step.R		call_last_step.R
data_last_step.R		data_last_step.R
exampleData.csv		exampleData.csv
mdat1.txt		mdat1.txt
mdat2.txt		mdat2.txt
mdat3.txt		mdat3.txt
mdat4.txt		mdat4.txt
mdat5.txt		mdat5.txt
mdat6.txt		mdat6.txt
mdat7.txt		mdat7.txt
natecheck1.ipynb		natecheck1.ipynb
ndata1.txt		ndata1.txt
ndata2.txt		ndata2.txt
ndata3.txt		ndata3.txt
ndata4.txt		ndata4.txt
ndata5.txt		ndata5.txt
ndata6.txt		ndata6.txt
newdata.txt		newdata.txt
reviewers.R		reviewers.R
reviewers_list.csv		reviewers_list.csv
reviewers_script.R		reviewers_script.R
rvest.ipynb		rvest.ipynb
rvest_vs_RCurl.R		rvest_vs_RCurl.R

fdac15/AmazonProductReviews

Folders and files

Latest commit

History

Repository files navigation

Amazon Research

Overview

Data

Method

Storage and Computing

Final Product

About

Resources

Stars

Watchers

Forks

Languages